This paper reviews some of the recent developments in the theory of supervised learning within the frarnework of statistical mechanics. The main focus of the paper is the properties of zero temperature learning which selects at random one of the parameter sets that minimize the training error. The maillresults concerning the shapes of the learning curves are slHnmarized and discussed. Several outstanding issues are discussed, including the evolution of the architecture of the learning systems during training, and the role of the input distribution. New results concerning learning in multi-layer networks, which illustrate these issues, are reported.