The effect of the structure of the input distribution on the complexity of learning a pattern classification task is investigated. Using statistical mechanics, we study the performance of a winner-take-all machine at learning to classify points generated by a mixture of K Gaussian distributions (‘‘clusters’’) in RN with intercluster distance u (relative to the cluster width). In the separation limit u≫1, the number of examples required for learning scales as NKu−p, where the exponent p is 2 for zero-temperature Gibbs learning and 4 for the Hebb rule.