Sompolinsky's Lab: Predicting the outputs of finite deep neural networks trained with noisy gradients

Gadi Naveh, Oded Ben David, Haim Sompolinsky, and Zohar Ringel

Predicting the outputs of finite deep neural networks trained with noisy gradients

Phys. Rev. E 104, 064301 (2021)

Lay summary:

Deep neural networks (DNNs) have been advancing the state-of-the-art in machine learning, yet a complete theory remains elusive. Recently, several exact results were obtained as the width of each layer of the DNN tends to infinity. This facilitated the derivation of an exact correspondence with Gaussian Processes (GPs) – non-parametric Bayesian models, which are well understood analytically. Thus, as the DNN has more and more degrees of freedom due to the increase in width, it actually becomes easier to understand.
However, this correspondence to GPs overlooks several important aspects of finite-width DNNs, most notably the ability to learn features from the training data. The features in the infinite width limit depend only on the DNN architecture and the distribution of training inputs but do not depend on the training labels, namely the function or rule being learned. In contrast, the representations of finite DNNs evolve during training in response to the structure of the data. In this work we provide an analytical framework for moving away from the GP limit and provide an expression for the leading correction term (in 1/width) to the GP result, thus taking a first step in closing the gap between GPs and real DNNs.