Publications

On-line Gibbs learning. I. General Theory

We study a model of on-line learning, the on-line Gibbs algorithm (OLGA), which is particularly suitable for supervised learning of realizable and unrealizable discrete valued functions. The learning algorithm is based on an on-line energy function E that balances the need to minimize the instantaneous error against the need to minimize the change in the weights. The relative importance of these terms in E is determined by a parameter λ, the inverse of which plays a similar role as the learning rate parameters in other on-line learning algorithms. In the stochastic version of the algorithm, following each presentation of a labeled example the new weights are chosen from a Gibbs distribution with the on-line energy E and a temperature parameter T. In the zero-temperature version of OLGA, at each step, the new weights are those that minimize the on-line energy E. The generalization performance of OLGA is studied analytically in the limit of small learning rate, i.e., →λ∞. It is shown that at finite temperature OLGA converges to an equilibrium Gibbs distribution of weight vectors with an energy function which is equal to the generalization error function. In its deterministic version OLGA converges to a local minimum of the generalization error. The rate of convergence is studied for the case in which the parameter λ increases as a power law in time. It is shown that in a generic unrealizable dichotomy, the asymptotic rate of decrease of the generalization error is proportional to the inverse square root of presented examples. This rate is similar to that of batch learning. The special cases of learning realizable dichotomies or dichotomies with output noise are also treated.

Authors: H Sompolinsky; Kim J.
Year of publication: 1998
Journal: Physical Review E, 58: 2335-2347. (1998)

Link to publication:

Labs:

“Working memory”