Up next


Stanford ICME Lecture on Why Deep Learning Works. Jan 2020

3,870 Views
AI Lover
3
Published on 06/02/23 / In How-to & Learning

Random Matrix Theory (RMT) is applied to analyze the weight matrices of
Deep Neural Networks (DNNs), including production quality, pre-trained
models and smaller models trained from scratch. Empirical and theoretical
results indicate that the DNN training process itself implements a
form of self-regularization, evident in the empirical spectral density (ESD)
of DNN layer matrices. To understand this, we provide a phenomenology
to identify 5+1 Phases of Training, corresponding to increasing amounts of
implicit self-regularization. For smaller and/or older DNNs, this implicit self-regularization
is like traditional Tikhonov regularization, with a "size scale" separating signal from
noise. For state-of-the-art DNNs, however, we identify a novel form of
heavy-tailed self-regularization, similar to the self-organization seen
in the statistical physics of disordered systems.

To that end, building on the statistical mechanics of generalization,
and applying recent results from RMT, we derive a new VC-like
complexity metric that resembles the familiar product norms, but
is suitable to study average case generalization behavior in real systems.
We then demonstrate its effectiveness by testing how well this new
metric correlates with trends in the reported test accuracies across models
for over 450 pretrained DNNs covering a range of data sets and architectures.

Show more
0 Comments sort Sort By

Up next