Double Descent Explained by Yann LeCun

Bias-Variance Trade Off

In statistical inference, models with too many parameters tend to overfit the data, memorizing it instead of generalizing.
Deep neural networks, despite having massive numbers of parameters, somehow still generalize well.

Double Descent Phenomenon

As model complexity (number of parameters) increases, test error initially decreases.
As complexity continues to increase, test error increases, reaches a peak, and then surprisingly decreases again.

Possible Explanations

Implicit regularization in neural networks: Neural networks might have inherent mechanisms that prevent overfitting even with numerous parameters.
Dynamics of gradient descent: The optimization process of gradient descent could lead to solutions that generalize better.
Stochastic gradient descent: The noise introduced by stochastic gradient descent could act as a form of regularization.

Polynomial Example:

Overfitting with low-degree polynomials: When fitting a low-degree polynomial (e.g., degree 10) to a small dataset (e.g., 11 points), the polynomial is forced to pass through every data point, resulting in a highly irregular curve that does not generalize well to new data.
Regularization and higher-degree polynomials: However, if we increase the degree of the polynomial (e.g., to 20 or 30) and apply regularization to its coefficients, the test error can actually decrease. The polynomial still passes through all the data points, but it becomes smoother and less erratic, leading to better generalization.

Thoughts

There are some discussion at https://x.com/paraschopra/status/1788542691455664235. The polynomial example seems not a good example. Regularization works well for both degree 10 and degree to. It emphasis the importance of regularization, but does not really illustrate the double descent phenomenon well.

🪴 Luckyrand's Garden

Explorer

Double Descent Explained by Yann LeCun

Bias-Variance Trade Off

Double Descent Phenomenon

Possible Explanations

Polynomial Example:

Thoughts

Graph View

Table of Contents

Backlinks