Skip to content

Commit

Permalink
update summaries
Browse files Browse the repository at this point in the history
  • Loading branch information
MoHawastaken committed Nov 12, 2024
1 parent b7df8e8 commit 958df1f
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 4 deletions.
4 changes: 3 additions & 1 deletion content/publication/cn_pitfalls/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,6 @@ url_video: ''

---

In this paper, we explore in which ways climate networks are distorted due to false and missing edges. We find several mechanisms through which finite-sample noise can systematically distort climate networks. We also find that common resampling procedures to quantify significant behaviour in climate networks do not adequately capture intrinsic network variance. While we propose a new resampling framework, the question of how to reliably quantify intrinsic network variance from complex climatic time series remains a matter of ongoing work.
In this paper, we explore in which ways estimation errors distort climate networks with false and missing edges. We find several mechanisms through which finite-sample noise can systematically distort climate networks. Most notably, spatially heterogenous estimation variance (for example caused by heterogenous autocorrelation patterns in different locations) causes large biases toward overestimating the importance of the particularly noisy locations. But even on isotropic data, many graph measures like betweenness or clustering coefficients can be heavily distorted.

We also find that common resampling procedures to quantify significant behaviour in climate networks do not adequately capture intrinsic network variance. While we propose a new resampling framework, the question of how to reliably quantify intrinsic network variance from complex climatic time series remains a matter of ongoing work.
4 changes: 2 additions & 2 deletions content/publication/mindthespikes/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,6 @@ url_video: ''

When can kernel or neural network models that overfit noisy data generalize nearly optimally?

Previous literature had suggested that kernel methods can only exhibit such `benign overfitting', also called 'harmless interpolation', if the input dimension grows with the number of data points. We show that, while overfitting leads to inconsistency with common estimators, adequately designed spiky-smooth estimators can achieve benign overfitting in arbitrary fixed dimension. For neural networks with NTK parametrization, you just have to add tiny fluctuations to the activation function. Remarkably, adding a Gaussian kernel with small bandwidth to the NTK approximately translates into adding a high-frequency low-amplitude shifted sin-curve to the activation function.
Previous literature had suggested that kernel methods can only exhibit such `benign overfitting', also called 'harmless interpolation', if the input dimension grows with the number of data points. We show that, while overfitting leads to inconsistency with common estimators, adequately designed spiky-smooth estimators can achieve benign overfitting in arbitrary fixed dimension. For neural networks with NTK parametrization, you just have to add tiny fluctuations to the activation function. Remarkably, adding a Gaussian kernel with small bandwidth to the NTK or to the NNGP approximately translates into adding a high-frequency low-amplitude shifted sin-curve to the activation function.

It remains to study whether a similar adaptation of the activation function or some other inductive bias towards spiky-smooth functions can also lead to benign overfitting with feature-learning neural architectures on complex datasets.
It remains to study whether a similar adaptation of the activation function or some other inductive bias towards spiky-smooth functions can also lead to benign overfitting with feature-learning neural architectures on complex datasets.
4 changes: 3 additions & 1 deletion content/publication/sam-mupp/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,9 @@ projects:
#slides: example
---

Naively scaling standard neural network architectures and optimization algorithms loses desirable properties such as feature learning in large models (see the Tensor Program series by Greg Yang et al.). We show the same for sharpness aware minimization (SAM) algorithms: There exists a unique nontrivial width-dependent and layerwise perturbation scaling for SAM that effectively perturbs all layers and provides in width-independent dynamics. Crucial practical benefits include improved generalization and transfer of optimal learning rate and perturbation radius jointly across model scales - even after multi-epoch training to convergence.
Naively scaling up standard neural network architectures and optimization algorithms loses desirable properties such as feature learning in large models (see the Tensor Program series by [Greg Yang](https://scholar.google.com/citations?user=Xz4RAJkAAAAJ&hl=en) et al.). We show the same for sharpness aware minimization (SAM) algorithms: There exists a unique nontrivial width-dependent and layerwise perturbation scaling for SAM that effectively perturbs all layers and provides in width-independent perturbation dynamics.

Crucial practical benefits of our parameterization $\mu P^2$ include improved generalization, training stability and transfer of optimal learning rate and perturbation radius jointly across model scales - even after multi-epoch training to convergence. This allows to tune and study small models, and train the large model only once with optimal hyperparameters.


<!--
Expand Down

0 comments on commit 958df1f

Please sign in to comment.