title

tags

authors

affiliations

date

bibliography

Learning Machine Learning with Lorenz-96

Python

Machine Learning

Neural Networks

Dynamical systems

name	orcid	affiliation
Dhruv Balwada	0000-0001-6632-0187	1

name	orcid	affiliation
Ryan Abernathey	0000-0001-5999-4917	1

name	orcid	affiliation
Shantanu Acharya	0000-0002-9652-2991	2

name	orcid	affiliation
Alistair Adcroft	0000-0001-9413-1017	3

name	orcid	affiliation
Judith Brener	0000-0003-2168-0431	12

name	orcid	affiliation
V Balaji	0000-0001-7561-5438	15

name	orcid	affiliation
Mohamed Aziz Bhouri	0000-0003-1140-7415	4

name	orcid	affiliation
Joan Bruna	0000-0002-2847-1512	2, 13

name	orcid	affiliation
Mitch Bushuk	0000-0002-0063-1465	9

name	orcid	affiliation
Will Chapman	0000-0002-0472-7069	12

name	orcid	affiliation
Alex Connolly	0000-0002-2310-0480	4

name	orcid	affiliation
Julie Deshayes	0000-0002-1462-686X	10

name	orcid	affiliation
Carlos Fernandez-Granda	0000-0001-7039-8606	2, 13

name	orcid	affiliation
Pierre Gentine	0000-0002-0845-8345	4, 14

name	orcid	affiliation
Anastasiia Gorbunova	0000-0002-3271-2024	6

name	orcid	affiliation
Will Gregory	0000-0001-8176-1642	3

name	orcid	affiliation
Arthur Guillaumin	0000-0003-1571-4228	5

name	orcid	affiliation
Shubham Gupta	0009-0002-6966-588X	8

name	orcid	affiliation
Marika Holland	0000-0001-5621-8939	12

name	orcid	affiliation
J Emmanuel Johnsson	0000-0002-6739-0053	6

name	orcid	affiliation
Julien Le Sommer	0000-0002-6882-2938	6

name	orcid	affiliation
Ziwei Li		2

name	orcid	affiliation
Nora Loose	0000-0002-3684-9634	3

name	orcid	affiliation
Feiyu Lu	0000-0001-6532-0740	9

name	orcid	affiliation
Paul O'Gorman	0000-0001-6532-0740	11

name	orcid	affiliation
Pavel Perezhogin	0000-0003-2098-3457	2

name	orcid	affiliation
Brandon Reichl	0000-0001-9047-0767	9

name	orcid	affiliation
Andrew Ross	0000-0002-2368-6979	2

name	orcid	affiliation
Aakash Sane	0000-0002-9642-008X	3

name	orcid	affiliation
Sara Shamekh	0000-0001-7441-4116	4, 2

name	orcid	affiliation
Tarun Verma	0000-0001-7730-1483	3

name	orcid	affiliation
Janni Yuval	0000-0001-7519-0118	11

name	orcid	affiliation
Lorenzo Zampieri	0000-0003-1703-4162	7

name	orcid	affiliation
Cheng Zhang	0000-0003-4278-9786	3

name	orcid	affiliation
Laure Zanna	0000-0002-8472-4828	2

name	index
Lamont Doherty Earth Observatory, Columbia University	1

name	index
Courant Institute of Mathematical Sciences, New York University	2

name	index
Program in Atmospheric and Oceanic Sciences, Princeton University	3

name	index
Earth and Environmental Engineering, Columbia University	4

name	index
Queen Mary University of London	5

name	index
Univ. Grenoble Alpes, CNRS, IRD, Grenoble INP, INRAE, IGE, 38000 Grenoble, France	6

name	index
Ocean Modeling and Data Assimilation Division, Fondazione Centro Euro-Mediterraneo sui Cambiamenti Climatici - CMCC	7

name	index
Tandon School of Engineering, New York University	8

name	index
NOAA Geophysical Fluid Dynamics Laboratory	9

name	index
Sorbonne Universités, LOCEAN Laboratory, Paris, France	10

name	index
Department of Earth, Atmospheric, and Planetary Sciences, Massachusetts Institute of Technology	11

name	index
National Center for Atmospheric Research	12

name	index
Center for Data Science, New York University	13

name	index
Columbia Climate School, Columbia University	14

name	index
Schmidt Futures	15

10 October 2023

paper.bib

Summary

Machine learning (ML) is a rapidly growing field that is starting to touch all aspects of our lives, and science is not immune to this. In fact, recent work in the field of scientific ML, i.e. combining ML and with conventional scientific problems, is leading to new breakthroughs in notoriously hard problems, which might have seemed too distant till a few years ago. One such age-old problem is that of turbulence closures in fluid flows. This closure or parameterization problem is particularly relevant for environmental fluids, which span a large range of scales from the size of the planet down to millimeters, and remains a big challenge in the way of improving forecasts of weather and projections of climate.

The climate system is composed of many interacting components (e.g., ocean, atmosphere, ice) and is described by complex nonlinear equations. To simulate, understand, and predict climate, these equations are solved numerically under a number of simplifications, therefore leading to errors. The errors result from numerics used to solve the equations and the lack of appropriate representations of processes occurring below the resolution of the climate model grid (i.e., sub-grid processes).

This book aims to conceptualize the problems associated with climate models within a simple and computationally accessible framework, and show how some basic ML methods can be used to approach these problems. We will introduce the readers to climate modeling using a simple tool, the @Lorenz1995 (L96) two-timescale model. We discuss the numerical aspects of the L96 model, the approximate representation of sub-grid processes (known as parameterizations or closures), and simple data assimilation problems (a data-model fusion method). We then use the L96 results to demonstrate how to learn sub-grid parameterizations from data with ML, and then test the parameterizations offline (apriori) and online (aposteriori), with a focus on the interpretability of the results. This book is written primarily for climate scientists and physicists, who are looking for a gentle introduction to how they can incorporate ML into their work. However, it may also help ML scientists learn about the parameterization problem in a framework that is relatively simple to understand and use.

The material in this Jupyter book is presented over five sections. The first section, Lorenz 96 and General Circulations Models, describes the Lorenz-96 model and how it can work as a simple analog to much more complex general circulation models used for simulating ocean and atmosphere dynamics. This section also introduces the essence of the parameterization or closure problem. In the second section, Neural Networks with Lorenz-96, we introduce the basics of ML, how fully connected neural networks can be used to approach the parameterization task, and how these neural networks can be optimized and interpreted. No model, even the well parameterized ones, is perfect, and the way we keep computer models close to reality is by guiding them with the help of observational data. This task is referred to as data assimilation, and is introduced in the third section, Data Assimilation with Lorenz-96. Here, we use the L96 model to quickly introduce the concepts from data assimilation, and show how ML can be used to learn data assimilation increments to help reduce model biases. While neural networks can be great functional approximators, they are usually quite opaque, and it is hard to figure out exactly what they have learnt. Equation discovery is a class of ML techniques that tries to estimate the function in terms of an equation rather than as a set of weights for a neural network. This approach produces a result that is far more interpretable, and can potentially even help discover novel physics. These techniques are presented in the fourth section, Equation Discovery with Lorenz-96. Finally, we describe a few more ML in section five, Other ML approaches for Lorenz-96, with the acknowledgment that there are many more techniques in the fast-growing ML and scientific ML literature and we have no intention of providing a comprehensive summary of the field.

The book was created by and as part of M2LInES, an international collaboration supported by Schmidt Futures, to improve climate models with scientific ML. The original goal for these notebooks in this Jupyter book was for our team to work together and learn from each other; in particular, to get up to speed on the key scientific aspects of our collaboration (parameterizations, ML, data assimilation, uncertainty quantification) and to develop new ideas. This was done as a series of tutorials, each of which was led by a few team members and occurred with a frequency of roughly once every 2 weeks for about 6-7 months. This Jupyter book is a collection of the notebooks used during these tutorials, which have only slightly been edited for continuity and clarity. Ultimately, we are happy to share these resources with the scientific community to introduce our research ideas and foster the use of ML techniques for tackling climate science problems.

Statement of Need

Parameterization of sub-grid processes is a major challenge in climate modeling. The details of this problem may often be very context dependent (@christensen2022parametrization), but much can be learned by addressing the issue in a general and simpler sense. Also, a general approach allows non-domain experts, e.g. ML researchers, to engage and contribute more meaningfully. This JupyterBook aims to achieve this target with the help of a simple dynamical system model - Lorenz 96, such that the reader is introduced to the basic concepts with minimal superfluous complexity. It is possible to extend the concepts that are presented here to other dynamical systems, and even to more complex parameterization tasks (some examples can be found at https://m2lines.github.io/publications/), and we hope that researchers and learners aiming to do this find the concepts presented here as a useful stepping stone in this pursuit.

As described above, these notebooks were originally created to introduce non-domain experts to ideas from the parameterization aspects of climate modeling and how ML could be used to potentially address these. Now they have been adapted to act as a pedagogical tool for self-learning, be used as a reference manual, or for teaching some modules in an introductory class on ML. The book is organized in sections that are relatively independent; with the exception that the first section provides a general overview to the parameterization problem in climate models. Each notebook covers material that can be discussed in roughly an hour-long lecture, and sections can be mixed and matched or ordered as needed depending on the overall learning objectives.

Acknowledgements

This work is supported by the generosity of Eric and Wendy Schmidt by recommendation of Schmidt Futures, as part of its Virtual Earth System Research Institute (VESRI). MAB acknowledges support from National Science Foundation's AGS-PRF Fellowship Award (AGS2218197).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paper.md

paper.md

Summary

Statement of Need

Acknowledgements

References

Files

paper.md

Latest commit

History

paper.md

File metadata and controls

Summary

Statement of Need

Acknowledgements

References