title | tags | authors | affiliations | date | bibliography | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Learning Machine Learning with Lorenz-96 |
|
|
|
10 October 2023 |
paper.bib |
Machine learning (ML) is a rapidly growing field that is starting to touch all aspects of our lives, and science is not immune to this. In fact, recent work in the field of scientific ML, i.e. combining ML and with conventional scientific problems, is leading to new breakthroughs in notoriously hard problems, which might have seemed too distant till a few years ago. One such age-old problem is that of turbulence closures in fluid flows. This closure or parameterization problem is particularly relevant for environmental fluids, which span a large range of scales from the size of the planet down to millimeters, and remains a big challenge in the way of improving forecasts of weather and projections of climate.
The climate system is composed of many interacting components (e.g., ocean, atmosphere, ice) and is described by complex nonlinear equations. To simulate, understand, and predict climate, these equations are solved numerically under a number of simplifications, therefore leading to errors. The errors result from numerics used to solve the equations and the lack of appropriate representations of processes occurring below the resolution of the climate model grid (i.e., sub-grid processes).
This book aims to conceptualize the problems associated with climate models within a simple and computationally accessible framework, and show how some basic ML methods can be used to approach these problems. We will introduce the readers to climate modeling using a simple tool, the @Lorenz1995 (L96) two-timescale model. We discuss the numerical aspects of the L96 model, the approximate representation of sub-grid processes (known as parameterizations or closures), and simple data assimilation problems (a data-model fusion method). We then use the L96 results to demonstrate how to learn sub-grid parameterizations from data with ML, and then test the parameterizations offline (apriori) and online (aposteriori), with a focus on the interpretability of the results. This book is written primarily for climate scientists and physicists, who are looking for a gentle introduction to how they can incorporate ML into their work. However, it may also help ML scientists learn about the parameterization problem in a framework that is relatively simple to understand and use.
The material in this Jupyter book is presented over five sections. The first section, Lorenz 96 and General Circulations Models, describes the Lorenz-96 model and how it can work as a simple analog to much more complex general circulation models used for simulating ocean and atmosphere dynamics. This section also introduces the essence of the parameterization or closure problem. In the second section, Neural Networks with Lorenz-96, we introduce the basics of ML, how fully connected neural networks can be used to approach the parameterization task, and how these neural networks can be optimized and interpreted. No model, even the well parameterized ones, is perfect, and the way we keep computer models close to reality is by guiding them with the help of observational data. This task is referred to as data assimilation, and is introduced in the third section, Data Assimilation with Lorenz-96. Here, we use the L96 model to quickly introduce the concepts from data assimilation, and show how ML can be used to learn data assimilation increments to help reduce model biases. While neural networks can be great functional approximators, they are usually quite opaque, and it is hard to figure out exactly what they have learnt. Equation discovery is a class of ML techniques that tries to estimate the function in terms of an equation rather than as a set of weights for a neural network. This approach produces a result that is far more interpretable, and can potentially even help discover novel physics. These techniques are presented in the fourth section, Equation Discovery with Lorenz-96. Finally, we describe a few more ML in section five, Other ML approaches for Lorenz-96, with the acknowledgment that there are many more techniques in the fast-growing ML and scientific ML literature and we have no intention of providing a comprehensive summary of the field.
The book was created by and as part of M2LInES, an international collaboration supported by Schmidt Futures, to improve climate models with scientific ML. The original goal for these notebooks in this Jupyter book was for our team to work together and learn from each other; in particular, to get up to speed on the key scientific aspects of our collaboration (parameterizations, ML, data assimilation, uncertainty quantification) and to develop new ideas. This was done as a series of tutorials, each of which was led by a few team members and occurred with a frequency of roughly once every 2 weeks for about 6-7 months. This Jupyter book is a collection of the notebooks used during these tutorials, which have only slightly been edited for continuity and clarity. Ultimately, we are happy to share these resources with the scientific community to introduce our research ideas and foster the use of ML techniques for tackling climate science problems.
Parameterization of sub-grid processes is a major challenge in climate modeling. The details of this problem may often be very context dependent (@christensen2022parametrization), but much can be learned by addressing the issue in a general and simpler sense. Also, a general approach allows non-domain experts, e.g. ML researchers, to engage and contribute more meaningfully. This JupyterBook aims to achieve this target with the help of a simple dynamical system model - Lorenz 96, such that the reader is introduced to the basic concepts with minimal superfluous complexity. It is possible to extend the concepts that are presented here to other dynamical systems, and even to more complex parameterization tasks (some examples can be found at https://m2lines.github.io/publications/), and we hope that researchers and learners aiming to do this find the concepts presented here as a useful stepping stone in this pursuit.
As described above, these notebooks were originally created to introduce non-domain experts to ideas from the parameterization aspects of climate modeling and how ML could be used to potentially address these. Now they have been adapted to act as a pedagogical tool for self-learning, be used as a reference manual, or for teaching some modules in an introductory class on ML. The book is organized in sections that are relatively independent; with the exception that the first section provides a general overview to the parameterization problem in climate models. Each notebook covers material that can be discussed in roughly an hour-long lecture, and sections can be mixed and matched or ordered as needed depending on the overall learning objectives.
This work is supported by the generosity of Eric and Wendy Schmidt by recommendation of Schmidt Futures, as part of its Virtual Earth System Research Institute (VESRI). MAB acknowledges support from National Science Foundation's AGS-PRF Fellowship Award (AGS2218197).