Skip to content

Commit

Permalink
Add experimental GCHP configuration.
Browse files Browse the repository at this point in the history
  • Loading branch information
JiaweiZhuang committed May 15, 2018
1 parent faee080 commit 1d888f3
Show file tree
Hide file tree
Showing 5 changed files with 84 additions and 20 deletions.
27 changes: 13 additions & 14 deletions doc/source/chapter02_beginner-tutorial/quick-start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ After entering some basic information, you will be required to enter your credit

Now you should have an AWS account! It's time to run the model in cloud. (You can skip Step 1 for the next time, of course)

Step 2: Launch a server with GEOS-Chem pre-installed
Step 2: Launch a server with GEOS-Chem pre-installed
----------------------------------------------------

Log in to AWS console, and click on EC2 (Elastic Compute Cloud), which is the most basic cloud computing service.
Expand Down Expand Up @@ -74,10 +74,10 @@ Select your instance, click on the "Connect" button near the blue "Launch Instan
.. figure:: img/connect_instruction.png
:width: 500 px

- On Mac or Linux, copy the ``ssh -i "xx.pem" root@xxx.com`` command under "Example".
Before using that command to ssh to your server, do some minor stuff:
(1) ``cd`` to the directory where store your Key Pair (preferably ``$HOME/.ssh``)
- On Mac or Linux, copy the ``ssh -i "xx.pem" root@xxx.com`` command under "Example".
Before using that command to ssh to your server, do some minor stuff:

(1) ``cd`` to the directory where store your Key Pair (preferably ``$HOME/.ssh``)
(2) Use ``chmod 400 xx.pem`` to change the key pair's permission (also mentioned in the above figure; only need to do this at the first time).
(3) Change the user name in that command from ``root`` to ``ubuntu``. (You'll be asked to use ``ubuntu`` if you keep ``root``).
- On Windows, please refer to the guide for `MobaXterm <http://angus.readthedocs.io/en/2016/amazon/log-in-with-mobaxterm-win.html>`_ and `Putty <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html>`_ (Your life would probably be easier with MobaXterm).
Expand All @@ -93,11 +93,11 @@ That's a system with GEOS-Chem already built!
**Trouble shooting**: if you have trouble ``ssh`` to the server, please :doc:`make sure you don't mess-up the "security group" configuration <security-group>`.

Go to the pre-generated run directory::

$ cd ~/tutorial/geosfp_4x5_standard

Just run the pre-compiled the model by::

$ ./geos.mp

Or you can re-compile the model on your own::
Expand Down Expand Up @@ -134,8 +134,7 @@ If you wait for the simulation to finish (takes 5~10 min), it will produce `NetC
time:calendar = "gregorian" ;
time:axis = "T" ;

`Anaconda Python <https://www.anaconda.com/>`_ and `xarray <http://xarray.pydata.org>`_ are already installed on the server for analyzing all kinds of NetCDF files. If you are not familiar with Python and xarray, checkout my tutorial on
`xarray for GEOS-Chem <http://gcpy-demo.readthedocs.io>`_.
`Anaconda Python <https://www.anaconda.com/>`_ and `xarray <http://xarray.pydata.org>`_ are already installed on the server for analyzing all kinds of NetCDF files. If you are not familiar with Python and xarray, checkout my `Python/xarray tutorial for GEOS-Chem users <https://github.com/JiaweiZhuang/GEOSChem-python-tutorial>`_.

Activate the pre-installed `geoscientific Python environment <https://github.com/JiaweiZhuang/cloud_GC/blob/master/build_scripts/python/geo.yml>`_ by ``source activate geo`` (it is generally a bad idea to directly install things into the root Python environment), and then start ``ipython`` from the command line::

Expand Down Expand Up @@ -163,9 +162,9 @@ Activate the pre-installed `geoscientific Python environment <https://github.com
A much better data-analysis environment is `Jupyter notebooks <http://jupyter.org>`_. If you have been using Jupyter on your local machine, the user experience on the cloud would be exactly the same.

To use Jupyter on remote servers, re-login to the server with port-forwarding option ``-L 8999:localhost:8999``::

$ ssh -i "xx.pem" ubuntu@xxx.com -L 8999:localhost:8999

Then simply run ``jupyter notebook --NotebookApp.token='' --no-browser --port=8999``::

$ jupyter notebook --NotebookApp.token='' --no-browser --port=8999
Expand All @@ -190,7 +189,7 @@ We encourage users to try the new NetCDF diagnostics, but you can still use the

Also, you could indeed download the output data and use old tools like IDL & MATLAB to analyze them, but we highly recommend the open-source Python/Jupyter/xarray ecosystem. It will vastly improve user experience and working efficiency, and also help open science and reproducible research.

Step 5: Shut down the server (Very important!!)
Step 5: Shut down the server (Very important!!)
-----------------------------------------------

Right-click on the instance in your console to get this menu:
Expand All @@ -199,10 +198,10 @@ Right-click on the instance in your console to get this menu:

There are two different ways to stop being charged:

- "Stop" will make the system inactive, so that you'll not be charged by the CPU time,
- "Stop" will make the system inactive, so that you'll not be charged by the CPU time,
and only be charged by the negligible disk storage fee. You can re-start the server at any time and all files will be preserved.
- "Terminate" will completely remove that virtual server so you won't be charged at all after that.
Unless you save your system as an AMI or transfer the data to other storage services,
Unless you save your system as an AMI or transfer the data to other storage services,
you will lose all your data and software.

You will learn how to save your data and configurations persistently in the next tutorials. You might also want to :doc:`simplify your ssh login command <../chapter06_appendix/ssh-config>`.
60 changes: 60 additions & 0 deletions doc/source/chapter03_advanced-tutorial/gchp.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
GEOS-Chem High-Performance version (GCHP) (experiential)
========================================================

We've successfully made GCHP running on the cloud. **It is functioning correctly** but there are several issues to be resolved:

- GCHP compiles with gfortran, but the overall performance is ~20% slower than with ifort. The major slow down comes from the new advection module (GFDL-FV3).
- The initial I/O takes a long time (does not affect long-term simulations).
- The data analysis pipeline is not fully documented. We do have some `preliminary scripts <http://ftp.as.harvard.edu/pub/exchange/elundgren/CSCI29/ipynb/>`_ to process and regrid cubed-sphere data, though.
- It is only set up to run in a single node, with at most 72 CPUs (c5.18xlarge).

Right now it is pretty good for learning and for small experiments. We will make a major update after the formal release of v11-02.

GCHP inside Singularity container
---------------------------------

We will be using :doc:`containers <./container>` to run GCHP. It allows you to set up GCHP quickly on almost **any machines**, not just on Amazon cloud. You can adapt this guide for your own server.

The Singularity image for GCHP can be obtained from `Singularity Hub <https://singularity-hub.org/collections/946/usage>`_, with the command::

$ singularity pull --name GCHP.simg shub://JiaweiZhuang/Singularity_GC

Launch server
-------------

Launch from the AMI ID ``ami-21f37a5e``. This AMI is just to provide sample input data and pre-configured run directory. The software libraries will be provided by Singularity container.

The minimum `hardware requirement <http://wiki.seas.harvard.edu/geos-chem/index.php/GCHP_Hardware_and_Software_Requirements>`_ is ``r4.2xlarge`` with 8 CPUs and 60 GB memory. The minimum number of MPI processes for GCHP is 6 (one for each cubed-sphere panel). You can still start a GCHP simulation on an instance with <6 CPUs, but the program is likely to die somewhere.

Test run
--------

After launching the instance and logging-in (username is ``ubuntu``), you should see::

$ ls
gcdata GCHP GCHP.simg miniconda singularity

Run the container interactively by::

$ singularity shell GCHP.simg

If you just execute the container by ``./GCHP.simg``, it will print some instructions.

Go to the run directory and execute the pre-compiled executable::

$ cd ~/GCHP/gchp_standard
$ mpirun -np 6 ./geos

Test compile
------------

To re-compile the model, for now you need to specify the code directory when starting the container::

$ SINGULARITYENV_GC_CODE_DIR=~/GCHP/Code.v11-02_gchp singularity shell GCHP.simg

Then re-compile the model in the run directory::

$ cd ~/GCHP/gchp_standard
$ make compile_clean

For more information please see `the official tutorial on GCHP wiki <http://wiki.seas.harvard.edu/geos-chem/index.php/Getting_Started_With_GCHP>`_.
3 changes: 2 additions & 1 deletion doc/source/chapter03_advanced-tutorial/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ This chapter provides advanced tutorials to improve your research workflow. Make
iam-role
advanced-awscli
container
hpc-overview
hpc-overview
gchp
8 changes: 6 additions & 2 deletions doc/source/chapter06_appendix/aws-resources-for-gc.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
List of public AWS resources for GEOS-Chem
List of public AWS resources for GEOS-Chem
==========================================

Currently all resources are in us-east-1 (N. Virginia).
Expand All @@ -7,7 +7,7 @@ Currently all resources are in us-east-1 (N. Virginia).
| Resource | ID/name | Size | Content |
+===================+========================+==========+==================================+
|| Tutorial AMI | ami-ab925cd6 | 70 GB | |
| | | | 1. gfortran 5.4.0, |
| | | | 1. gfortran 5.4.0, |
| | | | netCDF-Fortran 4.4.3 |
| | | | 2. GC environment variables |
| | | | 3. GC source code and Unit Tester|
Expand All @@ -18,6 +18,10 @@ Currently all resources are in us-east-1 (N. Virginia).
| | | | environment |
| | | | 7. Sample gcdata directory |
+-------------------+------------------------+----------+----------------------------------+
|| GCHP | ami-21f37a5e | 100 GB | 1. Pre-configured GCHP rundir |
|| experimental | | | 2. Sample GCHP input data |
|| AMI | | | 3. GCHP container environment |
+-------------------+------------------------+----------+----------------------------------+
|| S3 bucket for | s3://gcgrid | ~30 TB | All current GEOS-Chem input data |
|| all GC data | (requester-pay) | | |
+-------------------+------------------------+----------+----------------------------------+
6 changes: 3 additions & 3 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
GEOS-Chem on cloud computing platforms
======================================

`GEOSChem-on-cloud <http://acmg.seas.harvard.edu/research.html#cloud>`_ project aims to build a cloud computing capability for `GEOS-Chem <http://acmg.seas.harvard.edu/geos/>`_ that can be easily accessed by researchers worldwide.
`GEOSChem-on-cloud <http://acmg.seas.harvard.edu/research.html#cloud>`_ project aims to build a cloud computing capability for `GEOS-Chem <http://acmg.seas.harvard.edu/geos/>`_ that can be easily accessed by researchers worldwide.

See :ref:`motivation-label` for the motivation of this project. See :ref:`quick-start-label` to start your first GEOS-Chem simulation on the `Amazon Web Services (AWS) <https://aws.amazon.com/>`_ cloud within 10 minutes (and within seconds for the next time).

Expand All @@ -15,9 +15,9 @@ How to use this documentation

**For GEOS-Chem users**, this website contains everything you need in order to use GEOS-Chem on the cloud. You will be able to finish a complete research workflow, from model simulations to output data analysis and management. **If it is your first time trying GEOS-Chem, this project is perhaps your best starting point**, because :ref:`you don't need to do any initial setup <motivation-label>` and the model is guaranteed to work correctly (see :ref:`quick start guide <quick-start-label>`). Note that this website is not a user guide on the GEOS-Chem model itself. Please refer to our comprehensive `user guide <http://acmg.seas.harvard.edu/geos/doc/man/>`_ and `wiki <http://wiki.seas.harvard.edu/geos-chem/index.php/Main_Page>`_ for all details about GEOS-Chem. To run on the cloud, we only support versions newer than `v11-02a <http://wiki.seas.harvard.edu/geos-chem/index.php/GEOS-Chem_v11-02#v11-02a>`_ for `GNU-Fortran compatibility <http://wiki.seas.harvard.edu/geos-chem/index.php/GNU_Fortran_compiler>`_.

**For non-GEOS-Chem-users**, this documentation can be used as an introduction to AWS for scientific computing, especially for **Earth science model simulations**. Since all Earth science models are highly similar from a software perspective, it should be quite easy to adapt this guide for you specific use case. More than 90% of this website is about general AWS concepts and tutorials, which doesn't require GEOS-Chem-specific knowledge. Please get a feeling of cloud computing workflow by exploring :doc:`beginner tutorials <../chapter02_beginner-tutorial/index>` and then refer to the :doc:`developer guide <../chapter04_developer-guide/index>` to build your own model. Although cloud computing has a lot of potential in Earth science, it is still significantly under-utilized due to :doc:`the lack of accessible tutorials <./chapter01_overview/external-resources>` for Earth science researchers. This project tries to fill this gap.
**For non-GEOS-Chem-users**, this documentation can be used as an introduction to AWS for scientific computing, especially for **Earth science model simulations**. Since all Earth science models are highly similar from a software perspective, it should be quite easy to adapt this guide for you specific use case. More than 90% of this website is about general AWS concepts and tutorials, which doesn't require GEOS-Chem-specific knowledge. Please get a feeling of cloud computing workflow by exploring :doc:`beginner tutorials <./chapter02_beginner-tutorial/index>` and then refer to the :doc:`developer guide <./chapter04_developer-guide/index>` to build your own model. Although cloud computing has a lot of potential in Earth science, it is still significantly under-utilized due to :doc:`the lack of accessible tutorials <./chapter01_overview/external-resources>` for Earth science researchers. This project tries to fill this gap.

For general reference, GEOS-Chem is a `Chemical Transport Model <https://en.wikipedia.org/wiki/Chemical_transport_model>`_ for simulating atmospheric chemical compositions. It has been developed over 20 years and is used by `more than 100 research groups worldwide <http://acmg.seas.harvard.edu/geos/geos_people.html>`_. The program is mainly written in Fortran 90. `All model source code <https://bitbucket.org/account/user/gcst/projects/GEOS>`_ is `distributed freely under the MIT license <http://acmg.seas.harvard.edu/geos/geos_licensing.html>`_. Input and output data formats are mostly NetCDF, which can be analyzed easily by most languages such as Python, R and MATLAB. IDL (Interactive Data Language) has historically been the major data analysis tool but now we embrace open-source tools especially Python, `Jupyter <http://jupyter.org>`_ and `xarray <http://xarray.pydata.org>`_. The classic version of GEOS-Chem uses OpenMP parallelization (shared-memory, multi-threading). `The MPI version of GEOS-Chem <https://www.geosci-model-dev-discuss.net/gmd-2018-55/>`_ has also been developed and we are working on making it available on the cloud.
For general reference, GEOS-Chem is a `Chemical Transport Model <https://en.wikipedia.org/wiki/Chemical_transport_model>`_ for simulating atmospheric chemical compositions. It has been developed over 20 years and is used by `more than 100 research groups worldwide <http://acmg.seas.harvard.edu/geos/geos_people.html>`_. The program is mainly written in Fortran 90. `All model source code <https://bitbucket.org/account/user/gcst/projects/GEOS>`_ is `distributed freely under the MIT license <http://acmg.seas.harvard.edu/geos/geos_licensing.html>`_. Input and output data formats are mostly NetCDF, which can be analyzed easily by most languages such as Python, R and MATLAB. IDL (Interactive Data Language) has historically been the major data analysis tool but now we embrace open-source tools especially Python, `Jupyter <http://jupyter.org>`_ and `xarray <http://xarray.pydata.org>`_. The classic version of GEOS-Chem uses OpenMP parallelization (shared-memory, multi-threading). `The MPI version of GEOS-Chem <https://www.geosci-model-dev-discuss.net/gmd-2018-55/>`_ has also been developed and we have an :doc:`an experimental version that runs on the cloud<./chapter03_advanced-tutorial/gchp>`.


Table of Contents
Expand Down

0 comments on commit 1d888f3

Please sign in to comment.