- Xarray -- Working with labeled, multi-dimensional datasets like NetCDF
- Geopandas -- Pandas-based geospatial analysis
- Pandas -- Analysis in data frames
- Zarr -- Alternative, cloud-optimized storage format to HDF5
- Dask -- Distributed computing in Python. Works natively with Pandas, Xarray, and others; also supports custom computing.
- Work through the conda tutorial
- Read through Managing conda environments and play around with creating your own environments. Pay particular attention to creating environments with YAML definition files — they are really useful to being able to quickly get up and running different computing systems.
- Once you have conda running, also install mamba and have a look at its documentation. It’s a drop-in replacement for conda that is >10x faster for many operations, so I would highly recommend it.
NASA data usually require an extra level of authentication via EarthData Login to actually download data. If you already have an authorization API token, you can use code like the following:
import urllib.request
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor())
request = urllib.request.Request(url)
token = "<your token here...>" # E.g., "dHJpbmtld..."
request.add_header("Authorization", f"Bearer {token}")
with opener.open(request) as response, open("/path/to/target/file", "wb") as f:
f.write(response.read())
TODO: How to retrieve the token using a netrc file or username/password.