diff --git a/docs/derived_fields.md b/docs/derived_fields.md index cd954b84..80d7efa9 100644 --- a/docs/derived_fields.md +++ b/docs/derived_fields.md @@ -2,8 +2,7 @@ !!! info - If you want to run the code below, consider using the demo data - as described [here](supported_datasets/tng.md#demo-data). + If you want to run the code below, consider downloading the [demo data](supported_datasets/tng.md#demo-data) or use the [TNGLab](supported_datasets/tng.md#tnglab) online. Commonly during analysis, newly derived quantities/fields are to be synthesized from one or more snapshot fields into a new field. For example, while the temperature, pressure, or entropy of gas is not stored directly in the snapshots, they can be computed from fields which are present on disk. @@ -14,14 +13,14 @@ There are two ways to create new derived fields. For quick analysis, we can simp ``` py from scida import load -ds = load("TNG50-4_snapshot") # (1)! +ds = load("./snapdir_030") # (1)! gas = ds.data['gas'] kineticenergy = 0.5*gas['Masses']*(gas['Velocities']**2).sum(axis=1) ``` -1. In this example, we assume a dataset, such as the 'TNG50\_snapshot' test data set, that has its fields (*Masses*, *Velocities*) nested by particle type (*gas*) +1. In this example, we assume a dataset, such as the [demo data set](supported_datasets/tng.md#demo-data), that has its fields (*Masses*, *Velocities*) nested by particle type (*gas*) -In the example above, we define a new dask array called kineticenergy. Note that just like all other dask arrays and dataset fields, these fields are "virtual", i.e. only the graph of their construction is held in memory, which can be instantiated by applying the *.compute()* method. +In the example above, we define a new dask array called "kineticenergy". Note that just like all other dask arrays and dataset fields, these fields are "virtual", i.e. only the graph of their construction is held in memory, which can be instantiated by applying the *.compute()* method. We can also add this field from above example to the existing ones in the dataset. @@ -41,7 +40,7 @@ For this purpose, **field recipes** are available. An example of such recipe is import numpy as np from scida import load -ds = load("TNG50-4_snapshot") +ds = load("./snapdir_030") @ds.register_field("stars") # (1)! def VelMag(arrs, **kwargs): @@ -109,7 +108,7 @@ def GroupDistance(arrs, snap=None): Finally, we just need to import the *fielddefs* object (if we have defined it in another file) and merge them with a dataset that we loaded: ``` py -ds = load("TNG50-4_snapshot") +ds = load("./snapdir_030") ds.data.merge(fielddefs) ``` diff --git a/docs/faq.md b/docs/faq.md index b408cdf1..e4375075 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -18,7 +18,7 @@ Please note that all fields within a container are expected to have the same sha ``` py from scida import load import dask.array as da -ds = load('TNG50-4_snapshot') +ds = load("./snapdir_030") array = da.zeros_like(ds.data["PartType0"]["Density"]) ds.data['PartType0']["zerofield"] = array ``` @@ -27,7 +27,7 @@ As we operate with dask, make sure to cast your array accordingly. For example, Alternatively, if you have another dataset loaded, you can assign fields from one to another: ``` py -ds2 = load('TNG50-4_snapshot') +ds2 = load("./snapdir_030") ds.data['PartType0']["NewDensity"] = ds2.data['PartType0']["Density"] ``` diff --git a/docs/halocatalogs.md b/docs/halocatalogs.md index 8940e995..c80157ae 100644 --- a/docs/halocatalogs.md +++ b/docs/halocatalogs.md @@ -5,18 +5,17 @@ Cosmological simulations are often post-processed with a substructure identifica !!! info - If you want to run the code below, consider using the demo data - as described [here](supported_datasets/tng.md#demo-data). + If you want to run the code below, consider downloading the [demo data](supported_datasets/tng.md#demo-data) or use the [TNGLab](supported_datasets/tng.md#tnglab) online. ## Adding and using halo/galaxy catalog information Currently, we support the usual FOF/Subfind combination and format. Their presence will be automatically detected and the catalogs will be loaded into *ds.data* as shown below. ``` py from scida import load -ds = load("TNG50-4_snapshot") # (1)! +ds = load("./snapdir_030") # (1)! ``` -1. In this example, we assume a dataset, such as the 'TNG50\_snapshot' test data set, that has its fields (*Masses*, *Velocities*) nested by particle type (*gas*) +1. In this example, we assume a dataset, such as the [demo data set](supported_datasets/tng.md#demo-data), that has its fields (*Masses*, *Velocities*) nested by particle type (*gas*) The dataset itself passed to load does not possess information about the FoF/Subfind outputs as they are commonly saved in a separate folder or hdf5 file. For typical folder structures of GADGET/AREPO style simulations, an attempt is made to automatically discover and add such information. The path to the catalog can otherwise explicitly be passed to *load()* via the *catalog=...* keyword. diff --git a/docs/largedatasets.md b/docs/largedatasets.md index 0664b3a6..5911056b 100644 --- a/docs/largedatasets.md +++ b/docs/largedatasets.md @@ -3,7 +3,8 @@ !!! info - If you want to run the code below, you need access to the full [TNG](https://www.tng-project.org) simulation dataset. + If you want to run the code below, you need access to or download the full [TNG](https://www.tng-project.org) simulation dataset. + The easiest way to access all TNG data sets is to use the [TNGLab](https://www.tng-project.org/data/lab/), which supports [scida](https://www.tng-project.org/data/forum/topic/742/scida-analysis-toolkit-example-within-tng-lab/). Until now, we have applied our framework to a very small simulation. However, what if we are working with a very large data set @@ -22,7 +23,8 @@ the `mass.sum().compute()` will chunk the operation up in a way that the task ca ```pycon >>> from scida import load ->>> ds = load("TNG50_snapshot") +>>> sim = load("TNG50-1") +>>> ds = sim.get_dataset(99) ``` Before we start, let's enable a progress indicator from dask @@ -33,7 +35,7 @@ Before we start, let's enable a progress indicator from dask >>> ProgressBar().register() ``` -Let's benchmark this operation on our location machine. +Let's benchmark this operation on our local machine. ```pycon >>> %time ds.data["PartType0"]["Masses"].sum().compute() @@ -108,7 +110,8 @@ We configure the job and node resources before submitting the job via the `scale >>> client = Client(cluster) >>> from scida import load ->>> ds = load("TNG50_snapshot") +>>> sim = load("TNG50-1") +>>> ds = sim.get_dataset(99) >>> %time ds.data["PartType0"]["Masses"].sum().compute() CPU times: user 1.27 s, sys: 152 ms, total: 1.43 s Wall time: 21.4 s diff --git a/docs/series.md b/docs/series.md index 9b4facd6..18640235 100644 --- a/docs/series.md +++ b/docs/series.md @@ -2,8 +2,12 @@ !!! info - If you want to run the code below, you will need to have an AREPO simulation available. - Specify the path in load() to the base directory of the simulation, which contains the "output" sub directory. + + If you want to run the code below, you need a folder containing multiple scida datasets as subfolders. + Specify the path in load() to the base directory of the series. + The example below uses an AREPO simulation, the TNG50-4 simulation, as a series of snapshots. + This simulation can be downloaded from the [TNG website](https://www.tng-project.org/data/) + or directly accessed online in the [TNGLab](https://www.tng-project.org/data/lab/). In the tutorial section, we have only considered individual data sets. Often data sets are given in a series (e.g. multiple snapshots of a simulation, multiple exposures in a survey). @@ -11,7 +15,7 @@ Loading this as a series provides convenient access to all contained objects. ``` pycon >>> from scida import load ->>> series = load("TNGvariation_simulation") #(1)! +>>> series = load("TNG50-4") #(1)! ``` 1. Pass the base path of the simulation. diff --git a/docs/supported_data.md b/docs/supported_data.md index d3b518ab..a6dea525 100644 --- a/docs/supported_data.md +++ b/docs/supported_data.md @@ -3,19 +3,19 @@ The following table shows a selection of supported datasets. The table is not exhaustive, but should give an idea of the range of supported datasets. If you want to use a dataset that is not listed here, read on [here](dataset_structure.md) and consider opening an issue or contact us directly. -| Name | Support | Description | -|-------------------------------------------------------|---------------------------------------|-----------------------------------------------------------------------------------------------------------------| -| [AURIGA](https://wwwmpa.mpa-garching.mpg.de/auriga/) | :material-check-all: | Cosmological zoom-in galaxy formation *simulations* | -| [EAGLE](https://icc.dur.ac.uk/Eagle/) | :material-check-all: | Cosmological galaxy formation *simulations* | -| [FIRE2](https://wetzel.ucdavis.edu/fire-simulations/) | :material-check-all: | Cosmological zoom-in galaxy formation *simulations* | -| [FLAMINGO](https://flamingo.strw.leidenuniv.nl/) | :material-check-all: | Cosmological galaxy formation *simulations* | -| [Gaia](https://www.cosmos.esa.int/web/gaia/dr3) | :material-database-check-outline:[^1] | *Observations* of a billion nearby stars | -| [Illustris](https://www.illustris-project.org/) | :material-check-all: | Cosmological galaxy formation *simulations* | -| [LGalaxies](customs/lgalaxies.md) | :material-check-all: | Semi-analytical model for [Millenium](https://wwwmpa.mpa-garching.mpg.de/galform/virgo/millennium/) simulations | -| [SDSS DR16](https://www.sdss.org/dr16/) | :material-check: | *Observations* for millions of galaxies | -| [SIMBA](http://simba.roe.ac.uk/) | :material-check-all: | Cosmological galaxy formation *simulations* | -| [TNG](./supported_datasets/tng.md) | :material-check-all: | Cosmological galaxy formation *simulations* | -| [TNG-Cluster](https://www.tng-project.org/cluster/) | :material-check-all: | Cosmological zoom-in galaxy formation *simulations* | +| Name | Support | Description | +|-------------------------------------------------------|---------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------| +| [AURIGA](https://wwwmpa.mpa-garching.mpg.de/auriga/) | :material-check-all: | Cosmological zoom-in galaxy formation *simulations* | +| [EAGLE](https://icc.dur.ac.uk/Eagle/) | :material-check-all: | Cosmological galaxy formation *simulations* | +| [FIRE2](https://wetzel.ucdavis.edu/fire-simulations/) | :material-check-all: | Cosmological zoom-in galaxy formation *simulations* | +| [FLAMINGO](https://flamingo.strw.leidenuniv.nl/) | :material-check-all: | Cosmological galaxy formation *simulations* | +| [Gaia](https://www.cosmos.esa.int/web/gaia/dr3) | :material-database-check-outline:[\[download\]](https://www.tng-project.org/data/obs/) | *Observations* of a billion nearby stars | +| [Illustris](https://www.illustris-project.org/) | :material-check-all: | Cosmological galaxy formation *simulations* | +| [LGalaxies](customs/lgalaxies.md) | :material-check-all: | Semi-analytical model for [Millenium](https://wwwmpa.mpa-garching.mpg.de/galform/virgo/millennium/) simulations | +| [SDSS DR16](https://www.sdss.org/dr16/) | :material-check: | *Observations* for millions of galaxies | +| [SIMBA](http://simba.roe.ac.uk/) | :material-check-all: | Cosmological galaxy formation *simulations* | +| [TNG](./supported_datasets/tng.md) | :material-check-all: | Cosmological galaxy formation *simulations* | +| [TNG-Cluster](https://www.tng-project.org/cluster/) | :material-check-all: | Cosmological zoom-in galaxy formation *simulations* | @@ -28,5 +28,3 @@ A :material-database-check-outline: checkmark indicates support for converted HD As of now, two underlying file formats are supported: hdf5 and zarr. Multi-file hdf5 is supported, for which a directory is passed as *path*, which contains only hdf5 files of the pattern *prefix.XXX.hdf5*, where *prefix* will be determined automatically and *XXX* is a contiguous list of integers indicating the order of hdf5 files to be merged. Hdf5 files are expected to have the same structure and all fields, i.e. hdf5 datasets, will be concatenated along their first axis. Support for FITS is work-in-progress, also see [here](tutorial/observations.md#fits-files) for a proof-of-concept. - -[^1]: The HDF5 version of GAIA DR3 is available [here](https://www.tng-project.org/data/obs/). diff --git a/docs/supported_datasets/tng.md b/docs/supported_datasets/tng.md index 7db75e21..85b801bc 100644 --- a/docs/supported_datasets/tng.md +++ b/docs/supported_datasets/tng.md @@ -10,7 +10,7 @@ available at [www.tng-project.org](https://www.tng-project.org/). Many of the examples in this documentation use the TNG50-4 simulation. In particular, we make a snapshot and group catalog available to run these examples. You can download and extract the snapshot and its group -catalog from the TNG50-4 test data: +catalog from the TNG50-4 test data using the following commands: ``` bash wget https://heibox.uni-heidelberg.de/f/dc65a8c75220477eb62d/?dl=1 -O snapshot.tar.gz @@ -19,6 +19,31 @@ wget https://heibox.uni-heidelberg.de/f/ff27fb6975fb4dc391ef/?dl=1 -O catalog.ta tar -xvf catalog.tar.gz ``` +These files are exactly [the same files](https://www.tng-project.org/api/TNG50-4/files/snapshot-30/) +that can be downloaded from the official IllustrisTNG data release. + The snapshot and group catalog should be placed in the same folder. -Then you can load the snapshot with `ds = load("./snapdir_030")`. The group catalog should automatically be detected, +Then you can load the snapshot with `ds = load("./snapdir_030")`. +If you are executing the code from a different folder, you need to adjust the path accordingly. +The group catalog should automatically be detected when available in the same parent folder as the snapshot, otherwise you can also pass the path to the catalog via the `catalog` keyword to `load()`. + +## TNGLab + +The [TNGLab](https://www.tng-project.org/data/lab/) is a web-based analysis platform running a [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) instance with access to dedicated computational resources and all TNG data sets to provide +a convenient way to run analysis code on the TNG data sets. As TNGLab supports scida, it is a great way to get started and for running the examples. + +In order to run the examples which use the [demo data](#demo-data), replace + +``` py +ds = load("./snapdir_030") +``` + +with + +``` py +sim = load("TNG50-4") +ds = sim.get_dataset(30) +``` + +for these examples. diff --git a/docs/tutorial/observations.md b/docs/tutorial/observations.md index 729843d5..ee76f292 100644 --- a/docs/tutorial/observations.md +++ b/docs/tutorial/observations.md @@ -3,7 +3,7 @@ This package is designed to aid in the efficient analysis of large datasets, such as GAIA DR3. !!! info "Tutorial dataset" - In the following, we will subset from the [GAIA data release 3](https://www.cosmos.esa.int/web/gaia/dr3). The reduced dataset contains 100000 randomly selected entries only. The reduced dataset can be downloaded [here](https://heibox.uni-heidelberg.de/f/3b05069b1b524c0fa57e/?dl=1). + In the following, we will subset from the [GAIA data release 3](https://www.cosmos.esa.int/web/gaia/dr3). The reduced dataset contains 100000 randomly selected entries only. The reduced dataset can be downloaded [here](https://www.tng-project.org/files/obs/GAIA/gaia_dr3_mini.hdf5). Check [Supported Datasets](../supported_data.md) for an incomplete list of supported datasets and requirements for support of new datasets. A tutorial for a cosmological simulation can be found [here](simulations.md). @@ -17,7 +17,9 @@ It uses the [dask](https://dask.org/) library to perform computations, which has ## Loading an individual dataset -Here, we choose the [GAIA data release 3](https://www.cosmos.esa.int/web/gaia/dr3) as an example. +Here we use the [GAIA data release 3](https://www.cosmos.esa.int/web/gaia/dr3) as an example. +In particular, we support the [single HDF5 version of DR3](https://www.tng-project.org/data/obs/). + The dataset is obtained in HDF5 format as used at ITA Heidelberg. We intentionally select a small subset of the data to work with. Choosing a subset means that the data size is small and easy to work with. We demonstrate how to work with larger data sets at a later stage. diff --git a/docs/tutorial/simulations.md b/docs/tutorial/simulations.md index 9e1b84be..447b73bf 100644 --- a/docs/tutorial/simulations.md +++ b/docs/tutorial/simulations.md @@ -42,7 +42,7 @@ First, we load the dataset using the convenience function `load()` that will det ```pycon title="Loading a dataset" >>> from scida import load ->>> ds = load("snapdir_030") +>>> ds = load("./snapdir_030") >>> ds.info() #(1)! class: ArepoSnapshotWithUnitMixinAndCosmologyMixin source: /vera/u/byrohlc/Downloads/snapdir_030 diff --git a/docs/units.md b/docs/units.md index 4ab05c2f..1b53c41c 100644 --- a/docs/units.md +++ b/docs/units.md @@ -2,9 +2,7 @@ !!! info - If you want to run the code below, consider using the demo data - as described [here](supported_datasets/tng.md#demo-data). - + If you want to run the code below, consider downloading the [demo data](supported_datasets/tng.md#demo-data) or use the [TNGLab](supported_datasets/tng.md#tnglab) online. ## Loading data with units @@ -12,7 +10,7 @@ Loading data sets with ``` py from scida import load -ds = load("TNG50-4_snapshot") +ds = load("./snapdir_030") ``` will automatically attach units to the data. This can be deactivated by passing "units=False" to the load function. diff --git a/docs/visualization.md b/docs/visualization.md index 2008f0c7..1598b12c 100644 --- a/docs/visualization.md +++ b/docs/visualization.md @@ -2,8 +2,7 @@ !!! info - If you want to run the code below, consider using the demo data - as described [here](supported_datasets/tng.md#demo-data). + If you want to run the code below, consider downloading the [demo data](supported_datasets/tng.md#demo-data) or use the [TNGLab](supported_datasets/tng.md#tnglab) online. ## Creating plots @@ -15,7 +14,7 @@ For example, we can select a subset of particles by applying a cut on a given fi from scida import load import matplotlib.pyplot as plt -ds = load("TNG50-4_snapshot") +ds = load("./snapdir_030") dens = ds.data["PartType0"]["Density"][:10000].compute() # (1)! temp = ds.data["PartType0"]["Temperature"][:10000].compute() plt.plot(dens, temp, "o", markersize=0.1) @@ -35,8 +34,7 @@ from scida import load import matplotlib.pyplot as plt from matplotlib.colors import LogNorm -sim = load("TNG50-4") -ds = sim.get_dataset(redshift=3.0) +ds = load("./snapdir_030") dens10 = da.log10(ds.data["PartType0"]["Density"].to("Msun/kpc^3").magnitude) temp10 = da.log10(ds.data["PartType0"]["Temperature"].to("K").magnitude) @@ -69,8 +67,7 @@ import holoviews.operation.datashader as hd import datashader as dshdr from scida import load -sim = load("TNG50-4") -ds = sim.get_dataset(redshift=3.0) +ds = load("./snapdir_030") ddf = ds.data["PartType0"].get_dataframe(["Coordinates0", "Coordinates1", "Masses"]) # (1)! hv.extension("bokeh") diff --git a/mkdocs.yml b/mkdocs.yml index f2d9e68a..b510d77b 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -24,8 +24,7 @@ nav: - 'Data series': series.md - 'Large datasets': largedatasets.md - 'Advanced Topics': - - 'Arepo Simulations': - - 'Halo Catalogs': halocatalogs.md + - 'Halo/Galaxy Catalogs': halocatalogs.md - 'Configuration': configuration.md - 'FAQ': faq.md - 'API':