From 177dfe65d9609080cd5c9e674593a67842b949cb Mon Sep 17 00:00:00 2001 From: Chris Byrohl <9221545+cbyrohl@users.noreply.github.com> Date: Thu, 1 Feb 2024 13:37:22 +0100 Subject: [PATCH] fix #147 --- docs/customs/lgalaxies.md | 10 --- docs/dataset_structure.md | 34 --------- docs/derived_fields.md | 4 +- docs/halocatalogs.md | 4 +- docs/impressions.md | 2 + docs/supported_data.md | 129 ++++++++++++++++++++++++++++----- docs/supported_datasets/tng.md | 57 --------------- docs/units.md | 2 +- docs/userguide.md | 2 - docs/visualization.md | 2 +- mkdocs.yml | 1 + 11 files changed, 121 insertions(+), 126 deletions(-) delete mode 100644 docs/customs/lgalaxies.md delete mode 100644 docs/dataset_structure.md delete mode 100644 docs/supported_datasets/tng.md delete mode 100644 docs/userguide.md diff --git a/docs/customs/lgalaxies.md b/docs/customs/lgalaxies.md deleted file mode 100644 index 06cb9fde..00000000 --- a/docs/customs/lgalaxies.md +++ /dev/null @@ -1,10 +0,0 @@ -# LGalaxies - -Access via individual datasets are supported, e.g.: - -```pycon ->>> from scida import load ->>> load("LGal_Ayromlou2021_snap58.hdf5") -``` - -while access to the series at once (i.e. loading all data for all snapshots in a folder) is **not supported**. diff --git a/docs/dataset_structure.md b/docs/dataset_structure.md deleted file mode 100644 index 2f99fe27..00000000 --- a/docs/dataset_structure.md +++ /dev/null @@ -1,34 +0,0 @@ -# Dataset structure - -Here, we discuss the requirements for easy extension/support of new datasets. - -## Supported file formats - -Currently, input files need to have one of the following formats: - -* [hdf5](https://www.hdfgroup.org/solutions/hdf5/) -* multi-file hdf5 -* [zarr](https://zarr.readthedocs.io/en/stable/) - -## Supported file structures - -Just like this package, above file formats use a hierarchical structure to store data with three fundamental objects: - -* **Groups** are containers for other groups or datasets. -* **Datasets** are multidimensional arrays of a homogeneous type, usually bundled into some *Group*. -* **Attributes** provide various metadata. - -## Supported data structures - -At this point, we only support unstructured datasets, i.e. datasets that do not depend on the memory layout for their -interpretation. For example, this implies that simulation codes utilizing uniform or adaptive grids are not supported. - - -## Examples of supported simulation codes - -We explicitly support simulations run with the following codes: - -* [Gadget](https://wwwmpa.mpa-garching.mpg.de/gadget4/) -* [Gizmo](http://www.tapir.caltech.edu/~phopkins/Site/GIZMO.html) -* [Arepo](https://arepo-code.org/) -* [Swift](https://swift.strw.leidenuniv.nl/) diff --git a/docs/derived_fields.md b/docs/derived_fields.md index 80d7efa9..ad514398 100644 --- a/docs/derived_fields.md +++ b/docs/derived_fields.md @@ -2,7 +2,7 @@ !!! info - If you want to run the code below, consider downloading the [demo data](supported_datasets/tng.md#demo-data) or use the [TNGLab](supported_datasets/tng.md#tnglab) online. + If you want to run the code below, consider downloading the [demo data](supported_data.md#demo-data) or use the [TNGLab](supported_data.md#tnglab) online. Commonly during analysis, newly derived quantities/fields are to be synthesized from one or more snapshot fields into a new field. For example, while the temperature, pressure, or entropy of gas is not stored directly in the snapshots, they can be computed from fields which are present on disk. @@ -18,7 +18,7 @@ gas = ds.data['gas'] kineticenergy = 0.5*gas['Masses']*(gas['Velocities']**2).sum(axis=1) ``` -1. In this example, we assume a dataset, such as the [demo data set](supported_datasets/tng.md#demo-data), that has its fields (*Masses*, *Velocities*) nested by particle type (*gas*) +1. In this example, we assume a dataset, such as the [demo data set](supported_data.md#demo-data), that has its fields (*Masses*, *Velocities*) nested by particle type (*gas*) In the example above, we define a new dask array called "kineticenergy". Note that just like all other dask arrays and dataset fields, these fields are "virtual", i.e. only the graph of their construction is held in memory, which can be instantiated by applying the *.compute()* method. diff --git a/docs/halocatalogs.md b/docs/halocatalogs.md index c80157ae..c3a3574b 100644 --- a/docs/halocatalogs.md +++ b/docs/halocatalogs.md @@ -5,7 +5,7 @@ Cosmological simulations are often post-processed with a substructure identifica !!! info - If you want to run the code below, consider downloading the [demo data](supported_datasets/tng.md#demo-data) or use the [TNGLab](supported_datasets/tng.md#tnglab) online. + If you want to run the code below, consider downloading the [demo data](supported_data.md#demo-data) or use the [TNGLab](supported_data.md#tnglab) online. ## Adding and using halo/galaxy catalog information Currently, we support the usual FOF/Subfind combination and format. Their presence will be automatically detected and the catalogs will be loaded into *ds.data* as shown below. @@ -15,7 +15,7 @@ from scida import load ds = load("./snapdir_030") # (1)! ``` -1. In this example, we assume a dataset, such as the [demo data set](supported_datasets/tng.md#demo-data), that has its fields (*Masses*, *Velocities*) nested by particle type (*gas*) +1. In this example, we assume a dataset, such as the [demo data set](supported_data.md#demo-data), that has its fields (*Masses*, *Velocities*) nested by particle type (*gas*) The dataset itself passed to load does not possess information about the FoF/Subfind outputs as they are commonly saved in a separate folder or hdf5 file. For typical folder structures of GADGET/AREPO style simulations, an attempt is made to automatically discover and add such information. The path to the catalog can otherwise explicitly be passed to *load()* via the *catalog=...* keyword. diff --git a/docs/impressions.md b/docs/impressions.md index be13c12c..d0655ad0 100644 --- a/docs/impressions.md +++ b/docs/impressions.md @@ -1,5 +1,7 @@ # Visual impressions using scida +The following images were created using scida with matplotlib. + ![Projection from the SIMBA simulation at redshift 2.](images/projection_SIMBA_Temperature.jpg){: style="height:300px"} ![Projection from the FLAMINGO simulation at redshift 2.](images/projection_FLAMINGO_Density.jpg){: style="height:300px"} ![Projection from the TNG100 simulation at redshift 2.](images/projection_TNG100_GFM_Metallicity.jpg){: style="height:300px"} diff --git a/docs/supported_data.md b/docs/supported_data.md index a6dea525..b5716627 100644 --- a/docs/supported_data.md +++ b/docs/supported_data.md @@ -1,30 +1,125 @@ # Supported datasets The following table shows a selection of supported datasets. The table is not exhaustive, but should give an idea of the range of supported datasets. -If you want to use a dataset that is not listed here, read on [here](dataset_structure.md) and consider opening an issue or contact us directly. - -| Name | Support | Description | -|-------------------------------------------------------|---------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------| -| [AURIGA](https://wwwmpa.mpa-garching.mpg.de/auriga/) | :material-check-all: | Cosmological zoom-in galaxy formation *simulations* | -| [EAGLE](https://icc.dur.ac.uk/Eagle/) | :material-check-all: | Cosmological galaxy formation *simulations* | -| [FIRE2](https://wetzel.ucdavis.edu/fire-simulations/) | :material-check-all: | Cosmological zoom-in galaxy formation *simulations* | -| [FLAMINGO](https://flamingo.strw.leidenuniv.nl/) | :material-check-all: | Cosmological galaxy formation *simulations* | -| [Gaia](https://www.cosmos.esa.int/web/gaia/dr3) | :material-database-check-outline:[\[download\]](https://www.tng-project.org/data/obs/) | *Observations* of a billion nearby stars | -| [Illustris](https://www.illustris-project.org/) | :material-check-all: | Cosmological galaxy formation *simulations* | -| [LGalaxies](customs/lgalaxies.md) | :material-check-all: | Semi-analytical model for [Millenium](https://wwwmpa.mpa-garching.mpg.de/galform/virgo/millennium/) simulations | -| [SDSS DR16](https://www.sdss.org/dr16/) | :material-check: | *Observations* for millions of galaxies | -| [SIMBA](http://simba.roe.ac.uk/) | :material-check-all: | Cosmological galaxy formation *simulations* | -| [TNG](./supported_datasets/tng.md) | :material-check-all: | Cosmological galaxy formation *simulations* | -| [TNG-Cluster](https://www.tng-project.org/cluster/) | :material-check-all: | Cosmological zoom-in galaxy formation *simulations* | +If you want to use a dataset that is not listed here, read on [here](#supported-file-formats-and-their-structure) and consider opening an issue or contact us directly. +| Name | Support | Description | +|---------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------| +| [AURIGA](https://wwwmpa.mpa-garching.mpg.de/auriga/) | :material-check-all: | Cosmological zoom-in galaxy formation *simulations* | +| [EAGLE](https://icc.dur.ac.uk/Eagle/) | :material-check-all: | Cosmological galaxy formation *simulations* | +| [FIRE2](https://wetzel.ucdavis.edu/fire-simulations/) | :material-check-all: | Cosmological zoom-in galaxy formation *simulations* | +| [FLAMINGO](https://flamingo.strw.leidenuniv.nl/) | :material-check-all: | Cosmological galaxy formation *simulations* | +| [Gaia](https://www.cosmos.esa.int/web/gaia/dr3) | :material-database-check-outline:[\[download\]](https://www.tng-project.org/data/obs/) | *Observations* of a billion nearby stars | +| [Illustris](https://www.illustris-project.org/) | :material-check-all: | Cosmological galaxy formation *simulations* | +| [LGalaxies](https://lgalaxiespublicrelease.github.io/) [\[1\]](#lgalaxies) | :material-check-all: | Semi-analytical model for [Millenium](https://wwwmpa.mpa-garching.mpg.de/galform/virgo/millennium/) simulations | +| [SDSS DR16](https://www.sdss.org/dr16/) | :material-check: | *Observations* for millions of galaxies | +| [SIMBA](http://simba.roe.ac.uk/) | :material-check-all: | Cosmological galaxy formation *simulations* | +| [TNG](https://www.tng-project.org/)[\[2\]](#the-tng-simulation-suite) | :material-check-all: | Cosmological galaxy formation *simulations* | +| [TNG-Cluster](https://www.tng-project.org/cluster/) | :material-check-all: | Cosmological zoom-in galaxy formation *simulations* | A :material-check-all: checkmark indicates support out-of-the-box, a :material-check: checkmark indicates work-in-progress support or the need to create a suitable configuration file. A :material-database-check-outline: checkmark indicates support for converted HDF5 versions of the original data. +## Dataset Details + +### LGalaxies + +Access via individual datasets are supported, e.g.: + +```pycon +>>> from scida import load +>>> load("LGal_Ayromlou2021_snap58.hdf5") +``` + +while access to the series at once (i.e. loading all data for all snapshots in a folder) is **not supported**. + + +### The TNG Simulation Suite + +#### Overview +The IllustrisTNG project is a series of large-scale cosmological +magnetohydrodynamical simulations of galaxy formation. The data is +available at [www.tng-project.org](https://www.tng-project.org/). + +#### Demo data + +Many of the examples in this documentation use the TNG50-4 simulation. +In particular, we make a snapshot and group catalog available to run +these examples. You can download and extract the snapshot and its group +catalog from the TNG50-4 test data using the following commands: + +``` bash +wget https://heibox.uni-heidelberg.de/f/dc65a8c75220477eb62d/?dl=1 -O snapshot.tar.gz +tar -xvf snapshot.tar.gz +wget https://heibox.uni-heidelberg.de/f/ff27fb6975fb4dc391ef/?dl=1 -O catalog.tar.gz +tar -xvf catalog.tar.gz +``` + +These files are exactly [the same files](https://www.tng-project.org/api/TNG50-4/files/snapshot-30/) +that can be downloaded from the official IllustrisTNG data release. + +The snapshot and group catalog should be placed in the same folder. +Then you can load the snapshot with `ds = load("./snapdir_030")`. +If you are executing the code from a different folder, you need to adjust the path accordingly. +The group catalog should automatically be detected when available in the same parent folder as the snapshot, +otherwise you can also pass the path to the catalog via the `catalog` keyword to `load()`. + +#### TNGLab + +The [TNGLab](https://www.tng-project.org/data/lab/) is a web-based analysis platform running a [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) instance with access to dedicated computational resources and all TNG data sets to provide +a convenient way to run analysis code on the TNG data sets. As TNGLab supports scida, it is a great way to get started and for running the examples. + +In order to run the examples which use the [demo data](#demo-data), replace + +``` py +ds = load("./snapdir_030") +``` -# File-format requirements +with -As of now, two underlying file formats are supported: hdf5 and zarr. Multi-file hdf5 is supported, for which a directory is passed as *path*, which contains only hdf5 files of the pattern *prefix.XXX.hdf5*, where *prefix* will be determined automatically and *XXX* is a contiguous list of integers indicating the order of hdf5 files to be merged. Hdf5 files are expected to have the same structure and all fields, i.e. hdf5 datasets, will be concatenated along their first axis. +``` py +ds = load("/home/tnguser/sims.TNG/TNG50-4/output/snapdir_030") +``` + +for these examples. + +Alternatively, you can use + +``` py +sim = load("TNG50-4") +ds = sim.get_dataset(30) +``` + +where "TNG50-4" is a pre-defined shortcut to the TNG50-4 simulation path on TNGLab. After having loaded the simulation, we request the snapshot "30" as used in the demo data. Custom shortcuts can be defined in the [simulation configuration](configuration.md#simulation-configuration). + + + + +## Supported file formats and their structure + +Here, we discuss the requirements for easy extension/support of new datasets. +Currently, input files need to have one of the following formats: + +* [hdf5](https://www.hdfgroup.org/solutions/hdf5/) +* multi-file hdf5: We assume a directory containing hdf5 files of the pattern *prefix.XXX.hdf5*, where *prefix* will be determined automatically and *XXX* is a contiguous list of integers indicating the order of hdf5 files to be merged. Hdf5 files are expected to have the same structure and all fields, i.e. hdf5 datasets, will be concatenated along their first axis. +* [zarr](https://zarr.readthedocs.io/en/stable/) Support for FITS is work-in-progress, also see [here](tutorial/observations.md#fits-files) for a proof-of-concept. + + +Scida and above file formats use a hierarchical structure to store data with three fundamental objects: + +* **Groups** are containers for other groups or datasets. +* **Datasets** are multidimensional arrays of a homogeneous type, usually bundled into some *Group*. +* **Attributes** provide various metadata. + +At this point, we only support unstructured datasets, i.e. datasets that do not depend on the memory layout for their +interpretation. For example, this implies that simulation codes utilizing uniform or adaptive grids are not supported. + +We explicitly support simulations run with the following codes: + +* [Gadget](https://wwwmpa.mpa-garching.mpg.de/gadget4/) +* [Gizmo](http://www.tapir.caltech.edu/~phopkins/Site/GIZMO.html) +* [Arepo](https://arepo-code.org/) +* [Swift](https://swift.strw.leidenuniv.nl/) diff --git a/docs/supported_datasets/tng.md b/docs/supported_datasets/tng.md deleted file mode 100644 index a09856f5..00000000 --- a/docs/supported_datasets/tng.md +++ /dev/null @@ -1,57 +0,0 @@ -# The TNG Simulation Suite - -## Overview -The IllustrisTNG project is a series of large-scale cosmological -magnetohydrodynamical simulations of galaxy formation. The data is -available at [www.tng-project.org](https://www.tng-project.org/). - -## Demo data - -Many of the examples in this documentation use the TNG50-4 simulation. -In particular, we make a snapshot and group catalog available to run -these examples. You can download and extract the snapshot and its group -catalog from the TNG50-4 test data using the following commands: - -``` bash -wget https://heibox.uni-heidelberg.de/f/dc65a8c75220477eb62d/?dl=1 -O snapshot.tar.gz -tar -xvf snapshot.tar.gz -wget https://heibox.uni-heidelberg.de/f/ff27fb6975fb4dc391ef/?dl=1 -O catalog.tar.gz -tar -xvf catalog.tar.gz -``` - -These files are exactly [the same files](https://www.tng-project.org/api/TNG50-4/files/snapshot-30/) -that can be downloaded from the official IllustrisTNG data release. - -The snapshot and group catalog should be placed in the same folder. -Then you can load the snapshot with `ds = load("./snapdir_030")`. -If you are executing the code from a different folder, you need to adjust the path accordingly. -The group catalog should automatically be detected when available in the same parent folder as the snapshot, -otherwise you can also pass the path to the catalog via the `catalog` keyword to `load()`. - -## TNGLab - -The [TNGLab](https://www.tng-project.org/data/lab/) is a web-based analysis platform running a [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) instance with access to dedicated computational resources and all TNG data sets to provide -a convenient way to run analysis code on the TNG data sets. As TNGLab supports scida, it is a great way to get started and for running the examples. - -In order to run the examples which use the [demo data](#demo-data), replace - -``` py -ds = load("./snapdir_030") -``` - -with - -``` py -ds = load("/home/tnguser/sims.TNG/TNG50-4/output/snapdir_030") -``` - -for these examples. - -Alternatively, you can use - -``` py -sim = load("TNG50-4") -ds = sim.get_dataset(30) -``` - -where "TNG50-4" is a pre-defined shortcut to the TNG50-4 simulation path on TNGLab. After having loaded the simulation, we request the snapshot "30" as used in the demo data. Custom shortcuts can be defined in the [simulation configuration](../configuration.md#simulation-configuration). diff --git a/docs/units.md b/docs/units.md index 1b53c41c..34c0d7c8 100644 --- a/docs/units.md +++ b/docs/units.md @@ -2,7 +2,7 @@ !!! info - If you want to run the code below, consider downloading the [demo data](supported_datasets/tng.md#demo-data) or use the [TNGLab](supported_datasets/tng.md#tnglab) online. + If you want to run the code below, consider downloading the [demo data](supported_data.md#demo-data) or use the [TNGLab](supported_data.md#tnglab) online. ## Loading data with units diff --git a/docs/userguide.md b/docs/userguide.md deleted file mode 100644 index 6921ceac..00000000 --- a/docs/userguide.md +++ /dev/null @@ -1,2 +0,0 @@ - -TODO diff --git a/docs/visualization.md b/docs/visualization.md index 1598b12c..873de8d4 100644 --- a/docs/visualization.md +++ b/docs/visualization.md @@ -2,7 +2,7 @@ !!! info - If you want to run the code below, consider downloading the [demo data](supported_datasets/tng.md#demo-data) or use the [TNGLab](supported_datasets/tng.md#tnglab) online. + If you want to run the code below, consider downloading the [demo data](supported_data.md#demo-data) or use the [TNGLab](supported_data.md#tnglab) online. ## Creating plots diff --git a/mkdocs.yml b/mkdocs.yml index 15489327..270fdced 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -28,6 +28,7 @@ nav: - 'FAQ': faq.md - 'Configuration': configuration.md - 'Development': developer.md + - 'Visual Impressions': impressions.md - 'API': - 'Basic': api/base_api.md - 'Index': api/moduleindex.md