Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor fixes from December 2024 #168

Merged
merged 9 commits into from
Dec 18, 2024
Merged
2 changes: 1 addition & 1 deletion content/45.user_friendliness.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ When loading a dataset, yt will attempt to determine what the format of the data

While these may seem like simple, obvious changes to make, they can hide difficult technical challenges, and more importantly, have dramatically improved the user experience for people using yt.

### Jupyter Integration {#sec:jupyter_integrationt}
### Jupyter Integration {#sec:jupyter_integration}

Project Jupyter is an overarching term for a collection of related projects that provide an extensive, end-to-end suite for the user experience of developing code and narrative, as described in depth in (among other papers) @doi:10.1109/MCSE.2021.3059263 and @soton403913.
While many in the yt community utilize yt through python scripts executed on the command line or through submission queues on high-performance computing resources, a large fraction utilize Jupyter Notebooks for their data exploration.
Expand Down
2 changes: 1 addition & 1 deletion content/50.halo_finding_and_catalogs.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Being able to identify halos, as well as their associated baryonic content, is n
Furthermore, convergence studies and cross-simulation comparisons requires a consistent method for identifying dark matter halos, as well as the ability to track their growth over time.

In past versions of `yt`, several specific halo finders were bundled and made available to work on any class of data `yt` was able to read.
These included the HOP halo finder, the classic Friends-of-Friends (FOF) halo finder [@doi:10.1086/191003], a scalable and Parallel HOP [@doi: 10.1086/305535], and a wrapping of the ORIGAMI code [@doi:10.1142/9789814623995_0378] for filament identification.
These included the HOP halo finder, the classic Friends-of-Friends (FOF) halo finder [@doi:10.1086/191003], a scalable and Parallel HOP [@doi:10.1086/305535], and a wrapping of the ORIGAMI code [@doi:10.1142/9789814623995_0378] for filament identification.
To do so, `yt` would utilize direct in-memory connectors with these implementations; whereas typically data connectors are written for each individual dataset format for individual halo finding methods, this enabled a single connector to be written from `yt` to the halo finder.
In addition to these bundled halo finders, a direct in-memory interface with Rockstar [@doi:10.1088/0004-637X/762/2/109] was developed that sidestepped Rockstar's built in load-balancing to minimize data duplication and transfer.

Expand Down
20 changes: 8 additions & 12 deletions content/55.scaling_parallelism.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Scaling and Parallelism
## Scaling and Parallelism

To support cases where data volume results in long processing time or large memory requirements, yt operations have been parallelized using the Message Passing Interface (MPI; @mpi40).
When designing the parallel interface for yt, as discussed in [@doi:10.1088/0067-0049/192/1/9], the design goals included ensuring that scripts required little to no adjustments to be run in parallel.
Expand All @@ -8,19 +8,18 @@ In the intervening time, the parallel operation infrastructure has been rewritte

Almost all of the operations in yt that are conducted in parallel follow a straightforward method of decomposing work and consolidating results:

1. Identify which chunking method (see @sec:chunking) is most appropriate for the operation.
2. Consolidate chunks according to IO minimization and assign to individual MPI tasks
3. Join (potentially applying reduction operations) final results to provide solution to *all tasks* in the group
1. Identify which chunking method (see @sec:chunking) is most appropriate for the operation.
2. Consolidate chunks according to IO minimization and assign to individual MPI tasks
3. Join (potentially applying reduction operations) final results to provide solution to _all tasks_ in the group

The final step, of joining across tasks, results in the final set of values being accessible to all tasks; this is not a universal "final step" in parallel operations, and in some cases results in substantial duplication of memory.
This compromise was accepted as a result of the design goals of ensuring that scripts can run unmodified.

The parallelism in yt heavily leans upon the "index" for a dataset either being available *already* at initiation time on all tasks, or that index being *accessible* through IO operations or fast generation.
The parallelism in yt heavily leans upon the "index" for a dataset either being available _already_ at initiation time on all tasks, or that index being _accessible_ through IO operations or fast generation.
This provides a degree of load-balancing that can be conducted, as estimates of memory and processing requirements are available on all tasks (and thus the load-balancing calculations are deterministic across all tasks).
In essence, this means that for grid-based datasets, the entire grid hierarchy is available on all processors; for octrees or particle datasets, it means that at least a rough estimate of the distribution of values must be available (and identical) on all processors.
This doesn't prevent opaquely distributed datasets from being decomposed, but it does allow datasets whose distribution is well-described to be decomposed with greater precision.


### Multi-Level Parallelism

In its original implementation of parallelism, yt utilized a single, global MPI communicator (`MPI_COMM_WORLD`).
Expand All @@ -31,7 +30,7 @@ For example, when conducting halo finding and analysis (see @sec:halo_finding) y
This takes place by specifying a task size at the top level (or allowing yt's internal heuristics to determine it) and then distributing work to sub-communicators, each of which is then used for decomposition inside that top-level task.

In addition to multi-level communicators, yt utilizes OpenMP constructs exposed in Cython in several places.
This includes in the software volume rendering (see @sec:software-volume-rendering), in the pixelization operations for SPH data (see @sec:sph-analysis), calculation of gravitational binding energy (see @sec:analysis-modules) and for computing the bounding volume hierarchy for rendering finite element meshes (see @sec:unstructured-mesh).
This includes in the software volume rendering (see @sec:software-volume-rendering), in the pixelization operations for SPH data (see @sec:sph-analysis), calculation of gravitational binding energy (see @sec:analysis-modules) and for computing the bounding volume hierarchy for rendering finite element meshes (see @sec:unstructured_mesh).
In some instances, the Cython interface to OpenMP has had unpredictable performance implications; owing to this, the usage of OpenMP within yt has been somewhat conservative.

### Parallelism Interfaces
Expand All @@ -42,7 +41,7 @@ This parallelism is instrumented through the use of the yt "chunking" interface,
The high-level interface to the `DerivedQuantity` subclasses computes the data chunks in the source data object and then assigns these to individual MPI tasks in the current top-level communicator.
Each initializes storage space for the intermediate values, iterates over its assigned chunks and constructs intermediate reductions, and then the finalization step involves broadcasting the values to all other tasks and completing the final set of operations.
For projections, the procedure is very similar; those datasets with an index duplicated across MPI tasks (such as patch-based grid datasets) are collapsed along a dimension and each MPI task fills in the values, which are then reduced through a broadcast operation.
Utilizing these operations requires *no* modifications to user-facing code other than a call to `yt.enable_parallelism()` at the start of the script.
Utilizing these operations requires _no_ modifications to user-facing code other than a call to `yt.enable_parallelism()` at the start of the script.

The user-facing parallel constructs allow for somewhat greater flexibility in defining parallel task decomposition.
Many objects in yt, particularly those such as the `DatasetSeries` object, have constituent data objects on which analysis can be conducted.
Expand Down Expand Up @@ -72,8 +71,6 @@ For many types of data analysis, particularly those operations conducted across

### Performance of Operations



### Inline Analysis

It is possible to instrument a simulation code to call Python routines inline during its execution.
Expand All @@ -84,9 +81,8 @@ In these cases, `yt` did not pass around datasets between MPI tasks, but rather
Within Enzo, all of the communication between Python and C++ was managed through Enzo's usage of the C API.
This required some knowledge of how Python conducts garbage collection, and required ensuring that reference counting was managed correctly to avoid memory leaks.

This non-standardized approach to conducting *in situ* visualization led to the creation and development of the library `libyt` which serves as an intermediary layer between simulation codes and `yt` (and Python in general.)
This non-standardized approach to conducting _in situ_ visualization led to the creation and development of the library `libyt` which serves as an intermediary layer between simulation codes and `yt` (and Python in general.)
This library encapsulates all Python API calls, manages references, and provides a systematic method for providing data pointers to Python.
`libyt` provides a stable C-based API, and is accessible from numerous different languages.
It also provides a custom-built `yt` frontend for accepting data.
A more complete description is outside the scope of this paper, and we refer the reader to (**MJT: cite in prep manuscript**).

9 changes: 9 additions & 0 deletions content/68.future_directions.md
Original file line number Diff line number Diff line change
@@ -1 +1,10 @@
## Future Directions

- More integration with _in situ_ analysis systems like `libyt`
- Much improved optimization
- Integration with other domains besides astronomy
- Refactoring for the long term
- Static typing
- Improving visual representation of `yt` objects
- Testing infrastructure
- Integration with external libraries such as pytorch-spatial, etc
2 changes: 1 addition & 1 deletion content/70.sustainability.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ However, at the risk of belaboring a point that has been well-explored elsewhere
A tension exists, however, between support of an existing project and the support of new projects in an ecosystem.
By supporting an existing project, resources can tend to become concentrated; conversely, if a project supports a broader research agenda, that resource concentration can result in greater effort-multipliers for individuals who utilize the project.
We're aware of this tension in yt; in fact, while yt has been grant-supported, most of the grant development has gone to a very small number of groups.
This grant funding has been provided through the National Science Foundation, the Gordon and Betty Moore Foundation, the Department of Energy, the Chan Zuckerberg Initiative and other sources. [@doi:10.6084/m9.figshare.2061465.v1, [@doi:10.6084/m9.figshare.909413.v1], [@doi:10.5281/zenodo.4158589].
This grant funding has been provided through the National Science Foundation, the Gordon and Betty Moore Foundation, the Department of Energy, the Chan Zuckerberg Initiative and other sources. [@doi:10.6084/m9.figshare.2061465.v1], [@doi:10.6084/m9.figshare.909413.v1], [@doi:10.5281/zenodo.4158589].
Grants have supported the development of new features, including specific functionality for analysis routines and support for non-astronomical domains.

Into each of these grants has been explicit support for community building, constituted by the development of documentation, videos, and tutorials, as well as mentoring of new contributors and shepherding the growth of the project through code review and issue management.
Expand Down
32 changes: 32 additions & 0 deletions content/addl_authors.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
initials: DN
name: Desika Narayanan
orcid: 0000-0002-7064-4309
corresponding: false
- affiliations:
- Outer Loop LLC
email: samskillman@gmail.com
Expand All @@ -16,6 +17,7 @@
initials: SWS
name: Samuel W. Skillman
orcid: 0000-0002-7626-522X
corresponding: false
- affiliations:
- Lawrence Berkeley National Laboratory
email: axelhuebl@lbl.gov
Expand All @@ -24,6 +26,7 @@
initials: AH
name: Axel Huebl
orcid: 0000-0003-1943-7141
corresponding: false
- affiliations:
- Oak Ridge National Laboratory
email: veb@ornl.gov
Expand All @@ -32,6 +35,7 @@
initials: EB
name: Elliott Biondo
orcid: 0000-0002-9088-1360
corresponding: false
- affiliations:
- University of California, Davis
email: awetzel@ucdavis.edu
Expand All @@ -40,6 +44,7 @@
initials: AW
name: Andrew Wetzel
orcid: 0000-0003-0603-8942
corresponding: false
- affiliations:
- UC Santa Cruz
email: cjstrawn@ucsc.edu
Expand All @@ -48,6 +53,7 @@
initials: CS
name: Clayton Strawn
orcid: 0000-0001-9695-4017
corresponding: false
- affiliations:
- Idaho National Laboratory
email: alexander.lindsay@inl.gov
Expand All @@ -56,6 +62,7 @@
initials: AL
name: Alexander Lindsay
orcid: 0000-0002-6988-2123
corresponding: false
- affiliations:
- Free Agent
email: gabriel.altay@gmail.com
Expand All @@ -64,6 +71,7 @@
initials: GA
name: Gabriel Altay
orcid: 0000-0002-4120-2907
corresponding: false
- affiliations:
- Center for Astrophysics - Harvard & Smithsonian
- University of Miami
Expand All @@ -73,6 +81,7 @@
initials: ETL
name: Erwin T. Lau
orcid: 0000-0001-8914-8885
corresponding: false
- affiliations:
- Center for Astrophysics - Harvard & Smithsonian
email: aaron.smith@cfa.harvard.edu
Expand All @@ -81,6 +90,7 @@
initials: AS
name: Aaron Smith
orcid: 0000-0002-2838-9033
corresponding: false
- affiliations:
- Center for Theoretical Physics, Seoul National University
email: mornkr@snu.ac.kr
Expand All @@ -89,6 +99,7 @@
initials: JK
name: Ji-hoon Kim
orcid: 0000-0003-4464-1160
corresponding: false
- affiliations:
- Institute of Astrophysics, National Taiwan University, Taipei 10617, Taiwan
- Physics Division, National Center for Theoretical Sciences, Taipei 10617, Taiwan
Expand All @@ -98,6 +109,7 @@
initials: HS
name: Hsi-Yu Schive
orcid: 0000-0002-1249-279X
corresponding: false
- affiliations:
- Indian Institute of Technology Kharagpur
email: navaneeths1998@gmail.com
Expand All @@ -106,6 +118,7 @@
initials: NS
name: Navaneeth S
orcid: 0009-0007-6922-0369
corresponding: false
- affiliations:
- Michigan State University
email: oshea@msu.edu
Expand All @@ -114,6 +127,7 @@
initials: BWO
name: Brian W. O'Shea
orcid: 0000-0002-2786-0348
corresponding: false
- affiliations:
- Kavli Institute for Particle Astrophysics and Cosmology, Stanford University
email: tabel@stanford.edu
Expand All @@ -122,6 +136,7 @@
initials: TA
name: Tom Abel
orcid: 0000-0002-5969-1251
corresponding: false
- affiliations:
- Birla Institute of Technology and Science, Pilani, Sancoale, Goa 403726, India
email: yashgondhalekar567@gmail.com
Expand All @@ -130,13 +145,15 @@
initials: YG
name: Yash Gondhalekar
orcid: 0000-0002-6646-4225
corresponding: false
- affiliations: []
email: graywilliamj@gmail.com
funders: []
github: ''
initials: WJG
name: William J Gray
orcid: 0000-0001-9014-3125
corresponding: false
- affiliations:
- Fermilab
email: gnedin@fnal.gov
Expand All @@ -145,6 +162,7 @@
initials: NYG
name: Nickolay Y. Gnedin
orcid: 0000-0001-5925-4580
corresponding: false
- affiliations:
- Institute of theoretical physics, Chinese Academy of Science
email: cristian.joana@itp.ac.cn
Expand All @@ -153,6 +171,7 @@
initials: CJ
name: Cristian Joana
orcid: 0000-0003-4642-3028
corresponding: false
- affiliations:
- 'University of Central Lancashire '
email: bthompson2090@gmail.com
Expand All @@ -161,6 +180,7 @@
initials: BBT
name: 'Benjamin B Thompson '
orcid: 0000-0003-4383-9183
corresponding: false
- affiliations:
- University of North Texas
email: yuan.astro@gmail.com
Expand All @@ -169,6 +189,7 @@
initials: YL
name: Yuan Li
orcid: 0000-0001-5262-6150
corresponding: false
- affiliations:
- Max Planck Institute for Astrophysics
email: rjfarber@umich.edu
Expand All @@ -177,6 +198,7 @@
initials: RJF
name: Ryan Jeffrey Farber
orcid: 0000-0002-0649-9055
corresponding: false
- affiliations:
- Los Alamos National Laboratory
email: jonahm@lanl.gov
Expand All @@ -185,6 +207,7 @@
initials: JMM
name: Jonah M Miller
orcid: 0000-0001-6432-7860
corresponding: false
- affiliations:
- Penn State University
email: mzr55@psu.edu
Expand All @@ -193,6 +216,7 @@
initials: MR
name: Michael Ryan
orcid: 0000-0002-0378-5195
corresponding: false
- affiliations:
- Michigan State University
email: dsilvia@msu.edu
Expand All @@ -201,6 +225,7 @@
initials: DWS
name: Devin W. Silvia
orcid: 0000-0002-4109-9313
corresponding: false
- affiliations:
- Argonne National Laboratory
email: rjackson@anl.gov
Expand All @@ -209,6 +234,7 @@
initials: RJ
name: Robert Jackson
orcid: 0000-0003-2518-1234
corresponding: false
- affiliations:
- none
email: karraki@gmail.com
Expand All @@ -217,6 +243,7 @@
initials: KA
name: Kenz Arraki
orcid: 0000-0002-3012-1167
corresponding: false
- affiliations:
- Indian Institute of Science, Bangalore, India
email: alankardutta@iisc.ac.in
Expand All @@ -225,6 +252,7 @@
initials: AD
name: Alankar Dutta
orcid: 0000-0002-9287-4033
corresponding: false
- affiliations:
- 'Indian Institute of Science'
email: ritalighosh@iisc.ac.in
Expand All @@ -233,6 +261,7 @@
initials: RG
name: Ritali Ghosh
orcid: 0000-0001-8643-7104
corresponding: false
- affiliations:
- Shanghai Astronomical Observatory, Chinese Academy of Sciences
- School of Astronomy and Space Sciences, University of Chinese Academy of Sciences
Expand All @@ -242,6 +271,7 @@
initials: SX
name: Shaokun Xie
orcid: 0000-0001-5624-6008
corresponding: false
- affiliations:
- Posit PBC
email: tkteal@gmail.com
Expand All @@ -250,6 +280,7 @@
initial: TKT
name: Tracy K. Teal
orcid: 0000-0002-9180-9598
corresponding: false
- affiliations:
- University of California San Diego, San Diego Supercomputer Center
email: rpwagner@ucsd.edu
Expand All @@ -258,3 +289,4 @@
initials: RPW
name: Rick Wagner
orcid: 0000-0003-1291-5876
corresponding: false
28 changes: 14 additions & 14 deletions content/frontends.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -301,20 +301,20 @@ frontends:
- octree
name: ramses
usage_citations:
- '@doi:10.1051/0004-6361/202243170oi'
- '@doi:10.1051/0004-6361/202037698oi'
- '@doi:10.1051/0004-6361/201936188oi'
- '@doi:10.1051/0004-6361/201935504oi'
- '@doi:10.1093/mnras/sty2859oi'
- '@doi:10.1093/mnras/sty024oi'
- '@doi:10.3847/1538-4357/aa989aoi'
- '@doi:10.1093/mnras/stx1706oi'
- '@doi:10.3847/1538-4357/833/2/202oi'
- '@doi:10.3847/0004-637X/826/1/22oi'
- '@doi:10.1088/0004-637X/807/1/67oi'
- '@doi:10.1088/0067-0049/210/1/14oi'
- '@doi:10.1093/mnras/stt1789oi'
- '@doi:10.1088/0004-637X/765/1/39oi'
- '@doi:10.1051/0004-6361/202243170'
- '@doi:10.1051/0004-6361/202037698'
- '@doi:10.1051/0004-6361/201936188'
- '@doi:10.1051/0004-6361/201935504'
- '@doi:10.1093/mnras/sty2859'
- '@doi:10.1093/mnras/sty024'
- '@doi:10.3847/1538-4357/aa989a'
- '@doi:10.1093/mnras/stx1706'
- '@doi:10.3847/1538-4357/833/2/202'
- '@doi:10.3847/0004-637X/826/1/22'
- '@doi:10.1088/0004-637X/807/1/67'
- '@doi:10.1088/0067-0049/210/1/14'
- '@doi:10.1093/mnras/stt1789'
- '@doi:10.1088/0004-637X/765/1/39'
- '@doi:10.3847/1538-4357/aa6dff'
- '@doi:10.48550/arXiv.2206.11913'
- '@doi:10.1051/0004-6361/201834496'
Expand Down
Loading
Loading