-
Notifications
You must be signed in to change notification settings - Fork 153
A new data interface
The following page hashes out a new interface for accessing data with Glue. The idea is to enable Glue to visualize data which are inconvenient or infeasible to load into memory.
The interface described below is pseudocode, and not yet an actual API. It describes the functionality that Glue requires from a data backend.
data.components
: Lists what attributes exist in the data. Each item is a ComponentID
data.aggregate(one_or_more_component_id, bin_edges, reducer)
: Generalized histogram/heatmap interface. To make a histogram of a particular component, call data.aggregate(component_id, [1, 2, 3])
. This counts the number of records where component_id
is in the range [1, 2], [2, 3]. Likewise, passing two sets of ComponentIDs and bin edges would compute a heat map. Optionally, reduction functions could be used to compute things like the average of another quantity for each bin.
data.slice(component_id, slice_info)
: Computes a fixed-resolution, cropped, 2D slice through a >=2D dataset. slice_info
contains information about the orientation of the slice, view limits, and resolution. Similar to the concept of a fixed resolution buffer in yt.
data.slab(component_id, slab_info, aggregation_func)
: Similar to slice, but with a range of values on the dimensions perpendicular to the slice. The slab is collapsed using aggregation_func (e.g., a max projection, sum projection, ...) to produce a slice.
data.stats(component_id)
: Summary statistics for a particular component, including min, max, median, std, mean, 5/95% percentile, number of finite records.
Only a subset of these features are needed to do everything Glue does at the moment:
- the reducer function for aggregate isn't needed
- slab isn't needed
Subsets would also provide the aggregate
and stats
functions.
The complexity of a subset determines how difficult it is to perform aggregations and stats in realtime. This list enumerates the various ways subsets can be defined, in increasing order of complexity:
- A simple inequality on one component:
x > 5
- Boolean combinations of the above:
x > 5
&y < 10
- 2D polygon constraint on 2 components
- Boolean combinations of the above
Note also that some components in a dataset are derived fields, created by passing 1 or more other components through an arbitrary transformation function
For expensive operations, you can imagine yielding a sequence of increasingly-accurate results. Glue could render the "first impression" of a histogram immediately, and then improve the rendering over time as long as the user wants to wait.