Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collapsing UniformNdMappings of Dataset types #1417

Closed
philippjfr opened this issue May 7, 2017 · 0 comments
Closed

Collapsing UniformNdMappings of Dataset types #1417

philippjfr opened this issue May 7, 2017 · 0 comments

Comments

@philippjfr
Copy link
Member

philippjfr commented May 7, 2017

UniformNdMapping types (such as NdOverlay, HoloMap, NdLayout and GridSpace) wrap around one or more Elements adding additional outer indices to the data. This means that at least in theory they can always be reduced to a single Element containing the union of dimensions between the container and the element. The current API to do this is to call the table method. As a very simple example let's take three curves, and collapse them using the table method:

hv.HoloMap({chr(65): hv.Curve(range(3)) for i in range(3)}, kdims=['Groups']).table().data
	Groups	x	y
0	A	0	0
1	A	1	1
2	A	2	2
0	B	0	0
1	B	1	1
2	B	2	2
0	C	0	0
1	C	1	1
2	C	2	2

Using the table method to collapse the data in this way can be useful but since a Table is usually columnar this does not make much sense for gridded data. Secondly we also have a collapse method on HoloMap, which first combines the data by generating a table like the one above and then applies some aggregation. This approach is both highly inefficient when dealing with gridded data and also incorrect because once an image has been converted to tabular format it can't easily be converted back.

Therefore my proposal would be that we provide some API that allows combining both tabular and gridded datasets correctly without always converting to a tabular format. In practical terms this just means that we implement Interface.concat methods for the gridded interfaces and then replace the table method with a more general .to_dataset or similar. This will allow collapsing a HoloMap/GridSpace/NdOverlay of Images/Rasters etc. into an n-dimensional cube. Once that's implemented HoloMap.collapse will just work for gridded datasets again.

Another way of looking at this is as the reverse operation to a .groupby or .to conversion, i.e. a multi-dimensional dataset can be expanded into multiple elements in a container type, and .to_dataset would do the reverse and collapse into down into a single Dataset again.

I believe this could also be leveraged to an efficient storage protocol, complex containers could be collapsed down into a monolithic datastore, representing a large table or multi-dimensional array, on serialization. The collapsed data can then be stored along with a spec (recall my prototype for expressing .to specifications) to recreate the complex container on deserialization. That way a serialization tool could take advantage of large datastores such as a database, pytables or NetCDF by collapsing the data down to a single datastructure which can be stored efficiently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant