-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to view/load single datasets in a data collection #554
Comments
Some more thoughts on this: I'm currently iterating over multiple datasets to run some analysis. And I encountered precisely the problem described above: I need to have one for loop on the "normal" datasets", and another one on all the datasets from the hyperbard collection. If we add another collection, I'd need one more loop. It's not super practical and my code feels unnecessarily complicated. Also, when we look at the datasets table, no stats appear for hyperbard (because it's a collection of course). I'm thinking: what if each dataset had its own separate record in zenodo? Then, because we're starting to have many datasets, it would be useful to have a way of sorting/filtering them by stats/collection/category/... in the above table. So that one can find quickly what they're looking for. What do you think? |
This could be a great solution to this problem! I think that it could be quite tedious to add a lot of datasets to their own pages in Zenodo. So could we modify this to allow more than one dataset for each record? Then each dataset in the collection would have the same DOI, but we would still treat them as individual datasets like you're suggesting. In addition, how do you picture that we can efficiently access all the datasets in each collection? One thing that comes to mind is by specifying collections in index.json. Regarding the table, I 100% agree. Something that I originally was thinking was to group the datasets in a collection in an expandable section of the table, but I like the idea of more flexibly searching the datasets. |
Okay thinking out loud to this how we could make this work in practice. Right now, the hyperbard collection has a single Zenodo record, that contains one .json file per dataset in the collection. This sounds like a minimal change that we can already make and see if we like it. (Only small downside I see for this: if we need to make a correction to a single dataset, we need a new version for the whole collection.) So in practice we could now:
Did I miss anything? |
Second step would be to make the table more flexible for searching, adding more attributes in index.json, and maybe add filtering capabilities to the |
Okay, I've had time to think more about this. What about this: We make collections top-level items and add the datasets contained in the collection under this item with relative paths to the collection url. Then, we can define single datasets with the tuple xgi.load_xgi_data() it will return Datasets:
dataset1
dataset2
...
Collections:
collection1
("collection1", "dataset1")
("collection1", "dataset2")
collection2
("collection2", "dataset1")
("collection2", "dataset2")
... I don't think that this is a perfect solution, but it would be nice to have preserve the connection between the collection and its constituent datasets. |
From #540:
"
xgi.load_xgi_data("hyperbard")
loads a dict of datasets, so xgi.load_xgi_data("hyperbard")["coriolanus"] should load a single dataset if I understand correctly...I'm wondering if don't want to be able to access them directly from something likexgi.load_xgi_data("hyperbard-coriolanus")
to be able to iterate over all datasets in XGI-data. They could also appear in the list we get withxgi.load_xgi_data()
this way?"The text was updated successfully, but these errors were encountered: