-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🏛️ Better abstraction for STGraph Dataset Loaders #85
Conversation
Graph Dataset HandlingThe following steps can be implemented to better handle, store and access the graph datasets.
We also need to create a hidden folder named |
STGraphDataset is the base class for all other dataset classes. So far created the private method `_has_dataset_cache()`
Added STGraphStaticDataset as an abstract class for static graph datasets. It inherits from STGraphDataset.
Importing DatasetsPreviously we used to write the entire path of where the dataset was located to import it, instead now we can import it to the from stgraph.dataset.static.CoraDataLoader import CoraDataLoader we can now write from stgraph.dataset import CoraDataLoader Refer to the |
Added _download_dataset(), _save_dataset() and _load_dataset() methods which are used to handle cached graph datasets. These 3 functions will be common for all datasets present in STGraph and hence is implemented within the STGraphDataset base class.
Added more private attributes such as train/test masks/splits. Implemented the logic for _process_dataset() method.
Also added information about how the dataset handling is done inside the docstrings of STGraphDataset class
Just started writing pytests for the CoraDataLoader. So far written the code to test the init method.
HungaryCPDataLoader is part of the temporal graph dataset group.
Added the __init__() functions for HungaryCPDataLoader. Also added lags variable initialization inside STGraphTemporalDataset
Added descriptive docstrings for HungaryCPDataLoader. Made couple of additions to the docstrings of CoraDataLoader. Made assert checks for the parameters passed to the above two dataloaders. Also made an issue related to HungaryCPDataLoader #86 reagarding the node features array.
Initialised the EnglandCovidDataLoader class
Added the various dataset pre-processing methods for EnglandCovidDataLoader.
Added descriptive documentation for EnglandCovidDataLoader along with code to calculate and set the graph attributes gdata field. Made better name choices by changing _get_total_timestamps() to _set_total_timestamps() and so on. Added tests for EnglandCovidDataLoader. It is however incomplete. Added few more documentation and necessary name changes for CoraDataLoader and HungaryCPDataLoader.
Added the METRLADataLoader constructor and method to find the total timestamps of the dataset
The METRLADataLoader has been added with necessary documentation
The MontevideoBusDataLoader has been added. Instead of using assert checks, we will be using if conditions and raising Exceptions if the user is passing any incorrect parameters. Also added the logic for _set_edge_weights() along with documentation for the __init__ method
Added the PedalMeDataLoader along with it's docstrings. Made certain changes to the MonteVideoBusDataLoader as well. Previously the _all_features and _all_targets were list, now it is numpy arrays.
Added the WikiMath Data Loader. Also made some name fixes to previously added datasets. Few of them had spaces in the name, which caused the downloaded cached file to also have a space in the file name.
Tests for England COVID still in progress
Merging changes from v1.1.0 to dataset-abstraction branch
Noticed that a lot of code are being re-used/repeated in the
stgraph.dataset
module. Working on improving the abstraction for the dataset layer, similar to what we did forstgraph.graph
module.List of Datasets
Static
Temporal
Dynamic