Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🏛️ Better abstraction for STGraph Dataset Loaders #85

Merged
merged 35 commits into from
Jan 6, 2024

Conversation

nithinmanoj10
Copy link
Contributor

@nithinmanoj10 nithinmanoj10 commented Oct 19, 2023

Noticed that a lot of code are being re-used/repeated in the stgraph.dataset module. Working on improving the abstraction for the dataset layer, similar to what we did for stgraph.graph module.

List of Datasets

Static

  • Cora

Temporal

  • Hungary CP
  • METRLA
  • Montevideo Bus
  • PedalMe
  • WikiMath
  • Windmill Output

Dynamic

  • EnglandCOVID

@nithinmanoj10
Copy link
Contributor Author

Graph Dataset Handling

The following steps can be implemented to better handle, store and access the graph datasets.

  1. Check whether the dataset is present in a cache directory.
  2. If it is not present in the cache, download it online using its URL.
  3. Process the downloaded dataset
  4. Save the processed dataset into the cache folder
  5. Load the cached dataset

We also need to create a hidden folder named ~/.stgraph, that will have the cached dataset folder. The respective steps for handling the datasets should be an abstract method and implemented within each child class as necessary.

STGraphDataset is the base class for all other dataset classes. So far created the private method `_has_dataset_cache()`
Added STGraphStaticDataset as an abstract class for static graph datasets. It inherits from STGraphDataset.
@nithinmanoj10
Copy link
Contributor Author

Importing Datasets

Previously we used to write the entire path of where the dataset was located to import it, instead now we can import it to the __init__.py file of stgraph.dataset module. So now instead of writing

from stgraph.dataset.static.CoraDataLoader import CoraDataLoader

we can now write

from stgraph.dataset import CoraDataLoader

Refer to the __init__.py present inside stgraph.dataset to get a better idea

Added _download_dataset(), _save_dataset() and _load_dataset() methods which are used to handle cached graph datasets. These 3 functions will be common for all datasets present in STGraph and hence is implemented within the STGraphDataset base class.
Added more private attributes such as train/test masks/splits. Implemented the logic for _process_dataset() method.
Also added information about how the dataset handling is done inside the docstrings of STGraphDataset class
Just started writing pytests for the CoraDataLoader. So far written the code to test the init method.
HungaryCPDataLoader is part of the temporal graph dataset group.
Added the __init__() functions for HungaryCPDataLoader. Also added lags variable initialization inside STGraphTemporalDataset
Added descriptive docstrings for HungaryCPDataLoader. Made couple of additions to the docstrings of CoraDataLoader.

Made assert checks for the parameters passed to the above two dataloaders. Also made an issue related to HungaryCPDataLoader #86 reagarding the node features array.
Initialised the EnglandCovidDataLoader class
Added the various dataset pre-processing methods for EnglandCovidDataLoader.
Added descriptive documentation for EnglandCovidDataLoader along with code to calculate and set the graph attributes gdata field. Made better name choices by changing _get_total_timestamps() to _set_total_timestamps() and so on.

Added tests for EnglandCovidDataLoader. It is however incomplete.

Added few more documentation and necessary name changes for CoraDataLoader and HungaryCPDataLoader.
Added the METRLADataLoader constructor and method to find the total timestamps of the dataset
The METRLADataLoader has been added with necessary documentation
The MontevideoBusDataLoader has been added. Instead of using assert checks, we will be using if conditions and raising Exceptions if the user is passing any incorrect parameters.

Also added the logic for _set_edge_weights() along with documentation for the __init__ method
Added the PedalMeDataLoader along with it's docstrings.

Made certain changes to the MonteVideoBusDataLoader as well. Previously the _all_features and _all_targets were list, now it is numpy arrays.
@nithinmanoj10 nithinmanoj10 self-assigned this Dec 24, 2023
@nithinmanoj10 nithinmanoj10 added the new feature Adding a new feature label Dec 24, 2023
@nithinmanoj10 nithinmanoj10 linked an issue Dec 24, 2023 that may be closed by this pull request
Added the WikiMath Data Loader. Also made some name fixes to previously added datasets. Few of them had spaces in the name, which caused the downloaded cached file to also have a space in the file name.
@nithinmanoj10 nithinmanoj10 changed the base branch from main to v1.1.0 December 28, 2023 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature Adding a new feature v1.1.0 Tasks for STGraph v1.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant