🔢 Not computing the node features in HungaryCPDataLoader #86

nithinmanoj10 · 2023-11-05T06:03:55Z

Was going through the methods present in the HungaryCPDataLoader during the dataset abstraction task and noticed the following in the _get_targets_and_features method

Our Version

def _get_targets_and_features(self):
    stacked_target = np.array(self._dataset["FX"])
    self._all_targets = np.array(
        [stacked_target[i, :].T for i in range(stacked_target.shape[0])]
    )

But inside the PyTorch Geometric Temporal version they are computing the features list

PyG-T Version

def _get_targets_and_features(self):
    stacked_target = np.array(self._dataset["FX"])
    self.features = [
        stacked_target[i : i + self.lags, :].T
        for i in range(stacked_target.shape[0] - self.lags)
    ]
    self.targets = [
        stacked_target[i + self.lags, :].T
        for i in range(stacked_target.shape[0] - self.lags)
    ]

Need to confirm why we omitted this computation in our dataloader and add it back in if necessary.

The text was updated successfully, but these errors were encountered:

Added descriptive docstrings for HungaryCPDataLoader. Made couple of additions to the docstrings of CoraDataLoader. Made assert checks for the parameters passed to the above two dataloaders. Also made an issue related to HungaryCPDataLoader #86 reagarding the node features array.

nithinmanoj10 · 2023-11-25T10:16:27Z

Montevideo Bus

Something similar was also noticed in MontevideoBusDataLoader, where we are only calculating the target and not the features. We were also not considering the lags while calculating the target.

Our Version

def _get_targets(self, target_var: str = "y"):
    targets = []
    for node in self._dataset["nodes"]:
        y = node.get(target_var)
        targets.append(np.array(y))
    stacked_targets = np.stack(targets).T
    standardized_targets = (
        stacked_targets - np.mean(stacked_targets, axis=0)
    ) / np.std(stacked_targets, axis=0)
    self._all_targets = np.array([
        standardized_targets[i, :].T
        for i in range(len(standardized_targets))
    ])

PyG-T version

def _get_features(self, feature_vars: List[str] = ["y"]):
    features = []
    for node in self._dataset["nodes"]:
        X = node.get("X")
        for feature_var in feature_vars:
            features.append(np.array(X.get(feature_var)))
    stacked_features = np.stack(features).T
    standardized_features = (
        stacked_features - np.mean(stacked_features, axis=0)
    ) / np.std(stacked_features, axis=0)
    self.features = [
        standardized_features[i : i + self.lags, :].T
        for i in range(len(standardized_features) - self.lags)
    ]

def _get_targets(self, target_var: str = "y"):
    targets = []
    for node in self._dataset["nodes"]:
        y = node.get(target_var)
        targets.append(np.array(y))
    stacked_targets = np.stack(targets).T
    standardized_targets = (
        stacked_targets - np.mean(stacked_targets, axis=0)
    ) / np.std(stacked_targets, axis=0)
    self.targets = [
        standardized_targets[i + self.lags, :].T
        for i in range(len(standardized_targets) - self.lags)
    ]

For the new dataloader that is being implemented #85 , I will be following the PyG-T version unless our previous version is a better choice.

nithinmanoj10 · 2023-11-26T16:07:36Z

PedalMe

Something similar was also noticed for PedalMe dataset in our version.

nithinmanoj10 · 2024-01-06T09:25:10Z

WikiMath

Noticed that there is a difference in our version when calculating the standardized_target array. We are also not using the lags parameter when calculating self._all_targets. We are also omitting the calculation of self._all_features

PyG-T Version

 def _get_targets_and_features(self):

    targets = []
    for time in range(self._dataset["time_periods"]):
        targets.append(np.array(self._dataset[str(time)]["y"]))
    stacked_target = np.stack(targets)
    standardized_target = (
        stacked_target - np.mean(stacked_target, axis=0)
    ) / np.std(stacked_target, axis=0)
    self.features = [
        standardized_target[i : i + self.lags, :].T
        for i in range(len(targets) - self.lags)
    ]
    self.targets = [
        standardized_target[i + self.lags, :].T
        for i in range(len(targets) - self.lags)
    ]

Our Version

def _set_targets(self):
    r"""Calculates and sets the target attributes"""
    targets = []
    for time in range(self.gdata["total_timestamps"]):
        targets.append(np.array(self._dataset[str(time)]["y"]))
    stacked_target = np.stack(targets)
    standardized_target = (stacked_target - np.mean(stacked_target, axis=0)) / (
        np.std(stacked_target, axis=0) + 10**-10
    )
    breakpoint()
    self._all_targets = np.array(
        [standardized_target[i, :].T for i in range(len(targets))]
    )

nithinmanoj10 assigned nithinmanoj10 and JoelMathewC Nov 5, 2023

nithinmanoj10 added the help wanted Extra attention is needed label Nov 5, 2023

nithinmanoj10 linked a pull request Dec 24, 2023 that will close this issue

🏛️ Better abstraction for STGraph Dataset Loaders #85

Merged

8 tasks

nithinmanoj10 added this to the 🎁 STGraph 1.1.0 Release milestone Dec 24, 2023

nithinmanoj10 removed a link to a pull request Jan 6, 2024

🏛️ Better abstraction for STGraph Dataset Loaders #85

Merged

8 tasks

nithinmanoj10 mentioned this issue May 27, 2024

💡 Decide How We Plan to Calculate Node Features for All Dataloaders #136

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🔢 Not computing the node features in HungaryCPDataLoader #86

🔢 Not computing the node features in HungaryCPDataLoader #86

nithinmanoj10 commented Nov 5, 2023

nithinmanoj10 commented Nov 25, 2023

nithinmanoj10 commented Nov 26, 2023

nithinmanoj10 commented Jan 6, 2024

🔢 Not computing the node features in HungaryCPDataLoader #86

🔢 Not computing the node features in HungaryCPDataLoader #86

Comments

nithinmanoj10 commented Nov 5, 2023

nithinmanoj10 commented Nov 25, 2023

Montevideo Bus

nithinmanoj10 commented Nov 26, 2023

PedalMe

nithinmanoj10 commented Jan 6, 2024

WikiMath