Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔢 Not computing the node features in HungaryCPDataLoader #86

Open
nithinmanoj10 opened this issue Nov 5, 2023 · 3 comments
Open
Assignees
Labels
help wanted Extra attention is needed

Comments

@nithinmanoj10
Copy link
Contributor

Was going through the methods present in the HungaryCPDataLoader during the dataset abstraction task and noticed the following in the _get_targets_and_features method

Our Version

def _get_targets_and_features(self):
    stacked_target = np.array(self._dataset["FX"])
    self._all_targets = np.array(
        [stacked_target[i, :].T for i in range(stacked_target.shape[0])]
    )

But inside the PyTorch Geometric Temporal version they are computing the features list

PyG-T Version

def _get_targets_and_features(self):
    stacked_target = np.array(self._dataset["FX"])
    self.features = [
        stacked_target[i : i + self.lags, :].T
        for i in range(stacked_target.shape[0] - self.lags)
    ]
    self.targets = [
        stacked_target[i + self.lags, :].T
        for i in range(stacked_target.shape[0] - self.lags)
    ]

Need to confirm why we omitted this computation in our dataloader and add it back in if necessary.

@nithinmanoj10 nithinmanoj10 added the help wanted Extra attention is needed label Nov 5, 2023
nithinmanoj10 added a commit that referenced this issue Nov 5, 2023
Added descriptive docstrings for HungaryCPDataLoader. Made couple of additions to the docstrings of CoraDataLoader.

Made assert checks for the parameters passed to the above two dataloaders. Also made an issue related to HungaryCPDataLoader #86 reagarding the node features array.
@nithinmanoj10
Copy link
Contributor Author

Montevideo Bus

Something similar was also noticed in MontevideoBusDataLoader, where we are only calculating the target and not the features. We were also not considering the lags while calculating the target.

Our Version

def _get_targets(self, target_var: str = "y"):
    targets = []
    for node in self._dataset["nodes"]:
        y = node.get(target_var)
        targets.append(np.array(y))
    stacked_targets = np.stack(targets).T
    standardized_targets = (
        stacked_targets - np.mean(stacked_targets, axis=0)
    ) / np.std(stacked_targets, axis=0)
    self._all_targets = np.array([
        standardized_targets[i, :].T
        for i in range(len(standardized_targets))
    ])

PyG-T version

def _get_features(self, feature_vars: List[str] = ["y"]):
    features = []
    for node in self._dataset["nodes"]:
        X = node.get("X")
        for feature_var in feature_vars:
            features.append(np.array(X.get(feature_var)))
    stacked_features = np.stack(features).T
    standardized_features = (
        stacked_features - np.mean(stacked_features, axis=0)
    ) / np.std(stacked_features, axis=0)
    self.features = [
        standardized_features[i : i + self.lags, :].T
        for i in range(len(standardized_features) - self.lags)
    ]

def _get_targets(self, target_var: str = "y"):
    targets = []
    for node in self._dataset["nodes"]:
        y = node.get(target_var)
        targets.append(np.array(y))
    stacked_targets = np.stack(targets).T
    standardized_targets = (
        stacked_targets - np.mean(stacked_targets, axis=0)
    ) / np.std(stacked_targets, axis=0)
    self.targets = [
        standardized_targets[i + self.lags, :].T
        for i in range(len(standardized_targets) - self.lags)
    ]

For the new dataloader that is being implemented #85 , I will be following the PyG-T version unless our previous version is a better choice.

@nithinmanoj10
Copy link
Contributor Author

PedalMe

Something similar was also noticed for PedalMe dataset in our version.

@nithinmanoj10 nithinmanoj10 linked a pull request Dec 24, 2023 that will close this issue
8 tasks
@nithinmanoj10
Copy link
Contributor Author

WikiMath

Noticed that there is a difference in our version when calculating the standardized_target array. We are also not using the lags parameter when calculating self._all_targets. We are also omitting the calculation of self._all_features

PyG-T Version

 def _get_targets_and_features(self):

    targets = []
    for time in range(self._dataset["time_periods"]):
        targets.append(np.array(self._dataset[str(time)]["y"]))
    stacked_target = np.stack(targets)
    standardized_target = (
        stacked_target - np.mean(stacked_target, axis=0)
    ) / np.std(stacked_target, axis=0)
    self.features = [
        standardized_target[i : i + self.lags, :].T
        for i in range(len(targets) - self.lags)
    ]
    self.targets = [
        standardized_target[i + self.lags, :].T
        for i in range(len(targets) - self.lags)
    ]

Our Version

def _set_targets(self):
    r"""Calculates and sets the target attributes"""
    targets = []
    for time in range(self.gdata["total_timestamps"]):
        targets.append(np.array(self._dataset[str(time)]["y"]))
    stacked_target = np.stack(targets)
    standardized_target = (stacked_target - np.mean(stacked_target, axis=0)) / (
        np.std(stacked_target, axis=0) + 10**-10
    )
    breakpoint()
    self._all_targets = np.array(
        [standardized_target[i, :].T for i in range(len(targets))]
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants