Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added validation to the training script and exposed more of the settngs for network #113

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cfd1
Copy link
Contributor

@cfd1 cfd1 commented Sep 25, 2023

Modulus Pull Request

Description

  • Added validation to the training script
  • Added more parameters in the constants.py

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.

Dependencies

@NickGeneva NickGeneva requested a review from mnabian September 26, 2023 18:11
@NickGeneva NickGeneva added external Issues/PR filed by people outside the team enhancement New feature or request labels Sep 26, 2023
@cfd1 cfd1 force-pushed the mgn/train_validation branch from 5b543a5 to 1a36344 Compare October 3, 2023 12:26
@NickGeneva NickGeneva added the 2 - In Progress Currently a work in progress label Oct 3, 2023
LimitingFactor and others added 2 commits October 4, 2023 19:28
…ngs for network

Signed-off-by: LimitingFactor <aswift0n3@gmail.com>
Updated the wandb initialisation

Signed-off-by: LimitingFactor <aswift0n3@gmail.com>
@cfd1 cfd1 force-pushed the mgn/train_validation branch from f715d7e to 8ad3efc Compare October 4, 2023 18:28
@mnabian
Copy link
Collaborator

mnabian commented Oct 10, 2023

Hi @cfd1 , thanks for the PR! Is this ready for review?

@cfd1
Copy link
Contributor Author

cfd1 commented Oct 11, 2023

@mnabian yes, it's ready for a review please

do_concat_trick: bool = False
num_processor_checkpoint_segments: int = 0
# activation_fn: str = "relu"

# performance configs
amp: bool = False
jit: bool = False

# test & visualization configs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to "visualization configs"

def get_options():
parser = argparse.ArgumentParser()

parser.add_argument("--entity", "-e", type=str, default=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add these to the config file and not use argparse here?

# initialize distributed manager
DistributedManager.initialize()
dist = DistributedManager()

# initialize loggers
if wandb:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this if statement necessary? This can be done by changing the mode argument in initialize_wandb


if __name__ == "__main__":
@torch.no_grad()
def validation(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to the predict method in `inference.py. Can we make the prediction code more modular to avoid code duplication?

# Train the model
tmp_start = time.time()
loss_train_agg = 0
for graph in tqdm(trainer.dataloader):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does tqdm play nicely in the multi-gpu runs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress enhancement New feature or request external Issues/PR filed by people outside the team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants