Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grid Analyzer! #81

Merged
merged 40 commits into from
Nov 13, 2018
Merged

Grid Analyzer! #81

merged 40 commits into from
Nov 13, 2018

Conversation

tkornuta-ibm
Copy link
Contributor

@tkornuta-ibm tkornuta-ibm commented Nov 9, 2018

Finally, working mip-grid-analyzer is here!

This required to solve several issues and introduce many changes in the whole experimental pipeline, starting from:

In order to test the whole pipeline, I have also prepared a config for grid_trainer with LeNet5 and SimpleCNN model trained on MNIST (fixes #51 )

The use case is as follows:

  • run grid-trainer on the configuration file (2 models, 1 problem, 5 repetitions)
    'mip-grid-trainer-cpu --c configs/vision/grid_trainer_mnist.yaml'

  • run grid-tested on the newly created directory with containing 10 different models
    'mip-grid-tester-cpu --e --expdir ./experiments_20181109_131153'

  • run grid-analyzer to aggregate all the results in a single csv file
    'mip-grid-analyzer --e --expdir ./experiments_20181109_131153'

Attaching the resulting file:

20181109_134802_grid_analysis.xlsx

(What I have described above will in fact become a part of "reproducible research: MNIST Image Classification with LeNet-5 - issue #12 )

Besides, I've made plenty of minor changes, like:

… with wrongly formated values in grid-analyzer
@tkornuta-ibm
Copy link
Contributor Author

This pull request introduces 3 alerts when merging 1ae12d3 into d8c20f1 - view on LGTM.com

new alerts:

  • 3 for Unused import

Comment posted by LGTM.com

@vmarois
Copy link
Contributor

vmarois commented Nov 10, 2018

It looks good! I haven't add the time to test it so will do so on Monday.

This PR adds a dependency over psutil.. Is there no way around it?

@tkornuta-ibm
Copy link
Contributor Author

psutil

Sadly not, please review the solutions that I have analyzed when fixing issue #52

Copy link
Contributor

@vmarois vmarois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, thanks!
I have done some doc polishing and set a few basic methods of GridAnalyzer as static ones.
I have tested the pipeline on Ubuntu CPUs (fixes #82 ) and GPUs (fixes #83 ), and it worked flawlessly.

Nonetheless, I have some remarks about the resulting csv file:

  • The train_status value does not always match the actual one reported in the trainer.log. Critically, some runs are indicated as "Not converged" when they actually did. I am thinking that this happens as we do not always save the model. Thus, it is possible that we save the model mid-training, and the training status is then different than the final one

  • I find the key name train_start to be somewhat confusing when compared to model_timestamp (I thought at the beginning that we were not reporting the correct timestamp). Could we clarify this?

  • We should report the aggregated statistics for the test, in place of the data point for the last episode. I think this is more valuable to the user, and shouldn't be hard to implement (since we have only 1 data point for the test, and the corresponding csv file exists in all cases)

Otherwise, this is an important PR, and this is great work! 👍

…ged in online trainer 2) added saving model after last epoch in offline trainer 3) importing aggregated test statistics in grid-analyzer
…ts at the end of training (when epoch/episode limit is hit), modifies model save. Fixes #85
@vmarois vmarois merged commit 9d0a1e2 into develop Nov 13, 2018
@vmarois vmarois deleted the fix/grid-analyzer branch November 13, 2018 00:14
@tkornuta-ibm tkornuta-ibm restored the fix/grid-analyzer branch November 13, 2018 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment