[Doc|Train] Add Pytorch ResNet finetuning starter example #32936

woshiyyya · 2023-03-01T09:14:56Z

Why are these changes needed?

This example aims to provide an example of how to do finetuning on GPUs with TorchTrainer.

- Ray Train
   - examples
     - ...
     - Finetuning a Pytorch Image Classifier with Ray AIR

Task: Image Classification
Model: Pretrained Pytorch ResNet-50
Dataset: hymenoptera_data (2-class, ants and bees)

The workflow is:

Download hymenoptera_data dataset and create a ray dataset
Define the training loop (Load a pre-trained model)
Define the ScalingConfig and RunConfig
Build a TorchTrainer and run Trainer.fit()
Load the model weights from checkpoint file, and run an evaluation loop in pure PyTorch.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Yunxuan Xiao <yunxuanx@Yunxuans-MacBook-Pro.local>

…tune_example

Signed-off-by: Yunxuan Xiao <yunxuanx@Yunxuans-MacBook-Pro.local>

Signed-off-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter>

…ne_starter_example

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

richardliaw

any reason we need to use ray data here?

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

woshiyyya · 2023-03-01T21:00:49Z

any reason we need to use ray data here?

I've replaced ray data with prepare_dataloader().

amogkam · 2023-03-01T21:11:37Z

should this be in train examples or air examples?

woshiyyya · 2023-03-01T21:23:38Z

should this be in train examples or air examples?

I created a bookmark under the "Ray Train - Examples" section in the document.

Regarding the .ipynb file, I placed it under ray-air/examples since I could not find any other notebooks under the doc/source/train/examples folder. Is it acceptable to keep it there? Or we should move it?

Update:
I've put this example under thedoc/source/train/examples folder.

richardliaw · 2023-03-01T23:43:22Z

I think we should put it in train examples for now.

doc/source/ray-air/examples/pytorch_resnet_finetune.ipynb

Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

justinvyu

I think it's pretty much ready - have some minor suggestions:

One question - should we showcase TorchTrainer(resume_from_checkpoint) as the way to take in the AIR checkpoint in this example?

doc/source/train/examples/pytorch/BUILD

doc/source/train/examples/pytorch/pytorch_resnet_finetune.ipynb

Co-authored-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

…ne_starter_example

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

…ne_starter_example

richardliaw

clean and concise!

…ne_starter_example

…t#32936) Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MacBook-Pro.local> Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…t#32936) Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MacBook-Pro.local> Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter>

…t#32936) Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MacBook-Pro.local> Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter> Signed-off-by: chaowang <chaowang@anyscale.com>

…t#32936) Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MacBook-Pro.local> Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter> Signed-off-by: elliottower <elliot@elliottower.com>

…t#32936) Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MacBook-Pro.local> Co-authored-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter> Signed-off-by: Jack He <jackhe2345@gmail.com>

Yunxuan Xiao and others added 6 commits February 5, 2023 18:21

[doc] Add Ray AIR Pytorch ResNet Finetuning Example

952dc52

Signed-off-by: Yunxuan Xiao <yunxuanx@Yunxuans-MacBook-Pro.local>

Merge remote-tracking branch 'origin/master' into doc/air_resnet_fine…

9d77128

…tune_example

improve description & change local_dir

ec82769

Signed-off-by: Yunxuan Xiao <yunxuanx@Yunxuans-MacBook-Pro.local>

load ckpt from S3

94be8da

Signed-off-by: Yunxuan Xiao <yunxuanx@Yunxuans-MBP.local.meter>

Merge remote-tracking branch 'upstream/master' into doc/resnet_finetu…

5643ea0

…ne_starter_example

init starter fine-tune example

24a1ec5

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

woshiyyya changed the title ~~Doc/resnet finetune starter example~~ [Doc] Add Pytorch ResNet finetuning starter example Mar 1, 2023

add SMOKE_TEST flag

e952695

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

woshiyyya marked this pull request as ready for review March 1, 2023 18:13

woshiyyya requested review from richardliaw, gjoliver, krfricke, xwjiang2010, amogkam, matthewdeng, Yard1, maxpumperla and a team as code owners March 1, 2023 18:13

woshiyyya assigned justinvyu and richardliaw Mar 1, 2023

richardliaw reviewed Mar 1, 2023

View reviewed changes

woshiyyya assigned matthewdeng Mar 1, 2023

replace ray data to prepare_dataloader

ff0b75c

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

angelinalg reviewed Mar 2, 2023

View reviewed changes

doc/source/ray-air/examples/pytorch_resnet_finetune.ipynb Outdated Show resolved Hide resolved

angelinalg reviewed Mar 2, 2023

View reviewed changes

doc/source/ray-air/examples/pytorch_resnet_finetune.ipynb Outdated Show resolved Hide resolved

angelinalg reviewed Mar 2, 2023

View reviewed changes

doc/source/ray-air/examples/pytorch_resnet_finetune.ipynb Outdated Show resolved Hide resolved

woshiyyya and others added 3 commits March 3, 2023 12:47

Apply suggestions from code review

372e6fa

Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>

polish the wording

cdd0ced

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

move example under train/examples folder

56e45be

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

woshiyyya force-pushed the doc/resnet_finetune_starter_example branch from 1954168 to d0392b0 Compare March 9, 2023 19:20

fixing path not exist issue

ef3a5ad

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

woshiyyya force-pushed the doc/resnet_finetune_starter_example branch from d0392b0 to ef3a5ad Compare March 9, 2023 20:31

fix doc link

a2a3c4e

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

woshiyyya force-pushed the doc/resnet_finetune_starter_example branch from ee8cb67 to a2a3c4e Compare March 9, 2023 22:50

woshiyyya added 2 commits March 9, 2023 15:16

Merge branch 'master' into doc/resnet_finetune_starter_example

2ce50cc

Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>

remove false book link

19ccf35

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

woshiyyya added the train Ray Train Related Issue label Mar 10, 2023

woshiyyya changed the title ~~[Doc] Add Pytorch ResNet finetuning starter example~~ [Doc|Train] Add Pytorch ResNet finetuning starter example Mar 10, 2023

justinvyu approved these changes Mar 12, 2023

View reviewed changes

woshiyyya and others added 3 commits March 13, 2023 01:48

Apply suggestions from code review

309673f

Co-authored-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>

load checkpoint from uri

f6e8c90

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

Merge remote-tracking branch 'upstream/master' into doc/resnet_finetu…

f2b53bd

…ne_starter_example

woshiyyya requested a review from richardliaw March 16, 2023 01:37

woshiyyya added 2 commits March 15, 2023 19:43

fix wording

58e3d93

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>

Merge remote-tracking branch 'upstream/master' into doc/resnet_finetu…

d18dfbe

…ne_starter_example

richardliaw approved these changes Mar 17, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/master' into doc/resnet_finetu…

01e0f43

…ne_starter_example

richardliaw merged commit 305dc2a into ray-project:master Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc|Train] Add Pytorch ResNet finetuning starter example #32936

[Doc|Train] Add Pytorch ResNet finetuning starter example #32936

woshiyyya commented Mar 1, 2023 •

edited

Loading

richardliaw left a comment

woshiyyya commented Mar 1, 2023

amogkam commented Mar 1, 2023

woshiyyya commented Mar 1, 2023 •

edited

Loading

richardliaw commented Mar 1, 2023

justinvyu left a comment

richardliaw left a comment

[Doc|Train] Add Pytorch ResNet finetuning starter example #32936

[Doc|Train] Add Pytorch ResNet finetuning starter example #32936

Conversation

woshiyyya commented Mar 1, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

richardliaw left a comment

Choose a reason for hiding this comment

woshiyyya commented Mar 1, 2023

amogkam commented Mar 1, 2023

woshiyyya commented Mar 1, 2023 • edited Loading

richardliaw commented Mar 1, 2023

justinvyu left a comment

Choose a reason for hiding this comment

richardliaw left a comment

Choose a reason for hiding this comment

woshiyyya commented Mar 1, 2023 •

edited

Loading

woshiyyya commented Mar 1, 2023 •

edited

Loading