-
Notifications
You must be signed in to change notification settings - Fork 212
Lightning Trainer/PyTorch Lightning 1.7.0 support + CI Fixes (JIT Tracing and Functions to Classes conversion) #1410
Conversation
for more information, see https://pre-commit.ci
Codecov Report
@@ Coverage Diff @@
## master #1410 +/- ##
==========================================
- Coverage 91.77% 91.76% -0.01%
==========================================
Files 286 286
Lines 12857 12861 +4
==========================================
+ Hits 11799 11802 +3
- Misses 1058 1059 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
…AI/lightning-flash into flash-ci/functions-to-classes
model.eval() | ||
|
||
model.trainer = trainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Lightning-AI/pytorch-lightning#14036 (comment) for a workaround
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, anything wrong with the current workaround?
for more information, see https://pre-commit.ci
…AI/lightning-flash into flash-ci/functions-to-classes
…AI/lightning-flash into flash-ci/functions-to-classes
About the 5 failures left in the CI:
Thanks to @otaj for his help with a few failures related to PL 1.7, it was very helpful. ❤️ Also @rohitgr7 for a fix! 🔥 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @krshrimali ! Some very painful bugs here. Couple of suggestions. Also, let's add some things to the CHANGELOG here 😃
def change_collate_fn_dataloader(self, data_loader): | ||
# Starting PL 1.7.0 - changing attributes after the DataLoader is initialized - will not work | ||
# So we manually set the collate_fn to the wrapped version, for now. | ||
new_kwargs = getattr(data_loader, "__pl_saved_kwargs", None) | ||
new_collate_fn = functools.partial(self._wrap_collate_fn, data_loader.collate_fn) | ||
if new_kwargs: | ||
new_kwargs["collate_fn"] = new_collate_fn | ||
setattr(data_loader, "__pl_saved_kwargs", new_kwargs) | ||
data_loader.collate_fn = new_collate_fn | ||
return data_loader | ||
|
||
def update_collate_fn_dataloader(self, new_collate_fn, data_loader): | ||
# Starting PL 1.7.0 - changing attributes after the DataLoader is initialized - will not work | ||
# So we manually update the collate_fn for the dataloader, for now. | ||
new_kwargs = getattr(data_loader, "__pl_saved_kwargs", None) | ||
if new_kwargs: | ||
new_kwargs["collate_fn"] = new_collate_fn | ||
setattr(data_loader, "__pl_saved_kwargs", new_kwargs) | ||
data_loader.collate_fn = new_collate_fn | ||
return data_loader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a lot of duplication. Seems like change_collate_fn_dataloader
is equivalent to:
self.update_collate_fn_dataloader(functools.partial(self._wrap_collate_fn, data_loader.collate_fn), data_loader)
Maybe better to just do that and remove change_collate_fn_dataloader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethanwharris, you are right, this could be merged into only one function. Btw, while technically not a bug on PL side, it is definitely a regression on user experience regarding dataloaders and I'm sorry @krshrimali had to run into that. I suspect though, he will not be the last one, so I will send a PR today/early next week to PL to improve UX in there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethanwharris - Agreed. Fixed!
@otaj - Thank you for your help with this, please don't be sorry - looking forward to your PR which improves the UX further. Thank you again!
.github/workflows/ci-testing.yml
Outdated
@@ -33,6 +33,7 @@ jobs: | |||
- { os: 'ubuntu-20.04', python-version: 3.9, requires: 'latest', release: 'pre', topic: [ 'core' ] } | |||
- { os: 'ubuntu-20.04', python-version: 3.9, requires: 'latest', release: 'stable', topic: [ 'image' ] } | |||
- { os: 'ubuntu-20.04', python-version: 3.9, requires: 'latest', release: 'stable', topic: [ 'image','image_extras' ] } | |||
- { os: 'ubuntu-20.04', python-version: 3.9, requires: 'latest', release: 'stable', topic: [ 'image','image_extras_effdet' ] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to have no effect as it will lead to just image deps being installed and effdet is installed only for serving. I think it would be cleaner to not add a new job but just set the effdet version in requirements/datatype_image_extras
to be the one from github since we don't expect users to install the image_extras
extras anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, should have mentioned - I tried putting the version of effdet in datatype_image_extras.txt
and doing pip install -r requirements/datatype_image_extras.txt
fails miserably :/ Lots of version conflicts. Hence I thought to create a new file, and put effdet there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the offline conversation with @ethanwharris - we decided to limit PyTorch version to <1.12 until icevision
supports effdet
0.3.0.
Hi, @ethanwharris - The 5 failures are expected in the CI, only 1 of these is something that I haven't investigated yet. Fixes for others are either already merged in PL, or we have a PR opened in Flash. Could you please take a look again, and help merge if it all looks good? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
@@ -91,7 +91,7 @@ def on_run_start(self, *args: Any, **kwargs: Any) -> None: | |||
assert isinstance(self.trainer.datamodule, ActiveLearningDataModule) | |||
if self._datamodule_state_dict is not None: | |||
self.trainer.datamodule.load_state_dict(self._datamodule_state_dict) | |||
self.trainer.predict_loop._return_predictions = True | |||
self.trainer.predict_loop.return_predictions = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but we should figure out what to do with the baal loop. It is so bound to a particular PL and Baal version that I'm not sure it makes sense to have it as part of a framework. Maybe it would be better as a tutorial or in bolts? cc @otaj
…AI/lightning-flash into flash-ci/functions-to-classes
What does this PR do?
This PR adds support for PL 1.7.0 to Flash, and fixes a few issues with the CI. Summary:
ClassFromFunction
class. We were not checking if its type is subclass ofCustomClassFunctionBase
.Trainer
class, else it will give an error similar to:ImageClassifier is not attached to Trainer
.on_predict_dataloader, on_train_dataloader, on_test_dataloader
were deprecated, and removed in PL 1.7. I'm not very sure if we need them, removing that works?probabilities
need to have a check if it's not empty elsetorch.cat
will fail.DataLoader
class won't change the attributes except when they are in the__init__
method. Flash changes the collate function after theDataLoader
is initialized - this was causing multiple errors. This PR fixes it.This PR attempts to fix both of the issues. Note that we should document that the best way to JIT trace the model in Flash is to:
Issue: #1413
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃