Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Lightning >=2.0.0 and Pandas >=2.0.0 #301

Merged
merged 25 commits into from
Apr 27, 2023
Merged

Conversation

SagiPolaczek
Copy link
Collaborator

@SagiPolaczek SagiPolaczek commented Apr 13, 2023

Ready for CR 🕺🏼

Warning - Lightning >=2.0.0 breaks backward competability

Support Lightning 2.0.0:

References:

https://lightning.ai/pages/releases/2.0.0/ (recommended!!)
https://lightning.ai/docs/pytorch/latest/upgrade/migration_guide.html / thanks to @smartdanny
*_epoch_end() migration guide

Changes in Fuse:

  1. Trainer(strategy=None) is no longer supported. We should now use strategy="auto". see here for more info.
  2. Trainer(auto_select_gpus=...) also got removed.
  3. Tuner and Trainer broke up 💔 -> relevant for KNIGHT’s fuse baseline.
  4. Migrate *_epoch_end() to on_*_epoch_end(). see changes in run_mnist_custom_pl_imp.py and the relevant reference above for more info.

Remarks:

Support pandas 2.0.0:

References:

https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#deprecations

Changes in Fuse:

  1. df.append() was removed. see here . Relevant for an EHR transformer dataset. Had to switch with pd.concat().

@SagiPolaczek SagiPolaczek marked this pull request as draft April 13, 2023 14:44
@SagiPolaczek SagiPolaczek changed the title Support PyTorch Lightning >= 2.0 Support **Lightning >= 2.0.0** and **Pandas >= 2.0.0** Apr 16, 2023
@SagiPolaczek SagiPolaczek changed the title Support **Lightning >= 2.0.0** and **Pandas >= 2.0.0** Support Lightning >= 2.0.0 and Pandas >= 2.0.0 Apr 16, 2023
Copy link
Collaborator Author

@SagiPolaczek SagiPolaczek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor explanations

@@ -246,9 +246,10 @@ def _read_raw_data(raw_data_path: str) -> Tuple[pd.DataFrame, pd.DataFrame]:
outcomes = ["Outcomes-a.txt", "Outcomes-b.txt"]
for o in outcomes:
o_file = os.path.join(raw_data_path + "/" + o)
df_outcomes = df_outcomes.append(pd.read_csv(o_file)[["RecordID", "In-hospital_death"]]).reset_index(
df_outcomes = pd.concat([df_outcomes, pd.read_csv(o_file)[["RecordID", "In-hospital_death"]]]).reset_index(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

append was removed in 2.0.0

setup.py Outdated Show resolved Hide resolved
@SagiPolaczek SagiPolaczek linked an issue Apr 16, 2023 that may be closed by this pull request
@SagiPolaczek SagiPolaczek marked this pull request as ready for review April 16, 2023 12:06
Copy link
Collaborator

@mosheraboh mosheraboh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; see if it's possible to be backward compatible though.

@@ -8,7 +8,7 @@ matplotlib>=3.3.3
scikit-learn>=0.23.2
termcolor>=1.1.0
pycocotools>=2.0.1
pytorch_lightning>=1.6,<2.0.0 # temp - need to make all tests pass with this version first
pytorch_lightning>=2.0.0 # Lightning 2.0.0 is not backward compatible
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be in someway backward compatible - it seems too restrictive

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

setup.py Outdated Show resolved Hide resolved
@SagiPolaczek
Copy link
Collaborator Author

Looks good; see if it's possible to be backward compatible though.

Thanks!
I thought about it a bit later and it might be possible yes. Will try that on Wednesday.

@SagiPolaczek
Copy link
Collaborator Author

SagiPolaczek commented Apr 19, 2023

@mosheraboh
Now we support both <2.0.0 , >=2.0.0 !

Please see the two last runs that differs only by the Lightning version:
Screenshot 2023-04-19 at 13 26 34

I'll do another self-review now.

NOTE

To support both versions I had to delete arguments (and arguments values) that brake the backward computability such as strategy and auto_scale_batch_size.

num_sanity_val_steps=-1,
auto_scale_batch_size="binsearch",
# auto_scale_batch_size="binsearch", # should use Tuner - https://lightning.ai/pages/releases/2.0.0/#tuner
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is vary between <2.0.0 and >=2.0.0 !

@SagiPolaczek SagiPolaczek changed the title Support Lightning >= 2.0.0 and Pandas >= 2.0.0 Support Lightning >=2.0.0 and Pandas >=2.0.0 Apr 19, 2023
mosheraboh
mosheraboh previously approved these changes Apr 20, 2023
Copy link
Collaborator

@mosheraboh mosheraboh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

mosheraboh
mosheraboh previously approved these changes Apr 24, 2023
Copy link
Collaborator

@mosheraboh mosheraboh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, one question inline

@@ -180,13 +180,20 @@ def run_train(params: dict) -> None:
distributed=distributed,
)

# A workaround to support multiple GPUs with a custom batch_sampler for both Lightning versions
# see: https://lightning.ai/pages/releases/2.0.0/#sampler-replacement
kwargs = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be wrapped with if distributed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be wrapped, but it does nothing when it is not on distributed mode. (for example, the default value is True)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wrap it anyway for clarity, thanks!

Copy link
Collaborator

@mosheraboh mosheraboh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

@SagiPolaczek SagiPolaczek merged commit 2d74f98 into master Apr 27, 2023
@SagiPolaczek SagiPolaczek deleted the sagi/support_pl_2.0 branch April 27, 2023 12:18
SagiPolaczek added a commit to BiomedSciAI/fuse-drug that referenced this pull request Jun 15, 2023
# ✅ 

Same as [Fuse's](BiomedSciAI/fuse-med-ml#301)

## IMPORTANT
Couldn't fix the following unit-test: 
```
/dccstor/mm_hcls/usr/sagi/git_repos/fuse-drug/fusedrug_examples/tests/test_bimodal_mca.py BimodalMCATestCase.test_runner
```

Got some weird issue where it just crashes in the end of the
multi-processing part: (**without multi-processing it works OK !**)
```
column names used=['molecule_sequence', 'molecule_id'] - if it does not look like column names make sure that the following args are properly set: first_row_is_columns_names, columns_names
allow_access_by_id is enabled, building in-memory offset map (num_workers=128)
multiprocess pool created with 128 workers.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 441/441 [00:35<00:00, 12.55it/s]
[rank: 0] Received SIGTERM: 15████████████████████████████████████████████████████████████████████████████████████████▍| 439/441 [00:35<00:00, 14.29it/s]
[rank: 0] Received SIGTERM: 15
[rank: 0] Received SIGTERM: 15
[rank: 0] Received SIGTERM: 15
[rank: 0] Received SIGTERM: 15
[rank: 0] Received SIGTERM: 15
[rank: 0] Received SIGTERM: 15
.
.
.
```

I spent few hours to try to fix it but didn't succeed. The closest thing
I found was
[this](https://lightning.ai/forums/t/support-for-pytorchdata-dataloader2-multiprocessing-issue/2756)
Lightning x PyData issue.

---------

Co-authored-by: Sagi Polaczek <sagi.polaczek@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support pytorch lightning 2.0.0
2 participants