-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Lightning >=2.0.0 and Pandas >=2.0.0 #301
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor explanations
@@ -246,9 +246,10 @@ def _read_raw_data(raw_data_path: str) -> Tuple[pd.DataFrame, pd.DataFrame]: | |||
outcomes = ["Outcomes-a.txt", "Outcomes-b.txt"] | |||
for o in outcomes: | |||
o_file = os.path.join(raw_data_path + "/" + o) | |||
df_outcomes = df_outcomes.append(pd.read_csv(o_file)[["RecordID", "In-hospital_death"]]).reset_index( | |||
df_outcomes = pd.concat([df_outcomes, pd.read_csv(o_file)[["RecordID", "In-hospital_death"]]]).reset_index( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
append was removed in 2.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; see if it's possible to be backward compatible though.
fuse/requirements.txt
Outdated
@@ -8,7 +8,7 @@ matplotlib>=3.3.3 | |||
scikit-learn>=0.23.2 | |||
termcolor>=1.1.0 | |||
pycocotools>=2.0.1 | |||
pytorch_lightning>=1.6,<2.0.0 # temp - need to make all tests pass with this version first | |||
pytorch_lightning>=2.0.0 # Lightning 2.0.0 is not backward compatible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be in someway backward compatible - it seems too restrictive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
Thanks! |
@mosheraboh Please see the two last runs that differs only by the Lightning version: I'll do another self-review now. NOTETo support both versions I had to delete arguments (and arguments values) that brake the backward computability such as |
num_sanity_val_steps=-1, | ||
auto_scale_batch_size="binsearch", | ||
# auto_scale_batch_size="binsearch", # should use Tuner - https://lightning.ai/pages/releases/2.0.0/#tuner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option is vary between <2.0.0
and >=2.0.0
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, one question inline
@@ -180,13 +180,20 @@ def run_train(params: dict) -> None: | |||
distributed=distributed, | |||
) | |||
|
|||
# A workaround to support multiple GPUs with a custom batch_sampler for both Lightning versions | |||
# see: https://lightning.ai/pages/releases/2.0.0/#sampler-replacement | |||
kwargs = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be wrapped with if distributed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be wrapped, but it does nothing when it is not on distributed mode. (for example, the default value is True
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll wrap it anyway for clarity, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great
# ✅ Same as [Fuse's](BiomedSciAI/fuse-med-ml#301) ## IMPORTANT Couldn't fix the following unit-test: ``` /dccstor/mm_hcls/usr/sagi/git_repos/fuse-drug/fusedrug_examples/tests/test_bimodal_mca.py BimodalMCATestCase.test_runner ``` Got some weird issue where it just crashes in the end of the multi-processing part: (**without multi-processing it works OK !**) ``` column names used=['molecule_sequence', 'molecule_id'] - if it does not look like column names make sure that the following args are properly set: first_row_is_columns_names, columns_names allow_access_by_id is enabled, building in-memory offset map (num_workers=128) multiprocess pool created with 128 workers. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 441/441 [00:35<00:00, 12.55it/s] [rank: 0] Received SIGTERM: 15████████████████████████████████████████████████████████████████████████████████████████▍| 439/441 [00:35<00:00, 14.29it/s] [rank: 0] Received SIGTERM: 15 [rank: 0] Received SIGTERM: 15 [rank: 0] Received SIGTERM: 15 [rank: 0] Received SIGTERM: 15 [rank: 0] Received SIGTERM: 15 [rank: 0] Received SIGTERM: 15 . . . ``` I spent few hours to try to fix it but didn't succeed. The closest thing I found was [this](https://lightning.ai/forums/t/support-for-pytorchdata-dataloader2-multiprocessing-issue/2756) Lightning x PyData issue. --------- Co-authored-by: Sagi Polaczek <sagi.polaczek@ibm.com>
Ready for CR 🕺🏼
Warning - Lightning >=2.0.0 breaks backward competability
Support Lightning 2.0.0:
References:
https://lightning.ai/pages/releases/2.0.0/ (recommended!!)
https://lightning.ai/docs/pytorch/latest/upgrade/migration_guide.html / thanks to @smartdanny
*_epoch_end() migration guide
Changes in Fuse:
Trainer(strategy=None)
is no longer supported. We should now use strategy="auto". see here for more info.Trainer(auto_select_gpus=...)
also got removed.Tuner
andTrainer
broke up 💔 -> relevant for KNIGHT’s fuse baseline.*_epoch_end()
toon_*_epoch_end()
. see changes inrun_mnist_custom_pl_imp.py
and the relevant reference above for more info.Remarks:
Support pandas 2.0.0:
References:
https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#deprecations
Changes in Fuse:
df.append() was removed
. see here . Relevant for an EHR transformer dataset. Had to switch with pd.concat().