Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable ffcv with PL #11538

Closed
williamFalcon opened this issue Jan 19, 2022 · 20 comments · Fixed by #15598
Closed

enable ffcv with PL #11538

williamFalcon opened this issue Jan 19, 2022 · 20 comments · Fixed by #15598
Assignees
Labels
3rd party Related to a 3rd-party data handling Generic data-related topic docs Documentation related
Milestone

Comments

@williamFalcon
Copy link
Contributor

williamFalcon commented Jan 19, 2022

A team introduced ffcv which does optimizations in the dataloader level.
As long as it can be a drop in replacement for dataloader it should add the benefits to any PyTorch and Lightning script.

ie:

# current
data = Dataloader(...)

trainer.fit(..., data)

# 10x faster with ffcv
data = FFCVDataloader

trainer.fit(..., data)

Let's make sure when users want to get the benefits of both FFCV and PL they can do that!
cc @Borda @rohitgr7 @Felonious-Spellfire @justusschock @awaelchli @ninginthecloud @otaj @tchaton @carmocca

@williamFalcon williamFalcon added the feature Is an improvement or enhancement label Jan 19, 2022
@carmocca carmocca self-assigned this Jan 19, 2022
@carmocca carmocca added 3rd party Related to a 3rd-party data handling Generic data-related topic performance labels Jan 19, 2022
@carmocca carmocca added this to the 1.6 milestone Jan 19, 2022
@tchaton
Copy link
Contributor

tchaton commented Jan 20, 2022

@ethanwharris This could be built-in within Flash.

@paantya
Copy link

paantya commented Jan 20, 2022

libffcv/ffcv#72

@justusschock
Copy link
Member

justusschock commented Jan 20, 2022

This is related to #10696.

When we support arbitrary iterables, ffcv should be supported automatically. Maybe not all features we support with basic torch loaders (haven't looked at ffcv details so far), but definitely usable for training.

@siddk
Copy link

siddk commented Mar 17, 2022

Any update on this?

@carmocca
Copy link
Contributor

carmocca commented Mar 17, 2022

FFCV should be perfectly usable with no modifications to the PL source code after the 1.6 release.

Before, some patching to the loops was necessary to disable PL's pre-fetching as it conflicted with the pre-fetching done by FFCV. Now, PL avoids pre-fetching whenever it is possible.

From my own benchmarks, the speed up from FFCV applies perfectly when used with PL. We'll be releasing public benchmarks in the future.

We'll also explore making or internal data processing more flexible so that maybe we can provide an opinionated LightningDataModule integration that makes switching existing LightningDataModules easier.

A caveat I've noticed are that you'll want to remove the ToDevice(device) from the label pipeline if it's the last operation in your transformations (example) as PL already takes care of moving the tensors to the correct devices.

Features like fault-tolerance do not support FFCV.

Note that there's a lot of surface for bugs here and I think most of the battle-testing has been done on map-style datasets. If you encounter any memory or speed issues, please open a separate issue and you can ping me on it.

Cheers!

@siddk
Copy link

siddk commented Mar 17, 2022

Ok - awesome! Thanks so much!

@Borda Borda modified the milestones: 1.6, 1.6.x Mar 21, 2022
@siddk
Copy link

siddk commented Mar 25, 2022

Second question (sorry for the delayed follow-up); is the recommended pattern to couple FFCV dataloading within a LightningDataModule, or should I just pass the FFCV loaders directly to the Trainer?

@carmocca
Copy link
Contributor

It's up to you, but both will work.

The advantage of the LightningDataModule is that you can put the FFCV specific data-writing code (https://docs.ffcv.io/writing_datasets.html#writing-a-dataset-to-ffcv-format) inside the setup hook and also save in the instance state any specific arguments for your FFCV Loaders:
(https://docs.ffcv.io/making_dataloaders.html#dataset-ordering) inside *_dataloader methods

@leejiahe
Copy link

Hi,

Do we need to write a custom FitLoop to define how we want to pre-fetch the batch, as documented in the 1.6.0 release page to be compatible with FFCV dataloader?

@carmocca
Copy link
Contributor

@leejiahe You don't, this was just to showcase the customization. But it shouldn't be necessary

@codestar12
Copy link

A caveat I've noticed are that you'll want to remove the ToDevice(device) from the label pipeline if it's the last operation in your transformations (example) as PL already takes care of moving the tensors to the correct devices.

do you want to remove all ToDevice(device)? I'm a bit confused because I've mostly been able to ignore manually controlling device placement since switching to PyTorch lightning.

@codestar12
Copy link

I have actually been having a lot of trouble getting this to work with DDP and multiple GPUs. What is the best way to get the GPU rank from a lighting data module? Their data loader seems pretty opinionated about these transforms and really wants to move it to the device.

@Data-drone
Copy link

How did you go in the end @codestar12?

@carmocca
Copy link
Contributor

do you want to remove all ToDevice(device)?

No, only if it's the last operation in the pipeline. Otherwise, you'll want to keep it because doing more operations with the data on device could be faster.

What is the best way to get the GPU rank from a lighting data module?

You can use self.trainer.local_rank

@codestar12
Copy link

How did you go in the end @codestar12?

I basically just removed a lot of FFCV transforms from the pipeline and just kept the data preprocessing bit. It looks like @carmocca finally has the answer though

@paantya
Copy link

paantya commented Jun 28, 2022

Please add somewhere in the Lightning or bolt an example of how to use the Lightning in conjunction with FFCV

@carmocca carmocca modified the milestones: pl:1.6.x, pl:future Jul 28, 2022
@carmocca carmocca added docs Documentation related and removed feature Is an improvement or enhancement performance labels Oct 31, 2022
@Borda Borda assigned justusschock and unassigned carmocca Oct 31, 2022
@carmocca carmocca modified the milestones: future, v1.9 Oct 31, 2022
@justusschock justusschock mentioned this issue Nov 9, 2022
12 tasks
@rmchurch
Copy link

Do I understand right from the docs though that you can't use FFCV + Lightning when using ddp? I pass the distributed arg to the ffcv loader and get errors when running with Lightning. Is this a fundamental issue preventing, or could multi-GPU training with FFCV + Lightning potentially be implemented some day?

@carmocca
Copy link
Contributor

It was working last time I tried it. What errors do you see? FFCV requires quite a bit of configuration to work as you expect. This comment contains the script I used in the past: #15598 (review)

@rmchurch
Copy link

Ah, I had seen your example, but thought it was incomplete because for gpus>1 it was marked FIXME: distributed, and distributed=False input into FFCV Loader. Thinking it was old code, I was trying distributed=True, and getting the init_process_group not initialized error. But is it the case that I should put distributed=False into the FFCV Loader still, and Lightning will insert the DistributedSampler? If that is the case, perhaps the paragraph below in the docs should be rephrased, or additional detail given that ddp will work with FFCV, but you have to leave distributed=False.

In a distributed multi-GPU setting (ddp), Lightning automatically replaces the DataLoader’s sampler with its distributed counterpart. This makes sure that each GPU sees a different part of the dataset. As sampling can be implemented in arbitrary ways with custom iterables, there is no way for Lightning to know, how to replace the sampler.

@carmocca
Copy link
Contributor

No, Lightning will not insert it automatically for FFCV. You might need to initialize the Loader class under one of the *_dataloader hooks so that the process groups are initialized before you create it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party Related to a 3rd-party data handling Generic data-related topic docs Documentation related
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.