Replies: 3 comments 7 replies
-
@rwightman quick question - at which point or where in BITS does data sharding occur? Per my understanding each process should get a part of the data (this is based on HF accelerate) - I am not sure how this is dealt with in BITS? |
Beta Was this translation helpful? Give feedback.
-
Also, is there a way to use your version of ImageNet data? Believe it would save me some $$ to kick off training using your version and not host my own new dataset. |
Beta Was this translation helpful? Give feedback.
-
@amaarora BTW, re the timm bits vs accelerate comparison, a few corrections. First timm bits is a collection of multiple 'bits', the idea being you can use up to whatever level of abstraction you want. At it's lowest level the DeviceEnv is a thinner abstraction than accelerate and doesn't lock you in to any other design decisions re prepare. It's a very low level device wrapper that covers most PyTorch + CPU vs PyTorch + CUDA vs Pytorch XLA + TPU/GPU/CPU differences for common float32/(b)float16/AMP and distributed scenarios. I decided to let some hardware detail leak into the Updater classes to keep that logic clean and performant. The coupling of the grad scaler with clipping and the grad synchronization you need to do in PyTorch XLA vs Pytorch w/ DDP made it hard to do nicely w/ just device env. Also, the monster that is DeepSpeed (one massive Engine class that includes every model, optimizer, detail imageinable) makes it really difficult to use without making DeviceEnv cover 'too much'. I feel I can probably do that reasonably at the Updater level w/ some Deepspeed hacks but we shall see. One can use just
There is absolutely no need to use the timm bits provided train or eval loops/fns. My train script will use them but they won't cover every use case and I will encourage users to write their own when they need to vs me trying to add interfaces, hooks, and fns to cover all foreseable use cases. |
Beta Was this translation helpful? Give feedback.
-
Starting a new discussion post to discuss things related to timm bits. Will continue to add comments/post in this discussion post to keep it as a central place in case anybody else wants to read it in the future too.
Below, comments aren't really answers but rather further questions as I go through the source code.
Beta Was this translation helpful? Give feedback.
All reactions