how can i use the imagenet-22k-wds and imagenet-w21-wds in huggingface on timm/train.py ? #2152
Replies: 1 comment 5 replies
-
@TheDarkKnight-21th you need to find the local path with the .tar files in it, if it also has the It's actually faster to download the large wds tar datasets using the cli tool and enabling HF transfer. And it will use the shard info in the info file, there is no val split for that w21-wds which is why I use an empty '' to disable val. The in12k and in22k have a val subset that I created so wouldn't be needed. You can also manually specify the splits and use a subset of shards for val (they are shuffled). See below for manual split format, shard names followed by | and then # samples
|
Beta Was this translation helpful? Give feedback.
-
i know that wds is the "web dataset" and it is different with image files(data) which is stored in local.
so how can i use the timm/imagenet-22k-wds and timm/imagenet-w21-wds in huggingface on timm/train.py ?
what should i do typing on that script (argument) ?
also is it possible to use the .tar(wds) streaming mode with url (without storing the data in local) ? and then
how can i get the url? like imagenet-12k-wds on webdataset example ? https://huggingface.co/docs/hub/datasets-webdataset
if Data loading with url is possible, which method will be faster between data loading with url or data loading with stored dataset in local?
< imagenet-w21-wds path>
Beta Was this translation helpful? Give feedback.
All reactions