-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calls to map are not cached. #2322
Comments
I tried upgrading to Downgrading to Downloading: 9.20kB [00:00, 3.94MB/s]
Downloading: 5.99kB [00:00, 3.29MB/s]
No config specified, defaulting to: sst/default
Downloading and preparing dataset sst/default (download: 6.83 MiB, generated: 3.73 MiB, post-processed: Unknown size, total: 10.56 MiB) to /home/johannes/.cache/huggingface/datasets/sst/default/1.0.0/a16a45566b63b2c3179e6c1d0f8edadde56e45570ee8cf99394fbb738491d34b...
Dataset sst downloaded and prepared to /home/johannes/.cache/huggingface/datasets/sst/default/1.0.0/a16a45566b63b2c3179e6c1d0f8edadde56e45570ee8cf99394fbb738491d34b. Subsequent calls will reuse this data.
executed [0, 1]
#0: 0%| | 0/5 [00:00<?, ?ba/s]
#1: 0%| | 0/5 [00:00<?, ?ba/s]
executed [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
executed [4272, 4273, 4274, 4275, 4276, 4277, 4278, 4279, 4280, 4281]
executed [1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009]
executed [5272, 5273, 5274, 5275, 5276, 5277, 5278, 5279, 5280, 5281]
executed [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009]
executed [6272, 6273, 6274, 6275, 6276, 6277, 6278, 6279, 6280, 6281]
executed [3000, 3001, 3002, 3003, 3004, 3005, 3006, 3007, 3008, 3009]
executed [7272, 7273, 7274, 7275, 7276, 7277, 7278, 7279, 7280, 7281]
executed [4000, 4001, 4002, 4003, 4004, 4005, 4006, 4007, 4008, 4009]
#0: 100%|██████████| 5/5 [00:00<00:00, 94.83ba/s]
executed [8272, 8273, 8274, 8275, 8276, 8277, 8278, 8279, 8280, 8281]
#1: 100%|██████████| 5/5 [00:00<00:00, 92.75ba/s]
executed [0, 1]
#0: 0%| | 0/1 [00:00<?, ?ba/s]
#1: 0%| | 0/1 [00:00<?, ?ba/s]
executed [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
executed [551, 552, 553, 554, 555, 556, 557, 558, 559, 560]
#0: 100%|██████████| 1/1 [00:00<00:00, 118.81ba/s]
#1: 100%|██████████| 1/1 [00:00<00:00, 123.06ba/s]
executed [0, 1]
#0: 0%| | 0/2 [00:00<?, ?ba/s]
#1: 0%| | 0/2 [00:00<?, ?ba/s]
executed [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
executed [1105, 1106, 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114]
executed [1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009]
#0: 100%|██████████| 2/2 [00:00<00:00, 119.42ba/s]
executed [2105, 2106, 2107, 2108, 2109, 2110, 2111, 2112, 2113, 2114]
#1: 100%|██████████| 2/2 [00:00<00:00, 123.33ba/s]
##############################
executed [0, 1]
Loading cached processed dataset at /home/johannes/.cache/huggingface/datasets/sst/default/1.0.0/a16a45566b63b2c3179e6c1d0f8edadde56e45570ee8cf99394fbb738491d34b/cache-6079777aa097c8f8.arrow
Loading cached processed dataset at /home/johannes/.cache/huggingface/datasets/sst/default/1.0.0/a16a45566b63b2c3179e6c1d0f8edadde56e45570ee8cf99394fbb738491d34b/cache-2dc05c46f68eda6e.arrow
executed [0, 1]
Loading cached processed dataset at /home/johannes/.cache/huggingface/datasets/sst/default/1.0.0/a16a45566b63b2c3179e6c1d0f8edadde56e45570ee8cf99394fbb738491d34b/cache-1ca347e7430b98f1.arrow
Loading cached processed dataset at /home/johannes/.cache/huggingface/datasets/sst/default/1.0.0/a16a45566b63b2c3179e6c1d0f8edadde56e45570ee8cf99394fbb738491d34b/cache-c0f1a73ce3ba40cd.arrow
executed [0, 1]
Loading cached processed dataset at /home/johannes/.cache/huggingface/datasets/sst/default/1.0.0/a16a45566b63b2c3179e6c1d0f8edadde56e45570ee8cf99394fbb738491d34b/cache-832a1407bf1ac5b7.arrow
Loading cached processed dataset at /home/johannes/.cache/huggingface/datasets/sst/default/1.0.0/a16a45566b63b2c3179e6c1d0f8edadde56e45570ee8cf99394fbb738491d34b/cache-036316a259b773c4.arrow
- Datasets: 1.5.0
- Python: 3.8.3 (default, May 19 2020, 18:47:26)
[GCC 7.3.0]
- Platform: Linux-5.4.0-72-generic-x86_64-with-glibc2.10 |
Hi, set datasets/src/datasets/arrow_dataset.py Line 1718 in 241a0b4
@albertvillanova It seems like this behavior was overlooked in #2182. |
Hi @villmow, thanks for reporting. As @mariosasko has pointed out, we did not consider this case when introducing the feature of automatic in-memory for small datasets. This needs to be fixed. |
Hi ! Currently a dataset that is in memory doesn't know doesn't know in which directory it has to read/write cache files. Because of that, currently in-memory datasets simply don't use caching. Maybe a Dataset object could have a |
Please @villmow, feel free to update to |
Describe the bug
Somehow caching does not work for me anymore. Am I doing something wrong, or is there anything that I missed?
Steps to reproduce the bug
Actual results
This code prints the following output for me:
Expected results
Caching should work.
The text was updated successfully, but these errors were encountered: