Accessing Arrow dataset cache_files #2386

Mehrad0711 · 2021-05-20T23:57:43Z

Describe the bug

In datasets 1.5.0 the following code snippet would have printed the cache_files:

train_data = load_dataset('conll2003', split='train', cache_dir='data')
print(train_data.cache_files[0]['filename'])

However, in the newest release (1.6.1), it prints an empty list.

I also tried loading the dataset with keep_in_memory=True argument but still cache_files is empty.

Was wondering if this is a bug or I need to pass additional arguments so I can access the cache_files.

The text was updated successfully, but these errors were encountered:

Mehrad0711 · 2021-05-21T19:18:03Z

Thanks @bhavitvyamalik for referencing the workaround. Setting keep_in_memory=False is working.

Mehrad0711 added the bug Something isn't working label May 20, 2021

bhavitvyamalik mentioned this issue May 21, 2021

datasets 1.6 ignores cache #2387

Closed

Mehrad0711 closed this as completed May 21, 2021