Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing Arrow dataset cache_files #2386

Closed
Mehrad0711 opened this issue May 20, 2021 · 1 comment
Closed

Accessing Arrow dataset cache_files #2386

Mehrad0711 opened this issue May 20, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@Mehrad0711
Copy link

Describe the bug

In datasets 1.5.0 the following code snippet would have printed the cache_files:

train_data = load_dataset('conll2003', split='train', cache_dir='data')
print(train_data.cache_files[0]['filename'])

However, in the newest release (1.6.1), it prints an empty list.

I also tried loading the dataset with keep_in_memory=True argument but still cache_files is empty.

Was wondering if this is a bug or I need to pass additional arguments so I can access the cache_files.

@Mehrad0711 Mehrad0711 added the bug Something isn't working label May 20, 2021
@Mehrad0711
Copy link
Author

Thanks @bhavitvyamalik for referencing the workaround. Setting keep_in_memory=False is working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant