Issue with offline mode #4760

SaulLu · 2022-07-28T12:45:14Z

Describe the bug

I can't retrieve a cached dataset with offline mode enabled

Steps to reproduce the bug

To reproduce my issue, first, you'll need to run a script that will cache the dataset

import os
os.environ["HF_DATASETS_OFFLINE"] = "0"

import datasets

datasets.logging.set_verbosity_info()
ds_name = "SaulLu/toy_struc_dataset"
ds = datasets.load_dataset(ds_name)
print(ds)

then, you can try to reload it in offline mode:

import os
os.environ["HF_DATASETS_OFFLINE"] = "1"

import datasets

datasets.logging.set_verbosity_info()
ds_name = "SaulLu/toy_struc_dataset"
ds = datasets.load_dataset(ds_name)
print(ds)

Expected results

I would have expected the 2nd snippet not to return any errors

Actual results

The 2nd snippet returns:

Traceback (most recent call last):
  File "/home/lucile_huggingface_co/sandbox/evaluate/test_cache_datasets.py", line 8, in <module>
    ds = datasets.load_dataset(ds_name)
  File "/home/lucile_huggingface_co/anaconda3/envs/evaluate-dev/lib/python3.8/site-packages/datasets/load.py", line 1723, in load_dataset
    builder_instance = load_dataset_builder(
  File "/home/lucile_huggingface_co/anaconda3/envs/evaluate-dev/lib/python3.8/site-packages/datasets/load.py", line 1500, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/home/lucile_huggingface_co/anaconda3/envs/evaluate-dev/lib/python3.8/site-packages/datasets/load.py", line 1241, in dataset_module_factory
    raise ConnectionError(f"Couln't reach the Hugging Face Hub for dataset '{path}': {e1}") from None
ConnectionError: Couln't reach the Hugging Face Hub for dataset 'SaulLu/toy_struc_dataset': Offline mode is enabled.

Environment info

datasets version: 2.4.0
Platform: Linux-4.19.0-21-cloud-amd64-x86_64-with-glibc2.17
Python version: 3.8.13
PyArrow version: 8.0.0
Pandas version: 1.4.3

Maybe I'm misunderstanding something in the use of the offline mode (see doc), is that the case?

The text was updated successfully, but these errors were encountered:

albertvillanova · 2022-07-28T15:33:18Z

Hi @SaulLu, thanks for reporting.

I think offline mode is not supported for datasets containing only data files (without any loading script). I'm having a look into this...

SaulLu · 2022-07-28T15:37:30Z

Thanks for your feedback!

To give you a little more info, if you don't set the offline mode flag, the script will load the cache. I first noticed this behavior with the evaluate library, and while trying to understand the downloading flow I realized that I had a similar error with datasets.

albertvillanova · 2022-07-28T15:49:05Z

This is an issue we have to fix.

lhoestq · 2022-07-28T16:05:36Z

This is related to #3547

thuzhf · 2023-05-10T13:10:29Z

Still not fixed? ......

lhoestq · 2023-05-11T10:11:48Z

#5331 will be helpful to fix this, as it updates the cache directory template to be aligned with the other datasets

ManuelFay · 2023-10-26T13:18:32Z

Any updates ?

je-santos · 2024-01-23T01:33:13Z

I'm facing the same problem

lhoestq · 2024-01-23T10:58:18Z

This issue has been fixed in datasets 2.16 by #6493. The cache is now working properly :)

You just have to update datasets:

pip install -U datasets

jaded0 · 2024-02-15T17:24:31Z

I'm on version 2.17.0, and this exact problem is still persisting.

lhoestq · 2024-02-15T17:27:07Z

Can you share some code to reproduce your issue ?

Also make sure your cache was populated with recent versions of datasets. Datasets cached with old versions may not be reloadable in offline mode, though we did our best to keep as much backward compatibility as possible.

BramVanroy · 2024-03-19T08:48:17Z

I'm not sure if this is related @lhoestq but I am experiencing a similar issue when using offline mode:

$ python -c "from datasets import load_dataset; load_dataset('openai_humaneval', split='test')"
$ HF_DATASETS_OFFLINE=1 python -c "from datasets import load_dataset; load_dataset('openai_humaneval', split='test')"
Using the latest cached version of the dataset since openai_humaneval couldn't be found on the Hugging Face Hub (offline mode is enabled).
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/dodrio/scratch/projects/2023_071/alignment-handbook/.venv/lib/python3.10/site-packages/datasets/load.py", line 2556, in load_dataset
    builder_instance = load_dataset_builder(
  File "/dodrio/scratch/projects/2023_071/alignment-handbook/.venv/lib/python3.10/site-packages/datasets/load.py", line 2265, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
  File "/dodrio/scratch/projects/2023_071/alignment-handbook/.venv/lib/python3.10/site-packages/datasets/packaged_modules/cache/cache.py", line 122, in __init__
    config_name, version, hash = _find_hash_in_cache(
  File "/dodrio/scratch/projects/2023_071/alignment-handbook/.venv/lib/python3.10/site-packages/datasets/packaged_modules/cache/cache.py", line 48, in _find_hash_in_cache
    raise ValueError(
ValueError: Couldn't find cache for openai_humaneval for config 'default'
Available configs in the cache: ['openai_humaneval']

lhoestq · 2024-03-19T10:48:53Z

Thanks for reporting @BramVanroy, I managed to reproduce and I opened a fix here: #6741

BramVanroy · 2024-03-19T11:43:28Z

Awesome, thanks for the quick fix @lhoestq! Looking forward to update my dependency version list.

noforit · 2024-03-25T09:11:10Z

Thanks for reporting @BramVanroy, I managed to reproduce and I opened a fix here: #6741

Thanks a lot！ I have faced the same problem. Can I use your fix code to directly replace the existing version code? I noticed that this fix has not been merged yet. Will it affect other functionalities?

lhoestq · 2024-03-25T16:24:44Z

I just merged the fix, you can install datasets from source or wait for the patch release which will be out in the coming days

SaulLu added the bug Something isn't working label Jul 28, 2022

albertvillanova self-assigned this Jul 28, 2022

lhoestq closed this as completed Jan 23, 2024

lhoestq mentioned this issue Mar 19, 2024

Fix offline mode with single config #6741

Merged

jungle-gym-ac mentioned this issue Oct 18, 2024

[Feature Request] Update Datasets Version, so that lmms-eval can be used in Offline Environment EvolvingLMMs-Lab/lmms-eval#335

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with offline mode #4760

Issue with offline mode #4760

SaulLu commented Jul 28, 2022

albertvillanova commented Jul 28, 2022

SaulLu commented Jul 28, 2022

albertvillanova commented Jul 28, 2022

lhoestq commented Jul 28, 2022

thuzhf commented May 10, 2023

lhoestq commented May 11, 2023

ManuelFay commented Oct 26, 2023

je-santos commented Jan 23, 2024

lhoestq commented Jan 23, 2024

jaded0 commented Feb 15, 2024

lhoestq commented Feb 15, 2024

BramVanroy commented Mar 19, 2024

lhoestq commented Mar 19, 2024

BramVanroy commented Mar 19, 2024

noforit commented Mar 25, 2024

lhoestq commented Mar 25, 2024

Issue with offline mode #4760

Issue with offline mode #4760

Comments

SaulLu commented Jul 28, 2022

Describe the bug

Steps to reproduce the bug

Expected results

Actual results

Environment info

albertvillanova commented Jul 28, 2022

SaulLu commented Jul 28, 2022

albertvillanova commented Jul 28, 2022

lhoestq commented Jul 28, 2022

thuzhf commented May 10, 2023

lhoestq commented May 11, 2023

ManuelFay commented Oct 26, 2023

je-santos commented Jan 23, 2024

lhoestq commented Jan 23, 2024

jaded0 commented Feb 15, 2024

lhoestq commented Feb 15, 2024

BramVanroy commented Mar 19, 2024

lhoestq commented Mar 19, 2024

BramVanroy commented Mar 19, 2024

noforit commented Mar 25, 2024

lhoestq commented Mar 25, 2024