Failed to download CelebA dataset using download=True #1920

rrmina · 2020-02-27T15:02:53Z

🐛 Bug

It fails to download the following files

img_align_celeba.zip

Rather than the zip file, it downloads a html file "Google Drive - Quota exceeded". Returns badZipFile error

list_attr_celeba.txt

Similarly, "Google Drive - Quota exceeded". This time it returns RuntimeError('Dataset not found or corrupted.' + ' You can use download=True to download it')

list_landmarks_align_celeba.txt

Similar to number 2

To Reproduce

Steps to reproduce the behavior:

train_dataset = datasets.CelebA('data', split="train", transform=transforms.ToTensor(), download=True)

Expected behavior

Environment

PyTorch version: 1.2.0
Is debug build: No
CUDA used to build PyTorch: 10.0

OS: Microsoft Windows 10 Home Single Language
GCC version: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.17.0
[pip3] torch==1.2.0
[pip3] torchtext==0.4.0
[pip3] torchvision==0.4.0
[conda] Could not collect

Additional context

The text was updated successfully, but these errors were encountered:

pmeier · 2020-02-27T18:14:37Z

The error message

Google Drive - Quota exceeded

means, that the traffic of this file (size and number of downloads) exceeds a limit or quota set by Google Drive. Since we are not hosting the dataset we have no chance to help you with this, since this is not an error on our side. According to the answer in the above link this quota is reset every 24 hours, so a possible fix for you might be to try again later and hope that the traffic limit is not reached yet.

fmassa · 2020-02-27T19:16:14Z

Thanks @pmeier for the help!

It looks like there is not much we can do, please try again in some time and let us now if the problem persists. As such, I'm closing this issue

MohamedAliRashad · 2021-02-18T09:56:56Z

it has been nearly a year on this issue and the error still pops up @pmeier

pmeier · 2021-02-18T10:02:00Z

@MohamedAliRashad What do you mean by

the error still pops up

? There is no way for us to get around this error, since we are not hosting the dataset. See my previous comment #1920 (comment) for details.

MohamedAliRashad · 2021-02-18T11:53:59Z

@pmeier
Can't the dataset be hosted by other services ?

pmeier · 2021-02-18T12:28:07Z

Of course they can, but this is not for us to decide. If you think there is a better hosting solution you need to get in contact with the authors. Note our disclaimer at the bottom of our README:

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

MohanadOdema · 2021-06-01T15:53:32Z

Can I just point out a workaround that worked for me rather trying my luck every 24 hours.

The needed files for celeba dataset, as defined in the filelist in torchvision's CelebA class, are as follows:

img_align_celeba.zip, list_attr_celeba.txt, identity_CelebA.txt, list_bbox_celeba.txt, list_landmarks_align_celeba.txt, list_eval_partition.txt

I downloaded them directly from the authors' google drive link here, and placed them in the path: {root}/celeba

where root is the directory you specify when calling the CelebA class

MohamedAliRashad · 2021-06-01T18:21:17Z

@MohanadOdema
I think your link can be added as second step solution in the download class, it will be nice if you made a PR with this

pmeier · 2021-06-02T05:36:43Z

@MohanadOdema we should be doing exactly the same thing within the download functionality albeit automatically. I can confirm that I get different links when doing this manually. I'll investigate.

nikste · 2021-08-14T20:42:58Z

Can we reopen this? I just ran into this issue again.
The authors seem to have hosted it on baidu drive as well. Would be really great if both sources could be used (or baidu if that does not have a restriction).
https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

I was so happy to have this super simple solution and so disappointed when i ran into this issue :)

nikste · 2021-08-14T21:34:43Z

Friendly ping to one of the authors @liuziwei7, just to make you aware:
There seems to be an issue with downloading the CelebA dataset from google drive in pytorch. I think it would help more people to use CelebA if this gets fixed. Not sure if downloading from baidu automatically is easy to implement, or if it would help hosting it somewhere else.

pmeier · 2021-08-16T07:30:33Z

Can we reopen this? I just ran into this issue again.

This was fixed as good as we can in #4109. Starting from the next release we now bail out early if the download failed instead of simply putting the failure message in the file.

The authors seem to have hosted it on baidu drive as well. Would be really great if both sources could be used (or baidu if that does not have a restriction).

Since AFAIK CelebA is the only dataset hosted on Baidu Cloud and the problem can be solved by waiting and trying again it currently has no priority. We would accept a PR though if you or someone else wants to add the functionality.

There seems to be an issue with downloading the CelebA dataset from google drive in pytorch.

This has nothing to do with PyTorch, but with the dataset hosted on Google Drive. Each file has a daily quota on there. Ff it is met, i.e. the file was downloaded X times for this day, Google drive simple refuses the download if you try again.

wsh3776 · 2021-11-28T02:00:21Z

Can I just point out a workaround that worked for me rather trying my luck every 24 hours.

The needed files for celeba dataset, as defined in the filelist in torchvision's CelebA class, are as follows:

img_align_celeba.zip, list_attr_celeba.txt, identity_CelebA.txt, list_bbox_celeba.txt, list_landmarks_align_celeba.txt, list_eval_partition.txt

I downloaded them directly from the authors' google drive link here, and placed them in the path: {root}/celeba

where root is the directory you specify when calling the CelebA class

Thank you, it works for me.

cooperflourens · 2022-06-01T00:21:47Z

Can I just point out a workaround that worked for me rather trying my luck every 24 hours.

The needed files for celeba dataset, as defined in the filelist in torchvision's CelebA class, are as follows:

img_align_celeba.zip, list_attr_celeba.txt, identity_CelebA.txt, list_bbox_celeba.txt, list_landmarks_align_celeba.txt, list_eval_partition.txt

I downloaded them directly from the authors' google drive link here, and placed them in the path: {root}/celeba

where root is the directory you specify when calling the CelebA class

I'm trying to do this currently to no avail. Do you know if this is still a functional workaround?

abhi-glitchhg · 2022-06-01T04:36:40Z

I'm trying to do this currently to no avail. Do you know if this is still a functional workaround?

Hey @cooperflourens ,

Try manually downloading from the google drive link, you need to login into Google for this. For more information please see the discussions in #5704 and #6052 .

cooperflourens · 2022-06-01T21:08:29Z

I'm trying to do this currently to no avail. Do you know if this is still a functional workaround?

Hey @cooperflourens ,

Try manually downloading from the google drive link, you need to login into Google for this. For more information please see the discussions in #5704 and #6052 .

Hey @abhi-glitchhg ,

Thanks for your reply. I downloaded those files and set download=True and it worked. I think my problem before was that I had download set to false.

Thank you for your help!

anthonyquint · 2023-03-14T19:48:51Z

Can I just point out a workaround that worked for me rather trying my luck every 24 hours.

The needed files for celeba dataset, as defined in the filelist in torchvision's CelebA class, are as follows:

img_align_celeba.zip, list_attr_celeba.txt, identity_CelebA.txt, list_bbox_celeba.txt, list_landmarks_align_celeba.txt, list_eval_partition.txt

I downloaded them directly from the authors' google drive link here, and placed them in the path: {root}/celeba

where root is the directory you specify when calling the CelebA class

This worked for me too. Thank you!

wtipton · 2023-12-25T16:17:23Z

Thanks for the workaround. Not sure if the code's change recently, but fwiw, I also had to unzip img_align_celeba.zip into the celeba/ directory to get it working.

fmassa closed this as completed Feb 27, 2020

fmassa added the module: datasets label Feb 27, 2020

AntixK mentioned this issue Mar 9, 2020

Installation error AntixK/PyTorch-VAE#1

Closed

pmeier mentioned this issue May 27, 2020

Unable to load CelebA dataset. File is not zip file error. #2262

Closed

This was referenced Jun 15, 2020

Failed to download CelebA in Colab #2317

Closed

add descriptive error message if Google Drive quota is exceeded #2321

Merged

pmeier mentioned this issue Mar 30, 2022

improve error handling for GDrive downloads #5704

Merged

cuiboyuan mentioned this issue May 8, 2022

Add CelebA Dataset to Plato TL-System/plato#164

Merged

6 tasks

This was referenced Apr 25, 2024

Downoading celeba dataset chapter 12 page 387. rasbt/machine-learning-book#172

Closed

Chapter 14 rasbt/machine-learning-book#146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to download CelebA dataset using download=True #1920

Failed to download CelebA dataset using download=True #1920

rrmina commented Feb 27, 2020 •

edited

Loading

pmeier commented Feb 27, 2020

fmassa commented Feb 27, 2020

MohamedAliRashad commented Feb 18, 2021

pmeier commented Feb 18, 2021

MohamedAliRashad commented Feb 18, 2021

pmeier commented Feb 18, 2021

MohanadOdema commented Jun 1, 2021

MohamedAliRashad commented Jun 1, 2021

pmeier commented Jun 2, 2021

nikste commented Aug 14, 2021 •

edited

Loading

nikste commented Aug 14, 2021

pmeier commented Aug 16, 2021

wsh3776 commented Nov 28, 2021

cooperflourens commented Jun 1, 2022

abhi-glitchhg commented Jun 1, 2022 •

edited

Loading

cooperflourens commented Jun 1, 2022

anthonyquint commented Mar 14, 2023

wtipton commented Dec 25, 2023

Failed to download CelebA dataset using download=True #1920

Failed to download CelebA dataset using download=True #1920

Comments

rrmina commented Feb 27, 2020 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

pmeier commented Feb 27, 2020

fmassa commented Feb 27, 2020

MohamedAliRashad commented Feb 18, 2021

pmeier commented Feb 18, 2021

MohamedAliRashad commented Feb 18, 2021

pmeier commented Feb 18, 2021

MohanadOdema commented Jun 1, 2021

MohamedAliRashad commented Jun 1, 2021

pmeier commented Jun 2, 2021

nikste commented Aug 14, 2021 • edited Loading

nikste commented Aug 14, 2021

pmeier commented Aug 16, 2021

wsh3776 commented Nov 28, 2021

cooperflourens commented Jun 1, 2022

abhi-glitchhg commented Jun 1, 2022 • edited Loading

cooperflourens commented Jun 1, 2022

anthonyquint commented Mar 14, 2023

wtipton commented Dec 25, 2023

rrmina commented Feb 27, 2020 •

edited

Loading

nikste commented Aug 14, 2021 •

edited

Loading

abhi-glitchhg commented Jun 1, 2022 •

edited

Loading