Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to download data_CVPR2021.zip #4

Closed
Hlic818 opened this issue Oct 21, 2021 · 8 comments
Closed

failed to download data_CVPR2021.zip #4

Hlic818 opened this issue Oct 21, 2021 · 8 comments

Comments

@Hlic818
Copy link

Hlic818 commented Oct 21, 2021

I try to download the data from the link:https://www.dropbox.com/sh/1s6r4slurc5ei2n/AACg6TqoDfGdKe8t40Em1fgxa?dl=0&preview=data_CVPR2021.zip on different computers connecting different networks, but all failed. Is there any problems about whole data?
At your convenience, would you please send me the data(excluding synthetic data)via the mail?my email:1050217987@qq.com

@ku21fan
Copy link
Owner

ku21fan commented Oct 21, 2021

Hello,

Sorry for the inconvenience of downloading. The file is huge (8.42 GB).. so it is hard to send via email.

Can you try to download it with the following command? (via the original download URL)

wget -O data_CVPR2021.zip https://www.dropbox.com/sh/1s6r4slurc5ei2n/AABJZzmWTCNt6EWVXbQ-QdDUa/data_CVPR2021.zip?dl=0

or this command, (we just reset the download URL of the file.)

wget -O data_CVPR2021.zip https://www.dropbox.com/s/o27gunx16usjhgu/data_CVPR2021.zip?dl=0

In our environment, both commands can still download the file data_CVPR2021.zip.
So, we don't know why the problem happens :(

If you still cannot download this file, I am planning to upload it to Baidu.

Hope it helps.

@Hlic818
Copy link
Author

Hlic818 commented Oct 25, 2021 via email

@ku21fan
Copy link
Owner

ku21fan commented Oct 25, 2021

OK. I am going to upload it Baidu right now.

@ku21fan
Copy link
Owner

ku21fan commented Oct 25, 2021

I uploaded data to baidu (password: datm)

The file data_CVPR2021.zip is split into 3files: data_CVPR2021_split.z01, data_CVPR2021.z02, and data_CVPR2021.zip.

You should download them all and then run the following commands

cat data_CVPR2021_split.z* > tmp.zip
unzip tmp.zip

then you will get data_CVPR2021.zip (about 8.5GB)

Hope it helps :)

@yusirhhh
Copy link

When I use the above command unzip tmp.zip,
It return a error: End-of-centdir-64 signature not where expected (prepended bytes?)

Do it happen when you unzip this file?

@ku21fan
Copy link
Owner

ku21fan commented Nov 18, 2021

@yusirhhh It did not happen to me.
umm.. can you try to re-download or check md5sum of downloaded files?

md5sum of each file are as follows.

77c78ac256ffbf3cc5e36c8bd5e00b4d  data_CVPR2021_split.z01
ac4424d99c8c8ccdcb17dd7e2b8b9ae6  data_CVPR2021_split.z02
9c5c71ad13f72f700434bcd438b49d1c  data_CVPR2021_split.zip

@yusirhhh
Copy link

Hello, when performing scene text recognition, the input picture is processed into lmdb format. I want to conduct research on handwritten text on this program. I would like to ask if processing as lmdb format has a big impact on the speed of training. I look forward to your reply.

@ku21fan
Copy link
Owner

ku21fan commented Dec 5, 2021

@yusirhhh
Hello, I have not compared the speed of training carefully, but I believe that using lmdb format is faster than not using lmdb format.

Following the convention that CRNN implementation did, I usually use the lmdb format.
And of course, the lmdb format helps to handle many image files as one DB file.
Thus, I use the lmdb format because of convention and convenience.

So, in my opinion, if you don't need to follow convention and do not get the improvement of speed from lmdb, you may not need to use the lmdb format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants