cgnorthcutt
released this
05 May 22:33
·
65 commits
to main
since this release
We release the cross-validated predicted probabilities for the QuickDraw dataset. These probabilities were trained using 4-fold cross-validation for all 50,426,266
examples and 345
classes. The resulting predicted probabilities (pyx numpy matrix) is shape 50426266 x 345
. The resulting file is 33GB
in np.float16
format.
Note, pyx
is short for prob(y = label | data example x)
.
Download the QuickDraw Cross-validated Predicted Probabilities as an numpy matrix.
Make sure pigz
and wget
are installed:
# on Mac OS
brew install wget pigz
# on Ubuntu
sudo apt-get install pigz
Download the pyx files
base_url="https://github.com/cgnorthcutt/label-errors/releases/download/"
base_filename="quickdraw-pyx-v1/quickdraw_pyx.tar.gz-parta"
for part in $(eval echo "{a..k}"); do
wget --continue $base_url$base_filename$part
done
Decompress the tar.gz file parts into the final pyx numpy matrix:
cat quickdraw_pyx.tar.gz-part?? | unpigz | tar -xvC .
Ancillary extra details
To compress the pyx probabilities file prior to uploading, we used the followign command
tar -I pigz -cvf - quickdraw_pyx.npy | split --bytes=1800M - "quickdraw_pyx.tar.gz-part"