Nature Dataset

This is a curated version of iNaturalist 2017 Dataset for the purpose of training single image super resolution models. The original dataset consists of 675'170 images and is 200GB in size.

There is a small version that consists of 3000 images of 512x512px that can be used to train lightweight networks like for example compact or SPAN. Average hyperiqa score of hr_small with 3000 images is: 0.767434819261233

There is also a medium version that consists of 7000 images of 512x512 that can be used to train medium or heavy networks like for example RealPLKSR or RGT/DAT/ATD. Average hyperiqa score of hr_small with 7000 images is: 0.754073106459209

HR folder, LRx2 and LRx4 folders and a validation folder provided in the Assets as zip files.

I will list the changes I applied (or simply what I did) below:

For the HR folder, I

moved all images into the same folder
removed all files that were smaller than 300kB -> 240'833 images left (from 675'170)
tiled to 512x512
hyperiqa scored all of them and removed all that were below 0.7 -> 32'499 images left, 18GB in size
checked all images for visual similarity and removed duplicates
removed a lot of human hand photos (too many human hands)
made a small version with 3k images that can be used for training lightweight sisr networks.
made a medium version with images that can be used for training medium/heavy sisr networks.
normalized filenames
oxipng -o 4 --strip safe --alpha *.png

For the LRx4 folder, I took the HR folder and applied

scaling with randomized down_up (range 0.75, 1.5), linear, cubic_mitchell, lanczos, gauss and box
slight randomized gaussian blurring
randomized jpg compression with quality 75 - 100
oxipng -o 4 --strip safe --alpha *.png

The same approach was used for the LRx2 folder

The corresponding zip files are in the Assets below. Since GitHub file size limit is 2GB, the HR_medium was split into 2 files.

Example of HR images from the dataset:

Example of bad images removed from the original iNaturalist 2017 dataset:

The small HR folder:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nature Dataset

Nature Dataset