Skip to content

Nature Dataset

Compare
Choose a tag to compare
@Phhofm Phhofm released this 13 Aug 09:15
· 20 commits to main since this release
196f42d

Nature Dataset

This is a curated version of iNaturalist 2017 Dataset for the purpose of training single image super resolution models. The original dataset consists of 675'170 images and is 200GB in size.

There is a small version that consists of 3000 images of 512x512px that can be used to train lightweight networks like for example compact or SPAN. Average hyperiqa score of hr_small with 3000 images is: 0.767434819261233

There is also a medium version that consists of 7000 images of 512x512 that can be used to train medium or heavy networks like for example RealPLKSR or RGT/DAT/ATD. Average hyperiqa score of hr_small with 7000 images is: 0.754073106459209

HR folder, LRx2 and LRx4 folders and a validation folder provided in the Assets as zip files.

I will list the changes I applied (or simply what I did) below:

For the HR folder, I

  • moved all images into the same folder
  • removed all files that were smaller than 300kB -> 240'833 images left (from 675'170)
  • tiled to 512x512
  • hyperiqa scored all of them and removed all that were below 0.7 -> 32'499 images left, 18GB in size
  • checked all images for visual similarity and removed duplicates
  • removed a lot of human hand photos (too many human hands)
  • made a small version with 3k images that can be used for training lightweight sisr networks.
  • made a medium version with images that can be used for training medium/heavy sisr networks.
  • normalized filenames
  • oxipng -o 4 --strip safe --alpha *.png

For the LRx4 folder, I took the HR folder and applied

  • scaling with randomized down_up (range 0.75, 1.5), linear, cubic_mitchell, lanczos, gauss and box
  • slight randomized gaussian blurring
  • randomized jpg compression with quality 75 - 100
  • oxipng -o 4 --strip safe --alpha *.png

The same approach was used for the LRx2 folder

The corresponding zip files are in the Assets below. Since GitHub file size limit is 2GB, the HR_medium was split into 2 files.

Example of HR images from the dataset:
Example1
Example2

Example of bad images removed from the original iNaturalist 2017 dataset:
Example_bad

The small HR folder:
smallversion