title |
---|
jpg241 - Author & serve a single progressive JPEG image which can be served as two different qualities - JPG "Two for One" |
Author & serve a single progressive JPEG image which can be served as two different qualities -- "Two for One" 😁
Tracey Jaquith, from Internet Archive, runs their Imagery API. She was looking for a way to encode two downloadable qualities into a single file...
We transform images to a 'steep progressive' JPEG.
Let's say we are sending out 180px width thumbnails.
This technique encodes a JPEG so the first ~6700 bytes (on average) have a 'displayed at 180px wide' reasonable looking image for most cases. Total filesize expected to be ~72kb file on average.
This means the low quality portion of the image, only ~10% on average of the full image file, can be used for low-bandwidth situations where a user wants less bandwidth used.
Thus, you can encode a single file, and simply serve it two different ways.
For the preview image, you can just serve out the start of the file, cutting it off at the 3rd instance of hex pair of bytes FFDA
😁
(This will truncate the file after only sending out the first 2 "scans".)
https://jpg241.dev.archive.org
wedding.jpg | bytes | url |
---|---|---|
preview image | 10,165 | wedding.jpg/preview |
full image | 111,231 | wedding.jpg |
https://github.com/internetarchive/jpg241
We use this scans profile to get the lower-quality image into the first ~10% of the total image:
Example encoding:
# convert to input for `cjpeg` -- you can add args like: `--resize 180x` or `--resize 25%`
convert INPUT INPUT.ppm
# create JPG with wanted scans profile
cjpeg -scans scans.txt INPUT.ppm > encoded.jpg
(see the Dockerfile for more info on the packages for convert
and cjpeg
)
The encoded JPEG file will rely primarily on the first and last of 6 scans for the main imagery data.
If we are sending the full file, but cutoff at the start of the 3rd scan (keeping first two), that's averaging about 6700 bytes (of ~72kb full file) in testing a few hundred random items. We keep one more scan than we think we might need because:
- scans [2..5] are small (deliberately)
- https://archive.org/details/morebooks was missing last few pixel rows w/ just the 1st scan.
Clients / browsers / OS can handle the truncated image and will simply 'paint what it can'.
Progressive JPEG - https://cloudinary.com/blog/progressive_jpegs_and_green_martians