Skip to content
This repository has been archived by the owner on Nov 25, 2024. It is now read-only.

mediaapi: Add thumbnail support #132

Merged
merged 19 commits into from
Jun 6, 2017
Merged

mediaapi: Add thumbnail support #132

merged 19 commits into from
Jun 6, 2017

Conversation

superdump
Copy link
Contributor

No description provided.

superdump added 3 commits June 3, 2017 10:33
bimg is MIT licensed. It depends on the C library libvips which is LGPL
v2.1+ licensed. libvips must be installed separately.
@superdump
Copy link
Contributor Author

From: https://github.com/h2non/bimg#prerequisites

Note: libvips v8.3+ is required for GIF, PDF and SVG support.

Travis is configured to use trusty but xenial wouldn't deliver a new enough libvips for GIF support either:
screen shot 2017-06-03 at 11 43 55

The libvips wiki notes the ppa I've added to get a newer libvips on trusty: http://www.vips.ecs.soton.ac.uk/index.php?title=Supported#Unix

@superdump
Copy link
Contributor Author

@kegsay previously expressed a preference for pure go libraries instead of depending on libraries that have to be installed externally and bridged into go. However, the performance of the available golang image processing libraries (https://github.com/nfnt/resize) is grossly worse than something based on libvips.

The pure golang solution with nfnt/resize does not use any smart use of box filters to downscale large JPEGs (as in resolutions used in all digital cameras today) to within some percentage of the desired output resolution, and then use some traditional interpolation algorithm like bicubic or lanczos3. nfnt/resize can only use interpolation directly which is much much slower. Also, libvips has a smart multi-threaded design which splits the image up and processes those parts in parallel.

In summary, I think trying to do any media processing in pure golang instead of using mature and appropriately-optimised libraries written in other languages like C and using ASM optimisations is currently not a good choice. Perhaps the situation will change in the future but I strongly vote for bimg + libvips today and accept the cost of additional installation complexity. (I think we should ship a docker container anyway and then it's basically just pulling down binary data to execute. 😄)

@kegsay
Copy link
Member

kegsay commented Jun 3, 2017

libvips has a smart multi-threaded design which splits the image up and processes those parts in parallel.

Looks like nfnt/resize already does this based on the number of max procs (defaults to # cores) https://github.com/nfnt/resize/blob/master/resize.go#L612 - so it's a shame that there no obvious room for improvement for the golang version.

@ara4n
Copy link
Member

ara4n commented Jun 5, 2017

i don't see the point in jumping through hoops to avoid native deps - surely we just use the best tool for the job?

@kegsay
Copy link
Member

kegsay commented Jun 5, 2017

I think that it's a bit of a double standard to say "use the best tool for the job" on the one hand, but strive to eradicate Kafka/Zookeeper, to the point of emphasising this on all the Dendrite literature, on the other hand.

@NegativeMjark
Copy link
Contributor

I'm really not happy with adding a system dependency, especially one that requires adding a random ppa.
I'd much rather use a native golang library for resizing. Even though the performance is lower.

We can always ship a libvips version of the media repository in a separate go project if we need the additional performance.

@NegativeMjark
Copy link
Contributor

NegativeMjark commented Jun 5, 2017

Alternatively go supports tags that you could use to make libvips conditional.
So add something like

// +build libvips

to the stop of the libvips specific source files.

Logger: util.GetLogger(req.Context()).WithFields(log.Fields{
"Origin": origin,
"MediaID": mediaID,
}),
}

if r.IsThumbnailRequest {
Copy link
Contributor

@NegativeMjark NegativeMjark Jun 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight bias towards using separate functions for downloading and thumbnailing instead of using a condition to wedge both into a single function. Sharing the common code is nice but usually the same effect can be better handled by lifting the common code into a separate function.

However the reason I dislike having the code for the two cases wedged together is that it makes it harder to follow what is going on in a single case. That doesn't seem to be too bad in this case since it's a single bool condition and is always tested in the same way.

So while this is usually something I'd object to, I think it is acceptable here for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can refactor. It's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I changed my mind. I think this is better. 😄

if err != nil {
r.Logger.WithError(err).Warn("Error generating thumbnails")
}
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always generate the thumbnails in the background?
What happens if this is a request for thumbnails? Will this request race with the background thumbnail generation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumbnails being pre-generated according to the configuration of the server are generated in the background. There is a race in time, but it's done safely using the activeThumbnailGeneration synchronised map of thumbnail file name to a sync.Cond and the result of the thumbnail attempt, in a very similar way to the download of remote files.

For example (and I have tested this and specifically made sure that this works), say that we are running our server configured to pre-generate a bunch of thumbnail formats including 800x600 using the scale method. If a thumbnail request is received for an 800x600 scaled version of some image that is hosted on a remote server and we don't yet have that file, the server downloads the file from the remote server and then fires off a separate go routine (the code commented on here) to do the pre-generated thumbnailing. Each thumbnail configuration in turn will lock activeThumbnailGeneration, check if there's an active thumbnail generation job ongoing for that thumbnail, etc and if not, add a sync.Cond to the map and unlock. If there is an active thumbnail generation job ongoing, it waits on the sync.Cond. So it doesn't matter if the pre-generation job gets to generating that thumbnail first, or if the code that continues after this explicitly and synchronously requests to generate that specific thumbnail. First come, first served and the rest will wait or just find the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if dynamic thumbnail generation is turned off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically all covered here - pay attention to the control flow and hopefully it'll be clear:

if dynamicThumbnails {
thumbnail, resErr = r.generateThumbnail(filePath, r.ThumbnailSize, activeThumbnailGeneration, db)
if resErr != nil {
return nil, nil, resErr
}
} else {
thumbnails, err := db.GetThumbnails(r.MediaMetadata.MediaID, r.MediaMetadata.Origin)
if err != nil {
r.Logger.WithError(err).WithFields(log.Fields{
"Width": r.ThumbnailSize.Width,
"Height": r.ThumbnailSize.Height,
"ResizeMethod": r.ThumbnailSize.ResizeMethod,
}).Error("Error looking up thumbnails")
resErr := jsonerror.InternalServerError()
return nil, nil, &resErr
}
// If we get a thumbnailSize, a pre-generated thumbnail would be best but it is not yet generated.
// If we get a thumbnail, we're done.
var thumbnailSize *types.ThumbnailSize
thumbnail, thumbnailSize = thumbnailer.SelectThumbnail(r.ThumbnailSize, thumbnails, thumbnailSizes)
if thumbnailSize != nil {
r.Logger.WithFields(log.Fields{
"Width": thumbnailSize.Width,
"Height": thumbnailSize.Height,
"ResizeMethod": thumbnailSize.ResizeMethod,
}).Info("Pre-generating thumbnail for immediate response.")
thumbnail, resErr = r.generateThumbnail(filePath, *thumbnailSize, activeThumbnailGeneration, db)
if resErr != nil {
return nil, nil, resErr
}
}
}
if thumbnail == nil {
return nil, nil, nil
}

return 0
}

func compareThumbnailMetrics(a thumbnailMetrics, b thumbnailMetrics, desiredCrop bool) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe write this as:

// betterThan checks whether the thumbnail metrics are a better fit than the other metrics
func (a thumbnailMetrics) betterThan(b thumbnailMetrics, desiredCrop bool) bool {
...
}

I don't think we are distinguishing between `1` and `0` at the callsites so returning a bool might be neater.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

func newThumbnailMetrics() thumbnailMetrics {
return thumbnailMetrics{
isSmaller: 1,
aspect: float64(16384 * 16384),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use https://golang.org/pkg/math/#Inf for the worst value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually looking at MaxFloat64 and previously I was doing some b.aspect - a.aspect and was concerned that the precision with such a large number would be such that the subtraction would be inconsequential.

As such, I took something more reasonably within the realm of an acceptable max image resolution. However, now I am comparing the numbers directly I don't have the same precision problem.

I knew this would provoke comment but I forgot to comment it to explain. How do you want me to proceed - comment, Inf() or MaxFloat64?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Inf() makes the most sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

logger.WithFields(log.Fields{
"Width": config.Width,
"Height": config.Height,
"ResizeMethod": config.ResizeMethod,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe do the WithFields once and cache the result in a local variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider that but had no strong argument either way - I'll do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Both in a few places in download.go and in thumbnailer.go.

}

// init with worst values
func newThumbnailMetrics() thumbnailMetrics {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a different name than metrics here? I'm going to confuse this with prometheus metrics otherwise.

Maybe thumbnailFitness ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@NegativeMjark NegativeMjark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This looks awesome 🥇

@superdump
Copy link
Contributor Author

superdump commented Jun 6, 2017

bimg:
thumbnail-96x96-crop

nfnt:
thumbnail-96x96-crop

nfnt is slightly sharper but uses more cpu and takes longer (~845ms for the matrix.org config)
bimg is less sharp, uses less cpu and takes less time (~295ms for the matrix.org config)

The sharpness and performance is partially because nfnt uses Lanczos3 the whole way where bimg/libvips uses a box filter to shrink most of the way for large scale factors and then Lanczos3 for the last bit.

@superdump
Copy link
Contributor Author

I added a comment and a couple of small sample images to the README. Thanks for reviewing and glad you like it. 😄

@superdump superdump merged commit 2d202ce into master Jun 6, 2017
@superdump superdump deleted the rob/media-thumbnail branch June 6, 2017 23:12
brianathere added a commit to HereNotThere/dendrite that referenced this pull request Sep 24, 2022
* Verify that the user ID for registration matches the spec, and the auth data (#10)

* Blacklist some sytest tests that are failing in our environment

* Commenting out test that isn't reliably passing or failing, probably a race

* refresh latest dendrite main

* pull latest from dendrite-fork subtree

* refresh latest dendrite main

* pull dendrite subtree and resolve merge conflicts

* check that userID matches the signed message

* verify that the user ID for registration is CAIP-10 compliant and MXID compliant

* removed space

Co-authored-by: Brian Meek <brian@hntlabs.com>
Co-authored-by: Tak Wai Wong <takwaiw@gmail.com>

* Fix nats.go commit (matrix-org#2540)

Signed-off-by: Jean Lucas <jean@4ray.co>

* Don't return `end` if there are not more messages (matrix-org#2542)

* Be more spec compliant

* Move lazyLoadMembers to own method

* Return an error if trying to invite a malformed user ID (matrix-org#2543)

* Add `evacuateUser` endpoint, use it when deactivating accounts (matrix-org#2545)

* Add `evacuateUser` endpoint, use it when deactivating accounts

* Populate the API

* Clean up user devices when deactivating

* Include invites, delete pushers

* Silence presence logs (matrix-org#2547)

* Takwaiw/fix concurrent registration bug (#12)

* fix concurrent registration bug. Rename decentralizedid

* remove unused module

* add regressed test to blacklist

Co-authored-by: Tak Wai Wong <takwaiw@gmail.com>

Co-authored-by: Brian Meek <brian@hntlabs.com>
Co-authored-by: Tak Wai Wong <takwaiw@gmail.com>
Co-authored-by: Jean Lucas <jean@4ray.co>
Co-authored-by: Till <2353100+S7evinK@users.noreply.github.com>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
tak-hntlabs added a commit to HereNotThere/dendrite that referenced this pull request Oct 25, 2022
* Verify that the user ID for registration matches the spec, and the auth data (#10)

* Blacklist some sytest tests that are failing in our environment

* Commenting out test that isn't reliably passing or failing, probably a race

* refresh latest dendrite main

* pull latest from dendrite-fork subtree

* refresh latest dendrite main

* pull dendrite subtree and resolve merge conflicts

* check that userID matches the signed message

* verify that the user ID for registration is CAIP-10 compliant and MXID compliant

* removed space

Co-authored-by: Brian Meek <brian@hntlabs.com>
Co-authored-by: Tak Wai Wong <takwaiw@gmail.com>

* Fix nats.go commit (matrix-org#2540)

Signed-off-by: Jean Lucas <jean@4ray.co>

* Don't return `end` if there are not more messages (matrix-org#2542)

* Be more spec compliant

* Move lazyLoadMembers to own method

* Return an error if trying to invite a malformed user ID (matrix-org#2543)

* Add `evacuateUser` endpoint, use it when deactivating accounts (matrix-org#2545)

* Add `evacuateUser` endpoint, use it when deactivating accounts

* Populate the API

* Clean up user devices when deactivating

* Include invites, delete pushers

* Silence presence logs (matrix-org#2547)

* Takwaiw/fix concurrent registration bug (#12)

* fix concurrent registration bug. Rename decentralizedid

* remove unused module

* add regressed test to blacklist

Co-authored-by: Tak Wai Wong <takwaiw@gmail.com>

Co-authored-by: Brian Meek <brian@hntlabs.com>
Co-authored-by: Tak Wai Wong <takwaiw@gmail.com>
Co-authored-by: Jean Lucas <jean@4ray.co>
Co-authored-by: Till <2353100+S7evinK@users.noreply.github.com>
Co-authored-by: Neil Alexander <neilalexander@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants