Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance #40

Merged
merged 3 commits into from
Jul 25, 2016
Merged

Conversation

janko
Copy link
Contributor

@janko janko commented Jul 23, 2016

First of all, thank you for maintaining a great and very useful library.

I've integrated MimeMagic in the Shrine library for handling file uploads, and wanted to do a couple of performance improvements. I want to say upfront, this isn't only for reducing string allocations, it will actually have a significant impact in my context. In Shrine, in addition to passing local files, you can pass remote files represented as IOs, which downloads only how much is needed for #read to return the requested bytes.

Before sending a PR to update MimeMagic to support this IO-like object (which don't respond to #seek), I wanted to first make performance improvements ensuring everything runs the fastest. This PR consists of three distinct changes:

  1. Small refactoring which removes the duplication between detecting from IO or string, by wrapping the string into a StringIO.
  2. IO#read accepts a "buffer" string as a second argument, which will make that instead of #read returning a newly allocated string, it will always modify the "buffer" string, thus not making any new allocations. I tested the on a JPEG file, and it saved 400+ string allocations, which were 44KB in total.
  3. The MIME type is detected from the generated table from top to bottom. However, the way this table is generated, the algorithm will first check a lot of types which are very uncommon (thus very unlikely to be the MIME type of the input file), before it comes to more common types. I think it is much more efficient to put common MIME types in the beginning of the table, because then most common file types will be detected faster, resulting into an overall faster algorithm. I tested with a JPEG and an EPS image; before the change JPEG detection was 100x slower than EPS (because JPEG was much further down the list), but now after the change JPEG detection is 10x faster then EPS.

It would be great if you could review this, so that we can get it merged, and I can proceed with the second PR which avoids reading the whole content (in my case this means downloading the whole file) for IOs which don't respond to #seek.

janko added 3 commits July 23, 2016 14:03
Previously we had to separate branches which did the same thing, one was
for IOs and other was for strings. However, if we wrap the string into a
StringIO, we can use the same branch for both.
If you speed up detection of common MIME types, you speed up MIME type
detection in general.
@janko
Copy link
Contributor Author

janko commented Jul 24, 2016

Just wanted to clarify more the motivation for change no. 3. Some MIME types require more bytes to be checked, and some less. If the MIME type requires less bytes, in my case that means downloading less bytes.

Previously, even if the input IO is an image, cca 18725 bytes would still have to be downloaded, because that's how much audio/vnd.dts.hd requires, which is checked before any image types.

With this change of putting image MIME types first, we need to download only a couple of bytes because that's how much they require, and on my machine the difference was 0.2 seconds between downloading 18725 bytes and downloading 10 bytes.

@minad
Copy link
Collaborator

minad commented Jul 25, 2016

Hey @janko-m!

Seems like a great PR. It is a good idea to switch everything to IO.

I can also add you as a collaborator if you are interested?

Daniel

@minad minad merged commit 7cd73b3 into mimemagicrb:master Jul 25, 2016
@janko janko deleted the performance-improvements branch July 25, 2016 17:57
@janko
Copy link
Contributor Author

janko commented Jul 25, 2016

@minad Cool, I would like that!

@mimemagicrb mimemagicrb locked as resolved and limited conversation to collaborators Mar 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants