Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Jpeg Huffman Decoder Bottleneck #643

Merged
merged 36 commits into from
Jul 3, 2018

Conversation

JimBobSquarePants
Copy link
Member

@JimBobSquarePants JimBobSquarePants commented Jul 1, 2018

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

Replaces the slow PdfJsScanDecoder implementation with a much faster implementation based on Stb_Image. Now it's no longer a bottleneck. Fixes #601

We're creeping slowly closer to System.Drawing now for jpeg decoding.

New Benchmarks

BenchmarkDotNet=v0.10.14, OS=Windows 10.0.17134
Intel Core i7-6600U CPU 2.60GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
Frequency=2742191 Hz, Resolution=364.6719 ns, Timer=TSC
.NET Core SDK=2.1.300
  [Host]     : .NET Core 2.0.7 (CoreCLR 4.6.26328.01, CoreFX 4.6.26403.03), 64bit RyuJIT
  Job-CBPIPJ : .NET Framework 4.7.1 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3110.0
  Job-FVNPWQ : .NET Core 2.0.7 (CoreCLR 4.6.26328.01, CoreFX 4.6.26403.03), 64bit RyuJIT

LaunchCount=1  TargetCount=3  WarmupCount=3

                           Method | Runtime |                    TestImage |      Mean |     Error |    StdDev | Scaled | ScaledSD |    Gen 0 | Allocated |
--------------------------------- |-------- |----------------------------- |----------:|----------:|----------:|-------:|---------:|---------:|----------:|
   'Decode Jpeg - System.Drawing' |     Clr |  Jpg/baseline/Calliphora.jpg |  6.786 ms |  2.364 ms | 0.1336 ms |   1.00 |     0.00 | 117.1875 | 254.47 KB |
       'Decode Jpeg - ImageSharp' |     Clr |  Jpg/baseline/Calliphora.jpg | 33.662 ms |  4.877 ms | 0.2755 ms |   4.96 |     0.09 |        - |  52.63 KB |
 'Decode Jpeg - ImageSharp PdfJs' |     Clr |  Jpg/baseline/Calliphora.jpg | 24.767 ms |  3.341 ms | 0.1888 ms |   3.65 |     0.06 |        - |  25.25 KB |
                                  |         |                              |           |           |           |        |          |          |           |
   'Decode Jpeg - System.Drawing' |    Core |  Jpg/baseline/Calliphora.jpg |  7.079 ms | 11.763 ms | 0.6646 ms |   1.00 |     0.00 | 117.1875 | 254.11 KB |
       'Decode Jpeg - ImageSharp' |    Core |  Jpg/baseline/Calliphora.jpg | 34.337 ms | 12.214 ms | 0.6901 ms |   4.88 |     0.36 |        - |  47.79 KB |
 'Decode Jpeg - ImageSharp PdfJs' |    Core |  Jpg/baseline/Calliphora.jpg | 26.188 ms |  1.067 ms | 0.0603 ms |   3.72 |     0.27 |        - |  21.71 KB |
                                  |         |                              |           |           |           |        |          |          |           |
   'Decode Jpeg - System.Drawing' |     Clr | Jpg/baseline/jpeg420exif.jpg | 20.881 ms | 21.458 ms | 1.2124 ms |   1.00 |     0.00 | 343.7500 | 757.89 KB |
       'Decode Jpeg - ImageSharp' |     Clr | Jpg/baseline/jpeg420exif.jpg | 80.619 ms |  6.260 ms | 0.3537 ms |   3.87 |     0.18 | 250.0000 | 564.28 KB |
 'Decode Jpeg - ImageSharp PdfJs' |     Clr | Jpg/baseline/jpeg420exif.jpg | 56.485 ms |  7.069 ms | 0.3994 ms |   2.71 |     0.13 | 250.0000 | 535.07 KB |
                                  |         |                              |           |           |           |        |          |          |           |
   'Decode Jpeg - System.Drawing' |    Core | Jpg/baseline/jpeg420exif.jpg | 17.653 ms |  4.603 ms | 0.2601 ms |   1.00 |     0.00 | 343.7500 | 757.04 KB |
       'Decode Jpeg - ImageSharp' |    Core | Jpg/baseline/jpeg420exif.jpg | 80.280 ms | 10.145 ms | 0.5732 ms |   4.55 |     0.06 | 250.0000 | 548.49 KB |
 'Decode Jpeg - ImageSharp PdfJs' |    Core | Jpg/baseline/jpeg420exif.jpg | 57.506 ms |  6.344 ms | 0.3584 ms |   3.26 |     0.04 | 250.0000 | 522.19 KB |

Before Performance Trace
Before Performance tracing details

After Performance Trace
After Performance tracing details

I'm genuinely amazed I managed to pull this off!

@codecov
Copy link

codecov bot commented Jul 1, 2018

Codecov Report

Merging #643 into master will increase coverage by 0.14%.
The diff coverage is 91.63%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #643      +/-   ##
==========================================
+ Coverage   89.16%   89.31%   +0.14%     
==========================================
  Files         890      893       +3     
  Lines       37946    38020      +74     
  Branches     2661     2667       +6     
==========================================
+ Hits        33835    33956     +121     
+ Misses       3302     3267      -35     
+ Partials      809      797      -12
Impacted Files Coverage Δ
tests/ImageSharp.Tests/TestImages.cs 100% <ø> (ø) ⬆️
...ts/Jpeg/PdfJsPort/Components/PdfJsHuffmanTables.cs 100% <ø> (ø) ⬆️
src/ImageSharp/Formats/Jpeg/JpegThrowHelper.cs 0% <0%> (ø)
...ts/Jpeg/PdfJsPort/Components/FixedInt32Buffer18.cs 100% <100%> (ø)
...arp/Formats/Jpeg/PdfJsPort/PdfJsJpegDecoderCore.cs 83.01% <100%> (-1.25%) ⬇️
...s/Jpeg/PdfJsPort/Components/FixedUInt32Buffer18.cs 100% <100%> (ø)
...s/Jpeg/PdfJsPort/Components/FixedInt16Buffer257.cs 100% <100%> (ø)
...ts/Jpeg/PdfJsPort/Components/FixedByteBuffer512.cs 100% <100%> (ø)
...ats/Jpeg/PdfJsPort/Components/PdfJsHuffmanTable.cs 100% <100%> (ø) ⬆️
...s/Jpeg/PdfJsPort/Components/PdfJsFrameComponent.cs 95.74% <100%> (-2.04%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7bd82b...2277896. Read the comment docs.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, but I think the code is worth an other refactor/cleanup round before we move on.

It might be also worth to investigate if we can reduce the code duplications.

@@ -9,14 +9,14 @@ namespace SixLabors.ImageSharp.Formats.Jpeg.PdfJsPort.Components
[StructLayout(LayoutKind.Sequential)]
internal unsafe struct FixedInt64Buffer18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type name should be renamed accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

/// <summary>
/// The collection of lookup tables used for fast AC entropy scan decoding.
/// </summary>
internal sealed class FastACTables : IDisposable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we really need this class in it's current form. Either we should hide the Table property and encapsulate actual logic (initialization, retrieval of rows) inside the class, or we should drop it and define a helper method that creates Buffer2d<short>.

Copy link
Member Author

@JimBobSquarePants JimBobSquarePants Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having a class but I think you're right about the tables property. I'll hide that away and introduce a GetTableSpan() method.

private byte marker;
private bool badMarker;
private long markerPosition;
private int todo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have better names or some comments about the semantics of less trivial variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

PdfJsFrameComponent component = this.components[k];
ref short blockDataRef = ref MemoryMarshal.GetReference(MemoryMarshal.Cast<Block8x8, short>(component.SpectralBlocks.Span));
ref PdfJsHuffmanTable dcHuffmanTable = ref dcHuffmanTables[component.DCHuffmanTableId];
ref PdfJsHuffmanTable acHuffmanTable = ref acHuffmanTables[component.ACHuffmanTableId];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acHuffmanTable and fastAC are unused

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes I miss Resharper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got the unused, different line though so Github won't see it.

}
}

private void DecodeBlockProgressiveAC(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is quite long, might be worth to refactor internal branches into separate methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll split out the Refinement part into a separate method.

private int PeekBits() => (int)((this.codeBuffer >> (32 - FastBits)) & ((1 << FastBits) - 1));

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private uint LRot(uint x, int y) => (x << y) | (x >> (32 - y));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method can be static

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

/// Builds a lookup table for fast AC entropy scan decoding.
/// </summary>
/// <param name="index">The table index.</param>
private void BuildFastACTable(int index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be probably a member of the FastACTables class.

@@ -9,14 +9,14 @@ namespace SixLabors.ImageSharp.Formats.Jpeg.PdfJsPort.Components
[StructLayout(LayoutKind.Sequential)]
internal unsafe struct FixedInt16Buffer18
Copy link
Member

@dlemstra dlemstra Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename the struct because it no longer is a short/ Int16?

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good for a merge now!

@antonfirsov
Copy link
Member

@JimBobSquarePants pushed some code simplification, and a benchmark to quickly test ParseStream(). There is no change in performance:

Baseline, Before:

                           Method | Runtime |                    TestImage |     Mean |     Error |    StdDev | Scaled | ScaledSD |    Gen 0 | Allocated |
--------------------------------- |-------- |----------------------------- |---------:|----------:|----------:|-------:|---------:|---------:|----------:|
            'System.Drawing FULL' |     Clr | Jpg/baseline/jpeg420exif.jpg | 18.35 ms |  8.224 ms | 0.4647 ms |   1.00 |     0.00 | 218.7500 | 757.88 KB |
 PdfJsJpegDecoderCore.ParseStream |     Clr | Jpg/baseline/jpeg420exif.jpg | 19.64 ms | 13.984 ms | 0.7901 ms |   1.07 |     0.04 |        - |     15 KB |
                                  |         |                              |          |           |           |        |          |          |           |
            'System.Drawing FULL' |    Core | Jpg/baseline/jpeg420exif.jpg | 18.36 ms |  4.017 ms | 0.2270 ms |   1.00 |     0.00 | 218.7500 | 757.04 KB |
 PdfJsJpegDecoderCore.ParseStream |    Core | Jpg/baseline/jpeg420exif.jpg | 21.21 ms | 10.237 ms | 0.5784 ms |   1.16 |     0.03 |        - |  14.84 KB |

Baseline, After:

                           Method | Runtime |                    TestImage |     Mean |    Error |    StdDev | Scaled | ScaledSD |    Gen 0 | Allocated |
--------------------------------- |-------- |----------------------------- |---------:|---------:|----------:|-------:|---------:|---------:|----------:|
            'System.Drawing FULL' |     Clr | Jpg/baseline/jpeg420exif.jpg | 18.38 ms | 7.007 ms | 0.3959 ms |   1.00 |     0.00 | 218.7500 | 757.88 KB |
 PdfJsJpegDecoderCore.ParseStream |     Clr | Jpg/baseline/jpeg420exif.jpg | 19.71 ms | 1.502 ms | 0.0849 ms |   1.07 |     0.02 |        - |     15 KB |
                                  |         |                              |          |          |           |        |          |          |           |
            'System.Drawing FULL' |    Core | Jpg/baseline/jpeg420exif.jpg | 17.86 ms | 1.077 ms | 0.0609 ms |   1.00 |     0.00 | 218.7500 | 757.04 KB |
 PdfJsJpegDecoderCore.ParseStream |    Core | Jpg/baseline/jpeg420exif.jpg | 20.84 ms | 7.756 ms | 0.4382 ms |   1.17 |     0.02 |        - |  14.88 KB |

@JimBobSquarePants
Copy link
Member Author

Nice changes! That class looks very clean now and I like the inline and error helpers!

How the hell does S.D decode the full stream so damn quickly, it's infuriating!

@antonfirsov
Copy link
Member

Don't ask how, but it's possible to optimize huffman decoding with SIMD.

@JimBobSquarePants
Copy link
Member Author

I'm gonna merge this now to work on namespacing.

@JimBobSquarePants JimBobSquarePants merged commit 68ca7ff into master Jul 3, 2018
@JimBobSquarePants JimBobSquarePants deleted the js/new-jpeg-scan-decoder branch July 3, 2018 01:40
antonfirsov pushed a commit to antonfirsov/ImageSharp that referenced this pull request Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faster Jpeg Huffman Scan Decoding
3 participants