Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable & optimized memory management #475

Merged
merged 76 commits into from
Feb 27, 2018
Merged

Conversation

antonfirsov
Copy link
Member

@antonfirsov antonfirsov commented Feb 25, 2018

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

PR summary

Changes

Benchmarks

Load testing benchmarks have been implemented on the load-test branch. (The project should be moved into it's own SixLabors repository later). Link to results

TLDR

This is how (managed/GC) memory consumption changes for 10,000 resize requests of ~3200p Jpegs, new defaults VS pre-MemoryManager (old):
image
The peaks are measured around requests with big outlier images.

Throughput

The throughput visualized for different scenarios & pool settings (measured in MegaPixel/sec):
image

Memory footprint

Avarage managed memory footprint while serving ~600 resize requests (different scenarios & pool settings):
image

Conclusions

  • With the default pooling settings, ImageSharp memory usage is approximately halved. I hope these settings are now good enough for most users.
  • If your environment is memory-constrained:
    • Use ArrayPoolMemoryManager.CreateWithModeratePooling() or .CreateWithMinimalPooling()! For these two: the difference in memory consumption is not as big as the difference in throughput.
    • Do not use SimpleGCMemoryManager ("NullMemoryManager"), it only makes things worse!
  • For servers under high load ArrayPoolMemoryManager.CreateWithAggressivePooling() can increase the throughput with the cost of a higher memory footprint
  • The memory usage is always being stabilized after a warm-up period.
  • The throughput of the pre-MemoryManager ("Old") version is significantly higher. Not sure if the performance regression is caused directly by changes in memory management. We need to investigate this.

/cc @vpenades @GeorgePlotnikov @denisivan0v

rytmis and others added 30 commits January 11, 2018 19:31
- Add a null memory manager that doesn't do any actual memory management
  need it (aside from a few exceptions)
# Conflicts:
#	src/ImageSharp/Formats/Gif/GifDecoderCore.cs
#	src/ImageSharp/Formats/Png/PngEncoderCore.cs
#	src/ImageSharp/Image/ImageFrame{TPixel}.cs
#	src/ImageSharp/Memory/Buffer{T}.cs
#	src/ImageSharp/Memory/PixelDataPool{T}.cs
#	src/ImageSharp/Processing/Processors/Filters/LomographProcessor.cs
#	src/ImageSharp/Processing/Processors/Filters/PolaroidProcessor.cs
#	src/ImageSharp/Processing/Processors/Transforms/ResamplingWeightedProcessor.Weights.cs
#	src/ImageSharp/Processing/Processors/Transforms/ResizeProcessor.cs
#	src/ImageSharp/Processing/Processors/Transforms/RotateProcessor.cs
#	src/ImageSharp/Processing/Processors/Transforms/SkewProcessor.cs
#	tests/ImageSharp.Tests/Memory/PixelDataPoolTests.cs
#	tests/ImageSharp.Tests/Processing/Processors/Transforms/ResizeProfilingBenchmarks.cs
…to feature/memory-manager

# Conflicts:
#	tests/ImageSharp.Tests/Formats/Png/PngEncoderTests.cs
PngEncoder, WuQuantizer, ShapeRegion, ShapePath
@antonfirsov
Copy link
Member Author

antonfirsov commented Feb 25, 2018

Run some resizes, like with the current stress tests.
Stop using ImageSharp, and run a full GC collection, one that also tries to free any large object.
Test result: how much memory was actually freed, and how much is still locked by the array pools?

@vpenades as you wish!
On this chart you can see the effect of calling MemoryManager.ReleaseRetainedResources() + GC.Collect() after ~200 requests:
image

So, to use an alternative memory manager, there's two choices: [...]

Yes, you always need to explicitly pass a memory manager to individual operations. I suggest you to implement an ImagingContext class which owns a MemoryManager Configuraition+MemoryManager for you and encapsulates your commonly used Image factory methods & maybe some operations. As @JimBobSquarePants explained, we did not want such a thing to be part of our core API, because it complicates things, and hardens the learning curve of the library, while not needed by the 90% of our users.

@vpenades
Copy link
Contributor

vpenades commented Feb 26, 2018

@antonfirsov Upon a second review, I've realized the Configuration class essentially handles memory managers and image formats, but Image Formats are handled directly by the configuration class.

Maybe it could be cleaner to have a image formats decoupled from Configuration into it's own class ImageFormatsCollection that would keeps all the functionality related to image formats management, and would let Configuration to be greatly simplified.

That would make life easier when creating multiple configurations while keeping a shared collection of image formats. with the current configuration other than the default, you need to manually scan for all the formats in the default and copy them into your own configuration. If at some point during execution a new format is added to the default configuration, the manually created configurations will not be aware of the change.

Something like this:

class Configuration
{
    public ParallelOptions ParallelOptions {get;set;
    public MemoryManager Manager {get; set;}
    public FormatsCollection Formats {get; set;}
}

This would allow having manually created configurations with their own formats, OR sharing the formats with the default configuration.

Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One hell of a job! 🎉

Only thing blocking a merge that I can see is the naming of GetReferenceToOrigo. Despite it being internal, I'd rather fix it before so we don't forget.

buffer[i + offset] = array[i].X;
}
// TODO: This is a temporal workaround because of the lack of Span<T> API-s on IPath. We should use MemoryManager.Allocate() here!
PointF[] innerBuffer = new PointF[buffer.Length];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get a PR in place against Shapes to add those API's.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do it now, or in a follow up PR pair?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do a follow up once we've done Shapes.

}
}
catch (Exception)
{
// TODO: Why are we catching exceptions here silently ???
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was probably a hangover from the old try..catch..finally pattern from when we were directly using ArrayPool

this.valOffset = Buffer<short>.CreateClean(18);
this.maxcode = Buffer<long>.CreateClean(18);
// TODO: Replace FakeBuffer<T> usages with standard or array orfixed-sized arrays
this.lookahead = memoryManager.AllocateFake<short>(256);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we leaving this TODO for when we PR Shapes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would allow us to get rid of the AllocateFake<T> workaround entirely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

this.Colors[i] = color;
}
this.Colors = source.MemoryManager.Allocate<TPixel>(source.Width);
this.Colors.Span.Fill(color);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know Span had a Fill method. Neat!

@@ -88,6 +88,14 @@ internal static Span<TPixel> GetPixelRowSpan<TPixel>(this Image<TPixel> source,
where TPixel : struct, IPixel<TPixel>
=> source.Frames.RootFrame.GetPixelRowSpan(row);

/// <summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When shall we make this and others public?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need Memory<T> for the memory interop stuff, for the rest we need a proper API review. Not sure if stuff like .GetMemoryManager() should be public.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'm looking forward to MS getting that released.

/// Gets a value indicating whether the area refers to the entire <see cref="DestinationBuffer"/>
/// </summary>
public bool IsFullBufferArea => this.Size == this.DestinationBuffer.Size();

/// <summary>
/// Gets or sets a value at the given index.
/// </summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff wouldn't let me highlight where I wanted to

GetReferenceToOrigo I'm assuming this is a typo? GetReferenceToOrigin perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just my Hunglish :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 Haha... Far better than any language skills I've accumulated.

@@ -221,9 +224,9 @@ private void ReadRle8<TPixel>(PixelAccessor<TPixel> pixels, byte[] colors, int w
var color = default(TPixel);
var rgba = new Rgba32(0, 0, 0, 255);

using (var buffer = Buffer2D<byte>.CreateClean(width, height))
using (var buffer = this.configuration.MemoryManager.Allocate2D<byte>(width, height, true))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllocateClean2D?

@JimBobSquarePants
Copy link
Member

Maybe it could be cleaner to have a image formats decoupled from Configurationinto it's own class ImageFormatsCollection that would keeps all the functionality related to image formats management, and would let Configuration to be greatly simplified.

@vpenades Looking at Configuration class I'm inclined to agree. A separate PR though I think.

@antonfirsov Re throughput performance? Perhaps it's as simple as the abstraction costs due to using interfaces? I can't see anything that stands out.

@antonfirsov antonfirsov mentioned this pull request Feb 26, 2018
4 tasks
@antonfirsov
Copy link
Member Author

@JimBobSquarePants a quick initial investigation regarding the regression in speed:
Looks like there are several factors behind that, scattered through the whole codebase.
The most significant is probably Span<T> indexer, and the use of IBuffer<T>.Span (accessing elements in a lazy way: buffer.Span[i]). Span is being used now on many places where we've been using locally pooled arrays earlier, it has a high impact therefore.

I've been running my benchmarks on .NET 4.6.1 runtime (was simpler to work with command line), where indexing Span<T> is significantly slower than indexing arrays. The slowdown of the ReadByteStuffedByteUnsafe() is a quite trivial example in the Jpeg decoder:
image

Like it or not, but if we want the library to be fast on the classic .NET, we have to replace indexers with Unsafe.Add(ref base, i) on hot path.

I will continue my investigation and maybe apply fixes on the weakest points.

@JimBobSquarePants
Copy link
Member

@antonfirsov I'm happy for this to be merged now. 👍

Unsafe<T> is fantastically readable really (It's such a great API!) so I have no issue replacing indexer access on hot paths with that. That can be done as small individual PR's though as we benchmark after the beta. (I'm really hoping the final push will be all about optimization with very few bug fixes/changes)

@antonfirsov
Copy link
Member Author

@JimBobSquarePants check out my latest optimization commits. Pushed stuff that would normally deserve a separate explanatory PR.

Maybe I'll add a few more of these (only for stuff that's covered!), and we can merge afterwards.

@JimBobSquarePants
Copy link
Member

WithTemporalBuffer<T> is very nice! I really should get better at reading up on the API's I use, I wasn't aware of the before/after action overloads to Parallel.For

Ok, rock on, I'll leave merging to you.

… pinning from OrigHuffmanTree

(cherry picked from commit 81f6a94)
…ngs worse!

(cherry picked from commit 0bf64a09598ba988318c170259a7aab4d9391cb5)
@antonfirsov
Copy link
Member Author

Okkay, I guess I managed to bring back speed to previous levels with just a few changes, so no one would feel there's a regression.

@JimBobSquarePants
Copy link
Member

JimBobSquarePants commented Feb 27, 2018

That's a neat trick you used to access the fixed Data field in the unfixed indexer expression in FixedInt32Buffer16 and co. I'll have to remember that one.

@@ -58,10 +58,12 @@ internal struct Bytes : IDisposable
/// <returns>The bytes created</returns>
public static Bytes Create(MemoryManager memoryManager)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to pass the parameter here now or to the InputProcessor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing it!

@antonfirsov
Copy link
Member Author

antonfirsov commented Feb 27, 2018

@JimBobSquarePants
Re Parallel.For parameters:
I also discovered them yesterday! I was googling for that feature (I was quite sure it must exist), and it was right here on the Parallel.For, API we just failed to notice it :)

Re indexable fixed structs:
It's quite an unsafe trick, so one should be very careful using it. Strictly speaking, it should be named something like GetItemUnsafe(idx) instead of simply being an indexer, but that naming is soooo verbose.

@vpenades
Copy link
Contributor

@vpenades Looking at Configuration class I'm inclined to agree. A separate PR though I think.

@JimBobSquarePants Yes, please, this PR is a bit useless if creating manual configurations is non trivial.

@antonfirsov
Copy link
Member Author

I'm merging this now, further optimizations and API changes will be added as separate PR-s.

@antonfirsov antonfirsov merged commit 47a8533 into master Feb 27, 2018
@JimBobSquarePants JimBobSquarePants deleted the feature/memory-manager branch September 3, 2019 11:12
antonfirsov added a commit to antonfirsov/ImageSharp that referenced this pull request Nov 11, 2019
Configurable & optimized memory management
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants