-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readNativeFrames Performance Optimizations #161
Comments
This change reduces the number of allocations in `readNativeFrames`. Instead of allocating a slice for each pixel's samples, a single flat buffer slice is allocated upfront of the size `pixelsPerFrame*samplesPerPixel`. Later, ranges in that slice are referred to in the larger 2D slice. This leads to there only being two calls to `make`, leading to significant performance gains. On my machine running `make bench-diff`: ``` name old time/op new time/op delta Parse/1.dcm-8 2.29ms ± 1% 1.95ms ± 3% -15.00% (p=0.008 n=5+5) Parse/2.dcm-8 2.27ms ± 1% 1.94ms ± 0% -14.41% (p=0.008 n=5+5) Parse/3.dcm-8 2.10ms ± 0% 1.81ms ± 2% -13.77% (p=0.008 n=5+5) Parse/4.dcm-8 240ms ± 1% 196ms ± 4% -18.27% (p=0.008 n=5+5) Parse/5.dcm-8 33.6ms ± 1% 27.9ms ± 0% -17.00% (p=0.008 n=5+5) ``` We see similar results in the [GitHub action benchmark](https://github.com/suyashkumar/dicom/pull/157/checks?check_run_id=1574909456#step:5:26). Presumably the percentage performance gains would be even higher for DICOMs with more Native PixelData (e.g. more frames, pixels per frame, samples per pixel, etc). This helps address #161.
#157 led to some significant performance improvements by reducing slice allocations. |
Also, on this subject, @kaxap has observed that eliminating calls to buf := make([]byte, bitsAllocated/8)
for pixel := 0; pixel < pixelsPerFrame*samplesPerPixel; pixel++ {
_ := io.ReadFull(d, buf)
val := int(binary.LittleEndian.Uint16(buf))
} The main advantage here is reusing the buffer slice (since bitsAllocated should be constant), instead of binary.Read creating one within. |
it is also possible to lazily load frames from file which will result in smaller memory footprint (some dicom files can be gigabytes in size) |
This is somewhat along the lines of #162--which is allowing interested parties to not read the pixel data, or perhaps, to pause reading before actually parsing the PixelData (which can be done with the Parser API, but not at the frame by frame level, but at the tag by tag level). We can also read in the bytes and do nothing with unpacking them until a user wants the image.Image, which is almost what we do today. I'm leaning towards something similar to current behavior + an option to not read PixelData (or any blocklist of tags potentially). We also already have a "streaming" option where frames can be consumed by a consumer as they are parsed, instead of having to wait for the full set of frames (or the whole dicom) to be parsed. What do you think? Also, would love to learn more about your use cases if you have any particular ones. Cheers! |
Most of the cases all I want to do is to get some tags and a single frame rendered (for a thumbnail or a preview). Lazy mode (when we save offsets in file for each frame instead of reading all of them into RAM) looks very appealing in this case. |
Regarding memory usage, we've recently had our server (32GB RAM) run out of memory while attempting to parse a 1GB zipfile containing dicoms. Using Is this high memory usage (15x) expected? |
Interesting, thank you for reporting! I don't think this is expected. Out of curiosity, what version of the library are you using? Does it include some of the recent optimizations included in this issue? I'll take a closer look tonight and dive deep if needed this weekend. |
Thanks for looking into this! I don't have much experience profiling performance so I appreciate the help. Our prod server that ran out of memory is using v0.4.5, which obviously does not have the recent optimizations. However, my benchmark is using v1.0.3. We were hoping to see that upgrading the library would solve the issue. |
Thanks! So, something that sticks out to me is that we are packing smaller integers into the possibly larger go However, this could easily cause large increases in memory compared to the size of a DICOM on disk--for example an underlying set of 8-bit integers in the DICOM would be read into a slice of normal Go This is a situation in which generics in Go would be very helpful, so we could have a Simple option?One temporary and somewhat simple solution could be using uint32 in this API instead of int, if I can verify that DICOMs don't ever have 64 bits of data in them for NativeFrames (this saves 32 bits per int on 64-bit wide systems). More involved optionAnother option is exploring having a different NativeFrame type per size of integer. While it's not the cleanest as just being able to access the underlying data directly, I could envision something like the below, where we make NativeFrame an interface and not a struct. And, we have 3 structs that implement the interface that wrap the underlying int types. type NativeFrame interface {
Rows() int
Cols() int
BitsPerSample int
GetPixel(x, y int) int
}
type nativeFrame8 struct {
underlyingData uint8
}
...
type nativeFrame16 struct {
underlyingData uint16
}
... In the absense of generics, this will allow our memory representation to be as small as possible, while still exposing a consistent integer API for any entities that want to use the NativeFrame data. That being said, if a caller is copying out the data from the underlying NativeFrame, and grabbing it as ints, we'll run into the same problem of expanding it in memory anyway, just downstream from here (e.g. when turning this into an image later for example, if it goes through an intermediate int stage)...so might need to look at a couple of our use cases to see. Let me know if folks have any thoughts or if you can shed light on any use case specific details. Also, this might not explain all of the memory overhead--will look into other possible options soon too. We should be able to use the BenchmarkReadNativeFrames to see the impact of various possible changes on the memory profile at least in that fn. |
Reading through the DICOM spec, it appears that allocating more than 32 bits is rare. If I'm understanding correctly, 64 bits are only used for Double Float Pixel Data. FWIW, I've looked at ~90 chest CT scans that have been uploaded to our web app by doctors and they all allocate 16 bits. Our use case involves determining all viable series & a single "best" series within a CT scan with 1-N series, which is eventually segmented. For each series, we parse the header to immediately filter out series that don't meet our criteria (not enough slices, too large slice thickness, etc.). After getting a list of viable series, we parse & store the pixel data for every slice in these series (this can take up a lot of memory if the scan has many viable series). After creating a 3D volume from the series' pixel data, we save jpg images as slices through the volume for each anatomical plane (coronal, axial, sagittal). Saving the images doesn't require much memory since we write each pixel sequentially, converting it from We are in the process of refactoring code for our use case described above so that we're only storing pixel data for one series at a time, which will cut down our memory usage. Obviously, it'd still be nice to reduce footprint while parsing. If you don't mind, I'm going to take a stab at implementing the NativeFrame interface. I don't think we should worry too much about downstream memory usage with this implementation. The user can drop bits by converting the |
@bbeckrest That is my understanding as well re: the size of the pixel data. Regarding the NativeFrame interface I proposed in #161 (comment) I think it's reasonable but keeping in mind my performance comment. With my performance comment, what I meant was calling Overall I think my proposal is ok #161 (comment), and I think it should be a good tradeoff in the interim, espicially if it will help you with your usecase. I'm not 100% sure if it will solve all of your problems, if during the creation of the 3D Volume you ever hold the whole series in memory as full ints, but you of course have a better sense of that than I! I think that generics in Go will help a lot here so we can have Please let me know if you'd like to send a PR for this, and let me know if there are any questions! No worries if you don't have time, I can knock this out at some point soon too. 😄 |
@suyashkumar Yeah the proposal in #161 (comment) would be helpful for my use-case since the volume could be stored as Yesterday, I attempted to implement it but got stuck and realized I didn't really know what I was doing (here's my attempt). It's a decently large & important change so I'll probably defer to you here. I was having an issue implementing the NativeFrames interface while minimizing the API change. |
hi @bbeckrest! So there is another issue that may somehow be related to this (though I'm not exactly sure how or why). For files that have a weird bytesAllocated, we ended up not reading the pixel data correctly, which I'm addressing in #210. Another user of the library was experiencing around 1.7GB of memory being allocated (that wasn't being cleaned up properly by the Go gc) and their dicom was triggering this condition. It could still very well be an independent issue. Just figured I'd mention it! Anyway, I do think I will circle back to my proposed solution above, and will take a look at your attempt as well! |
We have a DICOM (33Mb) with bitAllocated = 1
I guess the issue is from data type representation. |
Yep, in the case you describe that is the case (though in the comment above I was referring to another issue). #161 (comment) is where I lay out a possible interface for handling this in as clean of a way as possible. I can probably also add a getData interface{} that people can type assert to the right underlying type too if they just want the underlying data. Though now that go has generics in 1.18 we may be able to adopt those to have a NativeFrame, but if we do, then folks will need to have a hard dep on at least go 1.18 and we would have generics exposed in a public API (which may not be that bad at this point...) |
I have been playing around with some loose hacking on what a multi-go-integer NativeFrame might look like here: #315. It needs a lot of clean up and extra testing, but feel free to play with it if you have dicoms that were previously causing you issues. Looking at the frame APIs again, I do think they need to be revisited in general, but this is something that should get the desired effect at least for now. |
So with the exploratory changes in #315, I've learned of another large contributor to this memory ballooning, and it's not the uint16->int64 expansion interestingly iiuc. Right now, the way frames parse the data is a 2D array something like So, at a minimum, we might want to store a flat underlying representation of the values in the frame, and then offer APIs that help you navigate by pixel easily as needed. (That is, if we even want to process NativePixelData at all during processing). Either way, even if we process it later, we'll want to be mindful of this. I haven't verified that this is the issue, but I'm guessing much of the memory is actually going to the slice headers as opposed to the uint16->int64 expansion! |
Some updates: in #315 (comment) you'll see an implementation that yields significant memory and cpu improvements for native pixeldata processing. I still need to iterate and clean up some tests / api though. Please let me know if you have any thoughts in general here though. This will be a pretty significant change to how Native PixelData is represented and interacted with. |
…and supporting generic data slices. (#315) This change refactors the Go NativeFrame representation to vastly reduce the memory footprint (by __90%__ in some cases) and CPU usage (~30% in some cases). It does this by: - Storing data in a flat slice, instead of a nested one (e.g. `[][]uint16`). This helps because the 24 byte slice header for Go slices is very significant here for the inner small slices (e.g. when storing 1-3 uint16s, which is often the case). More details in my comment here: #161 (comment) - Using generics to allow the Go NativeFrame Data slice use an appropriately sized Go integer based on the bitsAllocated (e.g. have the underlying slice be `uint8`, `uint16`, etc instead of always a full-size `int`). __This is an API breaking change.__ I'm not sure I love the INativeFrame API and naming, but introducing an interface likely makes it easier to interact with the NativeFrame, without forcing propagation of generic signatures everywhere. It also still preserves access to the underlying concrete integer slice if needed. Having a more efficient default NativeFrame representation is important, so lets get this in and iterate if needed before the next release. This fixes #291 and addresses a big part of #161. ## Benchmarks See the PR GitHub Action [here](https://github.com/suyashkumar/dicom/actions/runs/10294852102/job/28493571486?pr=315#step:5:69), with an example run from one of the runs screenshotted below: Memory reduction: <img width="1149" alt="Screenshot 2024-08-07 at 10 24 56 PM" src="https://github.com/user-attachments/assets/625ac3fd-e947-400f-9969-9a42ff1539b3"> CPU reduction: <img width="1185" alt="Screenshot 2024-08-07 at 10 29 16 PM" src="https://github.com/user-attachments/assets/415f4b55-e069-402b-91d7-200bee0babb9">
This is an issue to track various investigations and changes to optimize readNativeFrames implementation, which I noticed in
pprof
profiles seemed to take up a majority of the CPU time in parsing DICOMs with Native pixel data.More comprehensive tests and benchmarks were added in #160 to aid with this.
The text was updated successfully, but these errors were encountered: