-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize DiskV2 Deduplication #878
Conversation
In case we cloned `disk.img` from a local image, check if data at offset has the expected contents already.
|
||
if isHoleAligned && chunk == zeroChunk { | ||
if actualContentsOnDisk == chunk { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a place for optimization. One though was to not try to optimize wholes on the sub-layer level, only try to deduplicate layers. And in this case whole punch the whole uncompressed layer and then sparse write into it.
if try pullResumed && Digest.hash(diskURL, offset: diskWritingOffset, size: uncompressedLayerSize) == uncompressedLayerContentDigest { | ||
// Update the progress | ||
progress.completedUnitCount += Int64(diskLayer.size) | ||
if pullResumed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we need this change, because currently short-circuit evaluation will kick in if pullResumed
is false
, and hash won't be calculated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not according to the profiling I did. Both parts of the expression are getting evaluated regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that pullResumed
was set to true
when the profiling was made?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlikely, I was running pulls to completion between profiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just done some debugging and I think that we can safely revert this because the compiler is correctly compiling the original line of code.
It's just that there's one more call to the Digest.hash()
that is visible in the Instruments "CPU Profiler", and it is located below:
tart/Sources/tart/OCI/Layerizer/DiskV2.swift
Line 143 in 33b5cfe
if let localLayerCache = localLayerCache, let data = localLayerCache.find(diskLayer.digest), Digest.hash(data) == uncompressedLayerContentDigest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Sequoia it's still the case. Maybe something to do about the try
, don't know.
Sources/tart/Data+Chunks.swift
Outdated
extension Data { | ||
/* | ||
* Performant version of splitting a Data into chunks of a given size. | ||
* It appers that "Data.chunks` is not as performant as chunking the range of the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you measure this? Have you used "CPU Profiler" of Instruments? Have you ran sudo purge
before the test?
I've replaced the tart list
's run()
function with the following contents:
let disk = try FileHandle(forReadingFrom: URL(fileURLWithPath: "/Users/edi/.tart/vms/macos/disk.img"))
let holeGranularityBytes = 64 * 1024
var count = 0
while true {
guard let data = try disk.read(upToCount: 64 * 1024 * 1024) else {
break
}
for chunk in data.chunks(ofCount: holeGranularityBytes) {
count += chunk.count
}
}
print(count)
Here's a profile when using chunks()
which takes ~15 seconds and ~20 Mc to churn through macOS Sonoma's disk image:
Now by changing chunks()
to subdataChunks()
we do the same with ~27 seconds and using ~110 Mc:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't purge but this was not the biggest optimization. Let's revert and profile separately. See 773a403
I ran some Instruments CPU profiles pulling
ghcr.io/cirruslabs/macos-sonoma-base:latest
on top ofghcr.io/cirruslabs/macos-sonoma-vanilla:latest
and found a few bottle necks. See commits for details.