Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize DiskV2 Deduplication #878

Merged
merged 14 commits into from
Aug 5, 2024
Merged

Conversation

fkorotkov
Copy link
Contributor

I ran some Instruments CPU profiles pulling ghcr.io/cirruslabs/macos-sonoma-base:latest on top of ghcr.io/cirruslabs/macos-sonoma-vanilla:latest and found a few bottle necks. See commits for details.

In case we cloned `disk.img` from a local image, check if data at offset has the expected contents already.

if isHoleAligned && chunk == zeroChunk {
if actualContentsOnDisk == chunk {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a place for optimization. One though was to not try to optimize wholes on the sub-layer level, only try to deduplicate layers. And in this case whole punch the whole uncompressed layer and then sparse write into it.

if try pullResumed && Digest.hash(diskURL, offset: diskWritingOffset, size: uncompressedLayerSize) == uncompressedLayerContentDigest {
// Update the progress
progress.completedUnitCount += Int64(diskLayer.size)
if pullResumed {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need this change, because currently short-circuit evaluation will kick in if pullResumed is false, and hash won't be calculated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not according to the profiling I did. Both parts of the expression are getting evaluated regardless.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that pullResumed was set to true when the profiling was made?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely, I was running pulls to completion between profiles.

Copy link
Collaborator

@edigaryev edigaryev Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just done some debugging and I think that we can safely revert this because the compiler is correctly compiling the original line of code.

It's just that there's one more call to the Digest.hash() that is visible in the Instruments "CPU Profiler", and it is located below:

if let localLayerCache = localLayerCache, let data = localLayerCache.find(diskLayer.digest), Digest.hash(data) == uncompressedLayerContentDigest {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Sequoia it's still the case. Maybe something to do about the try, don't know.

extension Data {
/*
* Performant version of splitting a Data into chunks of a given size.
* It appers that "Data.chunks` is not as performant as chunking the range of the data
Copy link
Collaborator

@edigaryev edigaryev Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you measure this? Have you used "CPU Profiler" of Instruments? Have you ran sudo purge before the test?

I've replaced the tart list's run() function with the following contents:

let disk = try FileHandle(forReadingFrom: URL(fileURLWithPath: "/Users/edi/.tart/vms/macos/disk.img"))

let holeGranularityBytes = 64 * 1024

var count = 0

while true {
  guard let data = try disk.read(upToCount: 64 * 1024 * 1024) else {
    break
  }
  
  for chunk in data.chunks(ofCount: holeGranularityBytes) {
    count += chunk.count
  }
}

print(count)

Here's a profile when using chunks() which takes ~15 seconds and ~20 Mc to churn through macOS Sonoma's disk image:

Screenshot 2024-08-05 at 15 58 08

Now by changing chunks() to subdataChunks() we do the same with ~27 seconds and using ~110 Mc:

Screenshot 2024-08-05 at 16 00 17

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't purge but this was not the biggest optimization. Let's revert and profile separately. See 773a403

@fkorotkov fkorotkov merged commit ff928ad into main Aug 5, 2024
7 checks passed
@fkorotkov fkorotkov deleted the fedor-optimize-diskv2-deduplication branch August 5, 2024 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants