Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ReconstructSome for leopard 8+16 #272

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

elias-orijtech
Copy link
Collaborator

@elias-orijtech elias-orijtech marked this pull request as ready for review February 25, 2024 12:35
@odeke-em
Copy link

odeke-em commented Feb 25, 2024

/cc @klauspost @liamsi @musalbas; continued work on reconstructSome thanks to Elias!

Copy link
Owner

@klauspost klauspost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Do you have any numbers?

@liamsi
Copy link

liamsi commented Feb 28, 2024

I think this change makes sense as is. But I originally thought that there is a way to only recompute the required shares and I also thought that is what the other implementation was doing. @klauspost can you confirm that the non-leopard implementation operates only on the required shares?

Like it looks like this only operates on required shards:

reedsolomon/reedsolomon.go

Lines 1560 to 1606 in 5b85c72

// Re-create any data shards that were missing.
//
// The input to the coding is all of the shards we actually
// have, and the output is the missing data shards. The computation
// is done using the special decode matrix we just built.
outputs := make([][]byte, r.parityShards)
matrixRows := make([][]byte, r.parityShards)
outputCount := 0
for iShard := 0; iShard < r.dataShards; iShard++ {
if len(shards[iShard]) == 0 && (required == nil || required[iShard]) {
if cap(shards[iShard]) >= shardSize {
shards[iShard] = shards[iShard][0:shardSize]
} else {
shards[iShard] = AllocAligned(1, shardSize)[0]
}
outputs[outputCount] = shards[iShard]
matrixRows[outputCount] = dataDecodeMatrix[iShard]
outputCount++
}
}
r.codeSomeShards(matrixRows, subShards, outputs[:outputCount], shardSize)
if dataOnly {
// Exit out early if we are only interested in the data shards
return nil
}
// Now that we have all of the data shards intact, we can
// compute any of the parity that is missing.
//
// The input to the coding is ALL of the data shards, including
// any that we just calculated. The output is whichever of the
// data shards were missing.
outputCount = 0
for iShard := r.dataShards; iShard < r.totalShards; iShard++ {
if len(shards[iShard]) == 0 && (required == nil || required[iShard]) {
if cap(shards[iShard]) >= shardSize {
shards[iShard] = shards[iShard][0:shardSize]
} else {
shards[iShard] = AllocAligned(1, shardSize)[0]
}
outputs[outputCount] = shards[iShard]
matrixRows[outputCount] = r.parity[iShard-r.dataShards]
outputCount++
}
}

Also, agree with @klauspost that benchmarks/numbers would help understand the impact of this (if any).

@klauspost
Copy link
Owner

Leopard is pyramid encoded, so several layers are needed to reconstruct a bottom shard.

@liamsi
Copy link

liamsi commented Feb 29, 2024

Leopard is pyramid encoded, so several layers are needed to reconstruct a bottom shard.

It's been too long since I looked into leopard myself and I don't fully understand this tbh. Just from a high-level perspective, it should be possible to evaluate a polynomial given enough points (i.e. n+1) in O(n), with n being the degree of the polynomial (without having to fully recompute that polynomial). Are you saying that leopard does work differently/does not allow this and has to do its O(n log n) computations either way?

@liamsi
Copy link

liamsi commented Mar 1, 2024

If you look at the paper, what I mean is that you should be able to save computations by simply not adding missing shard positions to the set E if they are not required:
image

https://arxiv.org/pdf/1404.3458.pdf

In the implementation this should correspond to these:

reedsolomon/leopard.go

Lines 423 to 425 in 162f2ba

// Fill in error locations.
var errorBits errorBitfield
var errLocs [order]ffe

So a simple check if they are required before setting them could do the job. But it might not be as easy as that as other parts of the code might operate under the assumption that if a position is not in E, then there was no error or erasure. So there might be other changes necessary still.

@klauspost
Copy link
Owner

Sorry. I don't read math. I can't even tell if the paper relates to Leopard.

Reading the code it seems like input is divided into "mips" (probably derived from mip-maps from 3D) that are stacked for the final output. (*errorBitfield).prepare() converts missing shards into the mips that are needed.

This makes sense to give the O(N*log(N)) characteristics of the encoding and means that to decode one shard all mip levels above have to be resolved. The isNeeded will resolve if a given mip level is needed for the reconstruction.

There may very well be additional optimization possible here, but I have no idea where to start looking for it.

@liamsi
Copy link

liamsi commented Mar 4, 2024

I think it is important that we have a way to compare performance, e.g. if we are trying to recover a single shard as required. Otherwise we don't know if this PR (or changes to it) improves anything.

@klauspost
Copy link
Owner

klauspost commented Mar 4, 2024

I think it is important that we have a way to compare performance, e.g. if we are trying to recover a single shard as required. Otherwise we don't know if this PR (or changes to it) improves anything.

Yes. Currently BenchmarkDecode1K tests (among others) the time for reconstructing a single shard (leopard-gf16-single). You can see the numbers for that in this chart - "Recover One" column.

This will probably not have changed those numbers, since the only difference from using ReconstructSome is that it will ignore some other missing shards. So a similar benchmark that tests ReconstructSome with some mutations is probably needed. I wouldn't expect it to be faster than "Recover One".

(But do note that 1K is so small the setup time is mostly dominating this particular benchmark)

@liamsi
Copy link

liamsi commented Mar 4, 2024

Let me try if what I wrote above (inspired by the paper that is the basis for leopard) and see if only adding required shards to the error bits, breaks anything and also if it actually has any noticeable performance gains.

@elias-orijtech
Copy link
Collaborator Author

Based on #274, I've included the optimization for the error bits. See 0ae8b02 for details. In short:

  • errLocs must reflect the erasures, but also doesn't have an effect on runtime.
  • From my reading of the original C source code, errBits does seem to allow omission of unwanted shards.
  • I've added a more thorough ReconstructSome test.

The new tests fails in regular non-leopard configuration with

% go test -run ReconstructSome -v -short
Using pure Go
=== RUN   TestReconstructSome
panic: runtime error: slice bounds out of range [:65664] with capacity 0

goroutine 70 [running]:
github.com/klauspost/reedsolomon.(*reedSolomon).codeSomeShardsP.func1(0x0?, 0x10080)
	/Users/a/proj/orijtech/reedsolomon/reedsolomon.go:979 +0x2a4
created by github.com/klauspost/reedsolomon.(*reedSolomon).codeSomeShardsP in goroutine 34
	/Users/a/proj/orijtech/reedsolomon/reedsolomon.go:1011 +0x248
exit status 2
FAIL	github.com/klauspost/reedsolomon	0.622s

but I've so far failed to find an error in the test (and the leopards succeed). @klauspost can you spot my error?

@elias-orijtech elias-orijtech marked this pull request as draft March 7, 2024 18:38
@elias-orijtech
Copy link
Collaborator Author

Another issue with optimizing errBits is that the leopard8 cache is no longer correct:

r.inversion[errorBits.cacheID()] = c
because there is no longer a one-to-one relationship between errBits and errLocs.

@elias-orijtech
Copy link
Collaborator Author

I've fixed the caching issue by no longer including errBits, at the cost of calling errBits.prepare even for cache hits.

@elias-orijtech elias-orijtech requested a review from klauspost March 8, 2024 13:16
@klauspost
Copy link
Owner

Great stuff. I will take a look as soon as I get some extra time.

@odeke-em
Copy link

odeke-em commented May 2, 2024

Kind ping @elias-orijtech @klauspost :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants