Optimize ReconstructSome for leopard 8+16 #272

elias-orijtech · 2024-02-24T22:12:54Z

odeke-em · 2024-02-25T13:00:06Z

/cc @klauspost @liamsi @musalbas; continued work on reconstructSome thanks to Elias!

klauspost

LGTM. Do you have any numbers?

liamsi · 2024-02-28T18:04:30Z

I think this change makes sense as is. But I originally thought that there is a way to only recompute the required shares and I also thought that is what the other implementation was doing. @klauspost can you confirm that the non-leopard implementation operates only on the required shares?

Like it looks like this only operates on required shards:

reedsolomon/reedsolomon.go

Lines 1560 to 1606 in 5b85c72

    
           // Re-create any data shards that were missing. 
        
           // 
        
           // The input to the coding is all of the shards we actually 
        
           // have, and the output is the missing data shards.  The computation 
        
           // is done using the special decode matrix we just built. 
        
           outputs := make([][]byte, r.parityShards) 
        
           matrixRows := make([][]byte, r.parityShards) 
        
           outputCount := 0 
        
           for iShard := 0; iShard < r.dataShards; iShard++ { 
        
           	if len(shards[iShard]) == 0 && (required == nil || required[iShard]) { 
        
           		if cap(shards[iShard]) >= shardSize { 
        
           			shards[iShard] = shards[iShard][0:shardSize] 
        
           		} else { 
        
           			shards[iShard] = AllocAligned(1, shardSize)[0] 
        
           		} 
        
           		outputs[outputCount] = shards[iShard] 
        
           		matrixRows[outputCount] = dataDecodeMatrix[iShard] 
        
           		outputCount++ 
        
           	} 
        
           } 
        
           r.codeSomeShards(matrixRows, subShards, outputs[:outputCount], shardSize) 
        
           if dataOnly { 
        
           	// Exit out early if we are only interested in the data shards 
        
           	return nil 
        
           } 
        
           // Now that we have all of the data shards intact, we can 
        
           // compute any of the parity that is missing. 
        
           // 
        
           // The input to the coding is ALL of the data shards, including 
        
           // any that we just calculated.  The output is whichever of the 
        
           // data shards were missing. 
        
           outputCount = 0 
        
           for iShard := r.dataShards; iShard < r.totalShards; iShard++ { 
        
           	if len(shards[iShard]) == 0 && (required == nil || required[iShard]) { 
        
           		if cap(shards[iShard]) >= shardSize { 
        
           			shards[iShard] = shards[iShard][0:shardSize] 
        
           		} else { 
        
           			shards[iShard] = AllocAligned(1, shardSize)[0] 
        
           		} 
        
           		outputs[outputCount] = shards[iShard] 
        
           		matrixRows[outputCount] = r.parity[iShard-r.dataShards] 
        
           		outputCount++ 
        
           	} 
        
           }

Also, agree with @klauspost that benchmarks/numbers would help understand the impact of this (if any).

klauspost · 2024-02-28T18:30:30Z

Leopard is pyramid encoded, so several layers are needed to reconstruct a bottom shard.

liamsi · 2024-02-29T12:10:34Z

Leopard is pyramid encoded, so several layers are needed to reconstruct a bottom shard.

It's been too long since I looked into leopard myself and I don't fully understand this tbh. Just from a high-level perspective, it should be possible to evaluate a polynomial given enough points (i.e. n+1) in O(n), with n being the degree of the polynomial (without having to fully recompute that polynomial). Are you saying that leopard does work differently/does not allow this and has to do its O(n log n) computations either way?

liamsi · 2024-03-01T07:07:53Z

If you look at the paper, what I mean is that you should be able to save computations by simply not adding missing shard positions to the set E if they are not required:

https://arxiv.org/pdf/1404.3458.pdf

In the implementation this should correspond to these:

reedsolomon/leopard.go

Lines 423 to 425 in 162f2ba

    
           // Fill in error locations. 
        
           var errorBits errorBitfield 
        
           var errLocs [order]ffe

So a simple check if they are required before setting them could do the job. But it might not be as easy as that as other parts of the code might operate under the assumption that if a position is not in E, then there was no error or erasure. So there might be other changes necessary still.

klauspost · 2024-03-01T11:08:57Z

Sorry. I don't read math. I can't even tell if the paper relates to Leopard.

Reading the code it seems like input is divided into "mips" (probably derived from mip-maps from 3D) that are stacked for the final output. (*errorBitfield).prepare() converts missing shards into the mips that are needed.

This makes sense to give the O(N*log(N)) characteristics of the encoding and means that to decode one shard all mip levels above have to be resolved. The isNeeded will resolve if a given mip level is needed for the reconstruction.

There may very well be additional optimization possible here, but I have no idea where to start looking for it.

liamsi · 2024-03-04T11:01:41Z

I think it is important that we have a way to compare performance, e.g. if we are trying to recover a single shard as required. Otherwise we don't know if this PR (or changes to it) improves anything.

klauspost · 2024-03-04T11:15:42Z

I think it is important that we have a way to compare performance, e.g. if we are trying to recover a single shard as required. Otherwise we don't know if this PR (or changes to it) improves anything.

Yes. Currently BenchmarkDecode1K tests (among others) the time for reconstructing a single shard (leopard-gf16-single). You can see the numbers for that in this chart - "Recover One" column.

This will probably not have changed those numbers, since the only difference from using ReconstructSome is that it will ignore some other missing shards. So a similar benchmark that tests ReconstructSome with some mutations is probably needed. I wouldn't expect it to be faster than "Recover One".

(But do note that 1K is so small the setup time is mostly dominating this particular benchmark)

liamsi · 2024-03-04T13:58:41Z

Let me try if what I wrote above (inspired by the paper that is the basis for leopard) and see if only adding required shards to the error bits, breaks anything and also if it actually has any noticeable performance gains.

…`required`

elias-orijtech · 2024-03-07T18:33:39Z

Based on #274, I've included the optimization for the error bits. See 0ae8b02 for details. In short:

errLocs must reflect the erasures, but also doesn't have an effect on runtime.
From my reading of the original C source code, errBits does seem to allow omission of unwanted shards.
I've added a more thorough ReconstructSome test.

The new tests fails in regular non-leopard configuration with

% go test -run ReconstructSome -v -short
Using pure Go
=== RUN   TestReconstructSome
panic: runtime error: slice bounds out of range [:65664] with capacity 0

goroutine 70 [running]:
github.com/klauspost/reedsolomon.(*reedSolomon).codeSomeShardsP.func1(0x0?, 0x10080)
	/Users/a/proj/orijtech/reedsolomon/reedsolomon.go:979 +0x2a4
created by github.com/klauspost/reedsolomon.(*reedSolomon).codeSomeShardsP in goroutine 34
	/Users/a/proj/orijtech/reedsolomon/reedsolomon.go:1011 +0x248
exit status 2
FAIL	github.com/klauspost/reedsolomon	0.622s

but I've so far failed to find an error in the test (and the leopards succeed). @klauspost can you spot my error?

elias-orijtech · 2024-03-07T18:59:10Z

Another issue with optimizing errBits is that the leopard8 cache is no longer correct:

reedsolomon/leopard8.go

Line 531 in 4e91954

r.inversion[errorBits.cacheID()] = c

because there is no longer a one-to-one relationship between errBits and errLocs.

…locations

elias-orijtech · 2024-03-08T12:52:56Z

I've fixed the caching issue by no longer including errBits, at the cost of calling errBits.prepare even for cache hits.

klauspost · 2024-03-08T13:56:35Z

Great stuff. I will take a look as soon as I get some extra time.

odeke-em · 2024-05-02T10:36:30Z

Kind ping @elias-orijtech @klauspost :-)

elias-orijtech marked this pull request as ready for review February 25, 2024 12:35

klauspost approved these changes Feb 26, 2024

View reviewed changes

liamsi mentioned this pull request Mar 4, 2024

Try further optimize ReconstructSome for leopard 8+16 #274

Closed

elias-orijtech and others added 6 commits March 7, 2024 18:26

Optimize ReconstructSome for leopard 8+16

d1a03f1

Try optimization: only add required shards errorBits/errLocs

a6dce07

Fix tests

53144aa

should be investigated properly instead: make errorLocs dependent on …

4e22353

…`required`

fixups

0ae8b02

Add more thorough ReconstructSome test

6aff8aa

elias-orijtech force-pushed the master branch from 813029d to 6aff8aa Compare March 7, 2024 18:26

elias-orijtech marked this pull request as draft March 7, 2024 18:38

Don't cache error bits that are no longer one-to-one mapped to error …

2c7d895

…locations

elias-orijtech requested a review from klauspost March 8, 2024 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize ReconstructSome for leopard 8+16 #272

Optimize ReconstructSome for leopard 8+16 #272

elias-orijtech commented Feb 24, 2024

odeke-em commented Feb 25, 2024 •

edited

Loading

klauspost left a comment

liamsi commented Feb 28, 2024 •

edited

Loading

klauspost commented Feb 28, 2024

liamsi commented Feb 29, 2024

liamsi commented Mar 1, 2024 •

edited

Loading

klauspost commented Mar 1, 2024

liamsi commented Mar 4, 2024 •

edited

Loading

klauspost commented Mar 4, 2024 •

edited

Loading

liamsi commented Mar 4, 2024

elias-orijtech commented Mar 7, 2024

elias-orijtech commented Mar 7, 2024

elias-orijtech commented Mar 8, 2024

klauspost commented Mar 8, 2024

odeke-em commented May 2, 2024

Optimize ReconstructSome for leopard 8+16 #272

Are you sure you want to change the base?

Optimize ReconstructSome for leopard 8+16 #272

Conversation

elias-orijtech commented Feb 24, 2024

odeke-em commented Feb 25, 2024 • edited Loading

klauspost left a comment

Choose a reason for hiding this comment

liamsi commented Feb 28, 2024 • edited Loading

klauspost commented Feb 28, 2024

liamsi commented Feb 29, 2024

liamsi commented Mar 1, 2024 • edited Loading

klauspost commented Mar 1, 2024

liamsi commented Mar 4, 2024 • edited Loading

klauspost commented Mar 4, 2024 • edited Loading

liamsi commented Mar 4, 2024

elias-orijtech commented Mar 7, 2024

elias-orijtech commented Mar 7, 2024

elias-orijtech commented Mar 8, 2024

klauspost commented Mar 8, 2024

odeke-em commented May 2, 2024

odeke-em commented Feb 25, 2024 •

edited

Loading

liamsi commented Feb 28, 2024 •

edited

Loading

liamsi commented Mar 1, 2024 •

edited

Loading

liamsi commented Mar 4, 2024 •

edited

Loading

klauspost commented Mar 4, 2024 •

edited

Loading