Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle out of order shares in the data square #191

Closed
vgonkivs opened this issue Jun 22, 2023 · 20 comments · Fixed by #242
Closed

Handle out of order shares in the data square #191

vgonkivs opened this issue Jun 22, 2023 · 20 comments · Fixed by #242
Assignees
Labels
enhancement New feature or request

Comments

@vgonkivs
Copy link
Member

Based on this we may run in a situation when shares will be out of order in the dataSquare. In this case, we will not be able to compute the Merkle root of row or col and reconstruction fails. To handle such error properly we need to get these shares+indexes of Merkel Roots of the opposite axis(to build a Merkle Proof against them). If Merkle Roots for both row/col can't be computed, then we should receive an error that data is not available.

@musalbas
Copy link
Member

Is this not contradictory to this comment from @adlerjohn? Verification logic doesn't happen in rsmt2d.

We don't need a different data structure, we just need additional codepaths in the verification logic to handle the two additional cases described in OP. The data structure of the proof can remain the same.

@staheri14
Copy link
Contributor

@musalbas From what I understand, this issue is not about implementing the verification logic in the rsmt2d, rather it is about adding support for detecting situations where the root construction has failed due to unordered leaves and returning an error with enough information e.g., the index of unordered shares (column and row) to the caller, likely as part of the ErrByzantineData. Then the caller will take care of crafting a proper proof.

@musalbas
Copy link
Member

I don't think that can be handled directly in rsmt2d, because rsmt2d does not assume that the square uses a namespaced Merkle tree.

I also don't know if that's what's being requested, but we should clarify. cc @adlerjohn

@staheri14
Copy link
Contributor

staheri14 commented Jun 26, 2023

I don't think that can be handled directly in rsmt2d, because rsmt2d does not assume that the square uses a namespaced Merkle tree.

As far as I can say, it has access to the errors returned by the underlying tree construction function, based on which can populate the ErrByzantineData. Update: and those functions actually are able to detect unordered leaves.

Below are two approaches that I see to this issue (considering that what I have described in this comment is what this issue is aimed for).

  1. Attempt to compute the root using computeSharesRoot. Based on the returned error in here, which should include relevant information about the unordered shares, populate the ErrByzantineData structure and return it.
  2. Perform a sanity check on the available shares and ensure that their namespaces are sorted before attempting to repair the square and compute the root (i.e., the Repair function). This can be done as part of prerepairSanityCheck. This approach offers multiple benefits:
    a) The check is performed only on the shares for which we have an inclusion proof, rather than on the ones computed by erasure coding and repairing.
    b) It saves computation time for hashing by avoiding the attempt to construct the root.

@musalbas
Copy link
Member

musalbas commented Jun 27, 2023

I don't think it can be done in prerepairSanityCheck, because new unordered shares may be discovered in the middle of the solveCrossword loop when new rows or columns are reconstructed.

As for (1), are you proposing to add a new error field to ErrByzantineData?

@vgonkivs
Copy link
Member Author

Am I correct that in the case of out-of-order shares in the particular row/col, we will not be able to get the subtree roots and build the proof against this particular row/col? If so, then I guess in the node we can add a statement, during BEFP construction, that checks the subtree roots and re-requests the orthogonal axis.

@musalbas
Copy link
Member

musalbas commented Jun 27, 2023

The issue is that you might not be able to get the subtree root not only if the shares are out of order, but if the row that the BEFP for is simply erasure coded incorrectly, and thus does not match the subtree root in the block header. In that case, you need to use the inclusion proofs that the node already downloaded via sampling.

@vgonkivs
Copy link
Member Author

But does not ErrByzantineData store only sampled shares and not the rebuilt ones?

@musalbas
Copy link
Member

musalbas commented Jun 27, 2023

I believe ErrByzantineData returns the shares of the "last known good" square, which can include rebuilt rows and columns. This is because you may need to use shares from rebuilt rows and columns in the BEFP.

Basically, I believe the rule for choosing what proof should be used for a BEFP share should be:

  1. If a share is inside a fully and validly reconstructed row or column, then you can generate an inclusion proof from that row or column.
  2. Otherwise, you need to use Merkle inclusion proofs that the full node received from other nodes via sampling. Example: consider a square where all erasure coding quadrants (quadrants 2, 3, 4) contain garbage data that does not match the actual row or column roots.

@staheri14
Copy link
Contributor

I have read the comments and will get back soon to this thread.

@staheri14
Copy link
Contributor

I think it might be easier if we discuss our ideas using an example:

        c1    c2    c3    c4
      +-----+-----+-----+-----+
row 1 | s1  | s2  | s3' | s4' |
      +-----+-----+-----+-----+
row 2 | s5  | s6' | s7  | s8' |
      +-----+-----+-----+-----+
row 3 | s9  |s10' |s11  |s12' |
      +-----+-----+-----+-----+
row 4 |s13' |s14' |s15' |s16' |
      +-----+-----+-----+-----+

s1-s16 are the shares
s1,s2,s5,s6' constitute the original square and the rest are the extended part
Shares without ' are the ones for which we have valid inclusion proof to their respective rows or columns.
Shares with ' are the shares that are rebuilt
Suppose s5.namespace > s7.namespace (unordered).
We have inclusion proof of s5 to c1, and s7 to c3.

On the rsmt2d side, we want to return an error similar to the ErrByzantineData to encapsulate information about s5 and s7.
So, the questions are:

  • How much information is sufficient to be able to craft a BEFP on the caller side? My answer would be 1) Axis i.e., row 2) the row index e.g., 2, and 3) the index of the unordered shares within that row i.e., 1 and 2 (assuming indices start at 1) (we may need to make some changes to the current ErrByzantineData to accommodate new fields)
  • The other question is do we want to reconstruct s6' and s8' as well, before checking the namespaces order (relevant to this comment of @musalbas)? if we reconstruct s6' and its namespace happens to be larger than s5, do we want to return the index of s5 and s6' (instead of s5 and s7) as faulty shares? my proposal in my previous comment (second option) was to return s5 and s7 since they are associated with a valid inclusion proof to their respective columns, hence we can prove they are part of the square, however, we do not have any inclusion proof for s6' which was rebuilt.

Based on the answer to the above questions, we can then proceed with whether the current fields in ErrByzantineData are sufficient or should be extended.

@musalbas @vgonkivs @adlerjohn Please let me know your thoughts

@musalbas
Copy link
Member

musalbas commented Jun 28, 2023

AFAIK, we do not need to modify any fields in ErrByzantineData, so we don't need to add the index of unordered shares, nor do we need to return the newly repaired shares for an incorrect row/col root.

The only thing needed, is to see that the row root of the recomputed row does not match the row root that was committed to in the block header, because when we recompute the row root with the nmt library, it should return an err symbol as the hash, due to shares being out of order, and err will not match the committed row root in the block header.

See this comment for an explanation of why:

During the reconstruction process, if there are two shares that are out of order, then we should still be able to create an inclusion proof to half of the shares in that row or col using either the row or column roots. If the square is available, then this will always be the case.

Let's think about the end-to-end flow of sampling and reconstruction. Note that shares being out of order is not the only case where you cannot recompute a row or column root; this would also be the case if any reconstructed shares do not match the row or column root. We can conceptually treat an out-of-order share as the same as a recomputed share that does not match the row or column root.

Firstly, it should not be possible to sample a share that has any sibling nodes that are out of order, because the proof NMT verification will fail when it tries to reconstruct the root (due to a mismatch when calculating min and max nid).

Next, we should consider this rule:

Basically, I believe the rule for choosing what proof should be used for a BEFP share should be:

  1. If a share is inside a fully and validly reconstructed row or column, then you can generate an inclusion proof from that row or column.
  2. Otherwise, you need to use Merkle inclusion proofs that the full node received from other nodes via sampling. Example: consider a square where all erasure coding quadrants (quadrants 2, 3, 4) contain garbage data that does not match the actual row or column roots.

From the above, it doesn't seem to me like we need to create inclusion proofs ourselves in rows/cols with shares out of order, due to (2). Does that seem right @adlerjohn @evan-forbes?


Implementation wise, the only thing I'm unsure of is if err is be bubbled up in getRowRoot, and how solveCrossword behaves if there's an error, which should be reviewed. Seems like the tree's err is just returned, when the desired behavior should actually be to return ErrByzantineData, so that should be discussed. Either we change the err returned to be ErrByzantineData (which might not be clean, not sure we can assume that is good behaviour for all trees, or if NMT should actually return nil on unordered shares).

@adlerjohn
Copy link
Member

adlerjohn commented Jun 29, 2023

we do not need to modify any fields in ErrByzantineData

Agreed, the error already contains all the necessary information from rsmt2d that node needs.

Implementation wise, the only thing I'm unsure of is if err is be bubbled up in getRowRoot, and how solveCrossword behaves if there's an error, which should be reviewed. Seems like the tree's err is just returned, when the desired behavior should actually be to return ErrByzantineData, so that should be discussed. Either we change the err returned to be ErrByzantineData (which might not be clean, not sure we can assume that is good behaviour for all trees, or if NMT should actually return nil on unordered shares).

Good catch, and I think that's the missing piece. IMO rsmt2d should return an ErrByzantineData error if the tree construction fails for whatever reason. No need to return the tree's error. This doesn't even need to assume an underlying NMT; any tree construction failing is sufficient for the algorithm to fail.

@adlerjohn
Copy link
Member

I believe ErrByzantineData returns the shares of the "last known good" square, which can include rebuilt rows and columns. This is because you may need to use shares from rebuilt rows and columns in the BEFP.

Basically, I believe the rule for choosing what proof should be used for a BEFP share should be:

1. If a share is inside a fully and validly reconstructed row or column, then you can generate an inclusion proof from that row or column.

2. Otherwise, you need to use Merkle inclusion proofs that the full node received from other nodes via sampling. Example: consider a square where all erasure coding quadrants (quadrants 2, 3, 4) contain garbage data that does not match the actual row or column roots.

This is the correct intuition: either the shares in the ErrByzantineData---when extended---are committed to in the row/col root, or they're not. If they're not then by construction the shares are either on a complete col/row or they were obtained from sampling.

@vgonkivs
Copy link
Member Author

Agree with Mustafa and John that we do not need to modify fields in ErrByzantineData as we have a set of all the necessary information to build the BEFP and the out-of-order shares we will detect during nmt tree creation in BEFP validation.

@staheri14
Copy link
Contributor

Also, agreed with the point that ErrByzantineData is sufficient for the BEFP and doesn't need any change.

Implementation wise, the only thing I'm unsure of is if err is be bubbled up in getRowRoot, and how solveCrossword behaves if there's an error, which should be reviewed. Seems like the tree's err is just returned, when the desired behavior should actually be to return ErrByzantineData, so that should be discussed. Either we change the err returned to be ErrByzantineData (which might not be clean, not sure we can assume that is good behaviour for all trees, or if NMT should actually return nil on unordered shares).

Good catch, and I think that's the missing piece. IMO rsmt2d should return an ErrByzantineData error if the tree construction fails for whatever reason. No need to return the tree's error. This doesn't even need to assume an underlying NMT; any tree construction failing is sufficient for the algorithm to fail.

Also, agree with this.

@staheri14
Copy link
Contributor

staheri14 commented Jun 29, 2023

FYI: I'm moving forward with the implementation of the following solution in rsmt2d: if the tree construction fails for a row or column, during the repair process (this includes any inner function calls as well), an ErrByzantineData should be populated, and returned.

@staheri14 staheri14 added the enhancement New feature or request label Jun 29, 2023
@musalbas
Copy link
Member

Related, potentially the same issue: #178

@staheri14
Copy link
Contributor

Thanks for mentioning the #178.
It is relevant, however, they are addressing different problems. Nevertheless, the solution to the current issue would ultimately need to consume the fix to #178. That is, when preparing ErrByzantineData due to failure in root calculation (i.e., the current issue) for a newly completed orthogonal axis, the ErrByzantineData should also include the newly rebuilt share.

@staheri14
Copy link
Contributor

staheri14 commented Jul 3, 2023

Update: I actually need to also address #178 as part of addressing the current issue. @rootulp Have you started on #178? if not, I can address it as part of this issue.

Update: I think you already have started in #190, so forget about my question @rootulp

staheri14 added a commit that referenced this issue Jul 12, 2023
…hares in tree construction (#242)

## Overview
Closes #191 

## Checklist

- [x] New and updated code has appropriate documentation
- [x] New and updated code has new and/or updated testing
- [x] Required CI checks are passing
- [x] Visual proof for any user facing features like CLI or
documentation updates
- [x] Linked issues closed with keywords
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants