Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collation fetching fairness #4880
base: master
Are you sure you want to change the base?
Collation fetching fairness #4880
Changes from 29 commits
f4738dc
c7074da
73eee87
fa321ce
96392a5
0f28aa8
e5ea548
9abc898
c07890b
e50440e
42b05c7
2f5a466
ff96ef9
e837689
91cdd13
9f2d59b
a10c86d
b39858a
b30f340
c0f18b9
703ed6d
fba7ca6
d4f4ce2
5f52712
6c73e24
752f3cc
f0069f1
6b9f0b3
5f6dcdd
b8c1b85
f26362f
d6857fc
cde28cd
4c3db2a
b2bbdfe
e220cb4
01d121e
7b3c002
5dffdde
1c1744b
aaccab1
b1df2e3
ce3a95e
fe3c09d
b9ab579
fe623bc
d216689
ea99c7a
ee155f5
55b7902
515a784
4ef6919
bd7174f
df6165e
4c5c271
b0e4627
d1cf41d
df3a215
b70807b
f047036
94e4fc3
386488b
ff312c9
88d0307
af78352
d636091
2bb82eb
c782058
903f7f4
cefbce8
cb69361
1142a90
4438349
e82c386
4b2d4c5
1c91371
be34132
5c7b2ac
62c6473
9e3f62d
a4bc21f
6c103df
15e3a74
d6b35ca
586b56b
a04d480
13d5d15
86870d0
7b822af
558c82e
06c0fd0
ade7f9b
8ba2a80
ab70567
d24fdc1
f55390e
a2093ee
ded6fb5
cda9330
505eb24
94f573a
fa82404
55e7fb2
e27ddd4
a10c0c1
ee11c6a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should focus in documentation on what this is not how it is used. If you focus on how it is used and what to expect from it then you also open up for possibilities of it getting used elsewhere safely, as everybody can use it as long as they are fine with the stated contract.
Here in particular it seems that the name should also be changed to e.g.
last_fetch
, with documentation explaining what is to be expected: E.g. is this the last successful fetch or the last fetch that got initiated? When will this beNone
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is is confusing. I wanted to to use this field at some point and was surprised by the behaviour so I left a comment with my findings. I agree it would have been better to rename the field and leave a better comment. I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think it's a good idea to optimistically track what claims should be 'satisfied' already.
Once we fetch something it does not necessarily mean we second it. We send it to backing for validation and only then second it so we might effectively backpedal from our decision.
Consider cq: [a, b, b, b, b, b, b, b, b, ...]
We fetch candidate for a.
cq_state: [1, 0, 0, 0, 0, 0, ...]
So then we effectively stop fetching for a.
Turns out the candidate we fetched from
a
was rejected in backing for being invalid. Nevertheless our claim_queue_state stays as is and we heavily de-prioritise fetches fora
despite having seconded nothing for that slot. An malicious collator just by sending an invalid collation to all backers to block other incoming collations for that parachain.Is that how the proposed system works or have I missed something? Thinking how big of an issue it is but it seems concerning. We might have to clean-up the cq_state in such cases (maybe).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that's correct. We run some checks before enqueuing the candidate but we don't validate it of course. I suppose it shouldn't be hard to craft a garbage candidate which will pass the second checks, claim the slot and prevent legitimate validators from pushing candidates.
It will make more sense to have the 'satisfaction' check based on what's seconded and keep the fetches as 'pending seconding' (similar to the way I handle pending fetches in this PR). We can track this with
Seconded
andInvalid
messages and accept (and fetch) advertisements until we have enough seconded to 'satisfy' the claim queue.A misbehaving collator will still be able to spam the validators but on the first invalid collation it is supposed to get reported (and disconnected maybe? I don't know how this part of the code works).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworked the PR to track the number of seconded candidates per para. Additionally I realized that pending items per relay parent can't be more than one which simplifies the PR a little bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also keep in mind that cores are rotated. In general we have to consider that a claim queue position can be occupied by a backing of another validator in the backing group, but with rotations at boundaries it can even happen that a claim queue position is already occupied by a backing of the previous group.
E.g. one relay parent earlier a previous backing group was assigned and the claim queue looked like this [A,B,C].
Now in our view the claim queue looks like: [B,C,D] ... A already moved out, but B and C are still valid and might have been already provided by the previous backing group.
Let me know how complex it gets to account for that, maybe we might want to make this part of the bigger refactor.