-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offline: Provide incoming batch's starting CSN to the PendingStateManager #21714
Conversation
Need to compute it properly in CR still.
@@ -2619,7 +2623,11 @@ export class ContainerRuntime | |||
// but will not modify the contents object (likely it will replace it on the message). | |||
const messageCopy = { ...messageArg }; | |||
const savedOp = (messageCopy.metadata as ISavedOpMetadata)?.savedOp; | |||
for (const message of this.remoteMessageProcessor.process(messageCopy)) { | |||
const processResult = this.remoteMessageProcessor.process(messageCopy); | |||
if (processResult === undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It confuses me that process result can ever be undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's any chunk but the last one, the OpSplitter just holds onto them and doesn't return anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should assert non final chunk is the only case that hits this condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's necessary, but maybe I'm not seeing it. What/where exactly would you assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a more explicit* approach. The code is making assumptions about remoteMessageProcessor that could change in any moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to add a comment here explaining the undefined result and its meaning and the early return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a comment. This whole section of code has massive assumptions undergirding it which I don't like - biggest one is that this is secretly running in the equivalent of a synchronous for loop for all messages in a single batch. So the early return
here is more like a continue
.
As for assumptions about what RemoteMessageProcessor
does, that's a good call-out. Keep in mind too that this a very specific helper class used only in this one place, so there is some expectation that the two sides of this call understand how the other side works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm strongly considering a follow up to make this the contract between the two: RemoteMessageProcessor
only returns something when it reaches the end of a batch, otherwise it returns undefined. That'll be a lot simpler for this code to reason over IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assert messageCopy is of type ChunkedOp and maybe that is not the last one. Just making sure that we're falling into the expected case and being more explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, I can see the appeal of that. Tactically, it's kind of difficult* and since I have it in mind to make further clarifications as mentioned, I'll pass for now.
* "difficult", more like breaks the abstraction. This bit of code doesn't otherwise deal with digging into the inner type or inspecting that payload (although I do believe all the info you're talking about is all in there). Even the comment I put there doesn't mention chunked ops specifically, because it's not directly relevant - the contract of this function is just that RMP will return 0 or more messages to be processed next, as it ingests the virtualized ops one-by-one.
packages/runtime/container-runtime/src/opLifecycle/opDecompressor.ts
Outdated
Show resolved
Hide resolved
return { | ||
messages: this.opGroupingManager.ungroupOp(message).map(unpack), | ||
batchStartCsn, | ||
}; | ||
} | ||
|
||
// Do a final unpack of runtime messages in case the message was not grouped, compressed, or chunked | ||
unpackRuntimeMessage(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish this was typesafe, not important for this PR.
type MessageWithContext = { | ||
local: boolean; | ||
savedOp?: boolean; | ||
batchStartCsn: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
Regardless of chunking, compression, grouping config.
⯅ @fluid-example/bundle-size-tests: +3.9 KB
Baseline commit: 775079f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Fixes AB#8544
On submit, the PendingStateManager stores the batch's starting CSN along with each message. We need the corresponding information for incoming messages too, for two reasons:
Today, two things happening in RemoteMessageProcessor keep the correct info from flowing to the PSM:
Here's the fix - the RemoteMessageProcessor has the full context and can return the batch's starting CSN with each message, irrespective of grouped batching, compression, chunking, etc.
Reviewer Guidance
These changes are mostly internal to RemoteMessageProcessor, plus the signature changes to plumb the CSN down to PSM. To prove correctness, we add an error log in PSM if the batch start CSNs don't match.
Future consideration
A more comprehensive approach which I like, but is more disruptive, would be to have the RemoteMessageProcessor hold onto the batch messages until the last one arrives, and only then release the batch to be returned for processing (similar to how OpSplitter works today). This would make it more obvious that batches are processed all at once, and when processing the batch other parts of the system - e.g. PendingStateManager - could trivially get at the start of the batch (for CSN and in the future the batch ID).