Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support send/recv chunk for checkin protocol #89

Merged
merged 5 commits into from
Dec 18, 2023

Conversation

blakerouse
Copy link
Contributor

Overview

Adds the ability for observed and expected checkin messages to be larger that the maximum message size, where the messages are chunked across the streaming protocol and merged back together.

Requirements

  • Only enabled if the Elastic Agent supports chunking
  • Only enabled when the client also support chunking

How it works

It divides messages units across multiple messages with a units_timestamp to ensure that the messages are merged back together are from the same original message. In the case that they are not the same units_timestamp then only the newer message is used and taken as a new start to units. This allows the chunk of messages to be cancelled and a new chunk to be sent even before all the previous chunks have been sent.

@blakerouse blakerouse added the Team:Elastic-Agent Label for the Agent team label Dec 5, 2023
@blakerouse blakerouse self-assigned this Dec 5, 2023
Copy link

cla-checker-service bot commented Dec 5, 2023

💚 CLA has been signed

Copy link
Member

@AndersonQ AndersonQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are 2 small blocker, small things just to avoid confusion.

Another thing I think would be good is to add at least a test for the case where a new chinked message arrives to ensure it discard the old and keeps the new

Another test I see is missing is for when the 1st message fits more than 1 unit

pkg/utils/utils.go Outdated Show resolved Hide resolved
pkg/utils/utils.go Outdated Show resolved Hide resolved
pkg/utils/utils.go Outdated Show resolved Hide resolved
pkg/utils/utils.go Show resolved Hide resolved
pkg/utils/utils_test.go Outdated Show resolved Hide resolved
pkg/utils/utils_test.go Outdated Show resolved Hide resolved
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @blakerouse

@blakerouse
Copy link
Contributor Author

there are 2 small blocker, small things just to avoid confusion.

Another thing I think would be good is to add at least a test for the case where a new chinked message arrives to ensure it discard the old and keeps the new

Updated the tests in adcba18 to do that.

Another test I see is missing is for when the 1st message fits more than 1 unit

Add tests in adcba18 that test this as well.

@blakerouse blakerouse merged commit 8a1940e into elastic:main Dec 18, 2023
2 checks passed
@blakerouse blakerouse deleted the chunk-checkin branch December 18, 2023 16:01
Comment on lines +120 to +122
if first.UnitsTimestamp.AsTime().After(msg.UnitsTimestamp.AsTime()) {
// not newer so we ignore the message
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing this makes sense but we probably I think we want a way to tell that it happened. If this begins to cause missed checkins we won't be able to know why to debug it.

}
}

func shallowCopyCheckinExpected(msg *proto.CheckinExpected) *proto.CheckinExpected {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proto.Clone is a deep copy but has the advantage that we don't have to keep this function up to date if we add a top level message field.

Is there a test we can write that would fail if you forgot to update this function (and the observed version of it)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants