-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cooldown feature to properly manage several instances of emissary #4293
Fix cooldown feature to properly manage several instances of emissary #4293
Conversation
e8fe26c
to
5a83d04
Compare
5a83d04
to
287be1d
Compare
597ee5f
to
a80a73a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM.
@LanceEa is release/v2.3
the correct head branch? Should this target master
instead?
@knlambert - I will have some time to look at this more later and think about it but just as a quick glance.
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@knlambert - Overall it looks good, I have left a few style suggestions and a few test cases we should add coverage for.
Once addressed we can get this merged into master and cherry-picked to the release/v2.3
branch as well.
d47e2fe
to
c121e99
Compare
I understood that for when we reset the backoff....I was more trying to understand why we intialized it so the value was in the future. My assumption is so that the first set of reported values are 30s after startup rather than right away trying to trigger a report. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets cherry pick this on release/v2.3
as soon as it lands on master. We don't want the changelog to be lying!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, what's the expected rate of metrics per second?
Would it make sense to set a fixed buffer size and push if its either full or after 30 seconds?
It depends on the number of running edge stack instances. With a 3 nodes setup. it's usually around 3 requests per second, so we would need to know the number of instances to calculate this buffer size. More-other, we don't want to push it before 30 s because it's not necessary, and causes a lot of stress to the system-a api. The 30s value is a necessary cooldown period implemented at the agent level since we can't configure it from the metrics sync of envoy. |
c121e99
to
30f9c04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm...now that CI is green and merge conflicts are all good
Let's squash the commits and clean up commit message when we merge since they are all related. |
Signed-off-by: Kévin Lambert <kevin.lambert.ca@gmail.com>
Signed-off-by: Kévin Lambert <kevin.lambert.ca@gmail.com>
Signed-off-by: Kévin Lambert <kevin.lambert.ca@gmail.com>
30f9c04
to
45c3df1
Compare
…#4293) * Fix cooldown feature to properly manage several instances of emissary Signed-off-by: Kévin Lambert <kevin.lambert.ca@gmail.com> * Apply style guidelines Signed-off-by: Kévin Lambert <kevin.lambert.ca@gmail.com> * Add tests for edge cases Signed-off-by: Kévin Lambert <kevin.lambert.ca@gmail.com>
…#4293) * Fix cooldown feature to properly manage several instances of emissary Signed-off-by: Kévin Lambert <kevin.lambert.ca@gmail.com> * Apply style guidelines Signed-off-by: Kévin Lambert <kevin.lambert.ca@gmail.com> * Add tests for edge cases Signed-off-by: Kévin Lambert <kevin.lambert.ca@gmail.com> (cherry picked from commit a73be1c)
…#4293) (#4343) * Fix cooldown feature to properly manage several instances of emissary Signed-off-by: Kévin Lambert <kevin.lambert.ca@gmail.com> * Apply style guidelines Signed-off-by: Kévin Lambert <kevin.lambert.ca@gmail.com> * Add tests for edge cases Signed-off-by: Kévin Lambert <kevin.lambert.ca@gmail.com> (cherry picked from commit a73be1c)
Description
Previously, we added a cooldown step (30s) for the metrics to avoid to send too many requests to the ambassador cloud API.
Although, some values were not consistent, and we figured out that the way the metrics are sent to the agent are one request per emissary node, and not one for all.
It means that the previous iterations of the code were wrong for two reasons :
As a workaround, with minimal changes, this PR changes the logic to make the metric relay handler to accumulate metrics in a map per nodes' IP, and unload them after the cooldown period.
Testing
Adapted existing tests, & tested it in a dev cluster.
Checklist
I made sure to update
CHANGELOG.md
.Remember, the CHANGELOG needs to mention:
This is unlikely to impact how Ambassador performs at scale.
Remember, things that might have an impact at scale include:
My change is adequately tested.
Remember when considering testing:
I updated
DEVELOPING.md
with any any special dev tricks I had to use to work on this code efficiently.The changes in this PR have been reviewed for security concerns and adherence to security best practices.