Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visibility into LagBased AutoScaler desired task count #16199

Conversation

adithyachakilam
Copy link
Contributor

@adithyachakilam adithyachakilam commented Mar 25, 2024

This PR provides visibility into the actions taken by LagBasedAutoScaler and shows us the behaviour of auto scaler. This introduced metric provides insight into what actually is desired task count, how many times did we skip such actions etc. Info obtained from this metric could later be used to tweak the auto scaler config.

Release note

  • Druid operator could compare between ingest/autoScaler/lagBased/requiredTasks and task/running/count to know the exact gap between current and desired task counts.
  • Druid operator could now track how many times a scale action has been skipped because
    • It occurred too early than minTriggerScaleActionFrequencyMillis.
    • Task count is already at max/min and couldn't scale further.

Key changed/added classes in this PR
  • DynamicAllocationTasksNotice
  • LagBasedAutoScaler

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@abhishekrb19 abhishekrb19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I left a few naming suggestions and test verification for the new metrics. Thanks, @adithyachakilam!

@adithyachakilam
Copy link
Contributor Author

@abhishekrb19 Thanks for the review, addressed the comments!

docs/operations/metrics.md Outdated Show resolved Hide resolved
@adithyachakilam adithyachakilam changed the title Visibility into skipped scale actions Visibility into LagBased AutoScaler desired task count Mar 26, 2024
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some final comments.

{
EasyMock.expect(spec.getSupervisorStateManagerConfig()).andReturn(supervisorConfig).anyTimes();

EasyMock.expect(spec.getDataSchema()).andReturn(getDataSchema()).anyTimes();

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
SeekableStreamSupervisorSpec.getDataSchema
should be avoided because it has been deprecated.
{
EasyMock.expect(spec.getSupervisorStateManagerConfig()).andReturn(supervisorConfig).anyTimes();

EasyMock.expect(spec.getDataSchema()).andReturn(getDataSchema()).anyTimes();

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
SeekableStreamSupervisorSpec.getDataSchema
should be avoided because it has been deprecated.
@georgew5656
Copy link
Contributor

merging as this looks good to me and all the comments have been addressed

@georgew5656 georgew5656 merged commit a65b2d4 into apache:master Mar 27, 2024
85 checks passed
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants