Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supervisor: Emit active/publishing task counts #17268

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

adithyachakilam
Copy link
Contributor

Description

Adding this metric would help see how much of time a supervisor is spending to publish tasks, It is important to keep this time low because auto scaling would be skipped in during this period which could cause increased lag.

Release note

Adds new metrics: task/supervisor/active/count and task/supervisor/publishing/count.


Key changed/added classes in this PR
  • SeekableStreamSupervisor.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of reporting 2 new metrics, could you add the SeekableStreamIndexTaskRunner#status as a dimension to the service/heartbeat metric instead.

This change would make it so that we have visibility into all the different states a streaming task could be in, and the metrics would also provide visibility into which specific task is in which state, as opposed to just knowing the number of tasks that are in the publishing state.

@abhishekagarwal87
Copy link
Contributor

There has to be some docs changes. How are you going to infer the time in publishing tasks (btw what does supervisor publishing a task mean exactly)? And how do you keep that time low assuming you can find the time is high.

@adithyachakilam adithyachakilam marked this pull request as draft October 8, 2024 04:01
@kfaraz
Copy link
Contributor

kfaraz commented Oct 8, 2024

@adithyachakilam , leaving some suggestions here even though the PR is in draft right now.

how much of time a supervisor is spending to publish tasks

Could you please elaborate? What time are you referring to exactly?
The supervisor is just a thread which wakes up and launches or kills tasks and updates some metadata.

If you want to capture the time a task spends in publishing segments,
then the correct metric for that would be something like ingest/publish/time (in the same vein as ingest/handoff/time and ingest/merge/time).

If you want to capture the number of tasks currently in publishing phase etc, then as @suneet-s has suggested, emitting the current phase/state of a streaming task in its heartbeat makes sense.
But it would need some changes from the current approach:

  • The status is not an intrinsic property of a task and must not be a part of the Task interface. You can inject the runner to build up the heartbeat map in the CliPeon.heartbeatDimensions() method.
  • For non-streaming tasks, instead of always emitting UNKNOWN, do not emit any value for this dimension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants