Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TaskQueueStats to DescribeTaskQueueEnhanced #1553

Merged
merged 14 commits into from
Aug 16, 2024
Merged

Conversation

ShahabT
Copy link
Contributor

@ShahabT ShahabT commented Jul 19, 2024

What was changed

Add TaskQueueStats to DescribeTaskQueueEnhanced to report backlog info and task add/dispatch rate.

Why?

Useful for worker scaling.

Checklist

  1. Closes

  2. How was this tested:

Functional test added.

  1. Any docs updates needed?

Yes, will do in a separate PR.

@ShahabT ShahabT requested a review from a team as a code owner July 19, 2024 07:25
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, just want to wait until it is available to users server side so we can test it like users will

@@ -345,24 +345,25 @@ type (
TaskQueueVersionSelection = internal.TaskQueueVersionSelection

// TaskQueueDescription is the response to [Client.DescribeTaskQueueEnhanced].
// WARNING: Worker versioning is currently experimental.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is being removed because this stuff is now considered GA? Isn't this all part of worker versioning technically the stats are per build ID, or is this RPC and the way it comes back considered GA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's something I wanted to discuss actually. The versioning stuff are not GA for sure. I'd like to be able to say the Enhanced DecribeTaskQueue API parts that don't have to do with versioning is GA. It's a little tricky because of the versions_info map.

What do you guys think of a change like below to the API? It removes the empty string ugliness, also it allows us to keep the build_id related parts experimental and unversioned part GA.
image
cc @cretz @Quinn-With-Two-Ns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering to rename TaskQueueVersionInfo to just TaskQueueInfo as well. That would fit better with unversioned.

Copy link
Member

@cretz cretz Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That API change works for me, but the empty string does too. I have no strong opinion, though I could understand not wanting to go through a cycle of API changes just to be able to split up the "unstable" comment. Remember, we can't reasonably release something in the SDK until a user can actually use it in the server (we test the same way our users do to make sure things work for us the same way they do for them).

@@ -174,6 +216,7 @@ func (o *DescribeTaskQueueEnhancedOptions) validateAndConvertToProto(namespace s
TaskQueueTypes: taskQueueTypes,
ReportPollers: o.ReportPollers,
ReportTaskReachability: o.ReportTaskReachability,
ReportStats: o.ReportStats,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this file get properly go fmtd?

@@ -174,6 +216,7 @@ func (o *DescribeTaskQueueEnhancedOptions) validateAndConvertToProto(namespace s
TaskQueueTypes: taskQueueTypes,
ReportPollers: o.ReportPollers,
ReportTaskReachability: o.ReportTaskReachability,
ReportStats: o.ReportStats,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this available in the current dev server? We won't want to make this available until users can actually use it and we have tested it the same way they might.

@@ -1608,7 +1608,14 @@ func NewAggregatedWorker(client *WorkflowClient, taskQueue string, options Worke
tagNamespace, client.namespace,
tagTaskQueue, taskQueue,
tagWorkerID, workerParams.Identity,
tagBuildID, workerParams.getBuildID(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you adding tagBuildID twice here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

# Conflicts:
#	contrib/datadog/go.sum
#	contrib/opentracing/go.mod
#	contrib/opentracing/go.sum
#	contrib/tally/go.mod
#	go.mod
#	go.sum
#	internal/cmd/build/go.sum
#	test/go.mod
#	test/go.sum
@ShahabT
Copy link
Contributor Author

ShahabT commented Aug 13, 2024

@cretz @Quinn-With-Two-Ns it seems we are unblocked, dependency-wise. Can we get this PR merged?

ts.NoError(err)

// Give time for the task to be sent to the TQ
time.Sleep(500 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to avoid sleep in tests, it is very flaky, we should wait for the condition to become true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced it with eventually.

Copy link
Contributor

@Quinn-With-Two-Ns Quinn-With-Two-Ns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good to merge once CI is passing. If you need any help there please reach out

Comment on lines +290 to +294
return &TaskQueueStats{
ApproximateBacklogCount: stats.GetApproximateBacklogCount(),
ApproximateBacklogAge: stats.GetApproximateBacklogAge().AsDuration(),
TasksAddRate: stats.TasksAddRate,
TasksDispatchRate: stats.TasksDispatchRate,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cretz @Quinn-With-Two-Ns, realizing that eager activities are enabled by default, I'm now inclined to remove separate add/dispatch rate fields and replace both with a BacklogIncreaseRate field. Reason being that the individual rates can be very off an easily mislead people because they'd not include eager activities and sticky workflow tasks.
wdyt?

(The API can remain the same, or I can change that in the next release as well by deprecating the separate rates and adding a new backlog_increase_rate filed.)

Suggested change
return &TaskQueueStats{
ApproximateBacklogCount: stats.GetApproximateBacklogCount(),
ApproximateBacklogAge: stats.GetApproximateBacklogAge().AsDuration(),
TasksAddRate: stats.TasksAddRate,
TasksDispatchRate: stats.TasksDispatchRate,
return &TaskQueueStats{
ApproximateBacklogCount: stats.GetApproximateBacklogCount(),
ApproximateBacklogAge: stats.GetApproximateBacklogAge().AsDuration(),
ApproximateBacklogIncreaseRate: stats.TasksAddRate - stats.TasksDispatchRate,

Copy link
Member

@cretz cretz Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should give the same thing that the underlying API gives. If there are concerns about user confusion, they can be documented. If we must also provide helpers or derived fields we can (and we should document how they are derived). The SDK should not deviate so far from the API on these things (many users will use our API directly e.g. via HTTP, if it's not too confusing for them it's not too confusing for SDK API users).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I kept the separate fields with enough documentation and add a derivative field as well.

@ShahabT ShahabT merged commit b5db2b7 into master Aug 16, 2024
14 checks passed
@ShahabT ShahabT deleted the shahab/backlog-stats branch August 16, 2024 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants