-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TaskQueueStats to DescribeTaskQueueEnhanced #1553
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, just want to wait until it is available to users server side so we can test it like users will
@@ -345,24 +345,25 @@ type ( | |||
TaskQueueVersionSelection = internal.TaskQueueVersionSelection | |||
|
|||
// TaskQueueDescription is the response to [Client.DescribeTaskQueueEnhanced]. | |||
// WARNING: Worker versioning is currently experimental. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is being removed because this stuff is now considered GA? Isn't this all part of worker versioning technically the stats are per build ID, or is this RPC and the way it comes back considered GA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's something I wanted to discuss actually. The versioning stuff are not GA for sure. I'd like to be able to say the Enhanced DecribeTaskQueue API parts that don't have to do with versioning is GA. It's a little tricky because of the versions_info
map.
What do you guys think of a change like below to the API? It removes the empty string ugliness, also it allows us to keep the build_id related parts experimental and unversioned part GA.
cc @cretz @Quinn-With-Two-Ns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considering to rename TaskQueueVersionInfo
to just TaskQueueInfo
as well. That would fit better with unversioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That API change works for me, but the empty string does too. I have no strong opinion, though I could understand not wanting to go through a cycle of API changes just to be able to split up the "unstable" comment. Remember, we can't reasonably release something in the SDK until a user can actually use it in the server (we test the same way our users do to make sure things work for us the same way they do for them).
@@ -174,6 +216,7 @@ func (o *DescribeTaskQueueEnhancedOptions) validateAndConvertToProto(namespace s | |||
TaskQueueTypes: taskQueueTypes, | |||
ReportPollers: o.ReportPollers, | |||
ReportTaskReachability: o.ReportTaskReachability, | |||
ReportStats: o.ReportStats, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this file get properly go fmt
d?
@@ -174,6 +216,7 @@ func (o *DescribeTaskQueueEnhancedOptions) validateAndConvertToProto(namespace s | |||
TaskQueueTypes: taskQueueTypes, | |||
ReportPollers: o.ReportPollers, | |||
ReportTaskReachability: o.ReportTaskReachability, | |||
ReportStats: o.ReportStats, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this available in the current dev server? We won't want to make this available until users can actually use it and we have tested it the same way they might.
internal/internal_worker.go
Outdated
@@ -1608,7 +1608,14 @@ func NewAggregatedWorker(client *WorkflowClient, taskQueue string, options Worke | |||
tagNamespace, client.namespace, | |||
tagTaskQueue, taskQueue, | |||
tagWorkerID, workerParams.Identity, | |||
tagBuildID, workerParams.getBuildID(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you adding tagBuildID
twice here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
# Conflicts: # contrib/datadog/go.sum # contrib/opentracing/go.mod # contrib/opentracing/go.sum # contrib/tally/go.mod # go.mod # go.sum # internal/cmd/build/go.sum # test/go.mod # test/go.sum
@cretz @Quinn-With-Two-Ns it seems we are unblocked, dependency-wise. Can we get this PR merged? |
test/worker_versioning_test.go
Outdated
ts.NoError(err) | ||
|
||
// Give time for the task to be sent to the TQ | ||
time.Sleep(500 * time.Millisecond) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to avoid sleep in tests, it is very flaky, we should wait for the condition to become true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced it with eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good to merge once CI is passing. If you need any help there please reach out
return &TaskQueueStats{ | ||
ApproximateBacklogCount: stats.GetApproximateBacklogCount(), | ||
ApproximateBacklogAge: stats.GetApproximateBacklogAge().AsDuration(), | ||
TasksAddRate: stats.TasksAddRate, | ||
TasksDispatchRate: stats.TasksDispatchRate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cretz @Quinn-With-Two-Ns, realizing that eager activities are enabled by default, I'm now inclined to remove separate add/dispatch rate fields and replace both with a BacklogIncreaseRate
field. Reason being that the individual rates can be very off an easily mislead people because they'd not include eager activities and sticky workflow tasks.
wdyt?
(The API can remain the same, or I can change that in the next release as well by deprecating the separate rates and adding a new backlog_increase_rate
filed.)
return &TaskQueueStats{ | |
ApproximateBacklogCount: stats.GetApproximateBacklogCount(), | |
ApproximateBacklogAge: stats.GetApproximateBacklogAge().AsDuration(), | |
TasksAddRate: stats.TasksAddRate, | |
TasksDispatchRate: stats.TasksDispatchRate, | |
return &TaskQueueStats{ | |
ApproximateBacklogCount: stats.GetApproximateBacklogCount(), | |
ApproximateBacklogAge: stats.GetApproximateBacklogAge().AsDuration(), | |
ApproximateBacklogIncreaseRate: stats.TasksAddRate - stats.TasksDispatchRate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should give the same thing that the underlying API gives. If there are concerns about user confusion, they can be documented. If we must also provide helpers or derived fields we can (and we should document how they are derived). The SDK should not deviate so far from the API on these things (many users will use our API directly e.g. via HTTP, if it's not too confusing for them it's not too confusing for SDK API users).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I kept the separate fields with enough documentation and add a derivative field as well.
What was changed
Add TaskQueueStats to DescribeTaskQueueEnhanced to report backlog info and task add/dispatch rate.
Why?
Useful for worker scaling.
Checklist
Closes
How was this tested:
Functional test added.
Yes, will do in a separate PR.