Nexus #1466

bergundy · 2024-05-10T21:48:40Z

What was changed

EDIT: Merged #1473 and #1475 into this PR, it now included the entire Nexus implementation for the SDK.

Added the temporalnexus package and implemented the handler side for Nexus, including registering and dispatching Nexus Operations.

Tests only pass with server main, so this PR should not be merged until the server is released.
A future PR will complete the nexus work allowing invoking Nexus Operations from a workflow.

See the proposal for more information.

Also now memoizing worker.Start() to return consistent errors to callers and avoid rerunning the function unnecessarily.

Merge Checklist:

Release Server
Depend on tagged Nexus SDK

cretz

Looks great. Mostly minimal stuff.

cretz · 2024-05-13T13:08:56Z

go.mod

+go 1.21
+
+toolchain go1.21.1


Curious, is this just an artifact of your tooling or was this change required?

I just ran go get ... and go mod tidy but it may have been required due to the nexus sdk using slog.

Just FYI we should remove this before merging to main, it seems to mess up out CI trying to test multiple versions of Go

go.mod

internal/common/metrics/constants.go

internal/internal_nexus_task_handler.go

cretz · 2024-05-13T13:17:18Z

internal/internal_nexus_task_handler.go

+	// Associate the NexusOperationContext with the context.Context used to invoke operations.
+	ctx := context.WithValue(context.Background(), nexusOperationContextKey, nctx)
+
+	timeoutStr := header.Get(nexus.HeaderRequestTimeout)


Hrmm, I might have expected server to handle timeout and send cancellation. If a handler chooses not to respect timeout, what happens? If it is also handled server side, I think it's best to not also do it here except maybe with some considerable leeway to ensure server's cancellation logic is the one always processed.

We don't have sticky execution so there's not a way for the server to send cancelation. The server propagates this from the client request and also has its own context deadline.

I think it's good to cancel work that we know can't complete in time and have the server propagate this timeout.
As for whether the context deadline in the SDK should be shorter/same/longer than the one tracked on the server, fair point, but maybe shorter here is better so the SDK doesn't get a false sense of completion and the metrics we emit can be more accurate.

But what happens in racy cases where SDK and server hit at the same time? If server-side happens first does that look the exact same as if the client-side one hit first and reported this failure back? It's important to have one system be the arbiter of true timeout errors. If you want a just-in-case for the other system, no prob, just make it long enough to never be first, but having two separate systems that race each other to report timeout failure can result in racy inconsistencies.

Keep in mind that clients also set deadlines on the request context.
I think giving the most up-to-date and accurate timeout to all of the processes involved in handling this request is preferable. That's how gRPC does it and this is essentially the SDK handling RPCs.

Also note that on context deadline errors we don't respond to the server, we just drop the task so I'm not as concerned with the racy inconsistency you're talking about.

internal/internal_nexus_task_poller.go

cretz · 2024-05-13T13:25:06Z

internal/internal_nexus_worker.go

+
+// Start the worker.
+func (w *nexusWorker) Start() error {
+	err := verifyNamespaceExist(w.workflowService, w.executionParameters.MetricsHandler, w.executionParameters.Namespace, w.worker.logger)


Unrelated to this PR specifically, but sure wouldn't mind if this moved up to the aggregate worker instead of in each

cretz · 2024-05-13T13:27:25Z

internal/internal_worker.go

@@ -953,6 +994,14 @@ func (aw *AggregatedWorker) Start() error {
 	}
 	proto.Merge(aw.capabilities, capabilities)

+	return aw.memoizedStart()


I think this whole method should go inside memoized start. No need to repeat stuff above for each call.

Works for me, this was something that @Quinn-With-Two-Ns requested, so just confirming he's also okay with that.

Users may retry starting their worker if it failed that is why I requested we don't memorize it, I think the likely hood is low, but it's very little effort to not memorize it so why not just avoid the breaking change?

I changed it to what @cretz suggested, I'm open to changing back.
I don't have a strong opinion here but slightly prefer memoizing the entire thing because it's easier to reason about.

internal/nexus_operations.go

temporalnexus/operation.go

cretz · 2024-05-16T13:21:08Z

Probably obvious, but let's not merge this until there's a server that works with it

bergundy · 2024-06-19T23:41:21Z

Rebased and merged into the nexus branch.
I'll issue a separate PR to merge nexus into main once a server supporting Nexus is released.

* Nexus Handler * Execute nexus operation from a workflow * Add test environment support for Nexus Operations

## What was changed - Added the `temporalnexus` package and implemented the handler side for Nexus, including registering and dispatching Nexus Operations. - Added the ability to execute Nexus Operations from a workflow. - Added basic support for running Nexus Operations in the test environment. - Added memoizing to `worker.Start()` to return consistent errors to callers and avoid rerunning the function unnecessarily. - Updated the integration test's dev server to run CLI `0.14.0-nexus.0` which includes server `1.25.0-rc.0`. See the [proposal](https://github.com/temporalio/proposals/blob/b72c49b0c2278e916265b00a49638006f8fce469/nexus/sdk-go.md) for more information. Most of this code has been reviewed already in #1466, #1473, and #1475, which are all squashed in the first commit.

bergundy requested a review from a team as a code owner May 10, 2024 21:48

cretz reviewed May 13, 2024

View reviewed changes

bergundy force-pushed the bergundy/nexus-handler branch from 4085dfa to d097b3a Compare May 13, 2024 17:47

Quinn-With-Two-Ns approved these changes May 13, 2024

View reviewed changes

bergundy force-pushed the bergundy/nexus-handler branch from d097b3a to 32a3fd8 Compare May 15, 2024 21:29

bergundy mentioned this pull request May 15, 2024

Execute nexus operation from a workflow #1473

Merged

1 task

cretz mentioned this pull request May 16, 2024

Specify go compiler minor version temporalio/cli#563

Merged

cretz approved these changes May 16, 2024

View reviewed changes

bergundy added 8 commits June 18, 2024 14:55

Nexus Handler

523b6b0

Address review comments

549a05f

Group metric defs

d0d7304

Execute nexus operation from a workflow

3d806fb

Address review comments

e5cc6b8

Add test environment support for Nexus Operations

4e249d1

Change client to not allow any direct calls

07b68f5

Fix issues from rebase on main

571b49a

bergundy changed the base branch from master to nexus June 19, 2024 23:36

bergundy force-pushed the bergundy/nexus-handler branch from 660c124 to 571b49a Compare June 19, 2024 23:38

bergundy changed the title ~~Nexus Handler~~ Nexus Jun 19, 2024

bergundy merged commit 75fcd25 into temporalio:nexus Jun 19, 2024
3 of 11 checks passed

bergundy deleted the bergundy/nexus-handler branch June 19, 2024 23:40

bergundy added a commit that referenced this pull request Jul 19, 2024

Nexus (#1466)

648d46d

* Nexus Handler * Execute nexus operation from a workflow * Add test environment support for Nexus Operations

bergundy mentioned this pull request Jul 19, 2024

Nexus #1555

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nexus #1466

Nexus #1466

bergundy commented May 10, 2024 •

edited

Loading

cretz left a comment

cretz May 13, 2024

bergundy May 13, 2024

Quinn-With-Two-Ns Jun 19, 2024

cretz May 13, 2024

bergundy May 13, 2024

cretz May 13, 2024 •

edited

Loading

bergundy May 13, 2024

cretz May 13, 2024

cretz May 13, 2024

bergundy May 13, 2024

Quinn-With-Two-Ns May 13, 2024

bergundy May 13, 2024

cretz commented May 16, 2024

bergundy commented Jun 19, 2024

Nexus #1466

Nexus #1466

Conversation

bergundy commented May 10, 2024 • edited Loading

What was changed

cretz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cretz May 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cretz commented May 16, 2024

bergundy commented Jun 19, 2024

bergundy commented May 10, 2024 •

edited

Loading

cretz May 13, 2024 •

edited

Loading