Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Unexpected reuse of payload instances in payload codec for specific workflow failure scenarios. #234

Open
robcao opened this issue Apr 26, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@robcao
Copy link
Contributor

robcao commented Apr 26, 2024

What are you really trying to do?

I'm writing a payload codec that attaches metadata to headers. I am noticing that for some specific workflow failure scenarios around ApplicationFailureException with Details objects, that under certain circumstances, the instance of the payload object passed to the payload codec is re-used from a previous payload.

Describe the bug

I have a very simple workflow that only contains an activity.

My sample activity always throws an ApplicationFailureException with a single details payload. My workflow just propagates the same exception and fails.

I believe there should be four payloads here in this scenario:

  1. The workflow input.
  2. The activity input.
  3. The activity output (in this case failure details).
  4. The workflow output (in this case failure details).

I am noticing that (what seems to be the same instance of) the payload that is being passed to the payload codec to encode both the failure from the activity, and the failure from the workflow.

I've created a payload codec implementation in MetadataExaminingPayloadCodec.cs that adds a uuid to the X-My-Metadata header in the payload metadata during encoding only if a value does not already exist on the payload, and during decoding, reads the value of the X-My-Metadata header in the payload metadata.

Minimal Reproduction

https://github.com/robcao/temporal-sdk-dotnet-payload-repro/tree/main

The reproduction runs as a NUnit test case for simplicity, and can be run by using the dotnet test command.

The output of the codec is as below. When trying to encode the payload for the workflow failure details, the payload being passed to the payload codec already has a X-My-Metadata header, indicating some sort of re-use.

  Standard Output Messages:
 Successfully added metadata 229a20b0-9b33-4053-b4b8-f3471b9cdae9...
 Now reading metadata 229a20b0-9b33-4053-b4b8-f3471b9cdae9...
 Successfully added metadata d5566faf-ce58-4917-92e5-86250da83fc8...
 Now reading metadata d5566faf-ce58-4917-92e5-86250da83fc8...
 Successfully added metadata 9fb8cb67-e8ec-41c8-b9b9-71ad6c639b95...
 Now reading metadata 9fb8cb67-e8ec-41c8-b9b9-71ad6c639b95...
 WARN: tried to add metadata, but metadata already exists with a value of 9fb8cb67-e8ec-41c8-b9b9-71ad6c639b95...
 Now reading metadata 9fb8cb67-e8ec-41c8-b9b9-71ad6c639b95...

Environment/Versions

  • OS and processor: Windows 10 Enterprise 10.0.19045 x64
  • Temporal Version: Temporal 1.22.5, sdk-dotnet 1.0.0
  • Are you using Docker or Kubernetes or building Temporal from source? No

Additional context

So far, I've only noticed this happening for this specific scenario around failure details, but I'm not sure if it can happen under any other scenarios.

I looked at the encryption sample, and I see that this implementation is wrapping the original payload as the data of a new payload.

 Task.FromResult<IReadOnlyCollection<Payload>>(payloads.Select(p =>
        {
            return new Payload()
            {
                Metadata =
                {
                    ["encoding"] = EncodingByteString,
                    ["encryption-key-id"] = keyIDByteString,
                },
                // TODO(cretz): Not clear here how to prevent copy
                Data = ByteString.CopyFrom(Encrypt(p.ToByteArray())),
            };
        }).ToList());

However, it's unclear in the documentation if this is actually necessary (or why if so).

This behavior is kind of strange.

If I change the workflow code to the below, the same problem is happening.

		[WorkflowRun]
		public async Task<WorkflowOutput> RunAsync(WorkflowInput input)
		{
			try
			{
				await Temporalio.Workflows.Workflow.ExecuteActivityAsync(
					() => Activities.FailureActivityWithDetails(new ActivityInput()),
					new()
					{
						StartToCloseTimeout = TimeSpan.FromMinutes(5),
					}).ConfigureAwait(true);
			}

			catch (ApplicationFailureException)
			{
				throw new NonRetryableWithDetailsException(new FailureDetails());
			}


			return new WorkflowOutput();
		}

However, if I modify the above code to catch a generic Exception, instead of an ApplicationFailureException, the payload instance no longer appears to be re-used.

		[WorkflowRun]
		public async Task<WorkflowOutput> RunAsync(WorkflowInput input)
		{
			try
			{
				await Temporalio.Workflows.Workflow.ExecuteActivityAsync(
					() => Activities.FailureActivityWithDetails(new ActivityInput()),
					new()
					{
						StartToCloseTimeout = TimeSpan.FromMinutes(5),
					}).ConfigureAwait(true);
			}

			catch (Exception)
			{
				throw new NonRetryableWithDetailsException(new FailureDetails());
			}


			return new WorkflowOutput();
		}
@robcao robcao added the bug Something isn't working label Apr 26, 2024
@robcao robcao changed the title [Bug] Unexpected reuse of payload instances in payload codec for payload scenarios. [Bug] Unexpected reuse of payload instances in payload codec for specific workflow failure scenarios. Apr 26, 2024
@cretz
Copy link
Member

cretz commented Apr 26, 2024

Your payload codec should not mutate the payload being passed in or try to maintain a reference to it in any way. Rather, you should always create a new payload. I will make a note to make this very clear in the codec documentation.

Yes https://github.com/temporalio/samples-dotnet/blob/main/src/Encryption/Codec/EncryptionCodec.cs works by creating a new payload. We don't clone before every invocation because some invocations may do nothing to the object and that would affect performance unnecessarily. If there was an easy way to "freeze" the object just for the life of the call we would.

Arguably we could have wrapped the raw proto payload object in an immutable container/interface, but this is more of an advanced API with certain implementation expectations. We will clarify in the documentation.

@robcao
Copy link
Contributor Author

robcao commented Apr 26, 2024

Got it, thank you.

To be clear, when you say "you should always create a new payload", you do not mean just a deep clone via something like payload.Clone(), but rather setting the original payload (allocated into a new byte array) as the data of an outer payload wrapper?

new Payload()
{
    Data = ByteString.CopyFrom(payload.ToByteArray()),
};

@cretz
Copy link
Member

cretz commented Apr 26, 2024

Literally just always creating a new instance of the payload object. How you "wrap" or "convert" the other one into the payload is up to you. A "clone" is probably fine, though I usually recommend against that because you usually do not want all of the previous payload's metadata as your own. Rather usually you want to serialize/wrap the entire other payload into the data of the new one, and do the inverse on decode. But technically all that matters is that no fields in the parameter are mutated or referenced beyond the life of the method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants