-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add detail for GCP Cloud Run Job execution #3378
Add detail for GCP Cloud Run Job execution #3378
Conversation
1d2542c
to
5b37817
Compare
5b37817
to
3dbd919
Compare
ping @carlosalberto ptal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the linked page https://cloud.google.com/run/docs/managing/job-executions and it seems to confirm what I already suspected from the name "job execution". It does not really fit the version, it sounds like it would better fit the instance or invocation, or maybe cloud run jobs don't fit FaaS conventions at all.
A version should be something more or less permanent , that you can refer to, e.g. when you invoke a function (these conventions are about functions, not jobs), some version gets selected, and a particular version may be launched repeatedly.
E.g. when I read the docs here:
https://cloud.google.com/run/docs/managing/jobs#delete
Deleting a job cancels all running job executions of that job.
A job execution is something running, that's what an instance or invocation is, a version is something that can be run (multiple times).
See also https://cloud.google.com/run/docs/resource-model#job-executions and https://cloud.google.com/run/docs/resource-model#jobs
So I would say that CloudRun jobs don't really fit FaaS conventions well, but if we want to force them into the FaaS conventions, the job execution would match both an invocation_id and instance. I'm leaning slightly towards invocation_id, but not sure
Versions, instances, and invocations are all functionally similar (you could even argue they are the same) in the isolated context of Jobs, due to the inherent nature of Jobs. In effect, each In the broader context of Cloud Run, if you consider a job to be just a short-lived, single-request Service, then the concepts of version, instance, and invocation are equivalent:
We can rule out invocation, because that is not what we want to represent with this attribute. This attribute is meant to represent the Job itself, not the request made to it. Instance maps better to Execution (because each Task runs as its own instance). But there are 2 problems: 1) the Instance ID does not represent the deployment, it represents a single container and 2) we also provide Instance as its own attribute. Mapping Instance to both So with Invocation ruled out, and Instance mapped to Instance, that logically leaves (Job)Execution to map to (Service)Revision. We at the very least need somewhere to expose the Job Execution. And from a Cloud Run user's perspective, it would be expected in the same place as Service Revision. For users on Cloud Run, the same resource detector should be able to fill in the same fields whether for a Cloud Run Service or Cloud Run Job. If only for consistency with the existing Services conventions, we should not fill in fields based on different metadata sources within Cloud Run.
I understand that I'm being very self-referential by using the Service conventions to map to Jobs. I'm assuming that these existing conventions set a precedent for the inclusion of Cloud Run apps as FaaS concepts. So if the conventions for Services are valid, then this proposal for Jobs is also valid. Because you can, in theory, reduce a Service to a Job, and a Job to a Function. |
Actually "invocation" was just recently renamed and used to be named "execution" #3209. I think there is also not really a difference between "the Job itself" and "the request made to it". After all, the same applies to the invocation_id: It identifies the executing invocation, not the request causing the invocation (for example, on AWS Lambda, the request ID is generated by the Lambda backend, not by the requester). Regarding "instance":
What is that own attribute? The term "instance" has many meanings, but a "faas.instance" is an instance of a FaaS execution environment, whatever that encompasses (even multiple "instances" of something else). According to the Google Documentation, a job may invoke multiple containers, so container IDs cannot be used in that attribute anyway (since then a single faas.invocation_id would span multiple faas.instances which is not how these concepts are supposed to be used) In the end, it might also be worth it to introduce a special attribute gcp.job_execution. I just don't see how "version" would fit. |
I don't know the background for this change, but that PR description says it was changed because Is there any discussion on how the term "invocation" was decided, or a definition of it for this context? Josh's #3378 (comment) defined it as the request, so that's what I'm basing this on.
The Job is a resource that exists in the GCP Cloud Run API and handles requests. The request is an event initiated by a user to that resource. I think the fact that execution was renamed to invocation to make the term more specific means we should draw the distinction between the two, which I'm comparing to the definition in this section on the difference between instance and invocation.
faas.instance
A container is the instance of a Cloud Run execution environment. In Cloud Run Services, this container can serve multiple invocations. In Jobs, the container only lives for a single invocation but is technically still the execution environment.
As a span attribute, does invocation_id cross multiple instances? eg, a user calls a function that calls another function, is that the same invocation_id across both of those instances, or is the invocation_id meant to change whenever it hits a new instance? |
The renaming definitely did not have the intention to restrict the meaning of the attribute such that execution and invocation could now be two separate things. At least that is my interpretation. @tylerbenson do you agree? Or should we re-introduce execution (or execution_id) as an additional attribute? I don't think so... But I can see that the current definition of invocation_id is unclear with regard to your use case. You could adapt it though. |
It would usually be different for each function, but that might depend on the cloud provider. The attribute was originally modeled after the AWS Lambda Request ID https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html#runtimes-api-next. |
OK, now I get that I still don't get this aspect, sorry to keep bugging you here, but I think it might be important. So, what do you set as value here? The instance ID of the host that executes the container? Is it then possible that the same Job execution spans multiple instances? |
No problem. We use the
Yes
I don't mean that the renaming meant to split execution and invocation. I mean that it meant to more clearly define invocation as a request, rather than the execution. My argument here is that the execution is already represented by the version in Cloud Run.
That doc convinces me more that invocation is meant to map to the request event. |
I think this is not correct then. You should probably use host.id to store that. faas.instance is meant for the exectution environment of the function call, not part of it. That we even have to consider the difference, is a hint that faas conventions might really not be suited for CloudRun jobs.
You may be right. Regarding the version though, maybe a litmus test if you can use it could be "can you invoke the version multiple times?". If not, then it is probably not a version. |
This is the container ID on Cloud Run. That is the execution environment. The Host is not visible to the user, and is not appropriate for this.
Is this a requirement, or can a version auto-increment with each request? |
If the version auto-increments with every request, it would be something that can be used to uniquely identify the request, which would be usable as as invocation_id... It is not a requirement that is codified in the semantic conventions (yet), but is for the version concept I had in my mind. Regarding host.id, it is defined as:
But there is also container.id here: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/container.md |
(Can I suggest adding feedback as inline comments so it is easier to thread the conversation better. This format makes it difficult to follow the conversation and respond to individual items. I also suggest this would be a good topic for the FAAS SIG meeting later today. Perhaps you both could attend or send a representative?) |
Unfortunately that meeting is very late in my timezone, I won't be able to attend. But I'm very likely to accept the outcome reached there, if you discuss it and other FaaS SIG members had opinions on the topic. 😃 |
@Oberon00 understand. Perhaps you could summarize your concerns into a single paragraph that I can paste into the meeting agenda? (Or you're welcome to add it directly yourself.) |
@Oberon00 My read of Specifically:
You cannot just change this wording in your definition. The way I read the description, everything Mike has proposed is inline with the current definition. In fact given the Additionally, the term "execution environment" here is highly suspect. Perhaps in AWS there's some abstraction between your deployment and a running container. In cloud run that abstraction is NOT identified or called out as something you can look at. There's just (deletable) does this function name / job name exist, or there's a running container instance. +1 to @tylerbenson comments. Right now I'd like an ordered list of your concerns with this PR so that they can be discussed and addressed. |
Quick summary from the SIG meeting: |
My concern is simple, and there is only a single one: A job execution ID sounds like a totally different concept from a function version. Using invocation_id or instance were suggestions. I am also OK if you use another more fitting attribute, like |
36c082f
to
5b58946
Compare
Thanks for all the feedback so far. I do appreciate everyone's patience and help with this. I pushed a new commit 5b58946 that reverts the originally proposed Job Run version definition. I added a clarification that the existing version definition only applies to Services. I then added a new cloud provider resource directory and 2 new attributes for Cloud Run based on Josh's suggestion #3378 (comment):
We will need to update our resource detectors to differentiate between Jobs and Services for Cloud Run. These detectors are stable so we can't change how Services are handled, but I think we can add Jobs as a subset without breaking the existing Service workflow. Thanks again for all the thoughtful input on this, hopefully we can reach a resolution soon. |
@tylerbenson @Aneurysm9 @jsuereth What do you think about reverting the renaming / redefinition of faas.execution to faas.invocation_id done in #3209, or adjusting the changes in such a way that they could also be used to store the Cloud Watch Job execution ID, so that we don't need a new attribute? |
@Oberon00 faas.invocation/execution is a span attribute, but what we want is a resource attribute so would that also involve generalizing it to the resource level? |
We actually already have many FaaS attributes that can be set at either the Span or resource level (like faas.id). They are all "usually resource", so this would be the first "usually span" attribute explicitly being allowed in both positions, but I don't see a problem with it. (Note that you will still want at least one FaaS attribute on your FaaS root span that is always on the span, otherwise you can't determine which of your spans represents the FaaS invocation, if everything is on the resource. E.g. |
@Oberon00 I'm still not certain invocation_id / execution_id make sense in my head vs. a trace_id. I think it's worth having that discussion on a new bug/channel. Given this PR now introduces gcp specific attributes, we can always double-write IDs if we define an "exeuction_id" we're happy with. I think fundamentally, the "model" used by the FAAS semconv group should be well documented so we can better understand how to align our solution against it. Happy to continue that conversation. For now, we'd love to see this PR go through so Cloud Run Jobs have valid identities in OTEL in the meantime. |
Since the model is not formally defined in all details, we can also define it in such a way that your concerns about "not fitting" attributes are resolved EDIT: There is a bit of definition language at https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/faas.md#difference-between-invocation-and-instance that should be taken into account/moved/changed when we do this. |
specification/resource/semantic_conventions/cloud_provider/gcp/cloud_run.md
Show resolved
Hide resolved
semantic_conventions/resource/cloud_provider/gcp/cloud_run.yaml
Outdated
Show resolved
Hide resolved
f2d4075
to
92070cc
Compare
The link check failure looks unrelated to this change, is this good to merge? |
FYI @damemi @jsuereth @reyang this is missing a changelog entry, we need to make sure the changelog entry is added in the new https://github.com/open-telemetry/semantic-conventions (to be created this week) repo. Related to #3474 (review). |
@reyang should the new repo have its own CHANGELOG.md just for semantic conventions? |
I think the new repo should have a single CHANGELOG.md file, if there isn't one, it should be created. |
sounds good, opened that PR here open-telemetry/semantic-conventions#18 |
Changes
Add a note for Cloud Run Jobs on how to identify
faas.version
.Related issues
Downstream PR proposing this semconv change: GoogleCloudPlatform/opentelemetry-operations-go#465
Related OTEP(s) #