Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify interaction of format: byte and Content-Transfer-Encoding header definition #3929

Merged
merged 5 commits into from
Jun 29, 2024

Conversation

mikekistler
Copy link
Contributor

This PR clarifies (IMO) the interaction of format: byte in the schema of a multipart/form-data body with the Content-Transfer-Encoding header definition in the corresponding field of an Encoding Object.

I found the original text confusing because it talked about "setting the Content-Transfer-Encoding header", which I could not tell if this meant in the OpenAPI description or in the HTTP flow. I think the new language makes clear that this is in the Encoding Object in the OpenAPI description.

I also made this its own paragraph since I think this is independent of the note about Content-Transfer-Encoding now being deprecated.

@mikekistler
Copy link
Contributor Author

It appears that my editor did an auto-format. I will fix.

@mikekistler
Copy link
Contributor Author

Okay ... fixed silly whitespace diffs. I had to use VIM to make the change as it seemed nothing could convince VSCode to leave the formatting alone.

@handrews
Copy link
Member

I had to use VIM to make the change as it seemed nothing could convince VSCode to leave the formatting alone.

weird- I wouldn't think VIM would do anything that strange (I do use VIM on Mac OS for all of my editing).

Copy link
Member

@handrews handrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor quibbles trying to find the right wording for this confusing area.

I also made this its own paragraph since I think this is independent of the note about Content-Transfer-Encoding now being deprecated.

Honestly I think this is the most important part of the change. Now that I see it separated, your earlier confusion makes a lot more sense. The way I had it made it look like the deprecation and undefined behavior were related, and they're not related at all.

Using `format: byte` for a multipart field is equivalent to setting `Content-Transfer-Encoding: base64`.
If `format: byte` is used along with setting a different `Content-Transfer-Encoding` value with the `headers` field, the result is undefined.

Using `format: byte` for a multipart field is equivalent to specifying an `encoding` object with a `headers` field containing `Content-Transfer-Encoding: enum: ["base64"]`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need schema and some {} around the Content-Transfer-Encoding: {schema: {enum: ["base64"]}} part (my apologies for leaving them out when we were chatting on IM). And I'd probably either quote all the strings (like JSON) or none of them (like YAML).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay ... I think I fixed that.

If `format: byte` is used along with setting a different `Content-Transfer-Encoding` value with the `headers` field, the result is undefined.

Using `format: byte` for a multipart field is equivalent to specifying an `encoding` object with a `headers` field containing `Content-Transfer-Encoding: enum: ["base64"]`.
If `format: byte` is used for a multipart field that has an encoding object with a `headers` field containing `Content-Transfer-Encoding` with a schema that permits values other than "base64", the result is undefined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is quite right, as if you use format: byte with, say, headers: {Content-Transfer-Encoding: {schema: {enum: [base64, quoted-printable]}}} then it would be kind-of pointless because the only valid option is base64, but it technically wouldn't be a problem. Just confusing. But you can do a lot of things like this in JSON Schema and elsewhere in OpenAPI so we don't want to get into calling out every possible nonsensical combination that the specification syntactically allows.

Here's another attempt, feel free to try something else:

The result of using format: byte and using the Encoding Object's headers field to require a Content-Transfer-Encoding other than base64 is undefined, as this specification does not define a precedence for resolving the conflict.

Another reason this is undefined is that you can't rely on format: byte to actually be checked during JSON Schema validation, so unlike a lot of JSON Schema combinations that don't make sense, you can't rely on an error happening anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't "kind-of pointless" be "undefined"? If it is not undefined, then what exactly does this mean? Does it mean that Content-Transfer-Encoding: quoted-printable is allowed on this field? You seem to say "No", because you say the only valid option is "base64", but that seems to mean that format: byte overrides whatever is specified in the headers of the encoding object. But if that's the case, then why wouldn't that be the case for any schema of a Content-Transfer-Encoding header?

Copy link
Member

@handrews handrews Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikekistler if you set it through headers, you can set Content-Transfer-Encoding to any valid encoding allowed by that header. quoted-printable was the first one that came to mind other than base64.

Since the Header Object describes the valid values for the specific usage with a schema, you can set it to allow multiple values. So a schema of {enum: [base64, quoted-printable]} means that you can do either of those.

If you also set format: byte in the main representation schema, then that is the thing that is equivalent to base64. So now you're putting constraints on the header value in two places. One of those places only allows one value, so the second value in the other place is pointless in this specific usage. However, you might do that if you $ref the Header Object in headers because maybe you use headers a lot and typically allow either encoding, but one particular usage uses format to restrict it to just one.

So "kinda pointless" doesn't mean there's never a valid reason it might happen. And the behavior is completely well-defined. If I tell you "you can use odd numbers" in one place and "you can use even numbers between 0 and 9 inclusive" in another, you know you can use 0, 2, 4, 6, and 8. The interaction made one of these constraints superfluous, but not meaningless.

The problem here is that you're expecting this to make sense. I don't think anyone ever thought through the implications of allowing all of these different ways to do things. The format keyword was not meant for content encoding (which is why it doesn't do that in 3.1). Whether it's format or contentEncoding, JSON Schema was not designed with awareness of any sort of headings, MIME or HTTP. I assume the headers field was added to the Encoding Object for some other reason, and no one thought to consider the overlap with format: byte. Or maybe they did, and they wanted to allow other Content-Transfer-Encoding values and it was just too much troubel to forbid fomat: byte in this specific case. I have no idea.

But this is not going to make sense. It's two completely independent mechanisms that were developed separately that collide, which happens a lot in Parameter/Header/Encoding/Schema Objects. We're just stuck with it in 3.x and this is the best we can do for now.

Copy link
Contributor Author

@mikekistler mikekistler Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is a potential conflict between format: byte and the Content-Transfer-Encoding header, and that this was likely not considered when 3.0 was written. I think we can say what the API definition means when there is no conflict. And when there is conflict, I think we should say that the behavior is undefined. That is what I have tried to do with this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikekistler there are tons of things in JSON Schema and OpenAPI that can be in conflict. None of them are undefined. To call this undefined would introduce a major inconsistency in how we treat descriptions that have conflicting or overlapping elements.

@handrews handrews added clarification requests to clarify, but not change, part of the spec media and encoding Issues regarding media type support and how to encode data (outside of query/path params) labels Jun 25, 2024
@handrews handrews added this to the v3.0.4 milestone Jun 25, 2024
versions/3.0.4.md Outdated Show resolved Hide resolved
If `format: byte` is used along with setting a different `Content-Transfer-Encoding` value with the `headers` field, the result is undefined.

Using `format: byte` for a multipart field is equivalent to specifying an `encoding` object with a `headers` field containing `Content-Transfer-Encoding: { schema: { enum: [base64] } }`.
If `format: byte` is used for a multipart field that has an encoding object with a `headers` field containing `Content-Transfer-Encoding` with a schema that permits values other than "base64", the result is undefined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If `format: byte` is used for a multipart field that has an encoding object with a `headers` field containing `Content-Transfer-Encoding` with a schema that permits values other than "base64", the result is undefined.
If `format: byte` is used for a multipart field that has an encoding object with a `headers` field containing `Content-Transfer-Encoding` with a schema that permits values other than `base64`, the result is undefined.

handrews
handrews previously approved these changes Jun 27, 2024
Copy link
Member

@handrews handrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

miqui
miqui previously approved these changes Jun 27, 2024
@mikekistler mikekistler dismissed stale reviews from miqui and handrews via 37866fb June 27, 2024 19:33
Co-authored-by: Ralf Handl <ralf.handl@sap.com>
@mikekistler mikekistler marked this pull request as draft June 27, 2024 19:41
@mikekistler
Copy link
Contributor Author

Damn VSCode reformatted again! I will fix.

@mikekistler mikekistler marked this pull request as ready for review June 27, 2024 19:45
@mikekistler
Copy link
Contributor Author

Sorry to trouble you to re-review but GitHub dismissed the approvals when I moved the PR to draft temporarily while I fixed the reformatting issue.

Copy link
Member

@handrews handrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks even better now with the latest tweaks!

@ralfhandl ralfhandl requested a review from a team June 28, 2024 08:33
@ralfhandl
Copy link
Contributor

@OAI/tsc One more review please

@miqui miqui merged commit c421081 into OAI:v3.0.4-dev Jun 29, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification requests to clarify, but not change, part of the spec media and encoding Issues regarding media type support and how to encode data (outside of query/path params)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants