From e62f218e9719356e6b27b8dbb4696d59ad7d8f54 Mon Sep 17 00:00:00 2001 From: "Henry H. Andrews" Date: Wed, 22 Mar 2023 15:51:19 -0700 Subject: [PATCH 1/3] Clarify how to model binary data in 3.1 This reorganizes binary data-related guidance into a "Working With Binary Data" section, as has already been done in 3.0.4. This includes more detailed guidance on when various approaches to binary data make sense (e.g. you cannot stuff raw binary into JSON no matter what you put in your Schema Object, and while you can base64-encode entire message bodies, it takes up a lot more space for no clear benefit). Also note that only `multipart` media types with named parts are supported, as they are modeled as an object. --- versions/3.1.1.md | 62 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 59 insertions(+), 3 deletions(-) diff --git a/versions/3.1.1.md b/versions/3.1.1.md index e4423df5b2..c87b564bb2 100644 --- a/versions/3.1.1.md +++ b/versions/3.1.1.md @@ -159,6 +159,40 @@ The formats defined by the OAS are: `number` | `double` | | `string` | `password` | A hint to obscure the value. +#### Working With Binary Data + +The OAS can describe either _raw_ or _encoded_ binary data. + +* **raw binary** is used where unencoded binary data is allowed, such as when sending a binary payload as the entire HTTP message body, or as part of a `multipart/*` payload that allows binary parts +* **encoded binary** is used where binary data is embedded in a text-only format such as `application/json` or `application/x-www-form-urlencoded` (either as a message body or in the URL query string). + +In the following table showing how to use Schema Object keywords for binary data, we use `image/png` as an example binary media type. Any binary media type, including `application/octet-stream`, is sufficient to indicate binary content. + +Keyword | Raw | Encoded | Comments +------- | --- | ------- | -------- +`type` | _omit_ | `string` | raw binary is [outside of `type`](https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-00#section-4.2.3) +`contentMediaType` | `image/png` | `image/png` | can sometimes be omitted if redundant (see below) +`contentEncoding` | _omit_ | `base64` or `base64url` | other encodings are [allowed](https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-validation-00#section-8.3) + +Note that the encoding indicated by `contentEncoding`, which inflates the size of data in order to represent it as 7-bit ASCII text, is unrelated to HTTP's `Content-Encoding` header, which indicates whether and how a message body has been compressed and is applied after all content serialization described in this section has occurred. + +Using a `contentEncoding` of `base64url` ensures that URL encoding (as required in the query string and in message bodies of type `application/x-www-form-urlencoded`) does not need to further encode any part of the already-encoded binary data. + +The `contentMediaType` keyword is redundant if the media type is already set: + +* as the key for a [`MediaType Object`](#mediaTypeObject) +* in the `contentType` field of an [`Encoding Object`](#encodingObject) + +If the Schema Object will be processed by a non-OAS-aware JSON Schema implementation, it may be useful to include `contentMediaType` even if it is redundant. However, if `contentMediaType` contradicts a relevant Media Type Object or Encoding Object, then `contentMediaType` SHALL be ignored. + +The following table shows how to migrate from OAS 3.0 binary data descriptions, continuing to use `image/png` as the example binary media type: + +OAS < 3.1 | OAS 3.1 | Comments +--------- | ------- | -------- +`type: string`
`format: binary` | `contentMediaType: image/png` | if redundant, can be omitted, often resulting in an empty Schema Object +`type: string`
`format: byte` | `type: string`
`contentMediaType: image/png`
`contentEncoding: base64` | note that `base64url` can be used to avoid re-encoding the base64 string to be URL-safe + + ### Rich Text Formatting Throughout the specification `description` fields are noted as supporting CommonMark markdown formatting. Where OpenAPI tooling renders rich text it MUST support, at a minimum, markdown syntax as described by [CommonMark 0.27](https://spec.commonmark.org/0.27/). Tooling MAY choose to ignore some CommonMark features to address security concerns. @@ -1447,9 +1481,7 @@ application/json: In contrast with the 2.0 specification, `file` input/output content in OpenAPI is described with the same semantics as any other schema type. -In contrast with the 3.0 specification, the `format` keyword has no effect on the content-encoding of the schema. JSON Schema offers a `contentEncoding` keyword, which may be used to specify the `Content-Encoding` for the schema. The `contentEncoding` keyword supports all encodings defined in [RFC4648](https://tools.ietf.org/html/rfc4648), including "base64" and "base64url", as well as "quoted-printable" from [RFC2045](https://tools.ietf.org/html/rfc2045#section-6.7). The encoding specified by the `contentEncoding` keyword is independent of an encoding specified by the `Content-Type` header in the request or response or metadata of a multipart body -- when both are present, the encoding specified in the `contentEncoding` is applied first and then the encoding specified in the `Content-Type` header. - -JSON Schema also offers a `contentMediaType` keyword. However, when the media type is already specified by the Media Type Object's key, or by the `contentType` field of an [Encoding Object](#encodingObject), the `contentMediaType` keyword SHALL be ignored if present. +In contrast with the 3.0 specification, the `format` keyword has no effect on the content-encoding of the schema. Instead, JSON Schema's `contentEncoding` and `contentMediaType` keywords are used. See [Working With Binary Data](#binaryData) for how to model various scenarios with these keywords, and how to migrate from the previous `format` usage. Examples: @@ -1556,6 +1588,8 @@ When passing in `multipart` types, boundaries MAY be used to separate sections o Per the JSON Schema specification, `contentMediaType` without `contentEncoding` present is treated as if `contentEncoding: identity` were present. While useful for embedding text documents such as `text/html` into JSON strings, it is not useful for a `multipart/form-data` part, as it just causes the document to be treated as `text/plain` instead of its actual media type. Use the Encoding Object without `contentMediaType` if no `contentEncoding` is required. +Note that only `multipart/*` media types with named parts can be described as shown here. Note also that while `multipart/form-data` originally defined a per-part `Content-Transfer-Encoding` header that could indicate base64 encoding (`contentEncoding: base64`), it has been deprecated for use with HTTP as of [RFC7578](https://www.rfc-editor.org/rfc/rfc7578#section-4.7). + Examples: ```yaml @@ -1609,6 +1643,8 @@ This object MAY be extended with [Specification Extensions](#specificationExtens ##### Encoding Object Example +`multipart/form-data` allows for binary parts: + ```yaml requestBody: content: @@ -1644,6 +1680,26 @@ requestBody: type: integer ``` +`application/x-www-form-urlencoded` is a text format, which requires base64-encoding any binary data: + +```YAML +requestBody: + content: + application/x-www-form-urlencoded: + schema: + type: object + properties: + name: + type: string + icon: + # default is text/plain, need to declare an image type only! + type: string + format: byte + encoding: + icon: + contentType: image/png, image/jpeg +``` + #### Responses Object A container for the expected responses of an operation. From def8f41d9f919491fcb8bbefb595fb15b149e0bd Mon Sep 17 00:00:00 2001 From: "Henry H. Andrews" Date: Sat, 27 Apr 2024 10:41:23 -0700 Subject: [PATCH 2/3] Explain lack of HTTP header for base64 encoding Also, remove the example that goes against the advice in the updated binary-handling section. --- versions/3.1.1.md | 15 +-------------- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/versions/3.1.1.md b/versions/3.1.1.md index c87b564bb2..18ec7ddedd 100644 --- a/versions/3.1.1.md +++ b/versions/3.1.1.md @@ -174,7 +174,7 @@ Keyword | Raw | Encoded | Comments `contentMediaType` | `image/png` | `image/png` | can sometimes be omitted if redundant (see below) `contentEncoding` | _omit_ | `base64` or `base64url` | other encodings are [allowed](https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-validation-00#section-8.3) -Note that the encoding indicated by `contentEncoding`, which inflates the size of data in order to represent it as 7-bit ASCII text, is unrelated to HTTP's `Content-Encoding` header, which indicates whether and how a message body has been compressed and is applied after all content serialization described in this section has occurred. +Note that the encoding indicated by `contentEncoding`, which inflates the size of data in order to represent it as 7-bit ASCII text, is unrelated to HTTP's `Content-Encoding` header, which indicates whether and how a message body has been compressed and is applied after all content serialization described in this section has occurred. Since HTTP allows unencoded binary message bodies, there is no standardized HTTP header for indicating base64 or similar encoding of an entire message body. Using a `contentEncoding` of `base64url` ensures that URL encoding (as required in the query string and in message bodies of type `application/x-www-form-urlencoded`) does not need to further encode any part of the already-encoded binary data. @@ -1499,19 +1499,6 @@ content: application/octet-stream: {} ``` -Binary content transferred with base64 encoding: - -```yaml -content: - image/png: - schema: - type: string - contentMediaType: image/png - contentEncoding: base64 -``` - -Note that the `Content-Type` remains `image/png`, describing the semantics of the payload. The JSON Schema `type` and `contentEncoding` fields explain that the payload is transferred as text. The JSON Schema `contentMediaType` is technically redundant, but can be used by JSON Schema tools that may not be aware of the OpenAPI context. - These examples apply to either input payloads of file uploads or response payloads. A `requestBody` for submitting a file in a `POST` operation may look like the following example: From 8de5a93d47361d0043f31c141bb95a93f1503b83 Mon Sep 17 00:00:00 2001 From: "Henry H. Andrews" Date: Sat, 27 Apr 2024 11:01:43 -0700 Subject: [PATCH 3/3] Fix stray "format: byte" to use contentEncoding --- versions/3.1.1.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/versions/3.1.1.md b/versions/3.1.1.md index 18ec7ddedd..f55aa34356 100644 --- a/versions/3.1.1.md +++ b/versions/3.1.1.md @@ -1679,9 +1679,10 @@ requestBody: name: type: string icon: - # default is text/plain, need to declare an image type only! + # default for type string is text/plain, need to declare + # the appropriate contentType in the Encoding Object type: string - format: byte + contentEncoding: base64url encoding: icon: contentType: image/png, image/jpeg