Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Appendix on converting data types to strings (3.0.4) #3840

Merged
merged 5 commits into from
Jun 10, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions versions/3.0.4.md
Original file line number Diff line number Diff line change
Expand Up @@ -1049,6 +1049,7 @@ There are four possible parameter locations specified by the `in` field:

The rules for serialization of the parameter are specified in one of two ways.
Parameter Objects MUST include either a `content` field or a `schema` field, but not both.
See [Appendix B](#dataTypeConversion) for a discussion of converting values of various types to string representations.

###### Common Fixed Fields

Expand Down Expand Up @@ -1618,6 +1619,7 @@ An `encoding` attribute is introduced to give you control over the serialization
#### <a name="encodingObject"></a>Encoding Object

A single encoding definition applied to a single schema property.
See [Appendix B](#dataTypeConversion) for a discussion of converting values of various types to string representations.

##### Fixed Fields
Field Name | Type | Description
Expand Down Expand Up @@ -3700,6 +3702,32 @@ Version | Date | Notes

## <a name="dataTypeConversion"></a>Appendix B: Data Type Conversion

Serializing typed data to plain text, which can occur in `text/plain` message bodies or `multipart` parts, as well as in the `application/x-www-form-urlencoded` format in either URL query strings or message bodies, involves significant implementation- or application-defined behavior.

Schema Objects validate data based on the [JSON Schema data model](https://datatracker.ietf.org/doc/html/draft-wright-json-schema-00#section-4.2), which only recognizes four primitive data types: strings (which are [only broadly interoperable as UTF-8](https://datatracker.ietf.org/doc/html/rfc7159#section-8.1)), numbers, booleans, and `null`.
Notably, integers are not a distinct type from other numbers, with `type: integer` being a convenience defined mathematically, rather than based on the presence or absence of a decimal point in any string representation.

The [Parameter Object](#parameterObject) and [Encoding Object](#encodingObject) offer features to control how to arrange values from array or object types.
They can also be used to control how strings are further encoded to avoid reserved or illegal characters.
However, there is no general-purpose specification for converting schema-validated non-UTF-8 primitive data types (or entire arrays or objects) to strings.

Two cases do offer standards-based guidance:

* [RFC3987 §3.1](https://datatracker.ietf.org/doc/html/rfc3987#section-3.1) provides guidance for converting non-Unicode strings to UTF-8, particularly in the context of URIs (and by extension, the form media types which use the same encoding rules)
* [RFC6570 §2.3](https://www.rfc-editor.org/rfc/rfc6570#section-2.3) specifies which values, including but not limited to `null`, are considered _undefined_ and therefore treated specially in the expansion process when serializing based on that specification

Implementations of RFC6570 often have their own conventions for converting non-string values, but these are implementation-specific and not defined by the RFC itself.
This is one reason for the OpenAPI Specification to leave these conversions as implementation-defined: It allows using RFC6570 implementations regardless of how they choose to perform the conversions.

To control the serialization of numbers, booleans, and `null` (or other values RFC6570 deems to be undefined) more precisely, schemas can be defined as `type: string` and constrained using `pattern`, `enum`, `format`, and other keywords to communicate how applications must pre-convert their data prior to schema validation.
The resulting strings would not require any further type conversion.

The `format` keyword can assist in serialization.
Some formats (such as `date-time` or `byte`) are unambiguous, while others (such as [`decimal`](https://spec.openapis.org/registry/format/decimal.html) in the [Format Registry](https://spec.openapis.org/registry/format/)) are less clear.
However, care must be taken with `format` to ensure that the specific formats are supported by all relevant tools as unrecognized formats are ignored.

Requiring input as pre-formatted, schema-validated strings also improves round-trip interoperability as not all programming languages and environments support the same data types.

## <a name="usingRFC6570Implementations"></a>Appendix C: Using RFC6570 Implementations

Serialization is defined in terms of RFC6570 URI Templates in two scenarios:
Expand Down