Skip to content

Commit

Permalink
Appendix on converting data types to strings
Browse files Browse the repository at this point in the history
It's very unclear how numbers, booleans, and other non-UTF-8-string
values are converted to strings, particularly for the form media types.
This adds a brief appendix that acknowledges the lack of standardization,
and points to resources for the few cases that do have specifications.

It highlights concerns with relying on certain JSON Schema keywords
or values for serialization, and suggests defining schemas of
type string and requiring applications to perform the conversion
prior to schema validation as a way to control the results.

This also clarifies that schema validation occurs before serialization.
  • Loading branch information
handrews committed May 22, 2024
1 parent c069212 commit 12d8dcc
Showing 1 changed file with 27 additions and 0 deletions.
27 changes: 27 additions & 0 deletions versions/3.0.4.md
Original file line number Diff line number Diff line change
Expand Up @@ -1042,6 +1042,7 @@ There are four possible parameter locations specified by the `in` field:

The rules for serialization of the parameter are specified in one of two ways.
Parameter Objects MUST include either a `content` field or a `schema` field, but not both.
See [Appendix C](#dataTypeConversion) for a discussion of converting values of various types to string representations.

###### Common Fixed Fields

Expand Down Expand Up @@ -1607,6 +1608,7 @@ An `encoding` attribute is introduced to give you control over the serialization
#### <a name="encodingObject"></a>Encoding Object

A single encoding definition applied to a single schema property.
See [Appendix C](#dataTypeConversion) for a discussion of converting values of various types to string representations.

##### Fixed Fields
Field Name | Type | Description
Expand Down Expand Up @@ -3505,3 +3507,28 @@ Version | Date | Notes
1.2 | 2014-03-14 | Initial release of the formal document.
1.1 | 2012-08-22 | Release of Swagger 1.1
1.0 | 2011-08-10 | First release of the Swagger Specification

## <a name="dataTypeConversion"></a>Appendix C: Data Type Conversion

Serializing typed data to plain text, which can occur in `text/plain` message bodies or `multipart` parts, as well as in the `application/x-www-form-urlencoded` format in either URL query strings or message bodies, involves significant implementation- or application-defined behavior.

Schema Objects validate data based on the [JSON Schema data model](https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema-00#section-4.2.1), which only recognizes four primitive data types: strings (which are UTF-8 except in [extremely limited circumstances](https://datatracker.ietf.org/doc/html/rfc8259#section-8.1)), numbers, booleans, and `null`.
Notably, integers are not a distinct type from other numbers, with `type: integer` being a convenience defined mathematically, rather than based on the presence or absence of a decimal point in any string representation.

The Parameter and Encoding Objects offer features to control how to arrange values from array or object types.
They can also be used to control how strings are further encoded to avoid reserved or illegal characters.
However, there is no general-purpose specification for converting schema-validated non-UTF-8 primitive data types (or entire arrays or objects) to strings.

Two cases do offer standards-based guidance:

* [RFC3987 §3.1](https://datatracker.ietf.org/doc/html/rfc3987#section-3.1) provides guidance for converting non-Unicode strings to UTF-8, particularly in the context of URIs (and by extension, the form media types which use the same encoding rules)
* [RFC6570 §2.3](https://www.rfc-editor.org/rfc/rfc6570#section-2.3) specifies which values, including but not limited to `null`, are considered _undefined_ and therefore treated specially in the expansion process when serializing based on that specification

To control the serialization of numbers, booleans, and `null` (or other values RFC6570 deems to be undefined) more precisely, schemas can be defined as `type: string` and constrained using `pattern`, `enum`, `format`, and other keywords to communicated how applications must pre-convert their data prior to schema validation.
The resulting strings would not require any further type conversion.

The `format` keyword can assist in serialization.
Some formats (such as `date-time` or `byte`) are unambiguous, while others (such as [`decimal`](https://spec.openapis.org/registry/format/decimal.html) in the [Format Registry](https://spec.openapis.org/registry/format/)) are less clear.
However, care must be taken with `format` to ensure that the specific formats are supported by all relevant tools as unrecognized formats are ignored.

Requiring input as pre-formatted, schema-validated strings also improves round-trip interoperability as not all programming languages and environments support the same data types.

0 comments on commit 12d8dcc

Please sign in to comment.