Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify: "type": "string", "format": "binary" in non-entity-body #1544

Closed
pbryan opened this issue Apr 13, 2018 · 12 comments
Closed

Clarify: "type": "string", "format": "binary" in non-entity-body #1544

pbryan opened this issue Apr 13, 2018 · 12 comments
Assignees
Labels
clarification requests to clarify, but not change, part of the spec media and encoding Issues regarding media type support and how to encode data (outside of query/path params)
Milestone

Comments

@pbryan
Copy link

pbryan commented Apr 13, 2018

It would be good to clarify how implementations should handle "format": "binary" when the value expressed in a JSON representation (i.e. not encoded directly in the entity-body).

The choices I see:

  1. Interpret as "byte" (i.e. expect it to be base64-encoded).
  2. Prohibit "binary" format in JSON representations.
@handrews
Copy link
Member

In recent versions of JSON Schema, this is handled by "contentMediaType" and "contentEncoding":

https://tools.ietf.org/html/draft-handrews-json-schema-validation-01#section-8

These concepts have been part of JSON Schema since before OpenAPI, but under various names and at times in the Hyper-Schema spec (despite having nothing to do with hyperlinks)

@spacether
Copy link

spacether commented Sep 7, 2022

How do we describe binary data with a non empty schema?
Should it be this?

            type: string
            contentMediaType: image/png
            contentEncoding: binary

@handrews
Copy link
Member

handrews commented Sep 7, 2022

@spacether binary is not a valid contentEncoding value. The encoding keyword is about transferring binary data as non-binary JSON string data. Per the JSON Schema Validation spec:

Possible values indicating base 16, 32, and 64 encodings with several variations are listed in RFC 4648. Additionally, sections 6.7 and 6.8 of RFC 2045 provide encodings used in MIME.

  • RFC 4648 supplies the values base64, base64url, base32, base32hex, base16, hex
  • RFC 2045 supplies the values identity, quoted-printable, and base64

The JSON Schema Validation spec also notes: "As "base64" is defined in both RFCs, the definition from RFC 4648 SHOULD be assumed unless the string is specifically intended for use in a MIME context."

To transfer a binary resource, contentEncoding should be left out. I really need to go clean up that part of the OAS spec. I wrote it for 3.1, and even I find it confusing now.

@handrews
Copy link
Member

handrews commented Sep 7, 2022

@spacether I'm not sure you need a non-empty schema, btw, as the image/png part should be handled by the content type of the request or response. Unless it's part of a multipart response in which case things are more confusing.

A schema for a binary resource definitely should not have "type": "string" in OAS 3.1. In OAS 3.0 and earlier, there was stuff with "type": "string" and format, but that's not how it works in 3.1.

@spacether
Copy link

spacether commented Sep 7, 2022

So I am concerned with defining schemas in a location dependent context in v3.1.0
When empty schema is defined as a value in a key in the content map it means binary is accepted here.

  • Are other types not accepted in this use case? If they are not then empty schema is not doing a good job describing what is allowed here.

When that schema is there under a json key, it means that all json types are accepted there.

Looking at the content map definition one can do this:

paths:
  /fake/uploadDownloadFile:
    post:
      tags:
        - fake
      summary: uploads a file and downloads a file using application/octet-stream
      description: ''
      operationId: uploadDownloadFile
      responses:
        '200':
          description: successful operation
          content:
            application/octet-stream:
              schema:
                $ref: '#/components/schemas/AnyTypeSchema'
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/AnyTypeSchema'
          application/octet-stream:
            schema:
              $ref: '#/components/schemas/AnyTypeSchema'
components:
  schemas:
    AnyTypeSchema: {}
    NotAnyTypeV1:
        not: {}
    NotAnyTypeV2:
        type: []
    NotAnyTypeV3:
        not:
          type:
            - integer
            - number
            - string
            - object
            - array
            - boolean
            - "null"
    BinaryOnlySchema:
        contentMediaType: application/octet-stream
        type: []

Schemas can be $ref to other components.
So when the ref refers to another location, that other location's schema by itself has no knowledge of the location specific context/meaning of empty schema.
It's problematic that the same schema can be used to mean binary is the only data that this stores for application/octet-stream and for application/json it can store str/bool/int/float/dict/list/None.
My goal is to have the schema itself describe that binary is allowed in a BinaryOnlySchema component.
In BinaryOnlySchema if we only exclude all json schema types, then it is equivalent to NotAnyTypeV1/NotAnyTypeV2. Does that work, or should the presence of contentMediaType hint that binary is allowed here?

This lack of clarity makes it unclear how to implement tooling (code generation) for v3.1.0

@handrews
Copy link
Member

handrews commented Sep 7, 2022

It is not location-sensitive. The comment about multipart responses has to do with OpenAPI's Encoding Object, which is an are of considerable complexity outside the scope of JSON Schema.

@spacether
Copy link

spacether commented Sep 7, 2022

The context here is openapi. If one location allows ingestion and transmission of binary using AnyType schema and another location allows ingestion of different data with that same schema definition it looks location or maybe media type key sensitive to me.

@handrews
Copy link
Member

handrews commented Sep 7, 2022

@spacether

If one location allows ingestion and transmission of binary using AnyType schema and another location allows ingestion of different data with that same schema definition it looks location or maybe media type key sensitive to me.

The AnyType schema literally allows everything. There is nothing strange about it being used in different locations that, through other aspects of OpenAPI, further constrain what is allowed. The schema behaves the same everywhere your example uses it.

@spacether
Copy link

spacether commented Sep 7, 2022

Okay then per that logic then binary content can be stored in any empty schema definition. If one does that and and attempts to be send that data as application/json then serialization of that data would fail. Is that what you envision that implementors should do?

@spacether
Copy link

Filed #3024 for discussion at the meeting tomorrow.

@handrews
Copy link
Member

@pbryan I think we should indeed clarify that "format": "binary" only applies to places where actual binary data is valid (e.g. not within application/json). Tagging this for 3.0.4 – not relevant to 3.1.1 because the content* keywords don't have the same problem.

@handrews
Copy link
Member

PRs merged for 3.0.4, with analogous PRs merged for 3.1.1 and 3.2.0 - closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification requests to clarify, but not change, part of the spec media and encoding Issues regarding media type support and how to encode data (outside of query/path params)
Projects
None yet
Development

No branches or pull requests

3 participants