Mapped Data Types' Standards as 1st-Class Citizens. #206

francescoloconte · 2024-06-07T10:47:38Z

francescoloconte
Jun 7, 2024

I understand from the Orchestra specifications that DataTypes are used to define the “Value Space”, while MappedDataTypes are used to define the “Lexical Space”. You can find it at this link. My understanding is that the Lexical Space indicates how data is encoded (i.e., represented) when serialized. This is also confirmed by this ChatGPT response.

Assuming that we agreed on the above, currently the Orchestra schema defines the following “Standards” for MappedDataTypes: GPB, ISO11404, JSON, SBE, TAG_VALUE, XML. To me, this indicates that one is able to specify the encoding of DataTypes when using one of these standards. For example, when encoding an Orchestra model to FIXML, one would use the encoding instructions provided by the “XML” MappedDataTypes.

If we are still in agreement so far, my question is: If I am using Orchestra to describe a binary protocol (e.g., LSEG Millenium IT), and I want to include its specific encoding information using MappedDataTypes, which Standard should I use?

In my opinion, I should be able to create a new Standard, for example “MIT” (or whatever name I wish to use), and use this new value in my MappedDataTypes instructions. If I wanted to support yet another encoding for my protocol, e.g. ASN.1, I would create another value for the Standard, perhaps called “ASN1”, and use it in another set of MappedDataTypes instructions. Ultimately, the MappedDataTypes instructions are to be interpreted in the context of the external standard, such as SBE, GPB, etc.

If you agree with my statement above, then it leads that “Standards” should be first-class citizens in Orchestra, and one should be able to create new ones? And if that were the case, I think each standard should also have an attribute called “Endianness”, to specify whether such standard uses Big-endian or Little-endian.

patricklucas · 2024-06-07T12:13:53Z

patricklucas
Jun 7, 2024

I would like to emphasize the distinction between "standard" and "encoding", as Orchestra currently including "JSON" and "XML" as standards could cause confusion.

I see a "standard" as an agreed-upon way to structure Orchestra-specified messages/data using a particular encoding, whereas there could be many disparate standards that use the same encoding.

For example, two different standards that use JSON encoding might encode the same data as:

{
  "MyTimestampField": "2024-06-07T11:59:31+00:00"
}

or

{
  "MyTsFld": 1717761571
}

The FIX JSON Standard is such a standard, and is what I assume is meant in Orchestra when it refers to the JSON standard, but I think it would be good to ensure these concepts are kept distinct, such as calling the standard FIX_JSON.

As far as the ability to define standards in Orchestra itself, my position is that there should be a clear way to define and "configure" a standard, but not necessarily actually include all that many attributes. That is, the "configuration" here could probably simply be an XML extension, where a particular application/implementation can be given hints.

One route here would be:

Standards, as referenced in mapped datatypes in an Orchestra spec, become simple string names rather than an enumeration
By convention, some standard names can be "reserved" to mean a certain thing, but this is not enforced by the XML schema
It is optional to define a standard in a top-level "standards" block (many standards don't need to be configured; implementations know what to do)
Standards MAY be declared in a top-level block, to indicate their existence, document them, and include an extension to configure a particular application/implementation.

For example:

<fixr:repository>
  <fixr:standards>
    <fixr:standard name="MyStandard">
      <fixr:extension>
        <internal:endianness xmlns:internal="https://..." endianness="BIG_ENDIAN"/>
      </fixr:extension>
      <fixr:annotation>
        <fixr:documentation purpose="SYNOPSIS" contentType="text/markdown">This is my standard</fixr:documentation>
      </fixr:annotation>
    </fixr:standard>
  </fixr:standards>

  <fixr:datatypes>
    <fixr:datatype name="MyTimestamp">
      <fixr:mappedDatatype standard="MyStandard" base="string">
        <fixr:extension>
          <internal:timestamp xmlns:internal="https://..." format="yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"/>
        </fixr:extension>
      </fixr:mappedDatatype>
      <fixr:annotation>
        <fixr:documentation purpose="SYNOPSIS" contentType="text/markdown">Some datatype I care about</fixr:documentation>
      </fixr:annotation>
    </fixr:datatype>
  </fixr:datatypes>
</fixr:repository>

0 replies

kleihan · 2024-06-07T13:49:35Z

kleihan
Jun 7, 2024
Maintainer

@patricklucas any reason to use just standards and not datatypeStandards? The former sounds too generic to me.

I am not sure about your comment on JSON above. I believe that the actual JSON standard is meant here. The FIX standard for it is called "Encoding FIX using JSON" and is about the choice of attribute names not about a different JSON syntax. That should be standard JSON.

1 reply

patricklucas Jun 7, 2024

@kleihan I'm not opinionated about the actual XML tags, just wanted to get across the idea.

If it's the case that JSON as a valid value for the standard attribute on a mappedDatatype is really meant to refer to the JSON encoding rather than the FIX JSON standard, then I'll refer to the first part of my comment above—I think it's an important distinction because there are very many ways to encode an Orchestra-specified message using JSON. So, I see "standard" here as meaning an explicit description of how to represent a message in bytes; i.e. pairing an encoding with an opinion of how to represent each particular datatype/structure in that encoding.

Sometimes, the "standard" and encoding come together, such as in a proprietary binary format. Other times, like for JSON/XML/Avro/Protobuf/many more, there are myriad ways a user might choose to use those encodings to represent the same bit of data.

martinswanson · 2024-06-07T14:18:21Z

martinswanson
Jun 7, 2024

Thinking about a real-world example, I noticed that the NASDAQ OUCH specification models optional fields and codeset values at the logical level. This is a case where the details of the binary encoding are leaking into the Orchestra layer.

For example, the field BBOWeightIndicator defines a code value called "space" which means "unspecified" i.e. the field is optional.

<fixr:repository>
  <fixr:datatypeStandards>
    <fixr:standard name="NASDAQ_OUCH">
      <fixr:extension>
        <internal:endianness xmlns:internal="https://..." endianness="BIG_ENDIAN"/>
        <internal:padding xmlns:internal="https://..." padding="left"/>
      </fixr:extension>
      <fixr:annotation>
        <fixr:documentation purpose="SYNOPSIS" contentType="text/markdown">The NASDAQ OUCH protocol is a low-latency, point-to-point binary protocol used by traders to submit and manage orders directly with NASDAQ's trading system.</fixr:documentation>
      </fixr:annotation>
    </fixr:standard>
  </fixr:datatypeStandards>

  <fixr:datatypes>
    <fixr:datatype name="Alpha">
      <fixr:mappedDatatype standard="NASDAQ_OUCH" base="string" pattern="[A-Za-z]*"/>
      <fixr:annotation>
        <fixr:documentation purpose="SYNOPSIS" contentType="text/markdown">Alpha fields may contain upper and lowercase characters. All fixed-width alpha fields are left-justified and padded on the right with spaces.</fixr:documentation>
      </fixr:annotation>
    </fixr:datatype>
  </fixr:datatypes>
</fixr:repository>

Above would mean we no longer need to model the "unspecified" name/value in the codeset definition.

Codeset BBOWeightIndicatorCodeSet type Alpha (3)

Synopsis

Name	Value	Id	Sort	Synopsis
GreaterThanZeroPercentAndLessThanPointTwoPercent	0	3001	1	0-0.2%
GreaterThanPointTwoPercentAndLessThanOnePercent	1	3002	2	0.2%-1%
GreaterThanOnePercentAndLessThanTwoPercent	2	3003	3	1%-2%
GreaterThanTwoPercent	3	3004	4	Greater than 2%
SetsQBBOWhileJoiningNBBO	S	3005	5	Sets the QBBO while joining the NBBO
ImprovesNBBOOnEntry	N	3006	6	Improves the NBBO upon entry

This might cross over to the other discussion on supporting binary formats, but wanted to show how the proposal might solve this problem.

1 reply

mkudukin Jun 7, 2024

I believe the recommended way to use mappedDatatype is to refer to a type from a well-known public standard. This helps the implementation:

Find the best language-specific type to represent the Orchestra datatype internally.
Know how to encode and decode the value.

For a binary encoding like NASDAQ OUCH, you might map it to ISO/IEC 11404, like this:

<datatype name="Alpha">
      <mappedDatatype standard="ISO11404" base="array" element="character" parameter="repertoire=US-ASCII" pattern="[A-Za-z]*"/>
</datatype>

However, ISO11404 doesn't cover everything needed for encoding, such as padding or null values. These can be specified with extra attributes in mappedDatatype, proposed to be added in another discussion.

Could you please elaborate how is the base type string is defined in your example using a custom standard? How would the implementation know what internal type it maps to?

<fixr:mappedDatatype standard="NASDAQ_OUCH" base="string" pattern="[A-Za-z]*"/>

One possible solution is to include a "baseStandard" attribute like this:

<fixr:repository>
  <fixr:datatypeStandards>
    <fixr:standard name="NASDAQ_OUCH" baseStandard="ISO11404">
		...
    </fixr:standard>
  </fixr:datatypeStandards>

donmendelson · 2024-06-07T14:54:33Z

donmendelson
Jun 7, 2024
Maintainer

@francescoloconte, to answer your original question, the XML schema already allows other values than the ones listed in datatypeStandard_enum. See this type:

<xs:simpleType name="datatypeStandard_t">
		<xs:annotation>
			<xs:documentation>Extensible datatype standards</xs:documentation>
		</xs:annotation>
		<xs:union memberTypes="fixr:datatypeStandard_enum xs:string"/>
</xs:simpleType>

The actual value of datatype standard is a union of the explicit enumeration and xs:string. In short, while the enumeration suggests standard values, you can actually enter any string.

0 replies

donmendelson · 2024-06-07T15:05:22Z

donmendelson
Jun 7, 2024
Maintainer

Discussion here seems to be about whether the type system is strictly at the encoding layer or possibly another layer. Also, some encodings such as JSON and XML are supported by schemas. I believe details about protocol layers are better addressed by the Orchestra Interfaces schema--that was its intention--rather than the Repository schema.

0 replies

donmendelson · 2024-06-07T15:17:37Z

donmendelson
Jun 7, 2024
Maintainer

A suggestion about how to specify a standard using an existing enumeration: See IANA Media Types. Many common protocols such as XML and JSON are listed. We have already registered application/sbe and could register other FIX protocols as well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapped Data Types' Standards as 1st-Class Citizens. #206

{{title}}

Replies: 6 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Mapped Data Types' Standards as 1st-Class Citizens. #206

francescoloconte Jun 7, 2024

Replies: 6 comments · 2 replies

patricklucas Jun 7, 2024

kleihan Jun 7, 2024 Maintainer

patricklucas Jun 7, 2024

martinswanson Jun 7, 2024

Codeset BBOWeightIndicatorCodeSet type Alpha (3)

Synopsis

mkudukin Jun 7, 2024

donmendelson Jun 7, 2024 Maintainer

donmendelson Jun 7, 2024 Maintainer

donmendelson Jun 7, 2024 Maintainer

francescoloconte
Jun 7, 2024

Replies: 6 comments 2 replies

patricklucas
Jun 7, 2024

kleihan
Jun 7, 2024
Maintainer

martinswanson
Jun 7, 2024

donmendelson
Jun 7, 2024
Maintainer

donmendelson
Jun 7, 2024
Maintainer

donmendelson
Jun 7, 2024
Maintainer