Element Identifiers (names, ids, etc.) [Source: Old Email Conversation] #213

francescoloconte · 2024-07-29T02:50:49Z

francescoloconte
Jul 29, 2024

[This thread is reported here from an old email conversation]

Francesco Lo Conte - Sep 25, 2022

I think the standard has reached a point where manual editing of the XML spec is no longer an option. If we agree that going forward only tools will be used to handle the specifications, I would like to suggest that we switch from identifying elements by name or by id (tag), and instead we switch to UUID (which will be generated by tools and be invisible to users). I will add a note to the draft to this effect.

francescoloconte · 2024-07-29T02:52:05Z

francescoloconte
Jul 29, 2024
Author

[This thread is reported here from an old email conversation]

Donald Mendelson - Sep 26, 2022

Francesco,

Good comments; there's plenty to discuss. I just want to quickly answer your last bullet point.

We had a discussion about this early in the development of Orchestra that perhaps we should revisit. A member of the working group proposed that every element should be assigned an identifier with these characteristics:

Globally unique: an element's identifier would be unique across all Orchestra users and versions.
Persistent: all representations of an Orchestra file would use the same ID for an element forever, across all representations. Now we are using XML encoding, but we imagined others, such as OWL/RDF.

UUID would be one way to accomplish this. UUID can either be randomly assigned (statistically unique) or can be generated as a digest of some value. One advantage of random UUID is that it requires no registrar or assignment of namespaces.

The proposal was to use Object ID (OID), a joint ISO/ITU standard that meets the requirements. Like URL, it depends on a registrar that allocate high-level namespaces. Thus, each Orchestra user would have a high level namespace, and they could assign their own element IDs within their namespace. An OID can be represented as either a URI or a dotted decimal format.

We actually had an attribute for OID in the Orchestra schema in one RC. However, we dropped it since we did not have consensus on a plan. We dropped back to just having an integer ID for every element; we already had such a tag for fields, and FIX repository always had a numeric ID for messages and components.

Looking forward to your thoughts on this. Perhaps we want to revive this effort for Orchestra v1.1.

Don

0 replies

francescoloconte · 2024-07-29T02:54:14Z

francescoloconte
Jul 29, 2024
Author

[This thread is reported here from an old email conversation]

Francesco Lo Conte - Feb 8, 2024

Hi Hanno,

I wanted to bring up again the conversation below that we were having with Don back in 2022, about unique identifiers in Orchestra.

Recently, as part of another conversation, I noticed the plan to support spaces in Orchestra elements’ names. I checked whether names are used as IDs, and they are. Currently, elements in Orchestra are identified by their ID + Name + Scenario. This has 2 shortcomings in my opinion:

The “Name” can contains spaces and other characters that are usually not allowed in identifiers. In software development languages, for example, variables cannot contact spaces, to avoid basic mistakes like “Var 1” being confused with “Var 1” (with 2 spaces).
The other issue is that in Orchestra, elements’ IDs are only unique within elements of the same type, i.e. there can be a field with ID = 100 and a CodeSet with ID = 100. If both the Field and the CodeSet have the same name too (which they can), e.g., “Account”, their identifiers will be the same and indistinguishable from each other:
- Field identifier: 100 + Account + base
- CodeSet identifier: 100 + Account + base

I think we should make sure elements are unique within a single Orchestra file. I would not go as far as suggesting that elements should be unique across different Orchestra files or organizations. Just within he same Orchestra file. My suggestions are to either use UUID, since they can be generated locally (i.e., without a registrar) and easily, or change the identifier to be “ID + element_type + scenario”, such that the case above cases would be:

Field identifier: 100 + Field + base
CodeSet identifier: 100 + CodeSet + base

What do you think?

I appreciate this is not a simple question but I think it is important to improve it, to avoid significant confusion as adoption of the standard increases and as the size of the Orchestra files increases too.

0 replies

francescoloconte · 2024-07-29T02:55:05Z

francescoloconte
Jul 29, 2024
Author

[This thread is reported here from an old email conversation]

Donald Mendelson - Feb 8, 2024

Francesco,

A concatenated key for each message element could be encapsulated as UUID type 3 or type 5. In the XML schema, it would have the advantage of having a single field as a key reference. It would be analogous to an identity column in a database. In fact, the UUID could replace the assignment of numeric tags to messages and components if it were based on the element name, which doesn't change. You would no longer need to keep track of the next number in sequence to assign to the next new message or component.

However, it would have the disadvantage of requiring a smart editor to generate the unique keys. You couldn't just type an Orchestra file in a plain editor or even an XML editor without help. That might be acceptable to those with sophisticated systems for editing Orchestra files, but would be a burden to other users.

Don

0 replies

francescoloconte · 2024-07-29T02:56:47Z

francescoloconte
Jul 29, 2024
Author

[This thread is reported here from an old email conversation]

Francesco Lo Conte - Feb 8, 2024

Thank you, Don.

And the concatenated key would also need to contain the element type, right? To avoid the issue I pointed out below. Correct?

The key would be something like: type + name + scenario

An example below.

0 replies

francescoloconte · 2024-07-29T02:58:11Z

francescoloconte
Jul 29, 2024
Author

[This thread is reported here from an old email conversation]

Lisa Taikitsadaporn - Feb 8, 2024

With my BA hat on, if UUID is used it MUST NOT be the only way to reference the object because it is not humanly readable. For example, to manage scenarios we need to see human readable scenario identifiers to understand what we're working with and to manage the scenarios.

The other danger of a UUID is the perception that it could be globally unique.

Last point: if a scenario name is changed (because it could) is the expectation that the UUID is going to also need to change?

I'm just going to play devil's advocate ;-)
Lisa

0 replies

francescoloconte · 2024-07-29T02:58:48Z

francescoloconte
Jul 29, 2024
Author

[This thread is reported here from an old email conversation]

Francesco Lo Conte - Feb 8, 2024

Thank you, Lisa, for sharing your point of view.

Yours are all valid points.

As I mentioned in my previous email, my suggestion is to try and fix the fact that current element identifiers are not unique, and share my disagreement with spaces inside elements’ names because the name is part of the ID. I see no value in allowing spaces but many potential issues.

0 replies

francescoloconte · 2024-07-29T02:59:15Z

francescoloconte
Jul 29, 2024
Author

[This thread is reported here from an old email conversation]

Lisa Taikitsadaporn - Feb 8, 2024

I think I missed something... what was the rationale for having a space in an element name?

1 reply

francescoloconte Jul 29, 2024
Author

[This thread is reported here from an old email conversation]

Donald Mendelson - Feb 9, 2024

It was specifically requested. See #193

francescoloconte · 2024-07-29T03:01:49Z

francescoloconte
Jul 29, 2024
Author

[This thread is reported here from an old email conversation]

Francesco Lo Conte - Feb 9, 2024

Thank you, Don.

We identified this restriction when using Orchestra for binary protocols. Here is a link to the technical proposal.

As from our GitHub comment, “for a Message, both msgType and name are used as unique identifiers. This is the reason for the restrictions on Name_t. Would it make sense to have msgType as the only identifier for a message, and relax the name attribute to allow what would be a descriptive name for a message”.

If the restriction on the message name is relaxed to include spaces (as from our request), then then element ID (oiGrp) should not use it as one of its constituents. In the GitHub ticket we suggest: “Would it make sense to have msgType as the only identifier for a message, and relax the name attribute to allow what would be a descriptive name for a message?”.

For additional context, this is from Orchestra v1.1:

“Naming rules
Since Orchestra supports both FIX and non-FIX protocols, naming rules are relaxed in the XML schema. FIX and other style rules should be enforced by other means, such as a validator application. The only restriction is that names are of XML schema datatype "token", which trims leading and trailing spaces and disallows some non-printable characters like line feeds and carriage returns. Tokens are limited to 64 characters.”

And this it the XSD in question:

1 reply

mkudukin Jul 30, 2024

This issue was moved from the Binary Protocols Support Proposal to the separate issue #193.

francescoloconte · 2024-07-29T03:02:57Z

francescoloconte
Jul 29, 2024
Author

[This thread is reported here from an old email conversation]

Hanno Klein - Feb 9, 2024

Hi All,

let me first share my view on spaces in names. I absolutely agree that this may cause various problem if a name is used for anything other than display purposes. FIX has a convention not to use spaces in the names of messages, groups, components, fields, code sets, and codes. I would also discourage anybody from using spaces in scenario names. Spaces should be limited to annotation elements.

I did some digging and the actual change goes back to #118 where Yuval Cohen requested a relaxation of Name_t to be able to include “.” in the names. We went “all the way” and changed it from a string restricted by the pattern “([A-Z]|[a-z])([0-9]|[A-Z]|[a-z]|_)*” to xs:token, which not only allows punctuation characters but also single spaces in between words. I think Francesco’s feedback from a practitioner’s viewpoint is very valid and that we went too far. That is exactly the reason why we have Release Candidates to get people to start implementing and detecting serious issues with the design. Orchestra v1.1 RC2 does not have to be backward compatible with RC1.

There is a datatype xs:NMTOKEN (https://www.oreilly.com/library/view/xml-schema/0596002521/re84.html) that does not allow spaces. It does not require the XML processor to remove line feeds, carriage returns, tabs, leading and trailing spaces, and multiple spaces from the Orchestra XML file. The downside is that it exists in XML Schema only for compatibility with DTDs, the predecessor of XML Schema.

I would hence prefer to keep xs:token and add a pattern to restrict it to what xs:NMTOKEN allows. @don, do you have a view on the options?

The second topic (uniqueness) is more complex and best discussed through calls. The current definition says that the scope of identifier uniqueness is within the same element type and requires additional attributes such as a scenario name in Orchestra v1.0. I understand that you are looking for a simplification of uniqueness to ease technical implementations of Orchestra. Just a general comment that I think it should be possible to change the name of an element but not the ID. I am open to dropping names from uniqueness requirements in Orchestra. FIX would still be more restrictive and never define two messages/groups/components/fields/code sets/codes/scenarios with the same name but different IDs.

One comment on the current process. We still have a backlog of RC1 issues deferred to RC2 and the new concept for scenarios. On top of that we have multiple new issues from Francesco and myself. At least I do not have the bandwidth to work on so many issues in parallel with the proper focus they deserve. I do not mind discussing selected issues in a smaller group via email or GitHub to shape an opinion but they all need to end up in one of the working groups of the Orchestra Subcommittee. Complex issues such as the scenario enhancements require additional deliverables for the working group.

It would be good to have a list of items for discussion by the working group so that we have something like a standing agenda and avoid moving from one issue to another without resolving any. Only the resolutions can be applied to the RC2 XML schema and spec.

Regards,
Hanno.

0 replies

martinswanson · 2024-07-29T12:05:29Z

martinswanson
Jul 29, 2024

Regarding the discussion on id schemes in Orchestra, I agree with Lisa that this needs to be human-readable, and to support the notion of semantic equivalence across base and derived specifications (e.g. where you are using a reference spec to maintain customisations). Is it worth considering how namespace are defined in existing specifications (like RDF/OWL) and following the same approach?

As well as thinking about how we define a globally unique namespace for Orchestra elements, we also need to think about how Orchestra ids map to ids used in the encoding layer.

Currently, it is assumed that the Orchestra id maps 1:1 to the tag number used in tag-value encodings (at least for FIX). Recently, I realised this is not the case for other tag-value encodings. For example, the TagWire encoding used by JSE (see Volume PT02 – Post-trade EMAPI Clearing) uses tag values that are not globally unique, but rather unique within the context of a message / component.

Intuitively, it makes sense that the Orchestra ids are separate conceptually from the tags/ids used in the encoding layer, not least because some encodings use names rather than ids, and schema-less encodings have neither. it also suggests that Tablature should not see the "Id" with "Tag" table column names as equivalent, but rather as distinct concepts (we could have a convention to equate them for certain encodings like FIX tag-value, if appropriate).

I think there is a similar requirement for names. For example, JSON encodings use a lowerCamelCase naming convention, so there needs to be a way to map the Orchestra element names to those used in different encodings (abbreviated names for XML encodings are another example of this).

Summary of Requirements

Globally unique namespaces for Orchestra elements across a base specification and its derivatives
Mapping of Orchestra ids to encoding-level ids
Mapping of Orchestra names to encoding-level names

0 replies

mkudukin · 2024-07-30T13:39:06Z

mkudukin
Jul 30, 2024

I want to suggest another option of having unique human-readable identifiers in Orchestra by switching to alphanumeric identifiers. This would give users the freedom to choose almost any type of unique identifier:

Users who prefer sophisticated tools can still use integers or switch to UUIDs.
Users who prefer simplicity, like using plain XML editors, can choose unique short names, like in OpenAPI. This would offer excellent readability for both humans and machines.

Users could mix various identifier types to include business value or encoding-specific information. For example, in FIX, the id could represent the message type for messages, the tag for fields (like it is now), or a short name for code sets or scenarios. However, this doesn't seem to be a good practice. I agree with Martin that Orchestra identifiers should ideally be independent from the encoding.

This approach could help address some issues we've encountered:

The name attribute wouldn't need to be unique and restricted and could serve as an element's display name ([repository schema] Some protocols's message names are not supported by Name_t restrictions. #193).
There would be no need for the name and scenario (scenario name) attributes in references, as the id and scenarioId would be descriptive enough.
- Scenario inheritance and codeSet/datatype scenario reference proposals introduce new *ScenarioId attributes. Then there should be their *Scenario(name) counterparts to align with what we have now in Orchestra. These convenience attributes wouldn't be needed if the identifiers were convenient by themselves.

Example (FIX)

<fixr:datatype id="char"/>

<fixr:codeSet id="HandlInstCodeSet" type="char"/>

<fixr:field id="21" name="HandlInst" type="HandlInstCodeSet"/>

<fixr:group id="ListOrdGrp" category="ProgramTrading">
            <fixr:numInGroup id="73"/>
            <fixr:fieldRef id="21"/>
</fixr:group>

<fixr:message id="E" name="NewOrderList" category="ProgramTrading">
            <fixr:structure>
                        <fixr:groupRef id="ListOrdGrp"/>
            </fixr:structure>
</fixr:message>

Example (other protocol)

<fixr:component id="MessageHeader" name="Message Header">
	<fixr:fieldRef id="MessageLength"/>
	<fixr:fieldRef id="MessageType"/>
	<!-- ... -->
</fixr:component>

<fixr:component id="NewOrderTransaction" name="New Order Message Transaction">
	<!-- ... -->
</fixr:component>

<fixr:message category="OrderEntryMessages" id="1111" name="New Order">
	<fixr:structure>
		<fixr:componentRef id="MessageHeader"/>
		<fixr:componentRef id="NewOrderTransaction"/>
	</fixr:structure>
</fixr:message>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Element Identifiers (names, ids, etc.) [Source: Old Email Conversation] #213

{{title}}

Replies: 11 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Element Identifiers (names, ids, etc.) [Source: Old Email Conversation] #213

francescoloconte Jul 29, 2024

Replies: 11 comments · 2 replies

francescoloconte Jul 29, 2024 Author

francescoloconte Jul 29, 2024 Author

francescoloconte Jul 29, 2024 Author

francescoloconte Jul 29, 2024 Author

francescoloconte Jul 29, 2024 Author

francescoloconte Jul 29, 2024 Author

francescoloconte Jul 29, 2024 Author

francescoloconte Jul 29, 2024 Author

francescoloconte Jul 29, 2024 Author

mkudukin Jul 30, 2024

francescoloconte Jul 29, 2024 Author

martinswanson Jul 29, 2024

mkudukin Jul 30, 2024

francescoloconte
Jul 29, 2024

Replies: 11 comments 2 replies

francescoloconte
Jul 29, 2024
Author

francescoloconte
Jul 29, 2024
Author

francescoloconte
Jul 29, 2024
Author

francescoloconte
Jul 29, 2024
Author

francescoloconte
Jul 29, 2024
Author

francescoloconte
Jul 29, 2024
Author

francescoloconte
Jul 29, 2024
Author

francescoloconte Jul 29, 2024
Author

francescoloconte
Jul 29, 2024
Author

francescoloconte
Jul 29, 2024
Author

martinswanson
Jul 29, 2024

mkudukin
Jul 30, 2024