Element Identifiers (names, ids, etc.) [Source: Old Email Conversation] #213
Replies: 11 comments 2 replies
-
[This thread is reported here from an old email conversation] Donald Mendelson - Sep 26, 2022 Francesco, Good comments; there's plenty to discuss. I just want to quickly answer your last bullet point. We had a discussion about this early in the development of Orchestra that perhaps we should revisit. A member of the working group proposed that every element should be assigned an identifier with these characteristics:
UUID would be one way to accomplish this. UUID can either be randomly assigned (statistically unique) or can be generated as a digest of some value. One advantage of random UUID is that it requires no registrar or assignment of namespaces. The proposal was to use Object ID (OID), a joint ISO/ITU standard that meets the requirements. Like URL, it depends on a registrar that allocate high-level namespaces. Thus, each Orchestra user would have a high level namespace, and they could assign their own element IDs within their namespace. An OID can be represented as either a URI or a dotted decimal format. We actually had an attribute for OID in the Orchestra schema in one RC. However, we dropped it since we did not have consensus on a plan. We dropped back to just having an integer ID for every element; we already had such a tag for fields, and FIX repository always had a numeric ID for messages and components. Looking forward to your thoughts on this. Perhaps we want to revive this effort for Orchestra v1.1. Don |
Beta Was this translation helpful? Give feedback.
-
[This thread is reported here from an old email conversation] Francesco Lo Conte - Feb 8, 2024 Hi Hanno, I wanted to bring up again the conversation below that we were having with Don back in 2022, about unique identifiers in Orchestra. Recently, as part of another conversation, I noticed the plan to support spaces in Orchestra elements’ names. I checked whether names are used as IDs, and they are. Currently, elements in Orchestra are identified by their ID + Name + Scenario. This has 2 shortcomings in my opinion:
I think we should make sure elements are unique within a single Orchestra file. I would not go as far as suggesting that elements should be unique across different Orchestra files or organizations. Just within he same Orchestra file. My suggestions are to either use UUID, since they can be generated locally (i.e., without a registrar) and easily, or change the identifier to be “ID + element_type + scenario”, such that the case above cases would be:
What do you think? I appreciate this is not a simple question but I think it is important to improve it, to avoid significant confusion as adoption of the standard increases and as the size of the Orchestra files increases too. |
Beta Was this translation helpful? Give feedback.
-
[This thread is reported here from an old email conversation] Donald Mendelson - Feb 8, 2024 Francesco, A concatenated key for each message element could be encapsulated as UUID type 3 or type 5. In the XML schema, it would have the advantage of having a single field as a key reference. It would be analogous to an identity column in a database. In fact, the UUID could replace the assignment of numeric tags to messages and components if it were based on the element name, which doesn't change. You would no longer need to keep track of the next number in sequence to assign to the next new message or component. However, it would have the disadvantage of requiring a smart editor to generate the unique keys. You couldn't just type an Orchestra file in a plain editor or even an XML editor without help. That might be acceptable to those with sophisticated systems for editing Orchestra files, but would be a burden to other users. Don |
Beta Was this translation helpful? Give feedback.
-
[This thread is reported here from an old email conversation] Francesco Lo Conte - Feb 8, 2024 Thank you, Don. And the concatenated key would also need to contain the element type, right? To avoid the issue I pointed out below. Correct? The key would be something like: type + name + scenario An example below. |
Beta Was this translation helpful? Give feedback.
-
[This thread is reported here from an old email conversation] Lisa Taikitsadaporn - Feb 8, 2024 With my BA hat on, if UUID is used it MUST NOT be the only way to reference the object because it is not humanly readable. For example, to manage scenarios we need to see human readable scenario identifiers to understand what we're working with and to manage the scenarios. The other danger of a UUID is the perception that it could be globally unique. Last point: if a scenario name is changed (because it could) is the expectation that the UUID is going to also need to change? I'm just going to play devil's advocate ;-) |
Beta Was this translation helpful? Give feedback.
-
[This thread is reported here from an old email conversation] Francesco Lo Conte - Feb 8, 2024 Thank you, Lisa, for sharing your point of view. Yours are all valid points. As I mentioned in my previous email, my suggestion is to try and fix the fact that current element identifiers are not unique, and share my disagreement with spaces inside elements’ names because the name is part of the ID. I see no value in allowing spaces but many potential issues. |
Beta Was this translation helpful? Give feedback.
-
[This thread is reported here from an old email conversation] Lisa Taikitsadaporn - Feb 8, 2024 I think I missed something... what was the rationale for having a space in an element name? |
Beta Was this translation helpful? Give feedback.
-
[This thread is reported here from an old email conversation] Francesco Lo Conte - Feb 9, 2024 Thank you, Don. We identified this restriction when using Orchestra for binary protocols. Here is a link to the technical proposal. As from our GitHub comment, “for a Message, both msgType and name are used as unique identifiers. This is the reason for the restrictions on Name_t. Would it make sense to have msgType as the only identifier for a message, and relax the name attribute to allow what would be a descriptive name for a message”. If the restriction on the message name is relaxed to include spaces (as from our request), then then element ID (oiGrp) should not use it as one of its constituents. In the GitHub ticket we suggest: “Would it make sense to have msgType as the only identifier for a message, and relax the name attribute to allow what would be a descriptive name for a message?”. For additional context, this is from Orchestra v1.1: “Naming rules And this it the XSD in question: |
Beta Was this translation helpful? Give feedback.
-
[This thread is reported here from an old email conversation] Hanno Klein - Feb 9, 2024 Hi All, let me first share my view on spaces in names. I absolutely agree that this may cause various problem if a name is used for anything other than display purposes. FIX has a convention not to use spaces in the names of messages, groups, components, fields, code sets, and codes. I would also discourage anybody from using spaces in scenario names. Spaces should be limited to annotation elements. I did some digging and the actual change goes back to #118 where Yuval Cohen requested a relaxation of Name_t to be able to include “.” in the names. We went “all the way” and changed it from a string restricted by the pattern “([A-Z]|[a-z])([0-9]|[A-Z]|[a-z]|_)*” to xs:token, which not only allows punctuation characters but also single spaces in between words. I think Francesco’s feedback from a practitioner’s viewpoint is very valid and that we went too far. That is exactly the reason why we have Release Candidates to get people to start implementing and detecting serious issues with the design. Orchestra v1.1 RC2 does not have to be backward compatible with RC1. There is a datatype xs:NMTOKEN (https://www.oreilly.com/library/view/xml-schema/0596002521/re84.html) that does not allow spaces. It does not require the XML processor to remove line feeds, carriage returns, tabs, leading and trailing spaces, and multiple spaces from the Orchestra XML file. The downside is that it exists in XML Schema only for compatibility with DTDs, the predecessor of XML Schema. I would hence prefer to keep xs:token and add a pattern to restrict it to what xs:NMTOKEN allows. @don, do you have a view on the options? The second topic (uniqueness) is more complex and best discussed through calls. The current definition says that the scope of identifier uniqueness is within the same element type and requires additional attributes such as a scenario name in Orchestra v1.0. I understand that you are looking for a simplification of uniqueness to ease technical implementations of Orchestra. Just a general comment that I think it should be possible to change the name of an element but not the ID. I am open to dropping names from uniqueness requirements in Orchestra. FIX would still be more restrictive and never define two messages/groups/components/fields/code sets/codes/scenarios with the same name but different IDs. One comment on the current process. We still have a backlog of RC1 issues deferred to RC2 and the new concept for scenarios. On top of that we have multiple new issues from Francesco and myself. At least I do not have the bandwidth to work on so many issues in parallel with the proper focus they deserve. I do not mind discussing selected issues in a smaller group via email or GitHub to shape an opinion but they all need to end up in one of the working groups of the Orchestra Subcommittee. Complex issues such as the scenario enhancements require additional deliverables for the working group. It would be good to have a list of items for discussion by the working group so that we have something like a standing agenda and avoid moving from one issue to another without resolving any. Only the resolutions can be applied to the RC2 XML schema and spec. Regards, |
Beta Was this translation helpful? Give feedback.
-
Regarding the discussion on id schemes in Orchestra, I agree with Lisa that this needs to be human-readable, and to support the notion of semantic equivalence across base and derived specifications (e.g. where you are using a reference spec to maintain customisations). Is it worth considering how namespace are defined in existing specifications (like RDF/OWL) and following the same approach? As well as thinking about how we define a globally unique namespace for Orchestra elements, we also need to think about how Orchestra ids map to ids used in the encoding layer. Currently, it is assumed that the Orchestra id maps 1:1 to the tag number used in tag-value encodings (at least for FIX). Recently, I realised this is not the case for other tag-value encodings. For example, the TagWire encoding used by JSE (see Volume PT02 – Post-trade EMAPI Clearing) uses tag values that are not globally unique, but rather unique within the context of a message / component. Intuitively, it makes sense that the Orchestra ids are separate conceptually from the tags/ids used in the encoding layer, not least because some encodings use names rather than ids, and schema-less encodings have neither. it also suggests that Tablature should not see the "Id" with "Tag" table column names as equivalent, but rather as distinct concepts (we could have a convention to equate them for certain encodings like FIX tag-value, if appropriate). I think there is a similar requirement for names. For example, JSON encodings use a lowerCamelCase naming convention, so there needs to be a way to map the Orchestra element names to those used in different encodings (abbreviated names for XML encodings are another example of this). Summary of Requirements
|
Beta Was this translation helpful? Give feedback.
-
I want to suggest another option of having unique human-readable identifiers in Orchestra by switching to alphanumeric identifiers. This would give users the freedom to choose almost any type of unique identifier:
Users could mix various identifier types to include business value or encoding-specific information. For example, in FIX, the id could represent the message type for messages, the tag for fields (like it is now), or a short name for code sets or scenarios. However, this doesn't seem to be a good practice. I agree with Martin that Orchestra identifiers should ideally be independent from the encoding. This approach could help address some issues we've encountered:
Example (FIX) <fixr:datatype id="char"/>
<fixr:codeSet id="HandlInstCodeSet" type="char"/>
<fixr:field id="21" name="HandlInst" type="HandlInstCodeSet"/>
<fixr:group id="ListOrdGrp" category="ProgramTrading">
<fixr:numInGroup id="73"/>
<fixr:fieldRef id="21"/>
</fixr:group>
<fixr:message id="E" name="NewOrderList" category="ProgramTrading">
<fixr:structure>
<fixr:groupRef id="ListOrdGrp"/>
</fixr:structure>
</fixr:message> Example (other protocol) <fixr:component id="MessageHeader" name="Message Header">
<fixr:fieldRef id="MessageLength"/>
<fixr:fieldRef id="MessageType"/>
<!-- ... -->
</fixr:component>
<fixr:component id="NewOrderTransaction" name="New Order Message Transaction">
<!-- ... -->
</fixr:component>
<fixr:message category="OrderEntryMessages" id="1111" name="New Order">
<fixr:structure>
<fixr:componentRef id="MessageHeader"/>
<fixr:componentRef id="NewOrderTransaction"/>
</fixr:structure>
</fixr:message> |
Beta Was this translation helpful? Give feedback.
-
[This thread is reported here from an old email conversation]
Francesco Lo Conte - Sep 25, 2022
I think the standard has reached a point where manual editing of the XML spec is no longer an option. If we agree that going forward only tools will be used to handle the specifications, I would like to suggest that we switch from identifying elements by name or by id (tag), and instead we switch to UUID (which will be generated by tools and be invisible to users). I will add a note to the draft to this effect.
Beta Was this translation helpful? Give feedback.
All reactions