CDM - Serialization Target State #2180
Replies: 14 comments 2 replies
-
A few thoughts from my side on this... I would version against the model and not down to the entity level. I think trying to go too granular will make life very very difficult. Also I don't think we'll ever be in a position where we'd want to deliver pieces of the model separately e.g. deliver a new version of My suggestion would also be to keep using JSON as the serialization method for the CDM. It would be too large an undertaking to change it now. Having said that though, JSON was really designed for use with web applications so are we using it correctly with the CDM? It fits for where we're using API calls (so like calling an event function like Create_BusinessEvent) but does it really fit for where we're describing a trade object? So I guess for API use cases JSON works well - so like as input to a function or receiving results back from a function. For messaging it also kind of works, although generally these are expressed as XML. As data dictionaries, again, XML is used predominantly. This infers that we need to consider what the actual use cases of the CDM are if we really want to determine what the native serialization format for the CDM should be. This type of decision is more something people on the CDM Steering Working Group should get involved in I feel? One other point on versioning which you mention - conversion/translation between versions. This is an interesting comment. Lets assume that we start using the CDM on version 4.0.0 and we store the JSON created by the CDM from that version. Time moves on and we're now on version 10.0.0 and the JSON has changed considerably. If I have been storing the CDM JSON from version 4.0.0, should I expect backwards compatibility between versions 10.0.0 and 4.0.0? Or will scripts/documentation/functions be provided to facilitate the conversion of one version to another? In this scenario is JSON still our preferred format? Sorry Dan, more questions than answers here I think 😂 |
Beta Was this translation helpful? Give feedback.
-
@iansloyan, @davidshone, @gabriel-ICMA Off the back of these technical discussions I think we need to think about what the target use cases for the CDM actually are. Once we have these agreed then we can determine the best way to support them from a technical perspective. I think the best forum for this would be the CDM Steering WG which I believe is due to be meeting soon. Could we add this as an agenda item? To get the ball rolling the use cases we (ISLA) have been promoting are: CDM as a common processing engine - if all parties to a trade call the same functions in the CDM then you should get the same result and reduce the number of fails and the need for reconciliation. This would be where the CDM is accessed via API calls where JSON would be a good choice for data format. CDM as a negotiation/messaging standard - with the work we've been doing on pre-trade there is a need to transfer data from one party to another. This is where the CDM data would be used as a messaging standard - JSON will work for this but in general XML has historically been used for this use case. CDM as a data dictionary - we are always promoting the ability the CDM has to provide a common representation of a financial object. This is not a particularly good use case of JSON, and again is more in line with how XML is used. Any other use cases I have missed? |
Beta Was this translation helpful? Give feedback.
-
RequirementsI think that a question the SWG needs to answer is whether we want separate implementations of CDM (e.g. at different firms) to be able to communicate using JSON serialized objects (API calls or messages), or do we require firms to always first translate to XML (e.g. FpML or other standard) before communicating to another firm. Another related question we want to answer is whether we want JSON-serialized objects saved in an older version of CDM to be retrievable by newer versions of CDM. If we want to allow JSON to be used for communication or persistence like this (and I think we should, given the ubiquity of JSON), we need to define how compatibility will work between separate implementations, because we know that they won't always be upgraded simultaneously. By compatibility I will define the following terms (I'm open to suggestion on these names):
We don't necessarily need all of the above. For instance, we could mandate that if two CDM implementations want to interoperate, they need to fall back to the oldest supported JSON format, for communication in both directions. That would mean that forward-processing compatibility isn't required. But it may be helpful to have a limited amount of forward processing compatibility, e.g. by ignoring or flagging new content that doesn't match expectations. Now, assuming we want any of these things, we'll need to define some kind of domain of support for the compatibility. In other words, how far back (how many versions back) do we guarantee compatibility? And do we want to guarantee that by making test cases that demonstrate it? These are things that the SWG should decide on, based on feedback from the TAWG on feasibility. Deciding on the above will be hard, and will be based in part on technical feasibility. I suggest that we have the SWG discuss this, give some kind of steer on direction/requirements, and then task the TAWG with evaluating implementation options and their feasibility, and reporting back to the SWG with a recommended direction. ImplementationAssuming that we want to support any of the above requirements, we'll need a strategy for doing that. That's something the TAWG should discuss. First point is that we need a version numbering system for the model that has been serialized into JSON, and always include that in any generated JSON. I think for simplicity it should just be based on the CDM model version, a single number. Then users of the JSON can determine whether than model version supports the versions of each type they need / have. Second is that we need some kind of standardized wrapper for the JSON to contain the version number and an entry point (and possibly other things). Unless something has changed recently, serialized JSON doesn't by default contain the name of the root type, just the contents, so you have to know what you're loading. We should have a standardized way of wrapping all that, so that implementations can just load a lump of CDM JSON and get a useful CDM object, without having to know what it is before loading it. Having this standardized versioning will give us a starting point for interoperating with JSON. (Note that we have something similar with FpML and it is crucial to interoperating across implementations.) Then, if we want to be able to provide backward-processing compatibility, some options we could consider (roughly from crudest to most sophisticated) include
I assume that the implementation options for backward-generation would be similar. For forward-processing, it may be enough to find ways to keep the deserializer from barfing if the JSON code doesn't exactly match the current model, and to track any resulting exceptions. CommentaryDeveloping this kind of mapping between versions of the model seems like it could be a time-consuming effort. However, the alternative is to strongly limit the changes that are allowed to the model, and force everybody to upgrade whenever changes are allowed. In the case of FpML, this made sense, particularly once the core products were stable. (In practice the FpML IR Swaps model hasn't changed much in nearly 20 years, except for some additions.) In the case of CDM, I think that given the rapid evolution of the model that is likely to continue for a while, and the fact that there is a shared code base for working with the model, putting the translation between versions of a model somehow into the shared code base is worth the effort. In practice most changes to the model are likely to to be relatively easily mappable, and doing that once in the shared code base instead of for every implementation makes a lot more sense. |
Beta Was this translation helpful? Give feedback.
-
Of the types of use cases @chrisisla cites above, CDM is currently most frequently used as a data dictionary. There is no doubt that a shared standard industry lexicon is a necessary condition for increasing efficiency. It is insufficient without the exchange of data whether by messaging or API both of which serialization will need to support as the use of CDM expands. Secondly, the simultaneous in-production use of different versions of CDM mentioned by @brianlynn2 seems inevitable, especially given the emerging usage of the standard. The stated goal of interoperability and collaboration demands that users of supported (I.E., non-deprecated major) versions mutually understand the data they exchange at least to the minimum extent available between the versions used by each party. The implications are that the model and version must be part of the serialized data and that later versions of the model must include backward processing and generation. The good news is that the CDM's well-defined modeling and tightly coupled generated code may provide the basis for enabling the understanding of supported earlier versions. The need to provide the context of the model and version reasonably opens the question of whether to continue to use JSON as a serialization format, shift to something more self-described such as XML, or simultaneously support both. JSON historically has been used as the input or output of a process where the interaction with an API implies the model and version. As a result, some hold that including the model and version in the data itself is inconsistent with JSON's "ethos" and, therefore, more suited to a more verbose format such as XML. The argument against using JSON is understandable but a bit impractically purest in nature and the reasoning for using XML comes with its own suitability questions since it would require synchronization between the schema and CDM. Said another way, since Rosetta is used to define CDM, tight binding to XML implies that the schema should be the result of a new code generation process. Separately, the discussion of version management and lessons learned from FPML led to a question about the best way to include product definition data which might rapidly evolve. When it is directly embedded, every change implies the need to create a new version of the model. The recommendation is to make reference to data external to the model itself. |
Beta Was this translation helpful? Give feedback.
-
Recapping the agenda for the first meeting held to discuss how to move this forward: Overall mission: Define and realize a serialization format for CDM. Adoption of the to-be-proposed format may be one of the most critical issues in the adoption of the new standard. Meeting Goal: Define and develop a plan to realize CDM's target state serialization format. Agenda:
|
Beta Was this translation helpful? Give feedback.
-
@dschwartznyc @brianlynn2, @chrisisla, @eacunaISDA, @iansloyan, @manel-martos, @dschwartznyc I put some simplified use-cases into a PowerPoint as discussed. Take a look can we can discuss. |
Beta Was this translation helpful? Give feedback.
-
Output from brainstorming session discussing functional/technical challenges to consider in designing a solution for serialization |
Beta Was this translation helpful? Give feedback.
-
please make sure Fragmos-Chain CTO, namely Adrian Hutusoru is definitly involved in business rationale, use cases, validation process and any potential release, related to this "Serialisation" project thanks |
Beta Was this translation helpful? Give feedback.
-
hi Dan |
Beta Was this translation helpful? Give feedback.
-
Update from Sep 1 through Oct 3 |
Beta Was this translation helpful? Give feedback.
-
Potential Solutions reviewed 10/3 Objective: determine the recommended approach for serialization of CDM or present key alternatives w/pros and cons Meeting Agenda: brainstorm/outline potential solutions and alternatives. The intent is to list out materially different approaches. The following is intended as a starter and not comprehensive by any means.
|
Beta Was this translation helpful? Give feedback.
-
Recap from Serialization Taskforce meeting on Oct 5, 0223 Attending: Adrian, Chris, Dhruva, Eleonora, Jason, Manuel, Minesh, Tom, Dan Thank you again to Dhruva for walking us through JPMorgan’s approach to exchanging information including the reasoning behind its decision to partition its reading and writing libraries. Decisions:
Open questions:
The objective for the next meeting (the weekly on Thursday), is to sort out the Open Questions listed above. Issues identified but outside of the scope for serialization (directed to the TAWG):
Actions:
|
Beta Was this translation helpful? Give feedback.
-
Subject: Serialization #11 - agenda - 11/9
Java results:
Python compression comparison analysis
|
Beta Was this translation helpful? Give feedback.
-
Serialization #11 - recap - 11/9 Apologies: David, Marc Comments and corrections are welcome.
|
Beta Was this translation helpful? Give feedback.
-
Soliciting feedback, suggestions, and challenge, if any, on a proposal put forward in the Technology Architecture WG (TAWG) to address how CDM will be serialized going forward. The proposal was made in the 8th June 2023 TAWG meeting by the Task Force examining CDM's Build and Release Management. Please comment below and join the Task Force.
The proposal:
In principle, the proposal had consensus support but needs to be fleshed out and turned into action.
There are two central questions:
Why is this an issue?
Open questions:
-- JSON is a lightweight standard
-- Alternatives such as XML more readily accommodate the inclusion of metadata such as reference to an entity or version
-- How do we balance the flexibility of innovation versus the complexity of development and maintenance?
-- More granularity implies greater flexibility
-- Less granularity implies less complexity
-- Would the translation be direct between versions or against a canonical model representation such as a version of FPML?
Beta Was this translation helpful? Give feedback.
All reactions