Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding metadata (semantic) to find UN/CEFACT data more easily #70

Closed
svanteschubert opened this issue Mar 4, 2022 · 2 comments
Closed
Labels
semantics This is an issue inherited from the source CCTS model

Comments

@svanteschubert
Copy link

First of all, this is just a quick draft of an idea, that I mentioned in today's meeting.
It certainly needs some iterations and clean-up. Please show some mercy to me if it is hard to read and not easy to understand.
If too much to ask, just ping me directly and let's discuss during a tea break. Things can evolve much faster in quick dialogue.
This idea comes from endless hours in CEN TC 434 WG1 discussing over and over again if a new semantic should really be part of the EU core invoice or better become an extension.

Problem
The UN/CEFACT editor Gerhard Heemskerk explained to me that one of the main editor's tasks is to prevent existing data to be added twice (at different places).
The problem seems to me to locate the data easily.
A similar problem is known to everyone when sorting files by directories, creating a tree of directories to find (or place) the correct file. After some time there are files that might belong in multiple directories.
One way of solving this is by tagging the files with all the fields they belong to and doing a dynamic sorting afterwards.

Another use case is from customers for the UN/CEFACT data, who like to find existing data and do a query.
What transport container is UN/CEFACT already identifying and what are their volumes?

Suggestion of Solution
Remember the child game, where one child is thinking of something and the other is trying to guess it after asking questions, which can only be answered by yes/no?

  1. Does this thing you think about do live? -> yes
  2. Does this living being lives in the ocean? -> yes
  3. ...

Similar UN/CEFACT data can be classified by such metadata (binary bit representing yes/no).
From a software standpoint, the question/answer pairs can be viewed as traversing a binary tree. Dependent on the yes/no question the left/right branch is being taken.

How about adding boolean classifications/types to the UN/CEFACT data, which allow domain experts to traverse the data to find what they are looking for? To allow them to check if data already exists before asking to add them?

@svanteschubert
Copy link
Author

svanteschubert commented Mar 11, 2022

Please allow me to drop some basic questions and draft ideas, which I had not the time to discuss and distil to a far shorter and more didactic comment.

Some basic questions

  1. What is the vocabulary you have in mind? EIther a Taxonomy or Folksonomy
    https://en.m.wikipedia.org/wiki/Folksonomy#Folksonomy_vs._taxonomy
  2. How can this new metadata be used to enhance software?
    From my perspective, there are two main use cases: Search & Automatic Value Transformation (e.g. kilo in tons).
    Offering just a new JSON-LD syntax does not solve this problem, right? What information is missing for software developers to implement search and automatic value transformation?
  3. How do you align with other international players:
    a. Peppol (based on the UN/CEFACT fork UBL) claims to provide international reusable semantics via BIS Business
    Interoperability Specifications
    and other BIS, different to e-invoice. To me, it seems their UBL syntax (written persistent form of an idea/concept) was mistaken to be the semantic (the concept/idea itself) and should be specified separately as the same semantic might be represented by multiple syntaxes. Like you are going into an international restaurant and everyone gets a menu in a different language but all refer to the same food. Notable that the e-invoice BIS is a subset of the European norm of EN16931. This or an official subset of EN16931 such as BIS is mandatory to be used for B2G e-invoice across Europe in Italy already B2G, B2B and B2C (other EU countries are following).
    b. EN16931 written by CEN TC 434 is offering a semantic in its part EN16931-1 (EN16931-1 with errata free downloadable and EN16931-3 a mapping from semantic to syntax). As it is stupid to provide a norm with 400 pages of data to copy and paste to software developers I have extracted the data being an editor of EN16931-1 and provided an editor (VisualStudioCode Extension) to validate the structured mapping data (in progress). Most interesting the EN16931-1 semantic indicates in their mapping to UBL XML and CII XML (they call syntax-binding EN16931-3) semantic gaps. Therefore you receive with EN16931 a standard offering three slightly different semantics (UBL, CII and EN16931-1). Suboptimal.
    c. The CEN TC 440 adding their semantic on e-procurement at GitHub.
    d. There is ISO group ISO 154, ISO 321 and likely ISO 295 and likely more with overlaps to UN/CEFACT semantic.

Some suggestions

A language (or semantic) becomes more powerful the more it is being used by others!
Our all-goal should be to unite and cooperate with other international groups on sharing semantics (transforming syntaxes) and specifying semantics in an easy and interoperable way.

When I look at https://service.unece.org/trade/uncefact/vocabulary/uncefact/
The types seem to be autogenerated. The only information software developers can extract/reuse for search (and transformation) are

  1. the tags that are being implicit in the type name, like for
    https://service.unece.org/trade/uncefact/vocabulary/uncefact/#CargoInsurance
    the tags are "Cargo" and "Insurance" splitting the camel-cased string and
  2. the not machine-interpretable comment:
    "The insurance of goods during their transportation." is not formally described.

We might want to consider, before starting a new URL that no one used before (and UBL will never use for political reasons) like
https://service.unece.org/trade/uncefact/vocabulary/uncefact/#CargoInsurance

to use existing references for our tags.
For instance, by reusing the following public available two URLs as tags (or types) instead:
https://en.wikipedia.org/wiki/Insurance
https://en.wikipedia.org/wiki/Transport
UBL people would likely be far more willing to join the "shared semantic water" with us!
Some might say too vague not under our control, but be honest this would be far better than what we have now and will ever have! :-)
And these are exactly the URLs I would personally lookup and reference if I do not understand a term!

Finally, how would we ask for "CargoInsurance" at the UN/CEFACT Birthday party, playing the yes/no question & answer game?

  • Does this entity has a physical representation? -> No
  • Does this entity has something to do with transport? -> Yes
  • Does this entity is representing a monetary value? -> Yes
  • Does this entity has a fixed lifespan? -> Yes
  • ....<there are obviously more and better questions, but finding questions is difficult to be done and is likely better be done in a group>...

You might realize the semantic identification problem was now simplified by dividing it into these yes/no questions.
Allowing software developers to sort the data easily by data structures like binary trees.
There are follow-up problems, iterations on this, but the results would be far more reusable for software engineers

@nissimsan
Copy link
Contributor

@svanteschubert , a very capable search function. Pls give that a go. If you still miss features, don't be shy to re-open or raise another issue. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semantics This is an issue inherited from the source CCTS model
Projects
None yet
Development

No branches or pull requests

2 participants