Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature: @template #362

Open
pchampin opened this issue Sep 24, 2020 · 12 comments
Open

New feature: @template #362

pchampin opened this issue Sep 24, 2020 · 12 comments
Labels
defer-future-version Defer this issue until a future version of JSON-LD

Comments

@pchampin
Copy link
Contributor

pchampin commented Sep 24, 2020

Here is a feature that I discussed with some colleagues, and that we really would like to see in a future version of JSON-LD.

Use cases

Consider the following example JSON, as would be produced by a Web API

{
    "id": 1234,
    "name": "Alice",
    "bday": "1987-04-01",
    "height": 168
}

We know from the API documentation that id is a unique local identifier for this entity, whose corresponding IRI is http://example.org/users/1234. Unfortunately, there are two problems with the current spec:

  • a property mapped to @id accepts only strings;
  • even if 1234 was replaced with "1234", it would be resolved against the @base of the context; setting the base to http://example.org/users/ for the sole purpose of this property would not always be appropriate.

We also know from the API documentation that height is expressed in centimetres. We would like to map it using the cdt:ucum datatype, i.e. into "168 cm"^^cdt:ucum".

Proposed solution

  • add a @template keyword that can appear in a term definition; the value is a string containing a single placeholder {}, for example http://example.org/users/{} or {} cm;
  • during expansion, and before any other step, the value of a property matching this term definition is replaced by the value of the template, replacing the placeholder with the canonical representation of the value;
  • during compaction, a term definition with a @template attribute is usable only if the @id or @value "matches" the template. Its value is then replaced by extracting the substring corresponding to the placeholder.

An example context for the use-case above would then look like

{"@context": {
    "id": {
        "@id": "@id",
        "@template": "http://example.org/users/{}"
    },
    "height": {
        "@id": "http://example.org/ont/height",
        "@type": "https://ci.mines-stetienne.fr/lindt/v3/custom_datatypes#ucum",
        "@template": "{} cm"
    }
}}

Remaining issues

  • Is {} the best choice for the placeholder?
  • Do we need a way to escape the placeholder in the template?
  • Do we want to allow some formatting options in the template (minimum number of digits, hexadecimal, ...) or more complex transformation (computation...)? I suggest we keep things simple.
  • Do we reserve the use of @template for number values, or do we also use it with strings? There would possibly be useful use-cases with strings as well, but then compaction would have to decide whether to produce a string or a number (at least when the value can be parsed as a number)
@iherman
Copy link
Member

iherman commented Sep 24, 2020

Specifically for URL-s there is a URI Template RFC document. I know we referred to it in the CSVW work, but that is something that @gkellogg knows way better. I do not know whether it is too much for what we want, whether it is also usable for non-URL purposes, etc. But it has the advantage of being documented and would avoid reinventing the wheel...

@pietercolpaert
Copy link

I like the proposal. {} is a good suggestion, but if we would like to support more transformations, we could take inspiration from:

  • printf in python/bash/C
  • string handling in Bash4
  • regular expression variables and their flags

Probably @datatype in your last example is a typo and you meant @type though?

A little bit related: the Hydra explicit and basic variable representations for URI templates: https://www.hydra-cg.com/spec/latest/core/#example-22-the-different-variable-representations

@asbjornu
Copy link

asbjornu commented Sep 24, 2020

I really like the idea, and agree that for URIs, @iherman's suggestion on using URI Templates is the definitive way to go. I don't think URI Templates can (or should) be used for non-URI data, though. At least not without defining some clear processing rules (perhaps some inspiration can be sourced from RFC 5229).

Something I believe is going to surface as a need almost immediately after @template is released, is the ability to interpolate the value of other properties from the document into the result of the templated strings. So something like {otherProperty} is a good idea to support from the get-go.

…Which begs the question: How should otherProperty be resolved? Are only sibling properties to the templated property allowed? If not, should JSON Pointer or JSON Path be supported? This is a slippery slope unless we pin the syntax against a stable, ratified specification.

@pchampin
Copy link
Contributor Author

@pietercolpaert

Probably @datatype in your last example is a typo and you meant @type though?

Yes, thanks for spotting it. That's fixed

@azaroth42
Copy link
Contributor

azaroth42 commented Sep 24, 2020

I'm not convinced, I'm afraid.

This is even more complex than the frequently requested ability to change datatypes / add classes with @type in a context (#31, #76, etc). Instead this does additional data transformation by introducing new content in a context document, something we have previously decided is out of scope for a context.

The data would not round-trip, as the template would not be reversible. Once the context has been applied, it cannot be unapplied. Even worse, once it was applied, it could then be applied again if it could be used with strings, resulting in (e.g.) "168 cm cm cm cm cm". The template pattern is not idempotent, and there's no way to know when to apply it and when it has already been applied.

So I'm 👎 on the feature in its current state

@filip26
Copy link

filip26 commented Sep 24, 2020

I agree with @azaroth42 Beside a lot of unresolved issues this feature brings, the current complexity of JSON-LD is quite big. 

There are plenty of syntax description formats (OAS, RAML, etc.) that can be used as a source for a preprocessor to transform an input into hypermedia format before passing it to JSON-LD processor.

@pchampin
Copy link
Contributor Author

pchampin commented Sep 24, 2020

@iherman @asbjornu
It seems to me that URL templates are both too much (supporting multiple placeholders) and not enough (supporting only URLs).

@pietercolpaert @asbjornu
I really want to keep the system as simple as possible, in particular because the inability to round-trip would be a deal-breaker (see @azaroth42's comment).

@azaroth42
I don't think my initial proposal above "transforms" data much more than "@language": "fr" or "@type": "xsd:date" (used in a context), really. As for round-tripping, I do believe that this proposal supports it. I may have overlooked some edge-cases, but overall I think it is achievable. That is, if we refrain from adding complex pre-processing on the value besides injecting it in the template.
Maybe it will require that templates are only applied to numbers, not strings. But numeric IDs are really pervasive, so that use-case alone would make templates useful, I believe.

@filip26
Yes of course, pre-processing is an option, but then we lose the nice round-tripping feature that JSON-LD offers. And again, I do believe that @template can be made to round-trip.

@azaroth42
Copy link
Contributor

Okay, if it only works for numeric data (by which I mean r/-?[0-9.]+/) then reversing the template seems easy enough for compaction to handle. And if you don't compact with the same context that you used to expand, then you shouldn't expect to get the same results.

I have use cases for this too, FWIW. We have information systems that naively use an incrementing integer as the core identity for records describing objects, people, places and events. If @id could use a @template, then we would drop that integer in and use the template to expand to the full URI.

For example: http://vocab.getty.edu/aat/300194222 could have: "id": 300194222 which would make people happy.
Or (in my new institution): https://collection.britishart.yale.edu/id/page/object/1084 would have "id": 1084

With those caveats to ensure round tripping, I'm a definite +0

@gkellogg
Copy link
Member

For values of @id, I agree that we should be able to interpret other primitive types, such as number, as suitable for IRI expansion.

We did use the URI templates mechanism for CSVW, and it can do everything we want, and probably needs some more consideration to see how it adapts to creating URIs from values. Something like @template in a term definition that specifies the URI template to use would be good.

One of the issues we've encountered recently, though, is that URI templates end up escaping non-ASCII data. I believe that you can uniformly decode the value of the template transformation without messing up any legitimate data, which we'd need to be sure to allow for.

As @asbjornu says, a URI template can interpret other variable values, which could potentially come from other properties of the node, but this adds substantial complexity, and I would suggest we constrain ourselves to "the simplest thing which could possibly work", and see where that gets us. In this case, limiting ourselves to only the value of the @id key (when it is a primitive type).

Regarding non-IRI values, CSVW also considered using {{mustache}}, but it is inadequately specified and would add even more complexity, so no good solution there, I'm afraid. I've often thought that a URI Template-like mechanism that could be extended to literal values would be useful, but alas ...

@ajs6f
Copy link
Member

ajs6f commented Sep 24, 2020

Would this not work for more general alphanumeric identifiers? There are plenty in wide use (e.g. UUIDs). The restriction to a single use of a single value is what seems to me to make it feasible, but perhaps I'm missing something.

@gkellogg
Copy link
Member

It should work for any URI, and with unescaping, IRI. It is just restricted to generate URIs, not general string literals.

@pchampin
Copy link
Contributor Author

@ajs6f

Would this not work for more general alphanumeric identifiers?

The problem is: if you try to compact http://ex.co/id/1234 through template http://ex.com/id/{}, should you produce the string "1234" or the number 1234? If we restrict templates to numbers, there is no ambiguity.

Also, canonical numbers representations can (I think) be inserted into IRIs without any escaping/encoding, while other characters may need this.

@gkellogg

It is just restricted to generate URIs, not general string literals.

Well, the second use case is important for us too...

Regarding non-IRI values, CSVW also considered using {{mustache}}, but it is inadequately specified and would add even more complexity, so no good solution there, I'm afraid.

This is why I was not suggesting to design a complex formatting mechanism, just dead-simple substitution -- or, more precisely: concatenating something before and after the original value. This also makes "unapplying" the template easy.

Should the template fail to "unapply" during compaction (either because the expanded value does not match the prefix or suffix of the template, or because the substring in between does not parse to a number), the term definition will not apply. This is the same thing that happens when a term definition specifies a "@language" (example in the playground).

@gkellogg gkellogg added the defer-future-version Defer this issue until a future version of JSON-LD label Jan 12, 2021
@pchampin pchampin moved this to Future Work in JSON-LD Management May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defer-future-version Defer this issue until a future version of JSON-LD
Projects
Status: Future Work
Development

No branches or pull requests

8 participants