Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UD syntax: underscores in extended relations #27

Closed
matyaskopp opened this issue Jan 30, 2021 · 8 comments
Closed

UD syntax: underscores in extended relations #27

matyaskopp opened this issue Jan 30, 2021 · 8 comments
Milestone

Comments

@matyaskopp
Copy link
Collaborator

I have moved to the annotation part of our project and looked at UD extended syntactic relations again (#5).
@TomazErjavec, I don't want to push you too obstinately but I think that a solution with underscores _ in extended relations is not good and can be confusing for users who are habituated to colons :.

As I see, you are willing to do changes in ParlaMint schema. So I am trying it one more time.

If I understand you, you don't want to use a : in taxonomy because of a possible collision with prefixes. So I'm suggesting using @type and @subtype in relational links.
Current:

 <link ana="ud-syn:obl_arg" target="#seg3.1.6 #seg3.1.8"/>

Suggestion:

 <link type="obl" subtype="arg" target="#seg3.1.6 #seg3.1.8"/>

My solution doesn't go against UD standard it just split relation and its extension. I don't think that we have to introduce a new extended syntactic relations standard...

@TomazErjavec .
Please, can you look at it one more time?
Is it really necessary to create a new "standard" for UD syntax?
Can you figure out another better solution? (ideally where relation and extension would be in a single string... rel:ext)

@TomazErjavec
Copy link
Collaborator

OK, you indeed do have a point here. And maybe I am being too catious with not using colons in the values, as we could suppose that the prefix is only the part before the first colon, i.e. the software doing the prefix stripping will be smart enough to take the shortest rathen than the longest match.
With this, I will change back to what you originaly had, e.g.

 <link ana="ud-syn:obl:arg" target="#seg3.1.6 #seg3.1.8"/>

It still looks a bit funny, but maybe this is the least of all evils.
Will also change the existing V1 .ana. samples so they incorpora this fix.

@TomazErjavec
Copy link
Collaborator

Drat, I now remembered another reason why I didn't wan't colons in UD relations: in the TEI are actually the values of category/@xml:id, and it seems a bad idea to have colons in xsd:ID values, cf. https://stackoverflow.com/questions/6811188/why-does-jing-not-allow-a-colon-in-an-id-attribute.

So, I have to think again about this...

@TomazErjavec TomazErjavec reopened this Jan 30, 2021
@TomazErjavec
Copy link
Collaborator

OK, I though about it and I'm afraid I can't change the current practice, so the underscore will stay.

I don't like your suggestion with using @type and @subtype for several reasons:

  • (apart from UD morphological values) all the other linguistic annotation is done with pointers to IDs defined in taxonomies, and we would be here introducing a completely new mechanism
  • pointing to taxonomies makes sense, as there you can have more info on the particular category, e.g. its gloss, as well as its canonical value. If the categories are given in just the link attributes, you have no such added information.
  • you say that people would have to get used to underscore instead of colon, but then they would also have to get used to having extended relations in two attributes.

If it is any consolation, note that the UD-SYN taxonomy/category/catDesc/term actually contains the real UD category with colons. So, for user-facing applications, don't show them the value of @xml:id but the category/term that the ID points to.

@matyaskopp
Copy link
Collaborator Author

Yes, I am aware of the fact that the colon is not allowed in xml:id. It is the reason I used this hack:

<prefixDef ident="ud-syn" 
                  matchPattern="(.+)" 
                  replacementPattern="#xpath(//*[@xml:id = replace('$1',':','_')])">
...
</prefixDef>

I am using colons in links:

 <link ana="ud-syn:obl:arg" target="#seg3.1.6 #seg3.1.8"/>

but in taxonomy ids, I use _.

Ok, let's keep it as is. . would be better than _ (ud-syn:obl.arg) but it is only a small detail.

@TomazErjavec TomazErjavec reopened this Jan 31, 2021
@TomazErjavec
Copy link
Collaborator

Yes, I am aware of the fact that the colon is not allowed in xml:id. It is the reason I used this hack:

<prefixDef ident="ud-syn" 
                  matchPattern="(.+)" 
                  replacementPattern="#xpath(//*[@xml:id = replace('$1',':','_')])">

Ah, I see! You were miles ahead of me in #5, I didn't even properly look at the above, much less figure out what you meant. Sorry about that, at that time I didn't know I was dealing with a guru...

But you would have to explain to me how the above can be used in practice. I currenlty just do something like this:

select="replace($id, $prefixDef/@matchPattern, $prefixDef/@replacementPattern)"/>

So, how woud you use your #xpath(//*[@xml:id = replace('$1',':','_')]) in XSLT in a relativelly elegant way?

Also, given the definition of @replacementPattern in https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.patternReplacement.html is this even still TEI? Or maybe that is why you say it is a hack? In which case, I would still have doubts that this is the way to go...

@matyaskopp
Copy link
Collaborator Author

using xPath in @replacementPattern in TEI is definitely possible. You can see an example here: https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-cRefPattern.html

How I understand it: #my.id is a shortcut for #xpath(//*[@xml:id = 'my.id']) so in practice it can be used in the same way as a simple hash# referencing is done.

But how it can be used in XSLT is a really good question! Problem is that the dynamic evaluation of XPath is not widely supported: http://exslt.org/dyn/functions/evaluate/
So only one option comes to my mind. You can generate a new XSLT based on links in XML and evaluate it.

@TomazErjavec
Copy link
Collaborator

using xPath in @replacementPattern in TEI is definitely possible. You can see an example here: https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-cRefPattern.html

So it is, interesting.

How I understand it: #my.id is a shortcut for #xpath(//*[@xml:id = 'my.id']) so in practice it can be used in the same way as a simple hash# referencing is done.

Hmm, the TEI link above certainly supports your understanding, alas, not mine. I would expect #my.id to be exactly that, i.e. #xpath(//*[@xml:id = 'my.id']/@xml:id). Your XPath would not return the ID, but the XML node with this ID. And the same holds for the TEI examples, e.g. #xpath(//div[@type='book'][@n='$1']/div[@type='chap'][@n='$2']/div[@type='verse'][@n='$3']) gets you the div node. But, at the same time, the definition there says "The result of the substitution may be either an absolute or a relative URI reference.", which was my understanding all the time. So, confusion reigns.

But how it can be used in XSLT is a really good question!

Yes, I was sceptical, as far as I know XPath is a first-class citisen only in XLST 3.0, and there is not much software that supports it. Actually, Saxon does, but only the commercial editions, not SaxonHE, which I use. OK, I guess I could buy it, but I would like my software to be portable (even in nobody does port it..).

You can generate a new XSLT based on links in XML and evaluate it.

That sounds even scarier! Might as well start programming in LISP again :)

I think I need to sleep on this one...

@TomazErjavec
Copy link
Collaborator

TomazErjavec commented Feb 3, 2021

OK, slept on it, and I'm afraid that the solution to have the colon in ud-syn: IDREF is just too convoluted, so we stay with underscore. But, as mentioned,

the UD-SYN taxonomy/category/catDesc/term actually contains the real UD category with colons. So, for user-facing
applications, don't show them the value of @xml:id but the category/term that the ID points to.

Which might be a small consolation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants