Add base direction as a fourth element of literals. #48

gkellogg · 2023-06-19T21:34:43Z

This addresses the conceptual representation of text direction in RDF language-tagged literals, based on discussion in:

Fixes #9.

For discussion:

Separate directional language-tagged string with rdf:dirLangString datatype, or extend the notion of language-tagged string and continue to use rdf:langString.
Bikeshedding terminology for literal element:
text direction – From previous proposals
base direction – From Unicode bidi basics
initial text direction – From Wiki
text directionality – From HTML

(As the infrastructure seems to be having issues, you can also view the document via GitHack.

Preview | Diff

For #9.

iherman

(Bike shedding!)

I wonder whether the term "Text Direction" is appropriate. The value of the fourth element does not fully define the full text direction, because that may also depend on the specific Unicode characters in the text and may even vary within the text in the bidirectional case.

We used, elsewhere, the term "based direction". The HTML specification uses the term "text directionality".

My personal choice is to align with the HTML terminology.

CC: @r12a

afs · 2023-06-20T13:15:53Z

See https://github.com/w3c/rdf-star-wg/wiki/Text-Direction-Proposal

It uses "initial text direction" - does that capture your point @iherman?

pfps · 2023-06-20T13:30:19Z

If text direction is to be supported in RDF let's support it in a general way. Proposals for adding an initial text direction to not meet this requirement as correct rendering of bidirectional text needs internal direction markers.

If there is going to be a partial solution provided in RDF, then I feel that it needs to be backward compatible with existing RDF systems. One way to do this is to use '-x-ltr' and '-x-rtl' at the end of the language tag. This produces valid language tags and is backwards compatible.

iherman · 2023-06-20T13:39:44Z

See https://github.com/w3c/rdf-star-wg/wiki/Text-Direction-Proposal

It uses "initial text direction" - does that capture your point @iherman?

Maybe, but it is not ideal. "Initial" suggests some sort of an ordering in time. Although I realize that I am on a slippery slope in terms of English terminology...

I am a bit worried by getting into unnecessary bike-shedding; the reason I proposed to pick the term used by the HTML spec is to avoid that... Part of the community has already picked that term, I am not sure if it is worth picking our own for something that is essentially the same.

spec/index.html

afs · 2023-06-20T13:36:13Z

spec/index.html

+      the two <a>datatype IRIs</a>,
+      the two <a>language tags</a> (if any), and
+      the two <a>text directions</a> (if any)
+      compare equal, character by character.


Aside: for language tags, it's case insensitive.

That's a decision from outside RDF - for text direction we can restrict to lower case.

We previously say that the value space of language tag is lower case, is it redundant to say to use a case insensitive comparison here?

is it redundant to say to use a case insensitive comparison here

I think it conforms to Postel's Law, will clearly reflect user intent, and will be better than case sensitivity-based errors.

I was jumping to conclusions. Some investigation: it isn't as simple as "case insensitive".

The text above this paragraph says "MAY lower case" the concrete string that is the language tag.

The text here is about term-equality and does not say it is a value-space comparison. (FWIW "value space" for language tags is a bit meaningless - "value spaces" involve datatypes but we are where we are.)

The RDF 1.1 text:

the two language tags (if any) compare equal, character by character

does allow "abc"@en and "abc"@EN as different terms, whether that was intended or not.

The root problem is that RDF has not used the canonical form for language tags. Some users do care about this.

At users' request Jena has options to leave as-is, always lower-case and always canonicalization.

Maybe better to leave as "character by character" because otherwise it is a implementation change.

Hm, yes, I see what you mean. Accidental inconsistency is a rarely vanquished hobgoblin, especially across specs developed separately over years. We have our work cut out for us, in trying to bring consistency to all these docs that we're simultaneously trying to upgrade/update.

"Character by character" is at least clear, and we can include a note that advises deployers of the potential need to enact Jena's options — i.e., keep original langtag casing, make all langtags lower-case, or (whatever you meant by "always canonicalization").

We need something to reconcile the notions of a lower-case value (space) for the language-tag and the fact that it's compared character by character (code point by code point?). Are there systems where "abc"@en and "abc"@EN are not considered the same term.

It might say something like "two language tags (if any) compare equal after normalizing to lower case".

The sentence "The value space of language tags is always in lower case" might be changed to "The value of language tags is always treated as being in lower case".

We should avoid "value" because that is about datatype literals.

"compare equal as if normalized to lower case".

which does not imply they are converted (the earlier text is MAY).

Are there systems where "abc"@en and "abc"@en
Jena keeps them apart but they are the same term (yes - that's contradictory)

It's about meeting the user expectation of round-trip with no change.

spec/index.html

afs · 2023-06-20T14:38:05Z

It uses "initial text direction" - does that capture your point @iherman?

Maybe, but it is not ideal. "Initial" suggests some sort of an ordering in time. Although I realize that I am on a slippery slope in terms of English terminology...

the reason I proposed to pick the term used by the HTML spec is to avoid that... Part of the community has already picked that term, I am not sure if it is worth picking our own for something that is essentially the same.

Yes, it's tricky to get the right terminology.

When we talked to i18n, "text direction" was pointed out as suggesting "all this text" whereas this is meant as in "default".

In the "פעילות הבינאום, W3C" example there are multiple directions.

HTML also has auto which doesn't seem appropriate for RDF.

I am a bit worried by getting into unnecessary bike-shedding;

Yes :-)

afs · 2023-06-20T18:50:09Z

If there is going to be a partial solution provided in RDF, then I feel that it needs to be backward compatible with existing RDF systems. One way to do this is to use '-x-ltr' and '-x-rtl' at the end of the language tag. This produces valid language tags and is backwards compatible.

@pfps In https://github.com/w3c/rdf-star-wg/wiki/Text-Direction-Proposal the proposed syntax is @en--ltr. Not quite the same, but connected to the language tag.

LANGTAG ::=  "@" [a-zA-Z]+ ("-" [a-zA-Z0-9]+)*

RDF 1.2 gives us the opportunity for a syntax change while previous work has operated within the confines of RDF 1.1.

Being separate:
LANG("foo"@en--ltr) is proposed as "en", otherwise it would be en-x-ltr. Looking for "all literals with en continues to work. Similarly, the LANGMATCHES function continues to match (the algorithm is from RFC4647).

gkellogg · 2023-06-20T21:05:28Z

@afs said:

See https://github.com/w3c/rdf-star-wg/wiki/Text-Direction-Proposal

It uses "initial text direction" - does that capture your point @iherman?

Sorry, there were a number of documents on (initial) text direction, and I based the PR off of https://github.com/w3c/rdf-star-wg/blob/main/docs/text-direction.md. We can reconcile the differences. I'm fine with "initial text direction", as that gets to the intent of the element. "Text Directionality" may have a subtly different meaning, as it describes the behavior of a display element, not a property of the text, but we can continue to discuss terminology either in this PR, or subsequently.

@pfps said:

If text direction is to be supported in RDF let's support it in a general way. Proposals for adding an initial text direction to not meet this requirement as correct rendering of bidirectional text needs internal direction markers.

That's not my understanding of how bidirectional text works in Unicode. From Unicode Bidirectional Algorithm basics each character already has its own directionality encoded, it's for cases where character classes are mixed that there is no a-priori way of knowing how to begin rendering the text. After setting off the initial direction, the Unicode algorithms handle any subsequent change in direction. Within that document base direction is used, so perhaps that would be a better term than "initial text direction" or "text directionality".

If there is going to be a partial solution provided in RDF, then I feel that it needs to be backward compatible with existing RDF systems. One way to do this is to use '-x-ltr' and '-x-rtl' at the end of the language tag. This produces valid language tags and is backwards compatible.

RDF Literals and Base Directions did explore extending the language tag, but were ultimately rejected. See 2.1.1 Extend language tag for a discussion.

afs · 2023-06-20T21:17:42Z

https://github.com/w3c/rdf-star-wg/wiki/Text-Direction-Proposal is a write up based on issue #9.

spec/index.html

Co-authored-by: Andy Seaborne <andy@apache.org>

gkellogg · 2023-06-20T21:40:05Z

Applied some of @afs's suggestions, leaving the others for now pending further discussion.

iherman · 2023-06-21T05:16:53Z

Within that document base direction is used, so perhaps that would be a better term than "initial text direction" or "text directionality".

I am also fine with "base direction". Actually, when writing up my comment, my initial instinct was to propose that term but (to my surprise) that is not the term used by the HTML standard, and that is why I fell on the "directionality" side. (No idea why that term was chosen for HTML.) Either way is fine with me.

pfps · 2023-06-21T11:45:16Z

@gkellogg In https://www.w3.org/International/articles/inline-bidi-markup/ there are examples of strings that need embedded markup for correct rendering. My takeaway is that a solution that only provides a language tag and a base direciton is insufficient. The worst situation, I think, is including identifiers using strong ltr characters in rtl text, as in "[ARABIC TEXT] A7, B8, X" where the order of the identifiers is reversed from its correct order if there is no embedded markup. Note that the language of this text is entirely Arabic - the identifiers are not English or any other language that uses ltr display. I include an example with rtl identifiers inside ltr script.

Here are two identifiers using Hebrew script בבב, אא. The first is בבב the second is אא

afs · 2023-06-21T13:50:12Z

"base direction" works for me.

TallTed · 2023-06-21T21:50:53Z

I am also good with "base direction".

I don't like -x-ltr because it goes down the broken -x road. I am OK with --ltr.

pfps · 2023-06-21T22:09:49Z

How is -x- broken?

spec/index.html

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>

spec/index.html

afs · 2023-09-06T12:44:09Z

Do we need separate discuss-f2f for this PR and the issue, and we are also meeting i18n.

gkellogg · 2023-09-06T23:32:24Z

I was going to include it in the first slot proposed for I18N, along with the Unicode cleanup (if necessary) and discussion of BCP47 case sensitivity when it comes to literal equality, and thus if triples differing in case are the same, or not. I'm separately working on slides for this section.

aphillips

Reviewed in the I18N TPAC meeting as prep for tomorrow. Some comments included.

spec/index.html

aphillips · 2023-09-11T14:53:38Z

spec/index.html

+      <p class="note">The absence of a <a>base direction</a> does not necessarily imply that
+        the text has no initial text direction;
+        as described in [[[?UAX9]]],
+        strings may be embedded within structures which establish an <em>embedding direction</em>,
+        which determines the default bidirectional orientation of text.</p>


This is slightly misleading. The bidi algorithm determines the base direction in any case. And "embedding" is an overloaded term in the bidi algorithm (strings can be "embedded", but "embedding" in bidi refers to stacking bidirectional states...)

I'm not sure what the note is trying to convey. Are you trying to say "if the direction is not provided as metadata, the string can still be rendered"? Generally, what we say is either (a) when there is no base direction provided for a given string, the auto (first-strong detection) direction should be used; or (b) when the base direction is not provided, the direction of the enclosing document (or content??) is used

spec/index.html

gkellogg · 2023-09-12T09:54:08Z

Rebased, after merging in #59.

pchampin · 2023-09-12T13:37:09Z

This was discussed during the TPAC 2023 meeting:
https://www.w3.org/2023/09/12-rdf-star-minutes.html#t03

spec/index.html

* Apply suggestions from I18N review * Unrelated change not on rdf:HTML and rdf:XMLLiteral datatypes being definitions. Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com> Co-authored-by: Addison Phillips <addisonI18N@gmail.com>

spec/index.html

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>

Add **text direction** as a forth element of literals.

2e96714

For #9.

gkellogg requested review from afs, pchampin and iherman June 19, 2023 21:34

gkellogg changed the title ~~Add **text direction** as a forth element of literals.~~ Add text direction as a forth element of literals. Jun 19, 2023

w3cbot mentioned this pull request Jun 20, 2023

Add text direction as a forth element of literals. w3c/i18n-activity#1732

Closed

iherman reviewed Jun 20, 2023

View reviewed changes

afs changed the title ~~Add text direction as a forth element of literals.~~ Add text direction as a fourth element of literals. Jun 20, 2023

afs reviewed Jun 20, 2023

View reviewed changes

gkellogg commented Jun 20, 2023

View reviewed changes

spec/index.html Outdated Show resolved Hide resolved

Apply suggestions from code review

19a6413

Co-authored-by: Andy Seaborne <andy@apache.org>

TallTed suggested changes Jun 21, 2023

View reviewed changes

gkellogg and others added 2 commits June 21, 2023 15:45

Apply suggestions from code review

f77244d

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>

Cleanup from changing "text direction" to "base direction".

9e0c5e7

Merge branch 'main' into text-direction

0633807

gkellogg removed needs discussion Proposed for discussion in an upcoming meeting spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature labels Aug 30, 2023

afs reviewed Aug 31, 2023

View reviewed changes

spec/index.html Outdated Show resolved Hide resolved

gkellogg mentioned this pull request Sep 1, 2023

Improve Unicode terminology and term references. #59

Merged

pchampin added the discuss-f2f Proposed for discussion during the next face-to-face meeting label Sep 5, 2023

aphillips suggested changes Sep 11, 2023

View reviewed changes

gkellogg commented Sep 11, 2023

View reviewed changes

spec/index.html Outdated Show resolved Hide resolved

TallTed reviewed Sep 11, 2023

View reviewed changes

spec/index.html Outdated Show resolved Hide resolved

gkellogg force-pushed the text-direction branch from e3dc44d to 811812d Compare September 12, 2023 09:52

TallTed suggested changes Sep 15, 2023

View reviewed changes

spec/index.html Outdated Show resolved Hide resolved

spec/index.html Outdated Show resolved Hide resolved

TallTed suggested changes Sep 15, 2023

View reviewed changes

gkellogg force-pushed the text-direction branch from 7dab0b4 to 660dca1 Compare September 22, 2023 23:31

gkellogg added needs discussion Proposed for discussion in an upcoming meeting and removed discuss-f2f Proposed for discussion during the next face-to-face meeting labels Sep 25, 2023

Tpt reviewed Sep 28, 2023

View reviewed changes

spec/index.html Outdated Show resolved Hide resolved

gkellogg added spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature and removed needs discussion Proposed for discussion in an upcoming meeting labels Oct 5, 2023

gkellogg and others added 3 commits October 12, 2023 14:35

Merge branch 'main' into text-direction

a0ee47a

* Base direction discussion from @aphillips

2d153ec

* Apply suggestions from I18N review * Unrelated change not on rdf:HTML and rdf:XMLLiteral datatypes being definitions. Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com> Co-authored-by: Addison Phillips <addisonI18N@gmail.com>

Use I18N-GLOSSARY instead of i18n-glossary.

f1b884f

gkellogg force-pushed the text-direction branch from 3cdc417 to f1b884f Compare October 12, 2023 21:38

TallTed suggested changes Oct 13, 2023

View reviewed changes

spec/index.html Outdated Show resolved Hide resolved

spec/index.html Outdated Show resolved Hide resolved

spec/index.html Show resolved Hide resolved

spec/index.html Outdated Show resolved Hide resolved

spec/index.html Outdated Show resolved Hide resolved

Apply suggestions from code review

529b6dc

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>

gkellogg merged commit f299fd0 into main Oct 13, 2023

gkellogg deleted the text-direction branch October 13, 2023 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add base direction as a fourth element of literals. #48

Add base direction as a fourth element of literals. #48

gkellogg commented Jun 19, 2023 •

edited by pr-preview bot

Loading

iherman left a comment

afs commented Jun 20, 2023 •

edited

Loading

pfps commented Jun 20, 2023

iherman commented Jun 20, 2023

afs Jun 20, 2023

gkellogg Jun 20, 2023

TallTed Jun 21, 2023

afs Jun 21, 2023

TallTed Jun 22, 2023

gkellogg Jun 22, 2023

afs Jun 22, 2023

afs commented Jun 20, 2023

afs commented Jun 20, 2023

gkellogg commented Jun 20, 2023 •

edited

Loading

afs commented Jun 20, 2023

gkellogg commented Jun 20, 2023

iherman commented Jun 21, 2023

pfps commented Jun 21, 2023

afs commented Jun 21, 2023

TallTed commented Jun 21, 2023

pfps commented Jun 21, 2023

afs commented Sep 6, 2023 •

edited

Loading

gkellogg commented Sep 6, 2023

aphillips left a comment

aphillips Sep 11, 2023

gkellogg commented Sep 12, 2023

pchampin commented Sep 12, 2023

Add base direction as a fourth element of literals. #48

Add base direction as a fourth element of literals. #48

Conversation

gkellogg commented Jun 19, 2023 • edited by pr-preview bot Loading

iherman left a comment

Choose a reason for hiding this comment

afs commented Jun 20, 2023 • edited Loading

pfps commented Jun 20, 2023

iherman commented Jun 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afs commented Jun 20, 2023

afs commented Jun 20, 2023

gkellogg commented Jun 20, 2023 • edited Loading

afs commented Jun 20, 2023

gkellogg commented Jun 20, 2023

iherman commented Jun 21, 2023

pfps commented Jun 21, 2023

afs commented Jun 21, 2023

TallTed commented Jun 21, 2023

pfps commented Jun 21, 2023

afs commented Sep 6, 2023 • edited Loading

gkellogg commented Sep 6, 2023

aphillips left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkellogg commented Sep 12, 2023

pchampin commented Sep 12, 2023

gkellogg commented Jun 19, 2023 •

edited by pr-preview bot

Loading

afs commented Jun 20, 2023 •

edited

Loading

gkellogg commented Jun 20, 2023 •

edited

Loading

afs commented Sep 6, 2023 •

edited

Loading