Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add base direction as a fourth element of literals. #48

Merged
merged 16 commits into from
Oct 13, 2023
Merged

Conversation

gkellogg
Copy link
Member

@gkellogg gkellogg commented Jun 19, 2023

This addresses the conceptual representation of text direction in RDF language-tagged literals, based on discussion in:

Fixes #9.

For discussion:

  • Separate directional language-tagged string with rdf:dirLangString datatype, or extend the notion of language-tagged string and continue to use rdf:langString.
  • Bikeshedding terminology for literal element:
  • text direction – From previous proposals
  • base direction – From Unicode bidi basics
  • initial text direction – From Wiki
  • text directionality – From HTML

(As the infrastructure seems to be having issues, you can also view the document via GitHack.


Preview | Diff

@gkellogg gkellogg added i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. needs discussion Proposed for discussion in an upcoming meeting spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature labels Jun 19, 2023
@gkellogg gkellogg requested review from afs, pchampin and iherman June 19, 2023 21:34
@gkellogg gkellogg changed the title Add **text direction** as a forth element of literals. Add text direction as a forth element of literals. Jun 19, 2023
Copy link
Member

@iherman iherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Bike shedding!)

I wonder whether the term "Text Direction" is appropriate. The value of the fourth element does not fully define the full text direction, because that may also depend on the specific Unicode characters in the text and may even vary within the text in the bidirectional case.

We used, elsewhere, the term "based direction". The HTML specification uses the term "text directionality".

My personal choice is to align with the HTML terminology.

CC: @r12a

@afs afs changed the title Add text direction as a forth element of literals. Add text direction as a fourth element of literals. Jun 20, 2023
@afs
Copy link
Contributor

afs commented Jun 20, 2023

See https://github.com/w3c/rdf-star-wg/wiki/Text-Direction-Proposal

It uses "initial text direction" - does that capture your point @iherman?

@pfps
Copy link
Contributor

pfps commented Jun 20, 2023

If text direction is to be supported in RDF let's support it in a general way. Proposals for adding an initial text direction to not meet this requirement as correct rendering of bidirectional text needs internal direction markers.

If there is going to be a partial solution provided in RDF, then I feel that it needs to be backward compatible with existing RDF systems. One way to do this is to use '-x-ltr' and '-x-rtl' at the end of the language tag. This produces valid language tags and is backwards compatible.

@iherman
Copy link
Member

iherman commented Jun 20, 2023

See https://github.com/w3c/rdf-star-wg/wiki/Text-Direction-Proposal

It uses "initial text direction" - does that capture your point @iherman?

Maybe, but it is not ideal. "Initial" suggests some sort of an ordering in time. Although I realize that I am on a slippery slope in terms of English terminology...

I am a bit worried by getting into unnecessary bike-shedding; the reason I proposed to pick the term used by the HTML spec is to avoid that... Part of the community has already picked that term, I am not sure if it is worth picking our own for something that is essentially the same.

spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated
the two <a>datatype IRIs</a>,
the two <a>language tags</a> (if any), and
the two <a>text directions</a> (if any)
compare equal, character by character.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: for language tags, it's case insensitive.

That's a decision from outside RDF - for text direction we can restrict to lower case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We previously say that the value space of language tag is lower case, is it redundant to say to use a case insensitive comparison here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it redundant to say to use a case insensitive comparison here

I think it conforms to Postel's Law, will clearly reflect user intent, and will be better than case sensitivity-based errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was jumping to conclusions. Some investigation: it isn't as simple as "case insensitive".

The text above this paragraph says "MAY lower case" the concrete string that is the language tag.

The text here is about term-equality and does not say it is a value-space comparison. (FWIW "value space" for language tags is a bit meaningless - "value spaces" involve datatypes but we are where we are.)

The RDF 1.1 text:

the two language tags (if any) compare equal, character by character

does allow "abc"@en and "abc"@EN as different terms, whether that was intended or not.

The root problem is that RDF has not used the canonical form for language tags. Some users do care about this.

At users' request Jena has options to leave as-is, always lower-case and always canonicalization.

Maybe better to leave as "character by character" because otherwise it is a implementation change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yes, I see what you mean. Accidental inconsistency is a rarely vanquished hobgoblin, especially across specs developed separately over years. We have our work cut out for us, in trying to bring consistency to all these docs that we're simultaneously trying to upgrade/update.

"Character by character" is at least clear, and we can include a note that advises deployers of the potential need to enact Jena's options — i.e., keep original langtag casing, make all langtags lower-case, or (whatever you meant by "always canonicalization").

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need something to reconcile the notions of a lower-case value (space) for the language-tag and the fact that it's compared character by character (code point by code point?). Are there systems where "abc"@en and "abc"@EN are not considered the same term.

It might say something like "two language tags (if any) compare equal after normalizing to lower case".

The sentence "The value space of language tags is always in lower case" might be changed to "The value of language tags is always treated as being in lower case".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid "value" because that is about datatype literals.

"compare equal as if normalized to lower case".

which does not imply they are converted (the earlier text is MAY).

Are there systems where "abc"@en and "abc"@en
Jena keeps them apart but they are the same term (yes - that's contradictory)

It's about meeting the user expectation of round-trip with no change.

spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
@afs
Copy link
Contributor

afs commented Jun 20, 2023

It uses "initial text direction" - does that capture your point @iherman?

Maybe, but it is not ideal. "Initial" suggests some sort of an ordering in time. Although I realize that I am on a slippery slope in terms of English terminology...

the reason I proposed to pick the term used by the HTML spec is to avoid that... Part of the community has already picked that term, I am not sure if it is worth picking our own for something that is essentially the same.

Yes, it's tricky to get the right terminology.

When we talked to i18n, "text direction" was pointed out as suggesting "all this text" whereas this is meant as in "default".

In the "פעילות הבינאום, W3C" example there are multiple directions.

HTML also has auto which doesn't seem appropriate for RDF.

I am a bit worried by getting into unnecessary bike-shedding;

Yes :-)

@afs
Copy link
Contributor

afs commented Jun 20, 2023

If there is going to be a partial solution provided in RDF, then I feel that it needs to be backward compatible with existing RDF systems. One way to do this is to use '-x-ltr' and '-x-rtl' at the end of the language tag. This produces valid language tags and is backwards compatible.

@pfps In https://github.com/w3c/rdf-star-wg/wiki/Text-Direction-Proposal the proposed syntax is @en--ltr. Not quite the same, but connected to the language tag.

LANGTAG ::=  "@" [a-zA-Z]+ ("-" [a-zA-Z0-9]+)*

RDF 1.2 gives us the opportunity for a syntax change while previous work has operated within the confines of RDF 1.1.

Being separate:
LANG("foo"@en--ltr) is proposed as "en", otherwise it would be en-x-ltr. Looking for "all literals with en continues to work. Similarly, the LANGMATCHES function continues to match (the algorithm is from RFC4647).

@gkellogg
Copy link
Member Author

gkellogg commented Jun 20, 2023

@afs said:

See https://github.com/w3c/rdf-star-wg/wiki/Text-Direction-Proposal

It uses "initial text direction" - does that capture your point @iherman?

Sorry, there were a number of documents on (initial) text direction, and I based the PR off of https://github.com/w3c/rdf-star-wg/blob/main/docs/text-direction.md. We can reconcile the differences. I'm fine with "initial text direction", as that gets to the intent of the element. "Text Directionality" may have a subtly different meaning, as it describes the behavior of a display element, not a property of the text, but we can continue to discuss terminology either in this PR, or subsequently.

@pfps said:

If text direction is to be supported in RDF let's support it in a general way. Proposals for adding an initial text direction to not meet this requirement as correct rendering of bidirectional text needs internal direction markers.

That's not my understanding of how bidirectional text works in Unicode. From Unicode Bidirectional Algorithm basics each character already has its own directionality encoded, it's for cases where character classes are mixed that there is no a-priori way of knowing how to begin rendering the text. After setting off the initial direction, the Unicode algorithms handle any subsequent change in direction. Within that document base direction is used, so perhaps that would be a better term than "initial text direction" or "text directionality".

If there is going to be a partial solution provided in RDF, then I feel that it needs to be backward compatible with existing RDF systems. One way to do this is to use '-x-ltr' and '-x-rtl' at the end of the language tag. This produces valid language tags and is backwards compatible.

RDF Literals and Base Directions did explore extending the language tag, but were ultimately rejected. See 2.1.1 Extend language tag for a discussion.

@afs
Copy link
Contributor

afs commented Jun 20, 2023

https://github.com/w3c/rdf-star-wg/wiki/Text-Direction-Proposal is a write up based on issue #9.

spec/index.html Outdated Show resolved Hide resolved
Co-authored-by: Andy Seaborne <andy@apache.org>
@gkellogg
Copy link
Member Author

Applied some of @afs's suggestions, leaving the others for now pending further discussion.

@iherman
Copy link
Member

iherman commented Jun 21, 2023

Within that document base direction is used, so perhaps that would be a better term than "initial text direction" or "text directionality".

I am also fine with "base direction". Actually, when writing up my comment, my initial instinct was to propose that term but (to my surprise) that is not the term used by the HTML standard, and that is why I fell on the "directionality" side. (No idea why that term was chosen for HTML.) Either way is fine with me.

@pfps
Copy link
Contributor

pfps commented Jun 21, 2023

@gkellogg In https://www.w3.org/International/articles/inline-bidi-markup/ there are examples of strings that need embedded markup for correct rendering. My takeaway is that a solution that only provides a language tag and a base direciton is insufficient. The worst situation, I think, is including identifiers using strong ltr characters in rtl text, as in "[ARABIC TEXT] A7, B8, X" where the order of the identifiers is reversed from its correct order if there is no embedded markup. Note that the language of this text is entirely Arabic - the identifiers are not English or any other language that uses ltr display. I include an example with rtl identifiers inside ltr script.

Here are two identifiers using Hebrew script בבב, אא. The first is בבב the second is אא

@afs
Copy link
Contributor

afs commented Jun 21, 2023

"base direction" works for me.

@TallTed
Copy link
Member

TallTed commented Jun 21, 2023

I am also good with "base direction".

I don't like -x-ltr because it goes down the broken -x road. I am OK with --ltr.

@pfps
Copy link
Contributor

pfps commented Jun 21, 2023

How is -x- broken?

spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
gkellogg and others added 2 commits June 21, 2023 15:45
@gkellogg gkellogg removed needs discussion Proposed for discussion in an upcoming meeting spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature labels Aug 30, 2023
spec/index.html Outdated Show resolved Hide resolved
@pchampin pchampin added the discuss-f2f Proposed for discussion during the next face-to-face meeting label Sep 5, 2023
@afs
Copy link
Contributor

afs commented Sep 6, 2023

Do we need separate discuss-f2f for this PR and the issue, and we are also meeting i18n.

@gkellogg
Copy link
Member Author

gkellogg commented Sep 6, 2023

I was going to include it in the first slot proposed for I18N, along with the Unicode cleanup (if necessary) and discussion of BCP47 case sensitivity when it comes to literal equality, and thus if triples differing in case are the same, or not. I'm separately working on slides for this section.

Copy link
Contributor

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed in the I18N TPAC meeting as prep for tomorrow. Some comments included.

spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated
Comment on lines 796 to 800
<p class="note">The absence of a <a>base direction</a> does not necessarily imply that
the text has no initial text direction;
as described in [[[?UAX9]]],
strings may be embedded within structures which establish an <em>embedding direction</em>,
which determines the default bidirectional orientation of text.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly misleading. The bidi algorithm determines the base direction in any case. And "embedding" is an overloaded term in the bidi algorithm (strings can be "embedded", but "embedding" in bidi refers to stacking bidirectional states...)

I'm not sure what the note is trying to convey. Are you trying to say "if the direction is not provided as metadata, the string can still be rendered"? Generally, what we say is either (a) when there is no base direction provided for a given string, the auto (first-strong detection) direction should be used; or (b) when the base direction is not provided, the direction of the enclosing document (or content??) is used

spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
@gkellogg
Copy link
Member Author

Rebased, after merging in #59.

@pchampin
Copy link
Contributor

This was discussed during the TPAC 2023 meeting:
https://www.w3.org/2023/09/12-rdf-star-minutes.html#t03

spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
@gkellogg gkellogg added needs discussion Proposed for discussion in an upcoming meeting and removed discuss-f2f Proposed for discussion during the next face-to-face meeting labels Sep 25, 2023
spec/index.html Outdated Show resolved Hide resolved
@gkellogg gkellogg added spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature and removed needs discussion Proposed for discussion in an upcoming meeting labels Oct 5, 2023
gkellogg and others added 3 commits October 12, 2023 14:35
* Apply suggestions from I18N review
* Unrelated change not on rdf:HTML and rdf:XMLLiteral datatypes being definitions.

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Addison Phillips <addisonI18N@gmail.com>
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
@gkellogg gkellogg merged commit f299fd0 into main Oct 13, 2023
@gkellogg gkellogg deleted the text-direction branch October 13, 2023 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

base direction
8 participants