-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base direction #9
Comments
One for the full WG! Datatypes for language tags were discussed quite a lot in RDF 1.1 while discussing One factor considered was that the additional features of The fact that sub/super datatypes (XSD derived types) don't work for these features (derived-datatype is not compatible with parent-datatype if it is written in a different script). To quote RFC3536: "[UNICODE] has a long and incredibly detailed algorithm for displaying bidirectional text." |
There was quite a bit of discussion in the JSON-LD group about this, notably including @dlongley, @iherman,, and @r12a. There were actually two "experimental" options suggested, both with tradeoffs: Generally,
In the context of the JSON-LD algorithms, this was addressed by normalizing the language to lower-case, which is not a general solution, and has been somewhat confusing previous RDF specs and implementation, IMO. We could conceivably allow the use of literals with both an explicit language and datatype, with some semantic restrictions on the datatype, a solution not available to JSON-LD 1.1. If we were to create sub-properties of @prefix ex: <http://example.org/> .
@prefix i18n: <https://www.w3.org/ns/i18n#> .
# Note that this version preserves the base direction using a non-standard datatype.
[
ex:title "HTML و CSS: تصميم و إنشاء مواقع الويب"@ar-eg^^rdfLangStringRTL;
ex:publisher "مكتبة"^@ar-eg^^rdfLangStringRTL
] . This is already suggested under RDF 1.1 descriptions for literals, if the restriction of being strictly equal to
The restriction would be that if a literal contains both a language tag and a datatype, the data type must be
JSON-LD cites Strings on the Web: Language and Direction Metadata for the more extended discussion of the complexities and limitations of text direction in UNICODE in its informative section on Base Direction. Note that such a change would be backwards compatible with the JSON-LD 1.1 text direction options, but would allow it to be replaced by a normative statement in the future, while noting the previous experimental usage. |
+1 to @gkellogg's proposal. However, I have one concern:
This introduces the concept of subDatatype (not subProperty, I guess) inside of the core RDF interpretation and not only in a datatype-aware interpretation. Implementing this constraint in practice seems hard to me (the parsers would need to get access to a datatype hierarchy...). Or we can see it as a "deductive" constraint i.e. if a datatype is used with a language tag then it is a sub datatype of |
I'd like to see the range of possibilities enumerated. For example - an extension to language tags (which reduces the impact of literals how having a lang tag and a datatype - something that can break toolkits (an issue considered at RDF 1.1)). Concretely - what about the script and variant subtags? The JSON-LD solutions cover transmission of information about text which is the basic and important task. One question to address is does this need to be defined as a conceptual change to RDF? The compound literal approach does not; a common vocabulary ("rdf:") would be useful. A datatype defines a value space which in turn gives value-equality. XSD datatypes have facets - is that the right approach for this issue? (related to compound literals) |
@Tpt said:
It seems that @afs said:
Yes, if we were to say anything it would be that literals with a language tag are more specifically enumerated: A literal in an RDF graph consists of two or three elements:
Yes, these are good questions, as text direction has not been considered in RDF before. Does
I don't think so, or they have exactly the same facets (ordered, bound, cardinality, numeric), or anyway, XSD doesn't describe such a facet. It's really about signaling the direction to be used by viewers. It would have been great if Unicode could have included this, but it is only considered in a limited way. Rather, text direction is an additional property of literals having this datatype, used to signal how viewers should display the result, so in that sense, it is an additional facet. HTML also allows The inference considerations are definitely where this gets to be tricky. I don't think we'll be able to solve this until there's a more concerted discussion involving the I18N group. But, in my interpretation, it's a requirement of specs to consider this now. |
Strings on the Web: Language and Direction Metadata recommends the 4.2 Metadata approach. This is compatible with RDF 1.1 and RDF 1.0.
c.f. adding units to literal values. The section 4.7 Create a new bidi datatype uses only a datatype, not a language tag. (Example 7 shows this. The text isn't completely clear - and it is only considering JSON.) This is compatible with RDF 1.1 and RDF 1.0.
There is a new possibility in the metadata style (addressing the verbosity concern) using RDF-star annotation syntax.
c.f. adding units to literal values. This is not compatible with RDF 1.1 or RDF 1.0. (Beware that the "Strings on the Web" document is not completely consistent in the use of terms like "plain literal", which is from RDF 1.0, and mixes it up with rdf:PlainLiteral which should not be found in any RDF 1.1 syntax ("typed literals with rdf:PlainLiteral as the datatype are considered by this specification to be not valid in syntaxes for RDF graphs or SPARQL.") See also https://w3c.github.io/i18n-discuss/notes/i18n-action-612.html |
The The Using an annotation to define this is interesting, and doesn't really create issues with JSON-LD 1.1, as the two different mechanisms explored were experimental/non-normative. However, it can over-conflate the use of annotations, where you might want to both say something about the statement, and the particular object representation. The proposed solutions from https://w3c.github.io/i18n-discuss/notes/i18n-action-612.html need some work, but are generally consistent with the mechanism I outlined above. |
My apologies to be a bit on the sideline in the discussions right now; this is due to some personal circumstances. Two comments, though
Personally, the approach proposed in this thread (which is also documented in that draft) seems to be clean and it works. +1 on my side. |
RDF Literals and Base Directions is still relevant and it describes the value space so some questions can be answered. I would add that any solution that encodes information in the lexical form (different from the lexical space) has a backwards compatibility problem. What is the length of the string? How much code out there uses the programming language string and length function, SPARQL example being If a system has to microparse the lexical form, even if it is not a technical change in RDF syntax, then pre- and post- system will come up with different answers on the same data. We do have the option of extending language tag syntax ( Given the spread of RDF, old systems/new data as well as new systems/old data need to be considered. |
Not late - it isn't going to be decided on this issue - it needs to go to the wider WG and public WG mailing list at least, the sooner the better. |
As an aside, it is worth mentioning that in N3, we prefer the |
Piggybacking direction metadata the language tag is attractive for backward compatibility reasons. Now, @afs points out:
So we might change RDF concepts, by saying "a language tag is a string for the form
The second point is challenging. If Y was defined to be one of Looking at the BCP47 grammar, there are several options to fall our of BCP47 while remaining in Turtle's regexp :
none of this is super user-friendly... Or we might just bite the bullet, and use |
Info: Checking RFC 5646, it seems that the script subtag is discouraged where it is unnecessary (section 4.1). |
To add to the choices: A language is 2*3ALPHA (RFC 5646). A starting element "d-rtl-" can be used to add the information.
agreed - that is a cost with all backwards syntax-compatible solutions. |
It doesn't look to me like Trying to maintain compatibility with 1.1 LANGTAG doesn't really help, other than to keep older parsers from having to change, except it now looks like an odd language with text direction encoded, rather than being separated as a different facet. Do we really expect older systems to do the right thing with text direction? Changing the terminal to properly separate them also helps be sure that older systems will not incorrectly parse data that they can't handle properly, which is part of creating an extension point for such features. JSON-LD 1.1 had a similar problem when introducing new features, as we hadn't considered a versioning system in 1.0. Are we trying to maintain syntactic compatibility with 1.1 languages to shoe-horn in text direction, or taking advantage of the need to revisit the grammars by properly separating the concepts? Considering alternatives: Original i18n datatype: ex:publisher "مكتبة"^^i18n:ar-eg_rtl Syntactically separate language tag from text direction: ex:publisher "مكتبة"^@ar-eg^rtl Combine in language tag: ex:publisher "مكتبة"^@d-rtl-ar-eg |
If we aren't aiming for compatibility, an extended language tag related extension (new syntax) works better. Use case: |
It's something to consider. The "right thing" may be passing information along which is doable, at some cost, with all syntax-backwards-compatible solutions. e.g. Data published end-to-end (the ends being text-direction capable), through systems such as an RDF 1.1 triplestore or validated by RDF 1.1 tools (SHACL). Consider SHACL Depending on the timescale you expect the transition to new syntax to happen, there may be a significant length of time when clients have evolved, but the data path has not depending on industry domains. It may even accelerate the uptake of direction aware client software; evolving client and server at the same time is hard. The wiki says:
For quoted triples. What about systems uninterested in quoted triples but interested in text direction? (A variant of "weak compliance".) |
I think a note that the However, syntax aside, the abstract syntax description of language-tagged literals would need to be updated to accommodate datatypes other than Alternatively, if we deem that the cost of updating the abstract syntax and related concrete syntaxes to support text direction in the data model, that leaves either the I favor taking the plunge to update the abstract and concrete syntaxes to fold this into the data model. I'm wary if an option-soup to signal compliance, but the fact that non-compliant systems would fail to parse documents, in whatever form, that specify the text direction as part of a literal is a form of signal. Adding some facet in the test suite would help vendors not fully supporting these features to filter through them. In any-case, informative notes for alternative ways of encoding text direction, similar to what's in JSON-LD, would remain useful during a transition period. |
It seems to me that the option |
What is "this" as far as concrete syntaxes are concerned? By data model, do you mean the new datatype with value space of pairs or becoming part of section 3.3? I prefer the "lang + direction" approach as separate aspects of a literal as part of 3.3. If For FPWD, I think we should say "we're working on it - see issue 9". |
The "but ... fit RDF 1.1 constraints." didn't make it to the wiki/doc. |
We seem to be having the active discussion here, and I think that once we've reached some consensus we can synchronize that page.
I was thinking of the additional datatypes that would describe language-tagged strings with text direction as having the least impact on other implementations.
By this I presume you mean that a language-tagged string might have a fourth element, in addition to the language tag and the language might be something like the following:
This can indeed work, but may have more impact on triple stores than extending the datatype. In the end, I'm agnostic as to which approach is better.
We already do, see https://w3c.github.io/rdf-concepts/spec/#issue-container-number-9. |
It is a list of two options. The text above is i18n only. On this issue, Up-issue, For me, i18n datatypes are least attractive option for a permanent solution because it cuts off literals with text direction off from current language tags (noting that questions about the effect on LANG and LANGMATCHES have not been responded to). We then need to work on how to make uses like skos:prefLabel work which aren't having uses write code for both cases. |
I’ll update the content to just refer to the issue without describing the alternatives in the body. |
Yes, displaying text that has bits with different directional characteristics can be difficult, but RDF language-tagged strings don't allow internal markers. The question is whether having an external direction marker changes the meaning of the string. If not, then direction markers are outside the scope of RDF language-tagged strings. |
Hello, Peter!
I would not be so bold as to use pejorative wording. “logical order” is a technical term which appears 6 times in Unicode Standard Annex #9 – Unicode Bidirectional Algorithm (https://unicode.org/reports/tr9/).
More importantly, I cannot agree with your prediction of what would be displayed for ltr and rtl directions respectively. Such displays would be useless and non-conformant with the Unicode standard, please see the document referred above. Since this document is quite long and somewhat arduous, introductory text about ltr/rtl considerations may be found in many W3C publications, such as “Unicode Bidirectional Algorithm basics” (https://www.w3.org/International/articles/inline-bidi-markup/uba-basics).
I thus maintain that there are use cases where the direction is not cosmetic sugar (like bold or italic) but can affect the meaning and/or is essential for a coherent reading of the text.
Shalom (Regards), Mati
From: Peter F. Patel-Schneider ***@***.***
Sent: Thursday, February 16, 2023 1:57 PM
To: w3c/rdf-concepts
Cc: matial; Comment
Subject: Re: [w3c/rdf-concepts] text direction (Issue #9)
I don't think that this is the difference between rtf and ltr.
"12345 IS YOUR PHONE ***@***.*** <https://github.com/ar> ^ltr would be displayed as 12345 IS YOUR PHONE NUMBER.
"12345 IS YOUR PHONE ***@***.*** <https://github.com/ar> ^rtl would be displayed as REBMUN ENOPH RUOY SI 54321.
There is no way of getting either of the displays you state using only rtl or ltr, no matter what the language.
In any case there is no notion of '"logical"' order here. Please don't use pejorative wording.
—
Reply to this email directly, view it on GitHub <#9 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCX4JLOGJRHXV26PNTAZLDWXYIY3ANCNFSM6AAAAAAULW6FFI> .
You are receiving this because you commented. <https://github.com/notifications/beacon/ADCX4JKAYQZEJODBRAQ4LMDWXYIY3A5CNFSM6AAAAAAULW6FFKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSVNF264.gif> Message ID: ***@***.***>
|
I'm not saying that there are not lots of complexities in correctly displaying text. I'm just saying that a simple ltr or rtl around strings that are not parts of larger text are not helpful and likely to be misunderstood. After all, all that they can produce are the two outputs I provided - a left-to-right output and a right-to-left output. As all that language-tagged string in RDF provides is strings in a singular language there doesn't seem to be any utility in providing an ltr or rtl flag. If one wants to display multi-language text, or even text in a single language that has opposite-direction parts, a more complex mechanism is needed. As far as "logical" goes, I see 'logical order' in the Unicode documents but nowhere do I see '"logical" order'. Enclosing a word in double quotes without any indication of what the quoting means can have negative connotations, so much so that there is even a phrase for the practice - scare quotes. https://en.wikipedia.org/wiki/Scare_quotes |
Largely because it covers many of the questions and arguments raised above, I give you Unicode, Inc.'s Writing Direction and Bidirectional Text FAQ which itself references relevant W3C tutorials and articles. As with many topics that appear simple at first glance, this is actually a very complicated subject, with complicated answers. @pfps is correct in asserting that "a simple ltr or rtl around strings that are not parts of larger text are ... likely to be misunderstood" though I disagree with his unqualified assertion that such "simple ltr or rtl" are "not helpful", as for single words or simple phrases, they can be very helpful. Some "strings" in RDF are such single words or simple phrases, and the LTR vs RTL question has been largely considered irrelevant, in so small part, in my opinion, because people who use RTL languages are already used to being treated as second-class (if that) in the mostly English and Western European LTR Internet "world" of the "World Wide Web". However, many RDF "strings" are actually full or even multiple paragraphs (which is one of the reasons Turtle makes it easy to embed new lines, though there remains a prejudice against "open" spacing — that is, displaying a larger space between paragraphs than between lines — in Web-involved rendering, even though typesetting examples over hundreds of years show that either a first-line indent or such a larger vertical gap are more than common, and help greatly with readability). It does appear that (as suggested by @iherman) |
I've updated the issue description. Hopefully, that will make it into the online draft. |
Drawing on the discussion here, I've put a proposal on the WG wiki. Any work on this area will affect several documents. The idea is to get WG agreement for work on the area, which is easier with a concrete design to focus on, before investing time on text across several documents. https://github.com/w3c/rdf-star-wg/wiki/Text-Direction-Proposal I'll edit the wiki page to keep it current as discussion here happens. |
I'm trying to figure out what the benefits adding a fourth element to language-tagged literals. I wasn't coming up with anything so I decided to write down some of my questions and what I came up with as answers. This didn't help me see benefits so I'm putting them into this discussion. TL;DR: I don't see that adding a text direction to language-tagged literals achieves anything significant and there are better ways to represent text direction in RDF. Q: What problem is adding a direction marker to RDF language-tagged strings trying to solve? Information about bi-directional display of text within RDF. Q: Who is requiring that this problem be solved? Unknown. Q: Should RDF solve this problem? Unknown. Q: Is this problem within the scope of the working group? No. "Adding other improvements or extensions to RDF or SPARQL" is explicitly outside the scope of the working group. Q: RDF is about meaning. Does a direction marker change the meaning of a string, divorced from any display considerations? No. Q: Does adding a direction marker to RDF language-tagged strings solve the problem? No. Bi-directional display of strings requires changing direction within a string. Q: Does RDF already have facilities for solving the problem? Yes. RDF has rdf:HTML and HTML has the 'dir' attribute. The proposal already suggests using rdf:HTML. Q: Will documents have to change? Many working group documents will have to change: Concepts, syntaxes, Semantics, Query, and more. Q: Will implementations have to change? Yes. Implementations of almost every part of RDF and SPARQL will have to change, from syntax to semantics to storage to querying to update. Q: Will applications have to change? Yes. Text direction will affect the results of SPARQL queries. Q: Are there better ways of solving the problem? Yes. If rdf:HTML is deemed unsuitable it is possible to create a vocabulary for text direction or a dataype for text that includes direction markers. |
@pfps: We are not operating in isolation. The context has changed from RDF 1.1:
If the approach is the option that extends
for more detailed control of displayable content. Not all display is HTML; if the app wants detailed control, it would be output format specific. Initial text direction is carrying information around separate from output format.
Any data that includes text direction information affects application use of SPARQL; there is an impact from the non-syntax approaches as well. With new dataypes based approaches, there would be two different ways to have language information. In some variations, the length of a string is changed. With compound literals, finding blank nodes where a literal term is expected (c.f. SHACL) is an impact on applications, and in SPARQL, following the extra triples mean queries need to be modified. So staying within RDF 1.1 can have more impact on applications. |
@pfps said:
Any person or group that regularly deals with presentation of text in different directions. Quite a bit of the world, actually.
RDF is used as a data representation format that is often used for creating user interfaces; probably more with JSON-LD than other formats. Allowing the initial text direction to be added as a literal facet (of some kind) preserves this information that is important when presenting this data back to users.
This is subject to interpretation, and arguably necessary to address reasonable (and long-standing) internationalization considerations that were not obvious or not given enough weight during previous design cycles.
Given that without the proper use of the initial text direction a presentation would be at least confusing, if not harmful, then I think it absolutely changes the meaning of the string.
Unfortunately, BiDi didn't go far enough, and while you can signal text direction change within a string, it does not work properly at the beginning of the string. Things would be so much easier (in retrospect) had Unicode supported this.
JSON-LD introduced the i18n namespace to address this issue, and it seems to have had some uptake in the community. The problem is that it does not work properly with the language facet of a literal as it encodes both language and text direction as a datatype.
Definitely a big consideration.
That could depend on the nature of the change. But, if done as a separate facet, or as a sub-type of
Applications that do not currently consider text direction may have no need to adapt if that is not something important to them.
Considered and rejected previously by the JSON-LD WG for a number of reasons. |
This can't be the case. If RDF isn't being used by then there is no problem to be solved. I'm now unclear as to what the problem is that is supposed to be solved. Is it providing information about display of text in general, as it appears to be from the answer above? Or is the problem something different? |
I am concerned that the conversation @pfps had with himself is naturally skewed to English users, who are generally over-represented in Internet and Web technologies.
Or perhaps the reason they are not using RDF is that there is a problem which is only surmountable by addressing this issue in a suitably generalized fashion. I think we need more substantial input (optimally, full participation) from someone(s) who use rtl or otherwise non-ltr languages, whether those folks can be recruited from existing i18n groups or elsewhere. |
At the RDF star telecon (2023-05-11) https://www.w3.org/2023/05/11-rdf-star-minutes.html#r02
|
I do, though so far occasionally as a learner. I want to express support for https://w3c.github.io/rdf-dir-literal/#script-subtag. I fail to see how writing direction can be divorced from writing system, which can already be specified as part of the language tag by way of script subtag. Furthermore, I am not sure whether the inclusion of
If I understood that sentence correctly… There are many reasons that make RDF daunting: a graph is not how most people intuitively tend to think about information and knowledge (legitimate barrier), the tooling is lacking (solvable with time and effort), but inability to specify text direction does not strike me personally as one. I am somewhat enthusiastic about adopting RDF/JSON-LD in some tooling I work on. I believe RDF tooling could take into account the script when rendering data and facilitate script selection at the authoring stage, whereas allowing mixed signals (specifying a writing system through language tag and then giving a contradictory direction marker) seems liable to introduce more uncertainties as far as tooling implementation. |
* Add **base direction** as a forth element of literals. For #9. * Add a note on UAX9 determining a default text direction. * Define "directional language-tagged string". * Indicate that a plain literal has no explicit base direction, in addition to having no datatype or language tag. * Remove suggestion to format language tags based on BCP47 rules and for comparing language tags after normalizing to lower case. * Apply suggestions from I18N review * Unrelated change not on rdf:HTML and rdf:XMLLiteral datatypes being definitions. --------- Co-authored-by: Andy Seaborne <andy@apache.org> Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com> Co-authored-by: Pierre-Antoine Champin <github-100614@champin.net> Co-authored-by: Addison Phillips <addisonI18N@gmail.com>
A possible issue for RDF 1.2 is to standardize on a solution for the base direction of strings.
This would possibly include updating to the Abstract Syntax and associated changes to the various Concrete Syntax specifications.
See RDF Literals and Base Directions for possible options.
JSON-LD introduced features for specifying the text direction. These included experimental features compatible with RDF 1.1:
i18n namespace, and rdf:CompoundLiteral.
See the issue for the discussion of further options, and the Working Group page for further discussion.
The text was updated successfully, but these errors were encountered: