-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constructions around colons #933
Comments
https://universaldependencies.org/u/dep/appos.html explicitly covers key-value pairs—where there is basically structured data rather than linguistic structure. Sports scores are arguably an instance of key-value pairs (like they would be displayed on a scoreboard). I'd probably handle (3)–(6) the same |
Agree that (1) headings should be |
I think the cases described there are of the more straightforward kind, where both the key and the value are nominals. This works well for cases like 2., but for things like "throws: right", we have several rather counter-intuitive consequences of using I can certainly apply an
At least for 4. I have a problem with this, since I think the first term is easily omitable and therefore should not be the head. It's really just a modifier explaining the notation system from which the head comes:
So I think it should be ?deprel?(IFN, IATA)
This was also my gut reaction, but I noticed that these particular cases actually do fulfill the normal |
I think the rule of key-value pairs is a special case that overlaps with but extends beyond the substitutable nominal cases. For example, if a list of business hours had:
That is not really a standard apposition construction, and "closed" is in no way equivalent to "Sunday", but it is a list of key-value pairs connected by
I think there are two readings. One is that it is a key-value pair occurring in a list, in which case see above. Another is as an elliptical sentence, in which case perhaps the colon could be ignored in favor of advmod(bats, left) or similar.
I suppose there is a pattern of the form CODE: EXPRESSION, where by "CODE" I mean the name of a language or other notational system, and the use of the colon is motivated by the key-value nature of this construction. I would say "EXPRESSION" functions as a metalinguistic mention in this pattern. In the standard apposition construction with substitutability it would be a referring expression rather than a metalinguistic mention.
Even if in some cases there is semantic equivalence between the two parts, I think this is a broader heading-subheading construction, so |
OK, I don't find any of that unreasonable per se, but I do want to hear what others think about this. I feel somewhat strongly about the 'code' cases having the content as a head, since it really seems like the language/code name is an optional modifier there. For headings I could be persuaded to do
Yes, this is basically what I'm struggling with in those cases - but then how do we know we are looking at such a case? Some seem more blatant than others. Especially when the items are not tagged as nominals, I find |
I studied this phenomenon on the French-GSD corpus annotated in SUD but the results are easily transposable to UD.
|
Thanks Guy, those are good examples.
I would not expect this type of colon to be standardly used within an (unquoted) subject, so I don't think it's same thing as ordinary syntactic adnominal modification.
In my view, one function of the colon is for key-value pairs that are external to a grammatical sentence, typically in a list (or metadata section of a document). Sometimes the colon can be interpreted in multiple ways, but if it's a standalone fragment with no predication (and not a title or foreign phrase), I would go with the key-value interpretation. |
Thank you both for the examples and discussion - do I read the sentiment correctly that there is support for:
? If these positions are supported, this leaves open:
|
Regarding the last proposals of @amir-zeldes, I agree with the first three points, although for pairs (key, value), the alternative in terms of predicative complement is admissible. For example, as @amir-zeldes initially proposed, we can have e-mail/nsubj: turismo@merida.gob.mx/root because we can say that e-mail is turismo@merida.gob.mx. Regarding the last three open questions, I take them point by point:
|
Maybe my example of "day: hours" combinations is confusing because temporal modifiers can often be expressed without a preposition (or there is an obvious preposition that can be inferred). Suppose instead I am giving a list of team assignments for students:
There could be many possible ways to rephrase these as sentences ("John is in Group 1" or "John belongs to Group 1" or "Group 1 includes John" or "The assignments include John in Group 1" etc.). But those paraphrases are not due to the colon notation—they're from world knowledge. The colon notation merely maps keys to values. Does it make sense to treat these key-value pairs as headed nominal constituents? If so, would the name or the group be the head? It's not obvious to me. These are definitely not |
In case of lists, such as:
I think we should avoid purely semantic arguments trying to recover a possible paraphrase with words. If we only look to what we have, we have couples of NPs in a list. This is very similar to what we have in gapping coordination: |
I see where you're going here, but I always thought orphan was restricted to certain predicate ellipsis constructions. Here it is just a syntactically loose connection between two elements. There's a discussion of the scope of |
I think For the 'loose connection' interpretation, I think I would also prefer |
Sorry, if I insist, but it our choice to decide what is the extension of
I think we have the same construction in these 4 cases and would like to have the same relation.
|
There is a broader disagreement here: I am of the opinion that there is a divide between syntactic relations and other textual relations that might be signaled with punctuation etc., but are not really grammatical structures as they would be in spoken language. Not sure if the UD guidelines address this directly. I will open up a separate issue to try to articulate this and see what people think. |
@sylvainkahane I'm not sure I agree with equating all of those examples since, as Nathan pointed out, we can have a variety of possible paraphrases:
However I do agree that we should ideally distinguish predicative cases from nominals, which we should(?) be able to tell apart. For predications, no matter how they are expressed, I think |
@amir-zeldes It is exactly because many paraphrases are possible and thus we cannot even say which of the two phrases is the predicate and which is the argument that we can only say that the two phrases form a clause together: John and Group1 are associated in a particular way. We need a relation for that. For me, |
Maybe French is more permissive with these constructions than English. The cleanest case of gapping is like:
If the first element of the list is outside of the coordination, it is more marginal (leaning on focus) and might call for commas:
While (1) is a clear case of The answer-to-question example and the comparison example remind me of (2). I would want to transcribe these with a comma, and regard the speaker as being extremely terse—whereas (1) is a well-established construction licensing the predicate omission. All of those are arguably distinct from a (non-coordination) list of colon-separated pairs, which is a convention of writing or calling out items in sequence as opposed to constructing a grammatical sentence. It would be slightly unconventional to use colons in place of the commas in (2), and definitely unconventional in (1). |
I was reading all these, and getting curious. Would using punctuation in a text and in each language be different or the same? In these trackers, there are English and French samples using a colon. Would a colon be used the same in both languages? I have been annotating several Thai texts for the Thai treebank, and I saw several texts and headlines using a colon and/or another punctuation more and more. It is quite challenging to annotate them with UD because the way they are used in the Thai texts might follow the way they are used in English rather than in Thai. This has caused me difficulty to choose which relation should be, and even to understand what they mean as used. I checked the principles of using punctuation set by the Thai language authority. A colon in Thai is used in 3 ways:
I am not sure if a colon used in other languages would be used the same as the one in Thai or differently. I mean, in terms of writing English by a non-native speaker, when I use it in an English text, I need to check how they are used in English and what they mean as used. So what I am quite curious is when we annotate a text with a colon and/or another punctuation used in each language, should we also consider how it is used in each language and also what it means as used in context? And if yes, would this impact on choosing the UD relation as annotating a colon with 2 parts separated and/or another punctuation? FYI, in Thai, punctuation is not used frequently in a written text, and actually it is rarely used. Whitespace is not applied between words as like English and French. Plus, there are no capital letters. My point here is that using a colon and/or another punctuation in one language is influenced by the way of another language could cause difficulty to read them used in a text and also hesitance to annotate them, as like I am having with the Thai texts which I have been annotating. And I was wondering which part around a colon should be the head. This is from the statement that @nschneid mentioned above:
His samples are nominal constituents which, I agree, are not obvious. What about the 2 parts around a colon are not nominal constituents, but a verbal phrase and a sentence separated by a colon? Should the part before or after a colon be the head? I am having this difficulty to decide the head in my annotated Thai texts too. I cannot decide which one should be the head and which word should be the root. |
Just coming extremely late, but (mostly agreeing with the remarks/proposals by @sylvainkahane) I would like to add:
|
I don't think that's true without exception - sometimes a string is syntactically ambiguous and the colon can point to one analysis (I intuitively parse a "court martial" as an NP, but if it's "court: martial" it becomes something else)
Using a special subtype is indeed an option, but it would be a rather rare subtype so I'm not enthusiastic about doing it for English (and the string expl maybe makes people think of the expletive label, which is separate) |
Yes, it points to something, and it can be a clue for some of those things that we can very difficultly represent in writing (e.g. prosody), but I meant it cannot be a decisive factor, nor mechanically applied!
From our data it is not rare at all :-) |
There are a lot of possible configurations in which a colon separates two parts of an orthographic sentence. Currently there is a wide range of analyses for these, and I'm not sure how many distinct kinds should be recognized or how they should be annotated. Here is a brief overview of a few recurring types and how they are currently annotated in UD_English-GUM: (I have simplified some of the examples and reduced the head to 'root' even when embedded in something else, so please understand
root
to mean 'local root')appos
; should maybe beparataxis
?)/xcomp
: Study/appos
in Methodology"/appos
and Passion for Long-term Goals"appos
?), or just a single appositive set apart by colon/appos
and Kwa/appos
/root
: L'instant/appos
, Caraibes and Latin Club/advmod
; Throws: Right/advmod
(of a baseball player who bats with the left hand but throws with the right)/nsubj
: 1/root
201/flat
944-3737/flat
/nsubj
: 0590854959/root
/nsubj
: turismo@merida.gob.mx/root
/xcomp
(of the name of the Polish city Łódź)/dep
: Αθήνα/root
, Athína/dep
: IFN/root
(specifying an airport's code in the IATA system)/nmod
: Eusebius/root
Sophronius Hieronymus (nmod
is probably wrong here, other cases all seem to bedep
)/root
: Danny/dep
Newman/root
: https://xyz.com`/dep`/root
: Miharris/dep
/root
: 1/dep
; Scientology/parataxis
: 0/dep
Are there any thoughts on which of these might be distinct/the same, and what the correct analysis should be? They are not 100% consistent in GUM, but most examples of each of these types do follow the pattern above (probably because annotators searched for existing cases and followed majority behaviors). I feel especially torn about ones like 3. above, where the content relation clearly saturates some valency relation (incl. manner adjuncts like 'throws right'), but the marking is unusual (esp. for subject-predicates).
The text was updated successfully, but these errors were encountered: