-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The parent of 'orphan' should normally be 'conj' but it is 'reparandum' #635
Comments
I admit that when adding the validation rule, I assumed that it would spark discussion and possibly it would have to be loosened. Though the example you give is quite beyond my imagination :-) I have moved the issue to the docs repository because it is about precise interpretation of the guidelines (as most validation issues). The validator just tries to make sure that people can make assumptions about the data, if the assumptions follow from the documentation. The immediate impulse to introduce this rule was when I saw an In my understanding, the |
I whitelisted |
Thanks! The idea of promoting the auxiliary makes a lot of sense in a language like English, where the auxiliary is itself a verb. In the Coptic example, the element in question is really just a functional auxiliary, with no chance of being used as a verb, so it seems a little stranger to promote it, rather than a core argument. In terms of parallels elsewhere in the corpus, orphan is often a dependent of the subject, which get first choice as the argument to promote - for that reason I would be inclined to keep the subject as the head and say that it governs all other dependents of the missing verb - in this case also the tense marker. |
The point is not that the auxiliary can be used as a main verb but that it belongs to the same nucleus (in Tesnière's sense) as the main verb. As long as there is something left of the verb group, we prefer to let this represent the verb group so that other dependents can retain their true dependency relations (rather than "orphan"). It is for the same reason that we, for example, promote determiners to head elliptic noun phrases even though they can never head an ordinary noun phrase. |
Both subject and auxiliary are dependents of the (missing) main verb - in my opinion the question is only which would we rather promote. Either way some information will be lost:
One problem with 1. for Coptic is that auxiliaries aren't present in all tenses, so we would end up with situations in which we have ellipsis and a. we do have orphan, but aux is the head; b. we do have orphan, and nsubj is the head (since there is no aux) and c. where there is no orphan at all. Promoting the subject uniformly seems like a better choice for the data we have. I agree that in languages where the auxiliary is a finite verb (like English) it is more intuitive to promote the auxiliary, but in this case it seems like more information would be discarded, and a very odd government pattern would result ( |
@amir-zeldes : Just a note – |
To come back to the main topic of this issue, non-constituent conjunct are quite common with reformulations. Examples:
If we use |
@sylvainkahane I think that the three examples you gave would be solved without det(good-4, a-3); det(question, a-5); advmod(good-7, very); amod(question, good-7); reparandum(good-7, good-4) case(the, about-4); case(question, about-6); det(question, my); reparandum(my, the) mark(you-4, that-3); mark(go, that-5); nsubj(go, we); reparandum(we, you) |
If E.g., in Latvian saying viņš ēd tos ābolus, ko pirms tam [ēda] tārpi ('he eats the same apples, which where [eaten] by worms before that') is rather plausible. |
Sounds good to me. Added |
And what about other subordinate clauses - |
Well, perhaps all deprels that can mark incoming edges to heads of clauses make the heads technically eligible for outgoing Note that I don't doubt that this actually is ellipsis; but most types of ellipsis are annotated in UD without using the |
Yes, this is an interesting data point that we haven't considered so far. I always consider It seems like this case is a little different since the predicate does not necessarily appear anywhere in the preceding discourse (if I understood correctly) but it still fulfills the criterion of a missing predicate with multiple dependents. So in short, yes, I think using |
In Classical Chinese, very few |
This does not look like a case for One possibility would be to simply attach the conjunction to the remaining verb, i.e., to the left: 學― If you know there is a verb missing, the standard way is to pick one of the nodes that would depend on it, and promote it as the substitute head of the clause. The clause is still connected to its parent node with the relation that holds between the two clauses, i.e., |
Thank you for your comment, @dan-zeman , and I've tried |
The constraint on the parent of orphan leaves me with a bit of a problem for cases like this: opgesplitst in een Vlaamse en een Franstalige partij obl(opgesplitst,Vlaamse) Should we allow configurations like this? Any suggestions? |
I only noticed now that @jnivre wrote: we promote determiners to head elliptic noun phrases even though they can never head an ordinary noun phrase. So is that the solution here? |
Reposting this as I assume no one saw it, since I managed to post and then reopen the thread:
|
I agree based on the Latin example that parent of
I think it should be qui collegit multum non abundavit "the one who collected much did not have too much" In that case, "qui" is just being promoted to cover for the missing relative clause subject (at least I assume that's how it would be annotated, but if that is not the case in the Latin guidelines and "qui" is seen as a matrix argument, then the elliptical version is also not a clause). |
A version with a non-elliptic relative clause would be analysed as follows in our current version of the conversion script: root(abundavit) And surely the elliptic clause must go the same way. I'm not sure what the other Latin treebanks do (do you know, @daghaug?), but wanted to check if there is a general UD policy for headless relative clauses. In any case, I think any clausal head type must allow orphan dependents, since ellipsis is in principle always possible, so if nsubj is allowed for headless relative clauses, nsubj must allow orphan dependents, if it must be csubj then csubj must allow orphan dependents etc. |
To make sure I understand, an attempt at an English analogy:
Is this right? It does seem like a valid use case, since we wouldn't normally promote a subject or object as head of a clause when the predicate is missing. Though it is a pity we can't see that there's a free relative construction in the |
This seems a bit strange to me, since collegit is a verb, so I would have expected csubj
The English translation seems basically equivalent, except that in Latin we have a plain relative pronoun "qui", which is basically like "who" rather than "whoever". So this is more like Shakespeare's "who steals my purse steals trash", with "steals" elided in the first conjunct of a coordination. |
Yes, I can see why. But if we do that, the next question is what to do with object relative clauses, which also occur aplenty. Should they be ccomp? Our converter now has them as obj. (They can of course have ellipsis too.) reddite ergo quae Caesaris sunt Caesari |
So in the PROIEL annotation that free relative clauses are
syntactically nominal because they distribute exactly like NPs and not
like clauses.
In subject position, the csubj/nsubj distinction is maybe not so
important, but free relative clauses occur in other nominal positions
as well.
When they occurr in object position, we would presumably have to label
them ccomp if we take them as clausal. Perhaps not a disaster, but it
would definitely give the impression that some verbs can take
complement clauses when in fact they only take NPs (and free relative
clauses).
Probably the most disturbing case would be the one where the free
relative clause is the complement of a preposition as in
videbunt in quem transfixerunt
`they will look at the one they pierced' (literally `they will look at
whom they pierced)
This is
obl(videbunt, transfixerunt)
obj(transfixerunt, quem)
case(transfixerunt, in)
If we treat free relative clauses as clausal, I guess it would have to
be advcl? And the preposition would have to be considered mark?
on., 12.04.2023 kl. 08.24 -0700, skrev Amir Zeldes:
… > nsubj(abundavit, collegit)
This seems a bit strange to me, since collegit is a verb, so I would
have expected csubj
> Is this right?
The English translation seems basically equivalent, except that in
Latin we have a plain relative pronoun "qui", which is basically like
"who" rather than "whoever". So this is more like Shakespeare's "who
steals my purse steals trash", with "steals" elided in the first
conjunct of a coordination.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Oh, I see the problem. The free relatives are treated as clauses lacking a nominal head, which is different from how we treat them in English: https://universaldependencies.org/en/dep/acl-relcl.html#free-relatives Is it an option to treat the WH-word serving as subject as the head of the clause, and indicate the subject relation in the Enhanced Dependencies? So nsubj(abundavit, qui) |
I think either analysis is possible, and I understand the pros and cons. If this is the normal and only way to do free relatives in Latin, then my gut feeling is that what Nathan is suggesting makes the most sense. We had some similar thoughts in Coptic, but that language is more like English in that most free relatives have an explicit nominalization (something like "the one who"), and the examples with a plain relativizer (something like "who", except it's an indeclinable relativizer) are more rare, so we made those take clausal deprels. But canonically, yes, I would expect free relatives to take nominal deprels, among other things for the reasons Dag outlined above. |
That's right, they are treated differently. The reason is that the case
of the relative pronoun is governed by its function inside the
releative clause. So if it's a downstairs object it would be
accusative, as in "quem vidi, venit" (literally 'whom I saw arrived'),
and it would be strange to take this accusative pronoun as the subject
of "venit" (arrive) rather than the object of "vidi" (saw).
That said, we will preserve the original annotation in our source data,
and we could give it up in the UD conversion for the sake consistency
if there were clear rules for how to deal with free relatives, but the
web page does not exactly suggest that. Basically we are following the
annotation suggested for Czech in (in the case where the demonstrative
is elided).
on., 12.04.2023 kl. 12.04 -0700, skrev Nathan Schneider:
… Oh, I see the problem. The free relatives are treated as clauses
lacking a nominal head, which is different from how we treat them in
English:
https://universaldependencies.org/en/dep/acl-relcl.html#free-relatives
Is it an option to treat the WH-word serving as subject as the head
of the clause, and indicate the subject relation in the Enhanced
Dependencies? So
nsubj(abundavit, qui)
acl:relcl(qui, collegit)
E:nsubj(collegit, qui) - enhanced dependency
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Ah, yeah there's less of a case argument to be made for English since the who/whom distinction is disappearing, though technically "whoever saw me" vs. "whomever I saw" has the same issue I guess—case is assigned by the relative clause. |
I looked at the validation script now, and the permitted head relations for orphans are currently conj, parataxis, root, csubj, ccomp, advcl, acl and reparandum. If we are to continue treating relative clauses like nominals, which I would prefer for the reasons @daghaug lists, a much wider range of relations would have to be permitted (or at least be exceptions for this type of language). Apart from this we also get real examples of ellipsis at least with xcomp and dislocated. xcomp: We have occasional examples of the type "He wanted (to go) to Jerusalem on foot" where it's clearly not the modal verb that takes the PP argument and adjunct. Old East Slavonic example: xočem na smerdy i pogubiti ě ‘we want (to go) after the peasants and kill them' (where an elliptic xcomp is coordinated with a non-elliptic one) |
Can I bring this to the attention of @dan-zeman because we need to know how to deal with this in the conversion? The issue is that the validation rules for orphan enforces particular analyses on other constructions. So for the free relative clauses, we can make them nominal, but only if we take the wh-word as the head, or we have leave the wh-word where it belongs for case reasons, but only if we make them clausal. If there was a UD standard, we'd be happy to go either way, but as long as there isn't, we would really prefer to keep our analysis as is. (I could also give arguments for it, but that's really for somewhere else - I think these are nominalizing constructions, in much the same way as morphology can be nominalizing.) So if the validation rules are not going to change, I think the best solution for us might be to take these sentences out of the converted data set until the status of free relative clauses (and the modal constructions and the correlatives, as mentioned by Hanne) is clarified. But it would be good to know soon what we should do... |
There is currently no UD-wide consensus on free relatives as far as I know, and perhaps they should stay language-specific. As you have noticed, the perspective we take in Czech is different from what people do with the English data.
If we assume that there are two nodes elided in each clause, 1. "the.one", and 2. "gathered", and if we also assume that this does not qualify as (similar enough to) gapping, then qui will be first promoted to the head of the relative clause (thus acquiring the Now getting back to the first option where we did I don't know which of the options outlined above is the best one. But the double ellipsis and double promotion in this example suggests that almost anything can be the head of an |
Thank you, Dan! In the original PROIEL annotation this sentence does of course have empty nodes with argument dependents, that is the point of departure for our conversion. I think it might be nice to reclassify the test as a warning, we certainly found a lot of issues with our ellipsis handling because of those error messages. |
I am sorry to come late here, but I missed that this topic also touches upon some issues in Latin annotation that we addressed in the past months (regarding IT-TB, LLCT and UDante treebanks). @hanneme @daghaug , I invite you to take a look at the documentation pages that I wrote for free relative clauses in Latin. I think that they were not already there in April, but they appeared soon after (we had some internal discussion in our group). Basically, following general UD criteria, we are using clausal relation ( So, taking your sample sentence
the annotation will be as follows:
qui is internally promoted as the head in that it is the subject. I think the validator does not complain here, would it? Or does it just issue a warning? I understand the point that these clauses are acting nominally and very much agree that they should be able to take nominal relations, and think that this should be the future direction for UD's guidelines, but for the moment this is a sensible compromise. In your conversion, probably it is easy to convert an |
In Slovenian we seem to have found a case where an orphaned element also exhibits ellipsis of the clausal head. This leads to an orphaned element attaching itself to another orphaned element and triggers the validation warning "The parent of 'orphan' should normally be 'conj' but it is 'orphan'". The example in Slovenian is given below (with an added English equivalent. The verbs in [square brackets] are added in English to emphasize the words that are not present in the original Slovenian sentence):
Both the main clause of the second conjunct as well as its clausal dependent (the if clause, which would normally be advcl) lack a verb. Thus, we analyze this as orphan(jo-17, jo-12) and orphan(jo-12, dirko) (in English this would correspond to orphan(it-26, it-17) and orphan(it-17, race) with the obj being promoted to the role of clausal head in the former case). Here is a representation of the analyzed structure in Slovenian: There is no other option than to mark the direct object as the promoted clausal head and use the orphan relation, so we believe the validation script should not produce a warning in this case. |
This is an interesting example, maybe we should show it somewhere in the guidelines. I agree with your analysis. That is why what the validator produces is a warning and not an error. (Warnings do not make your treebank invalid.) It probably still makes sense to issue the warnings because cases like this are rare. And the validator can hardly know that this sentence is different from other cases where people attempt to chain two |
In this specific case, since this is an elliptic adverbial clause inside a main elliptic clause, why can |
Because the guidelines say that |
OK, I see this now. Thanks for pointing me to this, I might have to revise some things... but at the same point, I find there is something problematic about it, but it goes beyond this topic. |
Hi - a recent update to the validator creates the error message in the title, however in the Coptic corpus we have an exception that looks correct to me: a reparandum consisting of two dependents whose head is missing. I'll give the example in English for simplicity:
The alternative of saying that both 'they' and 'have' are reparandum is unappealing, because there is a whole interrupted phrased ("they have") which results in a single repair. The option of treating 'they' as the subject of 'have' is not available in Coptic, since the equivalent of have is a past auxiliary which never takes the subject directly (it is
aux
). Basically there is a missing verb that would have dominated "they have", so in its absence we've promoted the subject, and treated the auxiliary as an orphan.Any suggestion is appreciated, but if there isn't a good reason to reject orphan I would suggest allowing
reparandum
as a parent oforphan
.The text was updated successfully, but these errors were encountered: