Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ул/Бул #13

Open
mansayk opened this issue Dec 9, 2018 · 3 comments
Open

Ул/Бул #13

mansayk opened this issue Dec 9, 2018 · 3 comments
Labels
disambiguation enhancement New feature or request

Comments

@mansayk
Copy link
Member

mansayk commented Dec 9, 2018

Is "Ул/бул" parsed correctly here:

echo "Ул ташламас сине." | apertium-destxt -n | lt-proc -z -w 'apertium-tat/tat.automorf.bin' | cg-proc -z 'apertium-tat/tat.rlx.bin' | apertium-retxt

^Ул/бул<v><tv><imp><p2><sg>/ул<prn><pers><p3><sg><nom>/бул<v><iv><imp><p2><sg>/ул<prn><dem><nom>$ ^ташламас/ташла<v><tv><neg><gpr_fut>/ташла<v><tv><neg><fut><p3><sg>$ ^сине/син<prn><pers><p2><sg><acc>$^./.<sent>$

@IlnarSelimcan
Copy link
Member

The бул v iv analysis was added to deal with the 19th century corpus texts I've been working on, i.e. улмак = булмак which shows up in them quite frequently. I think the way to go here is to mark all such archaic words with some flag and prune them while compiling unless the user specifies a compilation flag which keeps them.

@mansayk
Copy link
Member Author

mansayk commented Dec 9, 2018

I understand. Unfortunately, in my case this one is even chosen after disambiguation.

@mansayk mansayk added the enhancement New feature or request label Jan 16, 2019
@jonorthwash
Copy link
Member

@IlnarSelimcan, it would probably be fairly straightforward to write a disambigution rule to deal with some of these.

Alternatively, sometimes it can make sense to just treat things like бул-/ул- as synonyms, and deal with them as such in later stages for translation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disambiguation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants