Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS: '8' disambiguert som ordenstal #14

Open
snomos opened this issue Jan 24, 2022 · 17 comments
Open

TTS: '8' disambiguert som ordenstal #14

snomos opened this issue Jan 24, 2022 · 17 comments
Assignees
Labels
bug Something isn't working

Comments

@snomos
Copy link
Member

snomos commented Jan 24, 2022

echo Sáme jahke, man birra dánna giehtoduvvá, le gåjt edesik juoga mij manná \
birra ådåsis ja ådåsis tjadá 8 jábe jagev birra. \
| tools/tts/modes/trace-smj-txt2ipa.mode

Resultat:

"<8>"
        "gáktsa" Num Sg Nom  "kɑːktsa"phon
                "8" A Arab Ord Attr CLBfinal <W:0.0> SELECT:1385:Arab SELECT:2092 SUBSTITUTE:4019 MAP:1326:>nAttr @>N #20->21 SETPARENT:866:SetModToN SUBSTITUTE:1428:smjRemove "8"phon
;       "8" Num Arab Sg Ela Attr <W:0.0> SELECT:1385:Arab SELECT:2092
;       "8" Num Arab Sg Gen <W:0.0> SELECT:1385:Arab SELECT:2092
;       "8" Num Arab Sg Ill Attr <W:0.0> SELECT:1385:Arab SELECT:2092
;       "8" Num Arab Sg Ine Attr <W:0.0> SELECT:1385:Arab SELECT:2092
;       "8" Num Arab Sg Nom <W:0.0> SELECT:1385:Arab SELECT:2092
;       "8" Num Sem/ID <W:0.0> SELECT:1385:Arab

Dvs den einaste lesinga vi ikkje vil ha, er den som står att. Eigentleg forstår eg ikkje at vi får Ord-lesinga i det heile - det er jo ingen punktum etter 8.

@snomos
Copy link
Member Author

snomos commented Jan 24, 2022

Dette er den reine morfologiske analysen:

"<8>"
        "8" A Arab Ord Attr CLBfinal <W:0.0>
        "8" Num Arab Sg Ela Attr <W:0.0>
        "8" Num Arab Sg Gen <W:0.0>
        "8" Num Arab Sg Ill Attr <W:0.0>
        "8" Num Arab Sg Ine Attr <W:0.0>
        "8" Num Arab Sg Nom <W:0.0>
        "8" Num Sem/ID <W:0.0>

@snomos
Copy link
Member Author

snomos commented Jan 24, 2022

Det er bra om du ser på disambigueringa. Eg skal sjå om eg finn ut kvifor ordenstalanalysen i det heile kjem. Han burde ikkje.

@snomos
Copy link
Member Author

snomos commented Jan 24, 2022

Eg skal sjå om eg finn ut kvifor ordenstalanalysen i det heile kjem. Han burde ikkje.

No fann eg plassen:

https://github.com/giellalt/shared-smi/blob/882510395e905af4ec96d1c5529bf4eff27aa0c9/src/fst/stems/arabic_roman_digits.lexc#L441-L447

Det er ikkje råd å fjerna denne analysen i lexc, trur eg, så han må fjernast med CG-reglar i staden. Eg har gjort eit forsøk i 610a9c9.

Eg veit ikkje om det vart rett. @lynnda-hill kan du sjå?

@snomos
Copy link
Member Author

snomos commented Feb 8, 2022

@lynnda-hill ?

@snomos
Copy link
Member Author

snomos commented Sep 13, 2023

@lynnda-hill det ser ut som om ting fungerer i dette tilfellet, men med fare for å fjerna ordenstalslesingar som burde bli ståande. @ilm024 kan du òg sjå på dette? Uansett så avsluttar eg saka, finn de feil så lag nye meldingar 🙂

@snomos snomos closed this as completed Sep 13, 2023
@ilm024
Copy link
Contributor

ilm024 commented Sep 15, 2023

Den skal velge "8" Num Arab Sg Gen" for det er preposisjon der

@snomos
Copy link
Member Author

snomos commented Sep 15, 2023

Ok, det er ein jobb for disambigueringa, så det må andre sjå på. @ilm024 eller @lynnda-hill ? Eg gjenopnar.

@snomos snomos reopened this Sep 15, 2023
@snomos snomos changed the title '8' disambiguert som ordenstal TTS: '8' disambiguert som ordenstal Oct 23, 2023
@snomos snomos added the bug Something isn't working label Oct 23, 2023
@lynnda-hill
Copy link
Contributor

lynnda-hill commented Oct 27, 2023

Trenger en bedre kommando for å teste, med den du brukte @snomos får jeg noe nesten uleselig:


"<8>"
Using Phon: gáktsa
looking up text2ipa: gáktsa
        "gáktsa" Num Sg Nom  "kɑːktsa"phon DIVVUN-PHON:TEXT2IPA
Using surf: 8
looking up text2ipa: 8
                "8" Num Arab Sg Gen "8>"MIDTAPE <W:0.0> SELECT:1387:Arab MAP:1245:
>P @>P #20->14 SETPARENT:862:ComplToPo "8"phon DIVVUN-PHON:TEXT2IPA
Using Phon: gávtsát
looking up text2ipa: gávtsát
        "gávtsát" A Ord Sg Nom  "kɑːftsɑːht"phon DIVVUN-PHON:TEXT2IPA
Using surf: 8
looking up text2ipa: 8
                "8" Num Arab Sg Gen "8>"MIDTAPE <W:0.0> SELECT:1387:Arab MAP:1245:
>P @>P #20->14 SETPARENT:862:ComplToPo "8"phon DIVVUN-PHON:TEXT2IPA
Using Phon: gáktsa
looking up text2ipa: gáktsa
        "gáktsa" Num Sg Nom  "kɑːktsa"phon DIVVUN-PHON:TEXT2IPA
Using surf: 8
looking up text2ipa: 8
                "8" Num Arab Sg Nom "8>"MIDTAPE <W:0.0> SELECT:1387:Arab MAP:1988:
Sg<subj @<SUBJ #20->14 SETPARENT:1055:SetSubjToLeftVfin "8"phon DIVVUN-PHON:TEXT2I
PA
Using Phon: gávtsát
looking up text2ipa: gávtsát
        "gávtsát" A Ord Sg Nom  "kɑːftsɑːht"phon DIVVUN-PHON:TEXT2IPA
Using surf: 8
looking up text2ipa: 8
                "8" Num Arab Sg Nom "8>"MIDTAPE <W:0.0> SELECT:1387:Arab MAP:1988:
Sg<subj @<SUBJ #20->14 SETPARENT:1055:SetSubjToLeftVfin "8"phon DIVVUN-PHON:TEXT2I
PA
Skipping traced removed CG line:
;       "8" Num Arab Err/Orth Ess "8>"MIDTAPE <W:0.0> SELECT:1387:Arab REMOVE:4021
:errsub
Skipping traced removed CG line:

@snomos
Copy link
Member Author

snomos commented Oct 27, 2023

Kva med denne kommandoen?

echo Sáme jahke, man birra dánna giehtoduvvá, le gåjt edesik juoga mij manná \
birra ådåsis ja ådåsis tjadá 8 jábe jagev birra. \
| tools/tts/modes/trace-smj-normaliser.mode

Då får eg dette ut:

"<8>"
	"gáktsa" Num Sg Nom "gáktsa"phon
		"8" Num Arab Sg Gen "8>"MIDTAPE <W:0.0> SELECT:1387:Arab MAP:1357:>nNum @>N #20->21 SETPARENT:866:SetModToN SETPARENT:1145:Not>NCoord
	"gávtsát" A Ord Sg Nom "gávtsát"phon
		"8" Num Arab Sg Gen "8>"MIDTAPE <W:0.0> SELECT:1387:Arab MAP:1357:>nNum @>N #20->21 SETPARENT:866:SetModToN SETPARENT:1145:Not>NCoord
	"gáktsa" Num Sg Nom "gáktsa"phon
		"8" Num Arab Sg Nom "8>"MIDTAPE <W:0.0> SELECT:1387:Arab MAP:2007:Sg<subj @<SUBJ #20->21 SETPARENT:1055:SetSubjToLeftVfin
	"gávtsát" A Ord Sg Nom "gávtsát"phon
		"8" Num Arab Sg Nom "8>"MIDTAPE <W:0.0> SELECT:1387:Arab MAP:2007:Sg<subj @<SUBJ #20->21 SETPARENT:1055:SetSubjToLeftVfin
;	"8" Num Arab Err/Orth Ess "8>"MIDTAPE <W:0.0> SELECT:1387:Arab REMOVE:4021:errsub
;	"8" Num Arab Err/Orth Sg Acc "8>"MIDTAPE <W:0.0> SELECT:1387:Arab REMOVE:2711
;	"8" Num Arab Err/Orth Sg Com "8>"MIDTAPE <W:0.0> SELECT:1387:Arab REMOVE:4021:errsub
;	"8" Num Arab Sg Ela Attr "8"MIDTAPE <W:0.0> SELECT:1387:Arab IFF:3194
;	"8" Num Arab Sg Ill Attr "8"MIDTAPE <W:0.0> SELECT:1387:Arab IFF:3194
;	"8" Num Arab Sg Ine Attr "8"MIDTAPE <W:0.0> SELECT:1387:Arab IFF:3194
;	"8" Num Sem/ID "8"MIDTAPE <W:0.0> SELECT:1387:Arab
;	"8" A Arab Ord Attr CLBfinal "8"MIDTAPE <W:0.0> REMOVE:2067:spurious-adj-reading

@lynnda-hill
Copy link
Contributor

Nå får vi Nom og Gen. For å bare ta Gen pga postposisjonen må vi ta en prat @ilm024. Det er 2 substantiver mellom numeralet og postposisjonen, og numeralet kan gjelde for hele nominalrekke eller bare deler av det. Det krever litt testing.

"<8>"
        "gáktsa" Num Sg Nom "gáktsa"phon
                "8" Num Arab Sg Gen "8>"MIDTAPE <W:0.0> SELECT:1387:Arab MAP:1245:>P @>P #20->14 SETPARENT:862:ComplToPo
        "gávtsát" A Ord Sg Nom "gávtsát"phon
                "8" Num Arab Sg Gen "8>"MIDTAPE <W:0.0> SELECT:1387:Arab MAP:1245:>P @>P #20->14 SETPARENT:862:ComplToPo
        "gáktsa" Num Sg Nom "gáktsa"phon
                "8" Num Arab Sg Nom "8>"MIDTAPE <W:0.0> SELECT:1387:Arab MAP:1988:Sg<subj @<SUBJ #20->14 SETPARENT:1055:SetSubjToLeftVfin
        "gávtsát" A Ord Sg Nom "gávtsát"phon
                "8" Num Arab Sg Nom "8>"MIDTAPE <W:0.0> SELECT:1387:Arab MAP:1988:Sg<subj @<SUBJ #20->14 SETPARENT:1055:SetSubjToLeftVfin
;       "8" Num Arab Err/Orth Ess "8>"MIDTAPE <W:0.0> SELECT:1387:Arab REMOVE:4021:errsub
;       "8" Num Arab Err/Orth Sg Acc "8>"MIDTAPE <W:0.0> SELECT:1387:Arab REMOVE:2711
;       "8" Num Arab Err/Orth Sg Com "8>"MIDTAPE <W:0.0> SELECT:1387:Arab REMOVE:4021:errsub
;       "8" Num Arab Sg Ela Attr "8"MIDTAPE <W:0.0> SELECT:1387:Arab IFF:3194
;       "8" Num Arab Sg Ill Attr "8"MIDTAPE <W:0.0> SELECT:1387:Arab IFF:3194
;       "8" Num Arab Sg Ine Attr "8"MIDTAPE <W:0.0> SELECT:1387:Arab IFF:3194
;       "8" Num Sem/ID "8"MIDTAPE <W:0.0> SELECT:1387:Arab
;       "8" A Arab Ord Attr CLBfinal "8"MIDTAPE <W:0.0> REMOVE:2067:spurious-adj-reading
: 
"<jábe>"
        "jáhpe" N Sem/Time Sg Gen "jáhpe>Q1"MIDTAPE <W:0.0> SELECT:2523 SUBSTITUTE:4028 MAP:1392:>nTime @>N #21->22 SETPARENT:866:SetModToN SUBSTITUTE:1428:smjRemove
;       "jáhpe" N Sem/Time Pl Nom "jáhpe>Q1"MIDTAPE <W:0.0> SELECT:2523
: 
"<jagev>"
        "jahke" N Sem/Time Sg Acc "jahke>Q1v"MIDTAPE <W:0.0> SUBSTITUTE:4028 MAP:2379:Acc<advl @<ADVL #22->14 SETPARENT:1018:SetAdvlToSubj SETPARENT:1020:SetAdvlToLeftMv SUBSTITUTE:1428:smjRemove
: 
"<birra>"
        "birra" Po "birra>"MIDTAPE <W:0.0> SUBSTITUTE:4033 MAP:2309:V<advl @<ADVL #23->14 SETPARENT:1018:SetAdvlToSubj SETPARENT:1020:SetAdvlToLeftMv SUBSTITUTE:1428:smjRemove
;       "birra" Pr "birra>"MIDTAPE <W:0.0> REMOVE:3123

@snomos
Copy link
Member Author

snomos commented Oct 27, 2023

Den skal velge "8" Num Arab Sg Gen" for det er preposisjon der

@lynnda-hill det skal vera Gen 🙂

@lynnda-hill
Copy link
Contributor

Det er greit at det skal være Gen i dette tilfellet, men vi må lage en kontrastiv analyse siden det ikke er entydig at alle Num-Gen Gen Gen(og nå ser jeg at jaget ikkje en gang er Gen men Acc) Po-rekker krever at første ordet i rekka må være Gen. Det kan også være slik at bare siste Gen hører til Po og alt som står før er en separat NP.

@snomos
Copy link
Member Author

snomos commented Oct 30, 2023

Etter det @ilm024 skriv, er det ikkje ein Po, men ein Pr. Om det endrar på utfordringane med å disambiguera veit eg ikkje 😊

@ilm024
Copy link
Contributor

ilm024 commented Nov 10, 2023

ja, de er "tjadá" som bestemmer at det ska være gen, "jagev" er akk pga "birra".

@snomos
Copy link
Member Author

snomos commented Nov 10, 2023

Dvs at dei relevante partane skal analyserast slik?

"<tjadá>"
        "tjadá" Po "tjadá>"MIDTAPE <W:0.0> @<ADVL #19->14
: 
"<8>"
        "gáktsa" Num Sg Gen "gávtse"phon "8"oldlemma
: 
"<jábe>"
        "jáhpe" N Sem/Time Sg Gen "jáhpe>Q1"MIDTAPE <W:0.0> @>N #21->22

og

"<jagev>"
        "jahke" N Sem/Time Sg Acc "jahke>Q1v"MIDTAPE <W:0.0>  @<ADVL #22->14 
: 
"<birra>"
        "birra" Po "birra>"MIDTAPE <W:0.0> @<ADVL #23->14 

Blir dette rett, @ilm024 og @lynnda-hill ?

@ilm024
Copy link
Contributor

ilm024 commented Nov 13, 2023

heilt rett

@ilm024
Copy link
Contributor

ilm024 commented Nov 13, 2023

eller, tjadá e prep, ikkje po

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants