Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow printing tags without full tracing #90

Open
marcriera opened this issue Feb 2, 2022 · 7 comments
Open

Allow printing tags without full tracing #90

marcriera opened this issue Feb 2, 2022 · 7 comments
Labels

Comments

@marcriera
Copy link

Since commit a9c7675, tags are not printed unless tracing is enabled. However, since CG is now used in many Apertium pairs during generation to handle preferences, tags may be necessary without full tracing.

Tags are specially useful when running a testvoc. If no tags are printed, only the internal lemma with # is shown, which is difficult to debug. Enabling tracing with -t helps in this sense, but also adds excessive information that the postgenerator does not handle properly.

I suggest adding a new flag to print tags, regardless of tracing, or printing tags by default again (unless it is really preferable not to print tags by default).

Thanks!

@TinoDidriksen
Copy link
Member

Ping @unhammer

@unhammer
Copy link
Collaborator

unhammer commented Feb 3, 2022

So you're running a pipeline that uses cg-proc -n -g after lt-proc -g and the lt-proc steps which used to give you #foo<tag> now gives #foo?

@unhammer
Copy link
Collaborator

unhammer commented Feb 3, 2022

if I understand correctly the problem is

$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin
^NotInGen<np>/@NotInGen<np>$
$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin | cg-proc -g -n nob-nno.genprefs.rlx.bin
#NotInGen

doesn't give the tags, while if we use -t it does give tags

$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin | cg-proc -g -n -t nob-nno.genprefs.rlx.bin
#NotInGen\<np\>

but is noisy if a rule actually hit:

$ echo å gafle|apertium -f none -d . nob-nno-dgen |cg-proc -g -n -t nob-nno.genprefs.rlx.bin
å gafla/¬gafle\<v:infa_infe\><REMOVE:26>


@unhammer
Copy link
Collaborator

unhammer commented Feb 3, 2022

or printing tags by default again (unless it is really preferable not to print tags by default)

This is running after the generator, so we do have to get rid of the tags to avoid them ending up in the output shown to the user.

Also, I suppose you only want tags on the stuff we couldn't generate?

@marcriera
Copy link
Author

if I understand correctly the problem is

$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin
^NotInGen<np>/@NotInGen<np>$
$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin | cg-proc -g -n nob-nno.genprefs.rlx.bin
#NotInGen

doesn't give the tags, while if we use -t it does give tags

$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin | cg-proc -g -n -t nob-nno.genprefs.rlx.bin
#NotInGen\<np\>

but is noisy if a rule actually hit:

$ echo å gafle|apertium -f none -d . nob-nno-dgen |cg-proc -g -n -t nob-nno.genprefs.rlx.bin
å gafla/¬gafle\<v:infa_infe\><REMOVE:26>

It's exactly this, thanks.

or printing tags by default again (unless it is really preferable not to print tags by default)

This is running after the generator, so we do have to get rid of the tags to avoid them ending up in the output shown to the user.

Also, I suppose you only want tags on the stuff we couldn't generate?

Yes, tags should only appear for lexical units that cannot be generated. The generator is running right before this and trying to generate a surface form, so the input to cg-proc -g -n will only contain tags in cohorts if there's a generation error in the previous step of the pipeline.

@unhammer
Copy link
Collaborator

unhammer commented Feb 3, 2022

The generator is running right before this and trying to generate a surface form, so the input to cg-proc -g -n will only contain tags in cohorts if there's a generation error in the previous step of the pipeline.

Well, there will also be tags on readings if there are variant tags (in addition to the input tags which are there since we use lt-proc -b on the generator):

$ echo blå | apertium -d . nob-nno-dgen
^blå<adj><sint><pst><un><pl><ind>/blå/blåe<v:blå_blåe>$

(that's the input to cg-proc -g -n)

@marcriera
Copy link
Author

The generator is running right before this and trying to generate a surface form, so the input to cg-proc -g -n will only contain tags in cohorts if there's a generation error in the previous step of the pipeline.

Well, there will also be tags on readings if there are variant tags (in addition to the input tags which are there since we use lt-proc -b on the generator):

$ echo blå | apertium -d . nob-nno-dgen
^blå<adj><sint><pst><un><pl><ind>/blå/blåe<v:blå_blåe>$

(that's the input to cg-proc -g -n)

You're right, of course. For some reason I had assumed these were just removed, but they are tags after all.

I suppose we could distinguish between invalid and valid readings by checking if there's a # or @ in the input. These are added by the generator only if it cannot generate anything. I assume they are also escaped if the generation is valid (there could be a lexical unit beginning with these two characters).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants