Allow printing tags without full tracing #90

marcriera · 2022-02-02T22:29:45Z

Since commit a9c7675, tags are not printed unless tracing is enabled. However, since CG is now used in many Apertium pairs during generation to handle preferences, tags may be necessary without full tracing.

Tags are specially useful when running a testvoc. If no tags are printed, only the internal lemma with # is shown, which is difficult to debug. Enabling tracing with -t helps in this sense, but also adds excessive information that the postgenerator does not handle properly.

I suggest adding a new flag to print tags, regardless of tracing, or printing tags by default again (unless it is really preferable not to print tags by default).

Thanks!

The text was updated successfully, but these errors were encountered:

TinoDidriksen · 2022-02-03T06:04:15Z

Ping @unhammer

unhammer · 2022-02-03T13:42:36Z

So you're running a pipeline that uses cg-proc -n -g after lt-proc -g and the lt-proc steps which used to give you #foo<tag> now gives #foo?

unhammer · 2022-02-03T13:43:00Z

if I understand correctly the problem is

$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin
^NotInGen<np>/@NotInGen<np>$
$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin | cg-proc -g -n nob-nno.genprefs.rlx.bin
#NotInGen

doesn't give the tags, while if we use -t it does give tags

$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin | cg-proc -g -n -t nob-nno.genprefs.rlx.bin
#NotInGen\<np\>

but is noisy if a rule actually hit:

$ echo å gafle|apertium -f none -d . nob-nno-dgen |cg-proc -g -n -t nob-nno.genprefs.rlx.bin
å gafla/¬gafle\<v:infa_infe\><REMOVE:26>

unhammer · 2022-02-03T13:46:34Z

or printing tags by default again (unless it is really preferable not to print tags by default)

This is running after the generator, so we do have to get rid of the tags to avoid them ending up in the output shown to the user.

Also, I suppose you only want tags on the stuff we couldn't generate?

marcriera · 2022-02-03T16:36:01Z

if I understand correctly the problem is

$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin
^NotInGen<np>/@NotInGen<np>$
$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin | cg-proc -g -n nob-nno.genprefs.rlx.bin
#NotInGen

doesn't give the tags, while if we use -t it does give tags

$ echo '^NotInGen<np>$' |lt-proc -b nob-nno.autogen.bin | cg-proc -g -n -t nob-nno.genprefs.rlx.bin
#NotInGen\<np\>

but is noisy if a rule actually hit:

$ echo å gafle|apertium -f none -d . nob-nno-dgen |cg-proc -g -n -t nob-nno.genprefs.rlx.bin
å gafla/¬gafle\<v:infa_infe\><REMOVE:26>

It's exactly this, thanks.

or printing tags by default again (unless it is really preferable not to print tags by default)

This is running after the generator, so we do have to get rid of the tags to avoid them ending up in the output shown to the user.

Also, I suppose you only want tags on the stuff we couldn't generate?

Yes, tags should only appear for lexical units that cannot be generated. The generator is running right before this and trying to generate a surface form, so the input to cg-proc -g -n will only contain tags in cohorts if there's a generation error in the previous step of the pipeline.

unhammer · 2022-02-03T19:11:35Z

The generator is running right before this and trying to generate a surface form, so the input to cg-proc -g -n will only contain tags in cohorts if there's a generation error in the previous step of the pipeline.

Well, there will also be tags on readings if there are variant tags (in addition to the input tags which are there since we use lt-proc -b on the generator):

$ echo blå | apertium -d . nob-nno-dgen
^blå<adj><sint><pst><un><pl><ind>/blå/blåe<v:blå_blåe>$

(that's the input to cg-proc -g -n)

marcriera · 2022-02-04T16:52:58Z

The generator is running right before this and trying to generate a surface form, so the input to cg-proc -g -n will only contain tags in cohorts if there's a generation error in the previous step of the pipeline.

Well, there will also be tags on readings if there are variant tags (in addition to the input tags which are there since we use lt-proc -b on the generator):
$ echo blå | apertium -d . nob-nno-dgen
^blå<adj><sint><pst><un><pl><ind>/blå/blåe<v:blå_blåe>$
(that's the input to cg-proc -g -n)

You're right, of course. For some reason I had assumed these were just removed, but they are tags after all.

I suppose we could distinguish between invalid and valid readings by checking if there's a # or @ in the input. These are added by the generator only if it cannot generate anything. I assume they are also escaped if the generation is valid (there could be a lexical unit beginning with these two characters).

TinoDidriksen added the request label Feb 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow printing tags without full tracing #90

Allow printing tags without full tracing #90

marcriera commented Feb 2, 2022

TinoDidriksen commented Feb 3, 2022

unhammer commented Feb 3, 2022

unhammer commented Feb 3, 2022 •

edited

Loading

unhammer commented Feb 3, 2022 •

edited

Loading

marcriera commented Feb 3, 2022

unhammer commented Feb 3, 2022 •

edited

Loading

marcriera commented Feb 4, 2022

Allow printing tags without full tracing #90

Allow printing tags without full tracing #90

Comments

marcriera commented Feb 2, 2022

TinoDidriksen commented Feb 3, 2022

unhammer commented Feb 3, 2022

unhammer commented Feb 3, 2022 • edited Loading

unhammer commented Feb 3, 2022 • edited Loading

marcriera commented Feb 3, 2022

unhammer commented Feb 3, 2022 • edited Loading

marcriera commented Feb 4, 2022

unhammer commented Feb 3, 2022 •

edited

Loading

unhammer commented Feb 3, 2022 •

edited

Loading

unhammer commented Feb 3, 2022 •

edited

Loading