Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lt-trim: new option --match-section #165

Merged
merged 1 commit into from
Sep 30, 2022
Merged

Conversation

unhammer
Copy link
Member

@unhammer unhammer commented Sep 29, 2022

May be given multiple times. Any section matching such a
name (id@type) in the analyser will only be trimmed against sections
with the same name in the bidix. Useful for regex sections, which tend
to have a very different structure from regular entries (few states
with lots of transitions + loops) – leading to slowdown when
intersecting.

This gives a 4x speedup (60s → 15s) on nob→nno:

BEFORE:

$ \time lttoolbox/lttoolbox/lt-trim apertium-nob/nob.automorf.bin apertium-nno-nob/nob-nno.autobil.bin /tmp/before.bin final@inconditional 26 76
main@standard 168643 350041
regex@standard 403 7475
58.73user 0.97system 1:00.45elapsed 98%CPU (0avgtext+0avgdata 2280784maxresident)k 0inputs+3288outputs (0major+574892minor)pagefaults 0swaps

AFTER:

$ \time lttoolbox/lttoolbox/lt-trim --match-section=regex@standard apertium-nob/nob.automorf.bin apertium-nno-nob/nob-nno.autobil.bin /tmp/after.bin Matched sections regex@standard
final@inconditional 26 76
main@standard 168643 350041
regex@standard 389 7405
14.36user 0.24system 0:14.77elapsed 98%CPU (0avgtext+0avgdata 382136maxresident)k 0inputs+3288outputs (0major+102452minor)pagefaults 0swaps

(timings are the same if lt-comp -j was used to make nob.automorf.bin)

@mr-martian
Copy link
Contributor

My original design of cli.h was with the intent that situations like this could easily have -s regex1 -s regex2, and I'm not sure whether this method of supporting both multiple arguments and comma separation is good or bad.

@unhammer
Copy link
Member Author

Oh! I didn't know you could do that, that's much nicer!

May be given multiple times. Any section matching such a
name (id@type) in the analyser will only be trimmed against sections
with the same name in the bidix. Useful for regex sections, which tend
to have a very different structure from regular entries (few states
with lots of transitions + loops) – leading to slowdown when
intersecting.

This gives a 4x speedup (60s → 15s) on nob→nno:

BEFORE:

$ \time lttoolbox/lttoolbox/lt-trim apertium-nob/nob.automorf.bin apertium-nno-nob/nob-nno.autobil.bin /tmp/before.bin
final@inconditional 26 76
main@standard 168643 350041
regex@standard 403 7475
58.73user 0.97system 1:00.45elapsed 98%CPU (0avgtext+0avgdata 2280784maxresident)k
0inputs+3288outputs (0major+574892minor)pagefaults 0swaps

AFTER:

$ \time lttoolbox/lttoolbox/lt-trim --match-section=regex@standard apertium-nob/nob.automorf.bin apertium-nno-nob/nob-nno.autobil.bin /tmp/after.bin
Matched sections regex@standard
final@inconditional 26 76
main@standard 168643 350041
regex@standard 389 7405
14.36user 0.24system 0:14.77elapsed 98%CPU (0avgtext+0avgdata 382136maxresident)k
0inputs+3288outputs (0major+102452minor)pagefaults 0swaps

(timings are the same if lt-comp -j was used to make nob.automorf.bin)
@unhammer unhammer force-pushed the lt-trim-matching-section-ids branch from 9fd3781 to b3f9e99 Compare September 29, 2022 18:54
@unhammer unhammer changed the title lt-trim: new option --match-section-ids lt-trim: new option --match-section Sep 29, 2022
@unhammer
Copy link
Member Author

force-updated with nicer cli

@unhammer unhammer merged commit 5fa5a97 into master Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants