Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New section type that doesn't minimise #96

Open
ftyers opened this issue Jun 26, 2020 · 9 comments
Open

New section type that doesn't minimise #96

ftyers opened this issue Jun 26, 2020 · 9 comments
Labels
enhancement New feature or request

Comments

@ftyers
Copy link
Member

ftyers commented Jun 26, 2020

At the moment we add regexes in sections. Minimising regexes takes a long time. So perhaps we could have a special type="regex" section that does not minimise, it would speed up compilation of regex-heavy dictionaries.

This will likely break binary compatibility.

@ftyers ftyers added the enhancement New feature or request label Jun 26, 2020
@mr-martian
Copy link
Contributor

or it could union with some other section after that section has been minimized to avoid having to create a new section in the binary.

@ftyers
Copy link
Member Author

ftyers commented Jun 26, 2020

@mr-martian that sounds a bit more complicated. Also, it would be cool to be able to give weights to sections, but I'll open another issue for that.

@mr-martian
Copy link
Contributor

Upon poking around a bit, I've determined that this would not break the binary format, since section types are just encoded as strings and lt-proc already handles multiple sections of the same type. Have lt-comp relabel type="regex" to type="standard" would result in complete backwards compatibility, or lt-proc can just recognize section names ending in @regex and treat them like @standard.

Either way, this should probably be accompanied by a way to mark <pardef>s as non-minimizing for the same reason. regex="yes", perhaps.

@TinoDidriksen
Copy link
Member

This should be optional. For development it should be fast to compile and test, but for distribution it should heavily optimize to the smallest/fastest output binary.

@mr-martian
Copy link
Contributor

Also, it occurs to me that this is tricky because lt-comp minimizes each pardef separately in addition to each section.

@unhammer
Copy link
Member

But this is about speed – is minimising each pardef on its own slow? (Last time I checked, the section minimisation at the end was the slow step.)

@mr-martian
Copy link
Contributor

Another alternative is that 0493630 added the ability to compile dictionaries in several pieces, which should alleviate the burden of frequently recompiling the regex sections.

@mr-martian
Copy link
Contributor

In fact, we could have globally shared regex sections, as proposed in apertium/apertium#161

@unhammer
Copy link
Member

minimisation has gotten quite a bit faster lately. but there's a related pr at #165

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants