Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split sectioning commands onto new lines #45

Merged
merged 9 commits into from
Oct 31, 2024

Conversation

cdesaintguilhem
Copy link
Contributor

These changes introduce a couple of new regexes to match sectioning commands and capture them in order to split them onto new lines. This PR replaces #36 by introducing the test files together with the code to handle them and addresses this comment from #13.

One change from the test files in #36 is that text that follows a sectioning command is not expected to also be split onto a new line. I realised that this might be tricky to catch with regex if there are more groups of opening and closing brackets within the sectioning command. (Imagine something like \section{The \emph{best} section}.

I will first mark this PR as draft so that we can discuss whether capturing trailing text should also be included in this PR or in a later one.

@WGUNDERWOOD
Copy link
Owner

WGUNDERWOOD commented Oct 22, 2024

Thanks for this. Do you cover non-numbered cases such as \section*{}? I think for now it's ok if we don't split text following the sectioning command.

@cdesaintguilhem
Copy link
Contributor Author

Ah, good catch on non-numbered cases! I've now updated the regexes and expanded the test file to cover these too.

@WGUNDERWOOD WGUNDERWOOD marked this pull request as ready for review October 25, 2024 15:44
@WGUNDERWOOD
Copy link
Owner

Perhaps we can try to formulate this in a way that does not duplicate all the \section, \part... commands? It will otherwise be difficult to maintain. For example, compare with the current implementation of LISTS and LISTS_BEGIN in regexes.rs: each environment is defined once in a const array, then iterated over to construct the objects for matching.

@cdesaintguilhem
Copy link
Contributor Author

Yes that's a good point. I'll look into it this week hopefully.

@cdesaintguilhem
Copy link
Contributor Author

Now all the regexes for splitting sectioning commands are built from arrays which avoids duplicating the name of the commands.

It still feels a bit off that these are treated separately from environments for detecting, but together with them for the actual capturing when the line has to be split. I feel like the separation between environment commands and sectioning commands is a bit artificial now. However changing this is outside the scope of this PR and would require its own.

@WGUNDERWOOD
Copy link
Owner

WGUNDERWOOD commented Oct 31, 2024

Thanks for your help here, I'm going to merge this now so we have the functionality. However I would like to simplify the implementation a little. Could you help me understand whether it will be possible to merge all the logic and regexes for sectioning and environments? It seems to me that they are treated the same.

There is also a significant performance degradation (50ms -> 68ms), which would be good to fix.

@WGUNDERWOOD WGUNDERWOOD merged commit ffa7e38 into WGUNDERWOOD:main Oct 31, 2024
9 checks passed
@cdesaintguilhem cdesaintguilhem deleted the pr-handle-sections branch October 31, 2024 11:01
@cdesaintguilhem
Copy link
Contributor Author

Regarding the implementation: I will open an issue to keep track of this.

Regarding the performance: I measure ~56ms on my machine both before and after the merge, but I'll double-check this.

@WGUNDERWOOD
Copy link
Owner

I have added few extra commits since the merge -- see 8236d5a. Performance is now closer to how it was on my machine, mainly by making Pattern::new() more efficient. I've also merged the implementations of sectioning and environments logic.

@cdesaintguilhem
Copy link
Contributor Author

Thanks for pointing those out! Indeed I now measure 49ms on my machine, much closer to what it was before.

And after taking a quick look at the merge of the implementations, that's pretty much what I had in mind, so I don't think opening the issue is necessary for the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants