-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative formatting processing #42
Alternative formatting processing #42
Conversation
Thanks, this is very exciting! I will look through this carefully soon; your contributions are greatly appreciated. |
Other than the comment above, this all looks good to me. Perhaps you can rebase it onto |
I've now merged your latest changes to I'm happy to squash all of these commits into one on my end, but I'm wondering whether this is something that you're also able to do on your side. Up to you if you want to decide what the merge commit should say or if you're happy for me come up with something. |
Yes I think it's best if you squash on your side before merging. Also if you ever want to dig deeper into performance issues, you can run the |
Squashed commit of the following: commit ccbc433 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Tue Oct 15 14:13:35 2024 +0200 Fix typo commit 0e4ec8c Merge: 52fe1f7 3f658fd Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Tue Oct 15 14:07:14 2024 +0200 Merge remote-tracking branch 'upstream/main' into pr-refactor-logic commit 52fe1f7 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Mon Oct 14 22:19:16 2024 +0200 Document added regex commit 1418812 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Mon Oct 14 18:15:44 2024 +0200 Fix Clippy warnings commit 6c248db Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Mon Oct 14 18:04:11 2024 +0200 Use `char::width()` for checking wrapping need commit 3f658fd Author: William G Underwood <42812654+WGUNDERWOOD@users.noreply.github.com> Date: Mon Oct 14 15:28:00 2024 +0100 Clippy commit 0dcbc93 Author: William G Underwood <42812654+WGUNDERWOOD@users.noreply.github.com> Date: Mon Oct 14 15:03:15 2024 +0100 Split cli and read files commit 3ca0fd2 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Mon Oct 14 14:09:18 2024 +0200 Remove unused function commit b4e2130 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Mon Oct 14 14:08:45 2024 +0200 Wrap lines based on indent length and only apply indentation after wrapping commit 9bc0a3b Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Sun Oct 13 20:12:44 2024 +0200 Move allocation outside of wrap application commit 76d9ef0 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Sat Oct 12 11:17:43 2024 +0200 Separate indent calculation from application commit eb7971e Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Sat Oct 12 10:46:56 2024 +0200 Extract indent calculation into function commit e41a66d Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Sat Oct 12 10:31:58 2024 +0200 Apply clippy suggestions commit 21bb908 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Fri Oct 11 17:43:14 2024 +0200 Rename ignore check and document commit ac164ad Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Fri Oct 11 17:13:31 2024 +0200 Simplify Option matching commit 8a6e868 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Fri Oct 11 12:58:12 2024 +0200 Remove unnecessary comment check and `Option` in return type when splitting lines commit 1d15b86 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Fri Oct 11 12:57:17 2024 +0200 Change comments around commit e77b629 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Fri Oct 11 12:56:46 2024 +0200 Refactor ignoring check as function commit b10b010 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Fri Oct 11 11:37:18 2024 +0200 Move environment line splitting before indenting This avoids re-adding the start of the split line to the queue. commit 0424d0a Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Fri Oct 11 11:23:01 2024 +0200 Move check for ignore outside of `apply_indent()` commit 70ac614 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Fri Oct 11 09:29:55 2024 +0200 Use new return type for `put_env_new_line` commit fdea16b Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Fri Oct 11 09:28:54 2024 +0200 Fix infinite loop issue when split points are in comments commit 791c7f1 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Fri Oct 11 09:27:23 2024 +0200 Correct to not expect `{` after `\item` but only after `\begin` and `\end` commit 6e5ce6c Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Fri Oct 11 09:26:06 2024 +0200 Match target tex file to source file commit 9562af1 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Thu Oct 10 22:27:53 2024 +0200 Remove debug print statement commit 0f48547 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Thu Oct 10 18:10:00 2024 +0200 Split environment and item lines without allocating This does not compile because the `format` function cannot yet deal with the new return type. commit c49a08c Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Thu Oct 10 13:27:48 2024 +0200 Check indentation return to 0 on `state` instead of `new_text` commit 1b90f21 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Thu Oct 10 13:11:57 2024 +0200 Split condition tree into distinct blocks with `continue` commit e974225 Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Thu Oct 10 11:07:08 2024 +0200 Plan line formatting logic steps commit 4bf802a Author: Cyprien de Saint Guilhem <cyprien.desaintguilhem@3milabs.tech> Date: Thu Oct 10 10:53:42 2024 +0200 Group lines to clean text into one function commit 380380d Author: William G Underwood <42812654+WGUNDERWOOD@users.noreply.github.com> Date: Thu Oct 10 14:33:04 2024 +0100 Release process notes
ccbc433
to
db634ec
Compare
Squash completed. I've kept the messages in the squash commit's message; and I've kept a branch on my fork with the individual commits. |
Could you add an example test of where the old code would fail on zero-width chars? I tried an example with a French accent in |
As a first attempt: the following line (with two spaces before "Aujourd'hui" and where the 70th visual column is between "vérifier" and "is") y̆es Aujourd'hui je me suis baladé déhors. Hier je suis allé vérifier si l'été était arrivé. is wrapped by y̆es Aujourd'hui je me suis baladé déhors. Hier je suis allé
vérifier si l'été était arrivé. So it's wrapped at the 61-st visual place, instead of the expected 70th (because of However, running the current y̆es Aujourd'hui je me suis baladé déhors. Hier je suis allé vérifier
si l'été était arrivé. The issue with what I said in the PR is that French accents are not 0-width characters; indeed letters like If there is a performance loss in using the function from |
I do see a slight performance degradation using |
That's a good point about how LaTeX is meant to offer other ways for the user to express accents, without assuming that the source is UTF-8 encoded. Some of the scripts for which I suspect there could be more 0-width characters would include Mandarin, Hindi, Arabic or maybe Korean or Japanese but I don't know enough about these to check. (I also don't know how much LaTeX is used with these languages.) One thought I had would be to extract the code that calculates a line's length into a function, and have a feature (something like |
A list of characters for which the width needs modifying is given in the unicode_width crate docs. My impression is that these will be rarely encountered, so I'll remove this dependency for now. |
Hi there, as I was tinkering to figure out how to wrap closer to the given wrapping parameter one thing led to another and I arrived at an alternative logic for some of the steps that the formatting loop makes. I thought I'd share the code here in case you would be interested; if there are particular changes you like, I would be happy to cherry-pick them into their own PR and close this one up for the archives.
All the changes in this PR are to the logic only and not to the formatting style, all tests still pass unchanged (save for the addition of a
\begin{env1}
inside a comment to check that it is properly ignored). However, with these changes, I think that wrapping tightly to the wrapping parameter can be achieved by changing thewrapping_boundary
to useargs.wrap
instead ofargs.wrap_min
. I'm still testing this on another branch, but this requires changing all the test file to account for the tighter expected wrap.Summary of changes
.chars()
could cause inconsistent wrapping in non-English languages. This code introduces theunicode-width
dependency to obtain the on-screen width of each character in the string and calculate a wrapping point that visually matches the wrapping parameter.\section{...}
or\[
for example)."% "
in front of comments), and a slice containing the rest of the line.Misc changes
tex-fmt
ignore commands) is extracted into a function, and checked before proceeding to the formatting logic.indent.actual
after exiting the loop instead of accessing the lines of the text.test_source()
function is modified slightly to panic on the first file that formats incorrectly and to print the file name. The computation of the diff was taking a very long time when many files were failing because of many diffs.Performance Report
As show from the output below, these changes bring a (63.0 -> 57.1 = ) 9.3% improvement in performance on my machine.