Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing of punctuation in format_links() #3027

Merged
merged 7 commits into from
Nov 5, 2023

Conversation

dato
Copy link
Contributor

@dato dato commented Oct 10, 2023

The main fix is not parsing punctuation inside brackets if it's not preceded by a URL. This fixes #2993, fixes #3049. Additionally, multiple punctuation signs are now supported.

As noted in a comment below, the inner workings of the function are changed, making helper functions perform the stripping of symbols, rather than just returning bool.

This allows to parse `(URL).` correctly, which was not detected
as URL before.
(Test case already part of test_format_links_simple_url.)
@dato
Copy link
Contributor Author

dato commented Oct 10, 2023

Hiding previous comment about dato#15, which was closed and fast-forwarded here

I also noticed that multiple punctuations signs are not supported.

This is a bit trickier to implement, because the current approach asumes only one character at either side is to be stripped.

Can be easily solved with a function that, rather than return booleans, splits the string into a prefix/link/suffix triplet.

I'll submit that as a separate PR if it doesn't seem too much to review. Thanks! (Since it's a chained PR, it's at dato#15 at the moment. You can fast-forward to this PR if you wish to merge/review all together.)

@dato dato changed the title Allow punctuation after parenthesis Fix parsing of punctuation in format_links() Nov 3, 2023
@dato
Copy link
Contributor Author

dato commented Nov 3, 2023

@hughrun Updated this PR per our discussion in #3074.

Also, consolidate all punctuation tests into a single table-driven one.
They're identical here, since re.M is not used, and the better-known
should be used, for readability.
@hughrun
Copy link
Contributor

hughrun commented Nov 5, 2023

This is really niiiice! Well done!

@hughrun hughrun merged commit a93519e into bookwyrm-social:main Nov 5, 2023
10 checks passed
@dato dato deleted the find_links_wrapped_punct branch November 5, 2023 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect rendering of (?) Misparsing of [...] elisions
2 participants