Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for unicode email addresses (RFC 6530 and following) #46

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

arnt
Copy link

@arnt arnt commented Sep 1, 2023

This is as good as it can be using the builtin re module. Using the regex module would permit another improvement.

As it stands, this matches å as a combined character, which is by far the most common way to encode å. It does not match two things:

  • 'a' + combining ring above
  • ZWJ, which is required e.g. for the word 'sri' when written using sinhala, the most common script on Sri Lanka.

Switching to the regex module would permit detecting both of those, but they're rare cases and it's not obvious to me that catching some rare cases justifiies pulling in another module.

@tsutsu3
Copy link
Owner

tsutsu3 commented Oct 4, 2023

I think it's a great idea.
However, since it behaves differently from linkify-it, I will not merge it.

@arnt
Copy link
Author

arnt commented Oct 4, 2023

I see, that makes sense to me. Please leave this open and I'll submit corresponding PRs to linkify-it and I suppose I might as well do linkify-it-rb too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants