Unicode characters support in regex #233

kunchtler · 2024-07-18T15:18:26Z

Linked to #232.

Changed the order in which the tokens are registered in the regex lexer to process the rule about recognizing letters last, and changed that rule to account for all non-blank characters (as specified per python's re library with \S).

Added a test to check for the support of non-ascii characters.

This is my very first pull request ever so feel free to guide me.

coveralls · 2024-07-18T16:42:16Z

coverage: 99.613%. remained the same
when pulling c7b94b1 on kunchtler:unicode-regex
into 9ab1a1c on caleb531:develop.

eliotwrobson

@kunchtler thanks for this! One request to make this test a little more robust, but overall I think the change looks good.

eliotwrobson · 2024-07-27T21:43:17Z

tests/test_regex.py

+    def test_validate_unicode_characters(self) -> None:
+        """Should pass validation for regular expressions with unicode characters."""
+        re.validate("(µ|🤖ù)*")
+


Should add a test that an nfa converted from this regex has the expected set of input symbols.

Unicode characters support in regex

c7b94b1

caleb531 requested a review from eliotwrobson July 18, 2024 16:39

eliotwrobson linked an issue Jul 27, 2024 that may be closed by this pull request

Unicode characters with regexp ? #232

Open

eliotwrobson requested changes Jul 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode characters support in regex #233

Unicode characters support in regex #233

kunchtler commented Jul 18, 2024

coveralls commented Jul 18, 2024

eliotwrobson left a comment

eliotwrobson Jul 27, 2024

Unicode characters support in regex #233

Are you sure you want to change the base?

Unicode characters support in regex #233

Conversation

kunchtler commented Jul 18, 2024

coveralls commented Jul 18, 2024

eliotwrobson left a comment

Choose a reason for hiding this comment

eliotwrobson Jul 27, 2024

Choose a reason for hiding this comment