-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
WIP: custom Unicode normalization for Julia identifiers (#19464)
* implement custom Julia Unicode normalization for confusable characters in identifiers * whoops * separated julia_charmap into its own file to make it easier to update * normalize fullwidth -> halfwidth in identifiers, ala NFKC * make \varepsilon complete to ε (u+03b5), fixes #14751 * docs for canonicalization * normalize fullwidth characters during parsing (fixes #5903) * typo * tests * be more cautious about normalizing chars when parsing, so as not to normalize string literals * test fullwidth numeric literals and parens * typo/clarification * update to utf8proc-2.1 * checksum for utf8proc 2.1 * moved symbol-normalization test from test/core to test/parse * Revert "be more cautious about normalizing chars when parsing, so as not to normalize string literals" This reverts commit 81033fa. * Revert "normalize fullwidth characters during parsing (fixes #5903)" This reverts commit cf61972. * remove more references to fullwidth normalization * rm fullwidth identifier normalization
- Loading branch information
Showing
13 changed files
with
75 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 change: 1 addition & 0 deletions
1
deps/checksums/utf8proc-40e605959eb5cb90b2587fa88e3b661558fbc55a.tar.gz/md5
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
f33af304538c3afba3b1d0ebae8e4555 |
1 change: 1 addition & 0 deletions
1
deps/checksums/utf8proc-40e605959eb5cb90b2587fa88e3b661558fbc55a.tar.gz/sha512
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
17a2df079e726a4ae1f10fcf48a7a771c2bcc93c7938f88148e1aa3b6cf9d250eb33cd7a9d8de54f29360e71c71e59b77996ba28dd894676888dc0453d67e9bb |
1 change: 0 additions & 1 deletion
1
deps/checksums/utf8proc-e3a5ed7b8bb5d0c6bb313d3e1f4d072c04113c4b.tar.gz/md5
This file was deleted.
Oops, something went wrong.
1 change: 0 additions & 1 deletion
1
deps/checksums/utf8proc-e3a5ed7b8bb5d0c6bb313d3e1f4d072c04113c4b.tar.gz/sha512
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
UTF8PROC_BRANCH=v2.0.2 | ||
UTF8PROC_SHA1=e3a5ed7b8bb5d0c6bb313d3e1f4d072c04113c4b | ||
UTF8PROC_BRANCH=v2.1 | ||
UTF8PROC_SHA1=40e605959eb5cb90b2587fa88e3b661558fbc55a |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
/* Array of {original codepoint, replacement codepoint} normalizations | ||
to perform on Julia identifiers, to canonicalize characters that | ||
are both easily confused and easily inputted by accident. */ | ||
static const uint32_t charmap[][2] = { | ||
{ 0x025B, 0x03B5 }, // latin small letter open e -> greek small letter epsilon | ||
{ 0x00B5, 0x03BC }, // micro sign -> greek small letter mu | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters