Skip to content

Commit

Permalink
doc: add wording about Unicode scalar values
Browse files Browse the repository at this point in the history
This makes it clearer that the regex engine works by *logically*
treating a haystack as a sequence of codepoints. Or more specifically,
Unicode scalar values.

Fixes #854
  • Loading branch information
BurntSushi committed Mar 15, 2023
1 parent cb25619 commit 3a31ecd
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,8 @@ instead.)
This implementation executes regular expressions **only** on valid UTF-8
while exposing match locations as byte indices into the search string. (To
relax this restriction, use the [`bytes`](bytes/index.html) sub-module.)
Conceptually, the regex engine works by matching a haystack as if it were a
sequence of Unicode scalar values.
Only simple case folding is supported. Namely, when matching
case-insensitively, the characters are first mapped using the "simple" case
Expand Down

0 comments on commit 3a31ecd

Please sign in to comment.