From 0d0cebe917621596b01840b63bd173ba04aea7fe Mon Sep 17 00:00:00 2001 From: Andrew Gallant Date: Mon, 6 Mar 2023 10:00:32 -0500 Subject: [PATCH] doc: add wording about Unicode scalar values This makes it clearer that the regex engine works by *logically* treating a haystack as a sequence of codepoints. Or more specifically, Unicode scalar values. Fixes #854 --- src/lib.rs | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/lib.rs b/src/lib.rs index 042d243f8..af9cea20d 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -199,6 +199,8 @@ instead.) This implementation executes regular expressions **only** on valid UTF-8 while exposing match locations as byte indices into the search string. (To relax this restriction, use the [`bytes`](bytes/index.html) sub-module.) +Conceptually, the regex engine works by matching a haystack as if it were a +sequence of Unicode scalar values. Only simple case folding is supported. Namely, when matching case-insensitively, the characters are first mapped using the "simple" case