From 3a31ecdf1018dcda915563711584d71e4b307f2e Mon Sep 17 00:00:00 2001
From: Andrew Gallant <jamslam@gmail.com>
Date: Mon, 6 Mar 2023 10:00:32 -0500
Subject: [PATCH] doc: add wording about Unicode scalar values

This makes it clearer that the regex engine works by *logically*
treating a haystack as a sequence of codepoints. Or more specifically,
Unicode scalar values.

Fixes #854
---
 src/lib.rs | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/lib.rs b/src/lib.rs
index 042d243f8..af9cea20d 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -199,6 +199,8 @@ instead.)
 This implementation executes regular expressions **only** on valid UTF-8
 while exposing match locations as byte indices into the search string. (To
 relax this restriction, use the [`bytes`](bytes/index.html) sub-module.)
+Conceptually, the regex engine works by matching a haystack as if it were a
+sequence of Unicode scalar values.
 
 Only simple case folding is supported. Namely, when matching
 case-insensitively, the characters are first mapped using the "simple" case