Skip to content

Commit

Permalink
Verify character class still non-empty after converting to byte class
Browse files Browse the repository at this point in the history
For `[^\x00-\xff]`, while it is still treated as a full Unicode
character class, it is not empty. For instance `≥` would still be
matched.

However, when `CharClass::to_byte_class` is called on it (as is done
when using `regex::bytes::Regex::new` rather than `regex::Regex::new`),
it _is_ now empty, since it excludes all possible bytes.

This commit adds a test asserting that `regex::bytes::Regex::new` for
this case (in accordance with
rust-lang#106) and adds an
`is_empty` check to the result of calling `CharClass::to_byte_class`,
which allows the test to pass.
  • Loading branch information
scooter-dangle committed Dec 8, 2016
1 parent 54ae5b6 commit 2d1a77d
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 1 deletion.
12 changes: 11 additions & 1 deletion regex-syntax/src/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -596,7 +596,17 @@ impl Parser {
Ok(Build::Expr(if self.flags.unicode {
Expr::Class(class)
} else {
Expr::ClassBytes(class.to_byte_class())
let byte_class = class.to_byte_class();

// If `class` was only non-empty due to multibyte characters, the
// corresponding byte class will now be empty.
//
// See https://github.com/rust-lang-nursery/regex/issues/303
if byte_class.is_empty() {
return Err(self.err(ErrorKind::EmptyClass));
}

Expr::ClassBytes(byte_class)
}))
}

Expand Down
6 changes: 6 additions & 0 deletions tests/bytes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,9 @@ matiter!(invalidutf8_anchor3,
r"^|ddp\xff\xffdddddlQd@\x80",
R(b"\x8d#;\x1a\xa4s3\x05foobarX\\\x0f0t\xe4\x9b\xa4"),
(0, 0));

// See https://github.com/rust-lang-nursery/regex/issues/303
#[test]
fn negated_full_byte_range() {
assert!(::regex::bytes::Regex::new(r#"[^\x00-\xff]"#).is_err());
}

0 comments on commit 2d1a77d

Please sign in to comment.