add second "capture" lifetime to SubCaptures and SubCapturesNamed #168

defuz · 2016-02-17T16:12:35Z

I think this code should work:

#[test]
fn test_iter_lifetimes() {
    fn get_first_subcapture<'t>(regex: &Regex, text: &'t str) -> Option<&'t str> {
        let captures = regex.captures(text).unwrap();
        captures.iter().next().unwrap()
    }

    let regex = Regex::new("(a+)(b+)").unwrap();
    let subcapture = get_first_subcapture(&regex, "aabbab");

    assert_eq!(subcapture, Some("aabb"));
}

But today it does not compile:

src/re.rs:956:9: 956:17 error: `captures` does not live long enough
src/re.rs:956         captures.iter().next().unwrap()
                      ^~~~~~~~
src/re.rs:954:82: 957:6 note: reference must be valid for the lifetime 't as defined on the block at 954:81...
src/re.rs:954     fn get_first_subcapture<'t>(regex: &Regex, text: &'t str) -> Option<&'t str> {
src/re.rs:955         let captures = regex.captures(text).unwrap();
src/re.rs:956         captures.iter().next().unwrap()
src/re.rs:957     }
src/re.rs:955:54: 957:6 note: ...but borrowed value is only valid for the block suffix following statement 0 at 955:53
src/re.rs:955         let captures = regex.captures(text).unwrap();
src/re.rs:956         captures.iter().next().unwrap()
src/re.rs:957     }

The problem is that subcaptures can not outlive Captures struct, which is wrong. To fix this, we need to add new lifetime:

pub fn iter<'c>(&'c self) -> SubCaptures<'c, 't>

And:

pub struct SubCaptures<'c, 't: 'c> {
    idx: usize,
    caps: &'c Captures<'t>,
}

There is a similar problem with iter_named and SubCapturesNamed. Possibly related to #158

The text was updated successfully, but these errors were encountered:

BurntSushi · 2016-02-17T16:18:49Z

Man, what a bonehead move on my part. The current definition is trivially wrong:

pub struct SubCaptures<'t> {
    idx: usize,
    caps: &'t Captures<'t>,
}

How presumptuous of me to assume that the search text might not live as long as the capture value! In this case, I believe 't is inferred to be the shorter of the two lifetimes, which certainly can't be reconciled with the lifetime in your function declaration.

Nice find!

BurntSushi · 2016-02-17T16:19:52Z

This should also be bundled up into 1.0, since it will also be a breaking change. (But thankfully shouldn't actually result in a huge amount of breakage, since I doubt folks are actually uttering the SubCaptures type.)

defuz · 2016-02-17T17:09:21Z

About iter_pos and SubCapturesPos. I think it should be changed to this:

pub fn iter_pos<'c>(&'c self) -> SubCapturesPos<'c> { ... }

/// `'c` is the lifetime of `Captures` struct.
pub struct SubCapturesPos<'с> {
    idx: usize,
    locs: &'с [Option<usize>], // from Captures::locs
}

Also, I think we can correct documentation before 1.0: 't is not the lifetime of the matched text for SubCaptures, SubCapturesNamed and SubCapturesPos. It is lifetime of Captures borrow.

BurntSushi · 2016-02-17T17:16:55Z

Also, I think we can correct documentation before 1.0:

That'd be fine.

BurntSushi · 2016-03-13T19:02:57Z

Note that this is fixed in the bytes::Regex API.

This corrects a gaffe of mine. In particular, both types contain references to a `Captures` *and* the text that was searched, but only names one lifetime. In practice, this means that the shortest lifetime is used, which can be problematic for when one is trying to extract submatch text. This also fixes the lifetime annotation on `iter_pos`, which should be tied to the Captures and not the text. It was always possible to work around this by using indices. Fixes #168

regex 0.2 0.2.0 ===== This is a new major release of the regex crate, and is an implementation of the [regex 1.0 RFC](https://github.com/rust-lang/rfcs/blob/master/text/1620-regex-1.0.md). We are releasing a `0.2` first, and if there are no major problems, we will release a `1.0` shortly. For `0.2`, the minimum *supported* Rust version is 1.12. There are a number of **breaking changes** in `0.2`. They are split into two types. The first type correspond to breaking changes in regular expression syntax. The second type correspond to breaking changes in the API. Breaking changes for regex syntax: * POSIX character classes now require double bracketing. Previously, the regex `[:upper:]` would parse as the `upper` POSIX character class. Now it parses as the character class containing the characters `:upper:`. The fix to this change is to use `[[:upper:]]` instead. Note that variants like `[[:upper:][:blank:]]` continue to work. * The character `[` must always be escaped inside a character class. * The characters `&`, `-` and `~` must be escaped if any one of them are repeated consecutively. For example, `[&]`, `[\&]`, `[\&\&]`, `[&-&]` are all equivalent while `[&&]` is illegal. (The motivation for this and the prior change is to provide a backwards compatible path for adding character class set notation.) * A `bytes::Regex` now has Unicode mode enabled by default (like the main `Regex` type). This means regexes compiled with `bytes::Regex::new` that don't have the Unicode flag set should add `(?-u)` to recover the original behavior. Breaking changes for the regex API: * `find` and `find_iter` now **return `Match` values instead of `(usize, usize)`.** `Match` values have `start` and `end` methods, which return the match offsets. `Match` values also have an `as_str` method, which returns the text of the match itself. * The `Captures` type now only provides a single iterator over all capturing matches, which should replace uses of `iter` and `iter_pos`. Uses of `iter_named` should use the `capture_names` method on `Regex`. * The `replace` methods now return `Cow` values. The `Cow::Borrowed` variant is returned when no replacements are made. * The `Replacer` trait has been completely overhauled. This should only impact clients that implement this trait explicitly. Standard uses of the `replace` methods should continue to work unchanged. * The `quote` free function has been renamed to `escape`. * The `Regex::with_size_limit` method has been removed. It is replaced by `RegexBuilder::size_limit`. * The `RegexBuilder` type has switched from owned `self` method receivers to `&mut self` method receivers. Most uses will continue to work unchanged, but some code may require naming an intermediate variable to hold the builder. * The free `is_match` function has been removed. It is replaced by compiling a `Regex` and calling its `is_match` method. * The `PartialEq` and `Eq` impls on `Regex` have been dropped. If you relied on these impls, the fix is to define a wrapper type around `Regex`, impl `Deref` on it and provide the necessary impls. * The `is_empty` method on `Captures` has been removed. This always returns `false`, so its use is superfluous. * The `Syntax` variant of the `Error` type now contains a string instead of a `regex_syntax::Error`. If you were examining syntax errors more closely, you'll need to explicitly use the `regex_syntax` crate to re-parse the regex. * The `InvalidSet` variant of the `Error` type has been removed since it is no longer used. * Most of the iterator types have been renamed to match conventions. If you were using these iterator types explicitly, please consult the documentation for its new name. For example, `RegexSplits` has been renamed to `Split`. A number of bugs have been fixed: * [BUG #151](#151): The `Replacer` trait has been changed to permit the caller to control allocation. * [BUG #165](#165): Remove the free `is_match` function. * [BUG #166](#166): Expose more knobs (available in `0.1`) and remove `with_size_limit`. * [BUG #168](#168): Iterators produced by `Captures` now have the correct lifetime parameters. * [BUG #175](#175): Fix a corner case in the parsing of POSIX character classes. * [BUG #178](#178): Drop the `PartialEq` and `Eq` impls on `Regex`. * [BUG #179](#179): Remove `is_empty` from `Captures` since it always returns false. * [BUG #276](#276): Position of named capture can now be retrieved from a `Captures`. * [BUG #296](#296): Remove winapi/kernel32-sys dependency on UNIX. * [BUG #307](#307): Fix error on emscripten.

BurntSushi added the bug label Feb 17, 2016

BurntSushi added this to the 1.0 milestone Feb 17, 2016

defuz added a commit to defuz/regex that referenced this issue Feb 18, 2016

Fix rust-lang#168 and using Arc for named groups

422a4b4

defuz added a commit to defuz/regex that referenced this issue Feb 18, 2016

Fix rust-lang#168 and using Arc for named groups

e4f54de

defuz mentioned this issue Feb 18, 2016

Storing mapping from names to group indices into Regex #158

Merged

defuz added a commit to defuz/regex that referenced this issue Feb 19, 2016

Fix rust-lang#168 and using Arc for named groups

b5f1b79

BurntSushi mentioned this issue Mar 10, 2016

Add regex matching for &[u8]. #183

Merged

BurntSushi changed the title ~~Possible lifetime problem with subcaptures iterators~~ add second "capture" lifetime to SubCaptures and SubCapturesNamed Mar 16, 2016

BurntSushi mentioned this issue Apr 26, 2016

Fix lifetimes for subcapture iterators in str regexes #214

Closed

BurntSushi mentioned this issue Dec 31, 2016

regex 0.2 #310

Merged

bors closed this as completed in d12042b Dec 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add second "capture" lifetime to SubCaptures and SubCapturesNamed #168

add second "capture" lifetime to SubCaptures and SubCapturesNamed #168

defuz commented Feb 17, 2016

BurntSushi commented Feb 17, 2016

BurntSushi commented Feb 17, 2016

defuz commented Feb 17, 2016

BurntSushi commented Feb 17, 2016

BurntSushi commented Mar 13, 2016

add second "capture" lifetime to SubCaptures and SubCapturesNamed #168

add second "capture" lifetime to SubCaptures and SubCapturesNamed #168

Comments

defuz commented Feb 17, 2016

BurntSushi commented Feb 17, 2016

BurntSushi commented Feb 17, 2016

defuz commented Feb 17, 2016

BurntSushi commented Feb 17, 2016

BurntSushi commented Mar 13, 2016