Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index out of bounds panic when using shortest_match_at, is_match_at with end-anchored regex #969

Closed
wabain opened this issue Mar 24, 2023 · 2 comments · Fixed by #970
Closed
Labels

Comments

@wabain
Copy link

wabain commented Mar 24, 2023

What version of regex are you using?

1.7.2, 1.7.1

Describe the bug at a high level.

Regex::shortest_match_at panics for certain regex patterns when given seemingly valid inputs. From the stack trace it looks like it might be specific to patterns that generate reversed FSMs.

What are the steps to reproduce the behavior?

(playground)

fn main() {
    let re = regex::Regex::new(r"c.*d\z").unwrap();
    println!("{:?}", re.shortest_match_at("ababcd", 4));
}

The same backtrace occurs if shortest_match_at is replaced with is_match_at.

What is the actual behavior?

Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.96s
     Running `target/debug/playground`
thread 'main' panicked at 'index out of bounds: the len is 2 but the index is 6', /playground/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/regex-1.7.1/src/dfa.rs:1444:54
stack backtrace:
   0: rust_begin_unwind
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/panicking.rs:64:14
   2: core::panicking::panic_bounds_check
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/panicking.rs:159:5
   3: regex::dfa::Fsm::start_flags_reverse
             at ./.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/regex-1.7.1/src/dfa.rs:1444:54
   4: regex::dfa::Fsm::reverse
             at ./.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/regex-1.7.1/src/dfa.rs:495:42
   5: <regex::exec::ExecNoSync as regex::re_trait::RegularExpression>::shortest_match_at
             at ./.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/regex-1.7.1/src/exec.rs:457:23
   6: <regex::exec::ExecNoSyncStr as regex::re_trait::RegularExpression>::shortest_match_at
             at ./.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/regex-1.7.1/src/exec.rs:397:9
   7: regex::re_unicode::Regex::shortest_match_at
             at ./.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/regex-1.7.1/src/re_unicode.rs:629:9
   8: playground::main
             at ./src/main.rs:3:22
   9: core::ops::function::FnOnce::call_once
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

What is the expected behavior?

I believe the output should be Some(6) in this case. Since 4 is the index of a character boundary in the input text I think that it certainly shouldn't panic.

@wabain wabain changed the title Index out of bounds panic when using shortest_match_at with end-anchored regex Index out of bounds panic when using shortest_match_at, is_match_at with end-anchored regex Mar 24, 2023
BurntSushi added a commit that referenced this issue Mar 24, 2023
In turns out that in *some* calls to Fsm::reverse, we were passing an
incorrect start offset. Namely, the haystack we pass is sub-sliced at
`&text[start..]`, but in some places, we were passing `text.len()` as
the start offset of the reverse search. But of course, it should be
`text.len() - start`. This was indeed the case in most places, but it
looks like it needed to be corrected in two additional places.

I've also added this test to regex-automata's set of regression tests
and can confirm that it doesn't happen there. (regex-automata is far
more principled about handling offsets like this.)

Fixes #969
@BurntSushi BurntSushi added the bug label Mar 24, 2023
@BurntSushi
Copy link
Member

Nice find! I've got a PR up in #970. You were right that it was something about the reverse DFA. In some places where it is called, the offsets passed to it were incorrect. This only happened in some calls and only when the start of the search wasn't 0, so it was not well covered by existing tests.

BurntSushi added a commit that referenced this issue Mar 25, 2023
In turns out that in *some* calls to Fsm::reverse, we were passing an
incorrect start offset. Namely, the haystack we pass is sub-sliced at
`&text[start..]`, but in some places, we were passing `text.len()` as
the start offset of the reverse search. But of course, it should be
`text.len() - start`. This was indeed the case in most places, but it
looks like it needed to be corrected in two additional places.

I've also added this test to regex-automata's set of regression tests
and can confirm that it doesn't happen there. (regex-automata is far
more principled about handling offsets like this.)

Fixes #969
@BurntSushi
Copy link
Member

This bug is fixed in regex 1.7.3 on crates.io.

crapStone pushed a commit to Calciumdibromid/CaBr2 that referenced this issue Apr 11, 2023
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [regex](https://github.com/rust-lang/regex) | dependencies | patch | `1.7.1` -> `1.7.3` |

---

### Release Notes

<details>
<summary>rust-lang/regex</summary>

### [`v1.7.3`](https://github.com/rust-lang/regex/blob/HEAD/CHANGELOG.md#&#8203;173-2023-03-24)

[Compare Source](rust-lang/regex@1.7.2...1.7.3)

\==================
This is a small release that fixes a bug in `Regex::shortest_match_at` that
could cause it to panic, even when the offset given is valid.

Bug fixes:

-   [BUG #&#8203;969](rust-lang/regex#969):
    Fix a bug in how the reverse DFA was called for `Regex::shortest_match_at`.

### [`v1.7.2`](https://github.com/rust-lang/regex/blob/HEAD/CHANGELOG.md#&#8203;172-2023-03-21)

[Compare Source](rust-lang/regex@1.7.1...1.7.2)

\==================
This is a small release that fixes a failing test on FreeBSD.

Bug fixes:

-   [BUG #&#8203;967](rust-lang/regex#967):
    Fix "no stack overflow" test which can fail due to the small stack size.

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS4xOS4wIiwidXBkYXRlZEluVmVyIjoiMzUuNDAuMiJ9-->

Co-authored-by: cabr2-bot <cabr2.help@gmail.com>
Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1827
Reviewed-by: crapStone <crapstone@noreply.codeberg.org>
Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants