regex generation tests same case multiple times #70

danburkert · 2018-07-04T17:56:05Z

With the regex pattern [a-z]{1, 5} and the assertion that the output length is even, proptest will repeatedly test the same (original) value over and over as it shrinks/expands:

"saanf"
"aanf"
"saanf"
"sanf"
"saanf"
"sanf"
"saanf"
"saaf"
"saanf"
"saan"
"saanf"
"jaanf"
"eaanf"
"caanf"
"baanf"
"aaanf"
"aaagf"
"aaadf"
"aaabf"
"aaaaf"
"aaaac"
"aaaab"
"aaaaa"

Here's the test for context:

        #[test]
        fn test_string_shrink(s in "[a-z]{1, 5}") {
            use env_logger;
            let _ = env_logger::try_init();
            info!("{:?}");
            assert!(s.len() % 2 != 0);
        }

This also demonstrates the shrinking method used by proptest can get 'stuck' at a local minimum (in this case aaaaa instead of a) because it doesn't try shrinking more than one character at once. This seems like a more difficult issue to fix, though. Perhaps there's some literature on effective string shrinking strategies out there?

The text was updated successfully, but these errors were encountered:

AltSysrq · 2018-07-04T19:29:26Z

will repeatedly test the same (original) value over and over as it shrinks/expands:

This is an inevitable consequence of the design of proptest's shrinking process. In order to explore another simplification path, the shrinker must first return to a failing test case, which in your example means returning to the original input. It would be possible to redifine the complicate() step to avoid that, but IMO it's not worth the complexity to save a little time on failing tests.

the shrinking method used by proptest can get 'stuck' at a local minimum

Indeed. This isn't unique to strings. Shrinking for essentially all types more complex than bool have this property. Finding the simplest failing case in general has exponential complexity, so the shrinkers use heuristic approaches that run in linear time or less. In this case, the shrinker is built with the expectation that problematic strings are caused by individual characters or character sequences rather than the total length of the string, so it proceeds by deleting or simplifying characters one at a time.

Eh2406 · 2018-08-23T15:10:51Z

Some context: I am working on a complicated strategy with slow tests associated with it. The current implementation filters out the parts of the generated input that doesn't make sense. This means that shrinking takes a long time (several hours), and that the same input is fed to the tests many times (hundreds). A better strategy may fix this situation (suggestions welcome), but that is the context I am coming from.

I wonder whether we could have an opt in setting for the proptest macro that when shrinking would cache the inputs and whether the test passed so when an input is regenerated the test could get skipped.

Edit: in my case it is not the regex causing the repeated test behavior. So I intended to suggest this as a more generally applicable fix.

danburkert · 2018-08-23T16:27:29Z

I ended up writing my own String and Vec<u8> shrinkers to work around this, as well as to tailor shrinking to cases where the strings/byte strings are used as lexicographic keys. You can find the code here and here. I'd love to have it upstreamed into proptest proper, but realistically I don't have the time to do so myself.

AltSysrq · 2018-08-27T01:18:37Z

I wonder whether we could have an opt in setting for the proptest macro that when shrinking would cache the inputs

I meant to reply earlier that I really like this idea. It's been added in 0.8.6.

Eh2406 · 2018-08-27T01:38:49Z

0.8.6 is just an amazing fount of features! Everything I needed, everything I asked for, everything I had made an offhand comment about, and some of the things I had only sort of thought might be worth suggesting.

Eh2406 · 2018-10-30T19:56:46Z

Having now looked at a large number of shrink HashSet<String>, I'd love to have an alternative steiker that shrinks strings lexicographic, like how we have bits module. Not that I have any ideas how to implement that.

danburkert · 2018-10-30T20:38:55Z

I’ve implemented it here: https://github.com/danburkert/kudu-rs/blob/master/src/prop/string.rs. Feel free to take that and adapt as necessary. There’s also a version for binary in another module.

…

On Tue, Oct 30, 2018 at 12:56 PM Jacob Finkelman ***@***.***> wrote: Having now looked at a large number of shrink HashSet<String>, I'd love to have an alternative steiker that shrinks strings lexicographic, like how we have bits module <https://altsysrq.github.io/rustdoc/proptest/latest/proptest/bits/index.html>. Not that I have any ideas how to implement that. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#70 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAJx8osj9ehGDzFf-AIF9KLOiv18egqwks5uqK7_gaJpZM4VC5l0> .

Eh2406 · 2018-10-30T20:47:46Z

Yes that is a good start, I'd want it to have the regex interfase of the current string generators.

AltSysrq added the wontfix label Aug 20, 2018

AltSysrq closed this as completed Aug 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regex generation tests same case multiple times #70

regex generation tests same case multiple times #70

danburkert commented Jul 4, 2018

AltSysrq commented Jul 4, 2018

Eh2406 commented Aug 23, 2018 •

edited

Loading

danburkert commented Aug 23, 2018

AltSysrq commented Aug 27, 2018

Eh2406 commented Aug 27, 2018

Eh2406 commented Oct 30, 2018

danburkert commented Oct 30, 2018 via email

Eh2406 commented Oct 30, 2018

regex generation tests same case multiple times #70

regex generation tests same case multiple times #70

Comments

danburkert commented Jul 4, 2018

AltSysrq commented Jul 4, 2018

Eh2406 commented Aug 23, 2018 • edited Loading

danburkert commented Aug 23, 2018

AltSysrq commented Aug 27, 2018

Eh2406 commented Aug 27, 2018

Eh2406 commented Oct 30, 2018

danburkert commented Oct 30, 2018 via email

Eh2406 commented Oct 30, 2018

Eh2406 commented Aug 23, 2018 •

edited

Loading