Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex generation tests same case multiple times #70

Closed
danburkert opened this issue Jul 4, 2018 · 8 comments
Closed

regex generation tests same case multiple times #70

danburkert opened this issue Jul 4, 2018 · 8 comments
Labels

Comments

@danburkert
Copy link

With the regex pattern [a-z]{1, 5} and the assertion that the output length is even, proptest will repeatedly test the same (original) value over and over as it shrinks/expands:

"saanf"
"aanf"
"saanf"
"sanf"
"saanf"
"sanf"
"saanf"
"saaf"
"saanf"
"saan"
"saanf"
"jaanf"
"eaanf"
"caanf"
"baanf"
"aaanf"
"aaagf"
"aaadf"
"aaabf"
"aaaaf"
"aaaac"
"aaaab"
"aaaaa"

Here's the test for context:

        #[test]
        fn test_string_shrink(s in "[a-z]{1, 5}") {
            use env_logger;
            let _ = env_logger::try_init();
            info!("{:?}");
            assert!(s.len() % 2 != 0);
        }

This also demonstrates the shrinking method used by proptest can get 'stuck' at a local minimum (in this case aaaaa instead of a) because it doesn't try shrinking more than one character at once. This seems like a more difficult issue to fix, though. Perhaps there's some literature on effective string shrinking strategies out there?

@AltSysrq
Copy link
Collaborator

AltSysrq commented Jul 4, 2018

will repeatedly test the same (original) value over and over as it shrinks/expands:

This is an inevitable consequence of the design of proptest's shrinking process. In order to explore another simplification path, the shrinker must first return to a failing test case, which in your example means returning to the original input. It would be possible to redifine the complicate() step to avoid that, but IMO it's not worth the complexity to save a little time on failing tests.

the shrinking method used by proptest can get 'stuck' at a local minimum

Indeed. This isn't unique to strings. Shrinking for essentially all types more complex than bool have this property. Finding the simplest failing case in general has exponential complexity, so the shrinkers use heuristic approaches that run in linear time or less. In this case, the shrinker is built with the expectation that problematic strings are caused by individual characters or character sequences rather than the total length of the string, so it proceeds by deleting or simplifying characters one at a time.

@Eh2406
Copy link

Eh2406 commented Aug 23, 2018

Some context: I am working on a complicated strategy with slow tests associated with it. The current implementation filters out the parts of the generated input that doesn't make sense. This means that shrinking takes a long time (several hours), and that the same input is fed to the tests many times (hundreds). A better strategy may fix this situation (suggestions welcome), but that is the context I am coming from.

I wonder whether we could have an opt in setting for the proptest macro that when shrinking would cache the inputs and whether the test passed so when an input is regenerated the test could get skipped.

Edit: in my case it is not the regex causing the repeated test behavior. So I intended to suggest this as a more generally applicable fix.

@danburkert
Copy link
Author

I ended up writing my own String and Vec<u8> shrinkers to work around this, as well as to tailor shrinking to cases where the strings/byte strings are used as lexicographic keys. You can find the code here and here. I'd love to have it upstreamed into proptest proper, but realistically I don't have the time to do so myself.

@AltSysrq
Copy link
Collaborator

I wonder whether we could have an opt in setting for the proptest macro that when shrinking would cache the inputs

I meant to reply earlier that I really like this idea. It's been added in 0.8.6.

@Eh2406
Copy link

Eh2406 commented Aug 27, 2018

0.8.6 is just an amazing fount of features! Everything I needed, everything I asked for, everything I had made an offhand comment about, and some of the things I had only sort of thought might be worth suggesting.

@Eh2406
Copy link

Eh2406 commented Oct 30, 2018

Having now looked at a large number of shrink HashSet<String>, I'd love to have an alternative steiker that shrinks strings lexicographic, like how we have bits module. Not that I have any ideas how to implement that.

@danburkert
Copy link
Author

danburkert commented Oct 30, 2018 via email

@Eh2406
Copy link

Eh2406 commented Oct 30, 2018

Yes that is a good start, I'd want it to have the regex interfase of the current string generators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants