Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify the types of built-in and custom word lists #16

Merged
merged 2 commits into from
Oct 30, 2023

Conversation

westonal
Copy link
Contributor

@westonal westonal commented Oct 29, 2023

Applies the same technique of T: AsRef<str> to fix #15

Original:

a
The core change is to make fetch_list return a Vec<String> using:

     .map(|s| s.to_string()) // makes new Strings from &str
     .collect::Vec<_>()

This would be possible without some of my more radical change: removing the custom build script in order to bundle the string lists as just strings.

@sts10
Copy link
Owner

sts10 commented Oct 29, 2023

Thank you so much for this work!

However, I've become a bit attached to the build script technique, if for nothing else than the speed boost.

I had been using the include_str! macro to read-in the word lists, but when I switched to a build script I got an almost 1000x improvement. (See #2)

As proof, if I move the let wordlist = fetch_list(ListChoice::Medium); line into the benchmark function in benches/generate_passphrase.rs, then run cargo bench on your branch, I got:

Generate a passphrase/as is
                        time:   [599.40 µs 605.70 µs 612.19 µs]
                        change: [+75936% +77584% +79182%] (p = 0.00 < 0.10)
                        Performance has regressed.

+77584%! (This same benchmark averages about 780 nanoseconds on my same machine.)

A more practical benchmark

We can also use Hyperfine as a more real-word benchmark. On my laptop, hyperfine -N phraze on your PR gives 2.1 ms, while on main it's generally 1.8 ms.

Now, certainly, one question here is "Is ~0.3 milliseconds an appreciable difference?" OR more specifically, "Is ~0.3 milliseconds worth a slightly 'messier' main function?" The pragmatist in me says "no", but Rust has turned me into a bit of a perfectionist, or at least one who doesn't want to sacrifice performance without a good reason, like security or usability.

This is a long way of saying I want to have my cake and to eat it too.

I guess we could try making fetch_list return a Vec<String> while maintaining the build script method and see how that affects this same benchmark? But it still feels a little painful to me to convert all those slices to Strings! (Also, I think I tried this approach at one point and ran into some issues?)

@sts10
Copy link
Owner

sts10 commented Oct 30, 2023

Just tried the approach of "convert built-in lists to Vec<String>" to see what would happen to the benchmark. My full code is in this branch, but here's the fetch_list function:

pub fn fetch_list(list_choice: ListChoice) -> Vec<String> {
    match list_choice {
        ListChoice::Long => WL_LONG
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
        ListChoice::Medium => WL_MEDIUM
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
        ListChoice::Qwerty => WL_QWERTY
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
        ListChoice::Alpha => WL_ALPHA
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
        ListChoice::Eff => WL_EFF
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
        ListChoice::Effshort => WL_EFFSHORT
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
        ListChoice::Mnemonicode => WL_MNEMONICODE
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
    }
}

The bummer is that this option also slows the benchmark way down.

Generate a passphrase/as is
                        time:   [434.76 µs 438.59 µs 442.86 µs]
                        change: [+52948% +53921% +54934%] (p = 0.00 < 0.10)
                        Performance has regressed.
Found 105 outliers among 1200 measurements (8.75%)
  41 (3.42%) high mild
  64 (5.33%) high severe

@westonal
Copy link
Contributor Author

I have completely changed the approach now to remove the duplicated code while keeping your performance as before.


When it comes to the way you generate code here, you might like to try writing a macro similar to include_str! but that does the line splitting at compile time, so something that would allow:

pub fn fetch_list(list_choice: ListChoice) -> &'static [&'static str] {
    match list_choice {
        ListChoice::Long => include_lines!("../word-lists/orchard-street-long.txt"),
        ListChoice::Medium => include_lines!("../word-lists/orchard-street-medium.txt"),
        ListChoice::Qwerty => include_lines!("../word-lists/orchard-street-qwerty.txt"),
        ListChoice::Alpha => include_lines!("../word-lists/orchard-street-alpha.txt"),
        ListChoice::Eff => include_lines!("../word-lists/eff-long.txt"),
        ListChoice::Effshort => include_lines!("../word-lists/eff-short-1.txt"),
        ListChoice::Mnemonicode => include_lines!("../word-lists/mnemonicode.txt"),
    }
}

The performance will be the same, but I think you will find it to be more idiomatic rust to use macros over code generation.

@westonal
Copy link
Contributor Author

Oh, of course, someone has already done it, here is a crate for include_lines! https://lib.rs/crates/include-lines

@sts10 sts10 merged commit 9187fd6 into sts10:main Oct 30, 2023
@sts10
Copy link
Owner

sts10 commented Oct 30, 2023

I have completely changed the approach now to remove the duplicated code while keeping your performance as before.

Brilliant! Clever! Merged! Thanks again.

@westonal westonal deleted the unify-types branch October 30, 2023 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up main function
2 participants