Unify the types of built-in and custom word lists #16

westonal · 2023-10-29T18:37:28Z

Applies the same technique of T: AsRef<str> to fix #15

Original:

a
The core change is to make fetch_list return a Vec<String> using:

~~.map(|s| s.to_string()) // makes new Strings from &str .collect::Vec<_>()~~

~~This would be possible without some of my more radical change: removing the custom build script in order to bundle the string lists as just strings.~~

sts10 · 2023-10-29T20:25:27Z

Thank you so much for this work!

However, I've become a bit attached to the build script technique, if for nothing else than the speed boost.

I had been using the include_str! macro to read-in the word lists, but when I switched to a build script I got an almost 1000x improvement. (See #2)

As proof, if I move the let wordlist = fetch_list(ListChoice::Medium); line into the benchmark function in benches/generate_passphrase.rs, then run cargo bench on your branch, I got:

Generate a passphrase/as is
                        time:   [599.40 µs 605.70 µs 612.19 µs]
                        change: [+75936% +77584% +79182%] (p = 0.00 < 0.10)
                        Performance has regressed.

+77584%! (This same benchmark averages about 780 nanoseconds on my same machine.)

A more practical benchmark

We can also use Hyperfine as a more real-word benchmark. On my laptop, hyperfine -N phraze on your PR gives 2.1 ms, while on main it's generally 1.8 ms.

Now, certainly, one question here is "Is ~0.3 milliseconds an appreciable difference?" OR more specifically, "Is ~0.3 milliseconds worth a slightly 'messier' main function?" The pragmatist in me says "no", but Rust has turned me into a bit of a perfectionist, or at least one who doesn't want to sacrifice performance without a good reason, like security or usability.

This is a long way of saying I want to have my cake and to eat it too.

I guess we could try making fetch_list return a Vec<String> while maintaining the build script method and see how that affects this same benchmark? But it still feels a little painful to me to convert all those slices to Strings! (Also, I think I tried this approach at one point and ran into some issues?)

sts10 · 2023-10-30T00:03:38Z

Just tried the approach of "convert built-in lists to Vec<String>" to see what would happen to the benchmark. My full code is in this branch, but here's the fetch_list function:

pub fn fetch_list(list_choice: ListChoice) -> Vec<String> {
    match list_choice {
        ListChoice::Long => WL_LONG
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
        ListChoice::Medium => WL_MEDIUM
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
        ListChoice::Qwerty => WL_QWERTY
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
        ListChoice::Alpha => WL_ALPHA
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
        ListChoice::Eff => WL_EFF
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
        ListChoice::Effshort => WL_EFFSHORT
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
        ListChoice::Mnemonicode => WL_MNEMONICODE
            .iter()
            .map(|w| w.to_string())
            .collect::<Vec<String>>(),
    }
}

The bummer is that this option also slows the benchmark way down.

Generate a passphrase/as is
                        time:   [434.76 µs 438.59 µs 442.86 µs]
                        change: [+52948% +53921% +54934%] (p = 0.00 < 0.10)
                        Performance has regressed.
Found 105 outliers among 1200 measurements (8.75%)
  41 (3.42%) high mild
  64 (5.33%) high severe

westonal · 2023-10-30T00:11:23Z

I have completely changed the approach now to remove the duplicated code while keeping your performance as before.

When it comes to the way you generate code here, you might like to try writing a macro similar to include_str! but that does the line splitting at compile time, so something that would allow:

pub fn fetch_list(list_choice: ListChoice) -> &'static [&'static str] {
    match list_choice {
        ListChoice::Long => include_lines!("../word-lists/orchard-street-long.txt"),
        ListChoice::Medium => include_lines!("../word-lists/orchard-street-medium.txt"),
        ListChoice::Qwerty => include_lines!("../word-lists/orchard-street-qwerty.txt"),
        ListChoice::Alpha => include_lines!("../word-lists/orchard-street-alpha.txt"),
        ListChoice::Eff => include_lines!("../word-lists/eff-long.txt"),
        ListChoice::Effshort => include_lines!("../word-lists/eff-short-1.txt"),
        ListChoice::Mnemonicode => include_lines!("../word-lists/mnemonicode.txt"),
    }
}

The performance will be the same, but I think you will find it to be more idiomatic rust to use macros over code generation.

Fixes sts10#15

westonal · 2023-10-30T00:19:30Z

Oh, of course, someone has already done it, here is a crate for include_lines! https://lib.rs/crates/include-lines

sts10 · 2023-10-30T00:27:26Z

I have completely changed the approach now to remove the duplicated code while keeping your performance as before.

Brilliant! Clever! Merged! Thanks again.

westonal force-pushed the unify-types branch from e59b41e to 80763cc Compare October 30, 2023 00:04

Unify the types of built-in and custom word lists.

f8ce518

Fixes sts10#15

westonal force-pushed the unify-types branch from 80763cc to f8ce518 Compare October 30, 2023 00:17

Merge branch 'main' into unify-types

cc623cd

sts10 merged commit 9187fd6 into sts10:main Oct 30, 2023

westonal deleted the unify-types branch October 30, 2023 00:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify the types of built-in and custom word lists #16

Unify the types of built-in and custom word lists #16

westonal commented Oct 29, 2023 •

edited

Loading

sts10 commented Oct 29, 2023 •

edited

Loading

sts10 commented Oct 30, 2023

westonal commented Oct 30, 2023

westonal commented Oct 30, 2023

sts10 commented Oct 30, 2023

Unify the types of built-in and custom word lists #16

Unify the types of built-in and custom word lists #16

Conversation

westonal commented Oct 29, 2023 • edited Loading

sts10 commented Oct 29, 2023 • edited Loading

A more practical benchmark

sts10 commented Oct 30, 2023

westonal commented Oct 30, 2023

westonal commented Oct 30, 2023

sts10 commented Oct 30, 2023

westonal commented Oct 29, 2023 •

edited

Loading

sts10 commented Oct 29, 2023 •

edited

Loading