Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting String is much faster than sorting Vec<u8> #8

Open
vigna opened this issue Feb 25, 2023 · 2 comments
Open

Sorting String is much faster than sorting Vec<u8> #8

vigna opened this issue Feb 25, 2023 · 2 comments

Comments

@vigna
Copy link

vigna commented Feb 25, 2023

I noticed that sorting String values is much faster than sorting the same values represented a Vec<u8>. I expected the second case to be faster. Is there an easy explanation?

@dapper91
Copy link
Owner

@vigna Hi. Could you provide your test? According to my measurements u8 typed items are sorted faster than String.

@vigna
Copy link
Author

vigna commented Feb 25, 2023

First, thank you for the prompt answer! You must be a little patient with me because I have a lot of experience in other languages but this is, well, my first Rust program.

It takes a directory with a number of large TSVs (the motivation was BlockChair files with cryptocurrency transitions), cuts some fields and sorts the result. There is an u8-vector (src/bin/u8.rs) version and a string (src/bin/string.rs) version. The u8 version uses ByteLines to read directly vectors of bytes. https://github.com/vigna/blockchair-sort/

You run it by passing the directory where the files are present and the list of fields. For example,

./target/release/u8 testoutputs 2 5

where testoutputs contains TSVs.

The u8 program is eight times slower than the string program on the same data and I'm really puzzled.

I have flamegraphs https://vigna.di.unimi.it/string.svg and https://vigna.di.unimi.it/u8.svg showing what happens on my hardware. I'm not that familiar with the inner workings of Rust but it looks like in the u8 case there is a lot of serialization/deserialization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants