Skip to content

killercup/simd-utf8-check

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SIMD UTF8 Validation in Rust

After reading the post Validating UTF-8 strings using as little as 0.7 cycles per byte, I was curious if this algorithm might be a good fit for Rust's standard library. Because Rust's String type is guaranteed to be UTF8, you'll need to either use from_utf8 to convert an array of bytes to a String, or, if you trust the input, use the unsafe fn from_utf8_unchecked. The faster from_utf8 is, the more people can always use the safe version.

Of course, I'm not the first person to think of this, and this Rust PR already contains a super fast implementation, albeit one that that not use explicit SIMD intrinsics.

Benchmarks

Results

$ env RUSTFLAGS='-C target-cpu=native' cargo bench --quiet
# ...
$ open target/criterion/report/index.html

You can also find the rendered report here. There are two runs, the first without and the second with the target-cpu=native flag. This was benchmarked on a late 2016 MacBook Pro with an Intel i7 6700HQ CPU.

Currently, it looks like the current std impl is a bit faster for inputs that contain mostly ASCII, but the SIMD version gives a significant speedup when dealing with multi-byte codepoints.

Data