Add `:fast_ascii` mode to `String.valid?/2` #12360

mtrudel · 2023-01-23T16:57:50Z

Based on the discussion on #12354, this PR adds an optional :fast_ascii option to String.valid?/2 (based on the bit56 algorithm discussed there). I've confirmed that this implementation yields the same benefits as observed previously:

Benchmark (OTP26, ARM)

iex(3)> Benchee.run(
...(3)>   %{
...(3)>     "stock" => fn {valid, input} -> ^valid = String.valid?(input) end,
...(3)>     "fast_ascii" => fn {valid, input} -> ^valid = String.valid?(input, :fast_ascii) end,
...(3)>   },
...(3)>   time: 10,
...(3)>   memory_time: 2,
...(3)>   inputs: %{
...(3)>     1 => {false, String.duplicate("a", 0) <> <<128::8>>},
...(3)>     4 => {false, String.duplicate("a", 3) <> <<128::8>>},
...(3)>     8 => {false, String.duplicate("a", 7) <> <<128::8>>},
...(3)>     16 => {false, String.duplicate("a", 15) <> <<128::8>>},
...(3)>     32 => {false, String.duplicate("a", 31) <> <<128::8>>},
...(3)>     64 => {false, String.duplicate("a", 63) <> <<128::8>>},
...(3)>     128 => {false, String.duplicate("a", 127) <> <<128::8>>},
...(3)>     256 => {false, String.duplicate("a", 255) <> <<128::8>>},
...(3)>     512 => {false, String.duplicate("a", 511) <> <<128::8>>},
...(3)>     1024 => {false, String.duplicate("a", 1023) <> <<128::8>>},
...(3)>     2048 => {false, String.duplicate("a", 2047) <> <<128::8>>},
...(3)>     4096 => {false, String.duplicate("a", 4095) <> <<128::8>>}
...(3)>   }
...(3)> )
Operating System: macOS
CPU Information: Apple M1
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.15.0-dev
Erlang 26.0-rc0

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: 1, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096
Estimated total run time: 5.60 min

Benchmarking fast_ascii with input 1 ...
Benchmarking fast_ascii with input 4 ...
Benchmarking fast_ascii with input 8 ...
Benchmarking fast_ascii with input 16 ...
Benchmarking fast_ascii with input 32 ...
Benchmarking fast_ascii with input 64 ...
Benchmarking fast_ascii with input 128 ...
Benchmarking fast_ascii with input 256 ...
Benchmarking fast_ascii with input 512 ...
Benchmarking fast_ascii with input 1024 ...
Benchmarking fast_ascii with input 2048 ...
Benchmarking fast_ascii with input 4096 ...
Benchmarking stock with input 1 ...
Benchmarking stock with input 4 ...
Benchmarking stock with input 8 ...
Benchmarking stock with input 16 ...
Benchmarking stock with input 32 ...
Benchmarking stock with input 64 ...
Benchmarking stock with input 128 ...
Benchmarking stock with input 256 ...
Benchmarking stock with input 512 ...
Benchmarking stock with input 1024 ...
Benchmarking stock with input 2048 ...
Benchmarking stock with input 4096 ...

##### With input 1 #####
Name                 ips        average  deviation         median         99th %
stock             2.33 M      429.50 ns  ±8326.31%         333 ns         500 ns
fast_ascii        1.75 M      571.29 ns  ±7050.58%         417 ns        1375 ns

Comparison:
stock             2.33 M
fast_ascii        1.75 M - 1.33x slower +141.79 ns

Memory usage statistics:

Name          Memory usage
stock              0.95 KB
fast_ascii         1.21 KB - 1.28x memory usage +0.27 KB

**All measurements for memory usage were the same**

##### With input 4 #####
Name                 ips        average  deviation         median         99th %
stock             2.27 M      440.72 ns  ±8391.93%         333 ns         500 ns
fast_ascii        1.70 M      588.49 ns  ±5044.09%         458 ns        1375 ns

Comparison:
stock             2.27 M
fast_ascii        1.70 M - 1.34x slower +147.77 ns

Memory usage statistics:

Name          Memory usage
stock              0.95 KB
fast_ascii         1.21 KB - 1.28x memory usage +0.27 KB

**All measurements for memory usage were the same**

##### With input 8 #####
Name                 ips        average  deviation         median         99th %
stock             2.22 M      449.63 ns  ±8262.54%         333 ns         500 ns
fast_ascii        1.75 M      571.52 ns  ±6966.38%         417 ns        1417 ns

Comparison:
stock             2.22 M
fast_ascii        1.75 M - 1.27x slower +121.89 ns

Memory usage statistics:

Name          Memory usage
stock              0.95 KB
fast_ascii         1.21 KB - 1.28x memory usage +0.27 KB

**All measurements for memory usage were the same**

##### With input 16 #####
Name                 ips        average  deviation         median         99th %
stock             2.12 M      472.24 ns  ±8088.62%         375 ns         542 ns
fast_ascii        1.74 M      575.32 ns  ±6022.48%         417 ns        1375 ns

Comparison:
stock             2.12 M
fast_ascii        1.74 M - 1.22x slower +103.08 ns

Memory usage statistics:

Name          Memory usage
stock              0.95 KB
fast_ascii         1.21 KB - 1.28x memory usage +0.27 KB

**All measurements for memory usage were the same**

##### With input 32 #####
Name                 ips        average  deviation         median         99th %
stock             1.91 M      523.51 ns  ±7420.64%         417 ns         583 ns
fast_ascii        1.69 M      591.28 ns  ±4932.73%         458 ns        1416 ns

Comparison:
stock             1.91 M
fast_ascii        1.69 M - 1.13x slower +67.78 ns

Memory usage statistics:

Name          Memory usage
stock              0.95 KB
fast_ascii         1.21 KB - 1.28x memory usage +0.27 KB

**All measurements for memory usage were the same**

##### With input 64 #####
Name                 ips        average  deviation         median         99th %
fast_ascii        1.74 M      575.27 ns  ±5064.74%         417 ns        1416 ns
stock             1.69 M      591.09 ns  ±4528.66%         500 ns         667 ns

Comparison:
fast_ascii        1.74 M
stock             1.69 M - 1.03x slower +15.82 ns

Memory usage statistics:

Name          Memory usage
fast_ascii         1.21 KB
stock              0.95 KB - 0.78x memory usage -0.26563 KB

**All measurements for memory usage were the same**

##### With input 128 #####
Name                 ips        average  deviation         median         99th %
fast_ascii        1.63 M      612.55 ns  ±4946.12%         459 ns        1333 ns
stock             1.33 M      751.79 ns  ±3756.57%         666 ns         833 ns

Comparison:
fast_ascii        1.63 M
stock             1.33 M - 1.23x slower +139.23 ns

Memory usage statistics:

Name          Memory usage
fast_ascii         1.21 KB
stock              0.95 KB - 0.78x memory usage -0.26563 KB

**All measurements for memory usage were the same**

##### With input 256 #####
Name                 ips        average  deviation         median         99th %
fast_ascii        1.56 M        0.64 μs  ±4380.88%        0.50 μs        0.79 μs
stock             0.93 M        1.07 μs  ±2111.72%        0.96 μs        1.17 μs

Comparison:
fast_ascii        1.56 M
stock             0.93 M - 1.67x slower +0.43 μs

Memory usage statistics:

Name          Memory usage
fast_ascii         1.21 KB
stock              0.95 KB - 0.78x memory usage -0.26563 KB

**All measurements for memory usage were the same**

##### With input 512 #####
Name                 ips        average  deviation         median         99th %
fast_ascii        1.45 M        0.69 μs  ±4132.72%        0.54 μs        0.79 μs
stock             0.57 M        1.76 μs  ±1141.26%        1.63 μs        1.83 μs

Comparison:
fast_ascii        1.45 M
stock             0.57 M - 2.54x slower +1.07 μs

Memory usage statistics:

Name          Memory usage
fast_ascii         1.21 KB
stock              0.95 KB - 0.78x memory usage -0.26563 KB

**All measurements for memory usage were the same**

##### With input 1024 #####
Name                 ips        average  deviation         median         99th %
fast_ascii        1.18 M        0.85 μs  ±3472.48%        0.71 μs        0.88 μs
stock            0.194 M        5.15 μs  ±2568.66%        3.04 μs           5 μs

Comparison:
fast_ascii        1.18 M
stock            0.194 M - 6.09x slower +4.30 μs

Memory usage statistics:

Name          Memory usage
fast_ascii         1.21 KB
stock              0.95 KB - 0.78x memory usage -0.26563 KB

**All measurements for memory usage were the same**

##### With input 2048 #####
Name                 ips        average  deviation         median         99th %
fast_ascii        1.01 M        0.99 μs  ±2266.22%        0.88 μs        1.17 μs
stock            0.169 M        5.93 μs   ±239.39%        5.54 μs       17.62 μs

Comparison:
fast_ascii        1.01 M
stock            0.169 M - 5.96x slower +4.94 μs

Memory usage statistics:

Name          Memory usage
fast_ascii         1.21 KB
stock              0.95 KB - 0.78x memory usage -0.26563 KB

**All measurements for memory usage were the same**

##### With input 4096 #####
Name                 ips        average  deviation         median         99th %
fast_ascii      699.67 K        1.43 μs  ±1280.99%        1.33 μs        1.54 μs
stock            92.49 K       10.81 μs    ±75.20%       10.62 μs       12.16 μs

Comparison:
fast_ascii      699.67 K
stock            92.49 K - 7.56x slower +9.38 μs

Memory usage statistics:

Name          Memory usage
fast_ascii         1.21 KB
stock              0.95 KB - 0.78x memory usage -0.26563 KB

**All measurements for memory usage were the same**

lib/elixir/lib/string.ex

sabiwara · 2023-01-24T11:56:45Z

Hi @mtrudel! These optimizations look quite promising 🤩

I was thinking of an alternative where we wouldn't need to introduce an extra mode argument, by just optimistically running the ASCII-only loop first and run the slower loop on the first mismatch (basically, ASCII until proven otherwise). The obvious downside is that it won't try to optimize mixed inputs that might still contain a lot of ASCII.
The benefit would be that the user doesn't need to be concerned about it, and it should hopefully pick a reasonable strategy for both ASCII and mixed.

  # optimistic loop, able to process big ASCII-only binaries very fast
  def valid?(<<a::56, rest::bits>>) when Bitwise.band(0x80808080808080, a) == 0 do
    valid?(rest)
  end

  # slower loop for other cases
  def valid?(other) when is_binary(other), do: valid_non_only_ascii?(other)

  defp valid_non_only_ascii?(<<_::utf8, rest::bits>>), do: valid_non_only_ascii?(rest)
  defp valid_non_only_ascii?(<<>>), do: true
  defp valid_non_only_ascii?(_), do: false

These early benchmarks look promising, especially with inlining, but I didn't check with various inputs and haven't installed OTP26.

WDYT?

mtrudel · 2023-01-24T15:44:12Z

Hi @mtrudel! These optimizations look quite promising 🤩

I was thinking of an alternative where we wouldn't need to introduce an extra mode argument, by just optimistically running the ASCII-only loop first and run the slower loop on the first mismatch (basically, ASCII until proven otherwise). The obvious downside is that it won't try to optimize mixed inputs that might still contain a lot of ASCII. The benefit would be that the user doesn't need to be concerned about it, and it should hopefully pick a reasonable strategy for both ASCII and mixed.
  # optimistic loop, able to process big ASCII-only binaries very fast
  def valid?(<<a::56, rest::bits>>) when Bitwise.band(0x80808080808080, a) == 0 do
    valid?(rest)
  end

  # slower loop for other cases
  def valid?(other) when is_binary(other), do: valid_non_only_ascii?(other)

  defp valid_non_only_ascii?(<<_::utf8, rest::bits>>), do: valid_non_only_ascii?(rest)
  defp valid_non_only_ascii?(<<>>), do: true
  defp valid_non_only_ascii?(_), do: false
These early benchmarks look promising, especially with inlining, but I didn't check with various inputs and haven't installed OTP26.

WDYT?

That's a really interesting idea! I like it, but there are three aspects of it that may preclude it:

I suspect a lot of strings are almost entirely ASCII, with only a few non-ASCII characters (such as serialized JSON where the only non-ASCII characters are a diacritic or two in a 'name' field, or a comment field with a single emoji character in it). The approach you outline would bail after encountering the first such character, which would narrow the benefit to truly all-ascii strings. By way of illustration, passing your comment through this version of the function would have caused validation to switch to the slow path on the first line (this one too: 🤣)
There's an aspect of non-determinism to the approach that feels out of place in an 'explicit is better than implicit' worldview (even moreso in a standard library function). Users may find runtimes to vary wildly on seemingly similar inputs, and surfacing the different types of input and their expected runtimes here may be difficult to make clear to users.
The runtime of this approach on earlier OTPs is generally worse than the existing implementation for all inputs. Without giving users a switch for this, we'd be decreasing performance for a good set of users.

Happy to hew either way here, depending on how others feel!

sabiwara · 2023-01-24T23:30:39Z

@mtrudel thank you for the detailed answers, these are all great points.

Indeed, I see how it would fail to optimize a lot of legit cases (good job using my own comment to convince me 😂)
Great point, I can see the determinism argument being used for :fast_ascii too actualy. Engineers might assume :fast_ascii makes sense after an early benchmark using English, and later get degraded performance when non-English speakers start using the software and JSON payloads get filled with massive chunks of non-ASCII text (Arabic, Chinese...). Maybe if we can have some non-ASCII (e.g. user input), we should assume there could be a lot of it?
I see. Maybe this could be addressed by adding a compile-time check on System.otp_release() I suppose?

Happy to hew either way here, depending on how others feel!

Same for me, I don't have any strong opinion, just wanted to share the idea.

josevalim · 2023-01-25T07:38:27Z

I think this patch is good to go, thank you! One last concern is: if the Erlang/OTP team decide to accept an optimized UTF-8 validation (from the simdutf8 library), then this patch may be pointless. According to the paper, the simdutf8 is several times faster than the ascii check implemented in C. So my suggestion is to keep this around until Erlang/OTP 26 is out. :)

dvic · 2023-01-25T09:36:05Z

Just dropping my 2 cents regarding the naming of the argument mode: even though the documentation says that the validation should be the some, mode makes it sound like it changes the behaviour and not the implementation. Not a huge deal but maybe a better name for the argument is algorithm?

mtrudel · 2023-01-25T14:58:33Z

Not a huge deal but maybe a better name for the argument is algorithm?

I'd taken the naming from String.downcase/2, though as you mention that actually changes the behaviour not just the implementation. I'll update this here.

mtrudel · 2023-01-25T15:05:52Z

if the Erlang/OTP team decide to accept an optimized UTF-8 validation (from the simdutf8 library), then this patch may be pointless.

Agreed! The absolute best approach here would be to have a :unicode.valid?/1 function backed by a native implementation such as simdutf8 (or, frankly, even any of the other implementations as a pure-NIF). If / when that happens (I'm still planning on working it up and submitting it upstream, but it's not going to happen 'soon'), we'd still want to keep these versions around as long as Elixir supports Erlang/OTP versions that predate its addition. So I don't think this work is wasted in the meantime.

So my suggestion is to keep this around until Erlang/OTP 26 is out. :)

Sure! Their release milestone doesn't mention anything about it, but seeing as this work is of no benefit on earlier versions, there's no real rush. Whatever you think is easiest!

josevalim · 2023-01-25T15:32:53Z

Oh, I thought you were planning to submit a PR with simdutf8 for Erlang/OTP 26. I typically do a draft PR, only to show the numbers, and if they approve it I tidy everything up with tests, docs, and so on. But no worries.

mtrudel · 2023-01-25T16:09:28Z

Oh, I thought you were planning to submit a PR with simdutf8 for Erlang/OTP 26. I typically do a draft PR, only to show the numbers, and if they approve it I tidy everything up with tests, docs, and so on. But no worries.

I very much do plan to submit such a PR, but looking at all the things I'm trying to get done for ElixirConf EU (big news on the Bandit front!), (plus my real job 😄) means I'm not going to be able to do it on that timeline. I'm hoping, roughly, to get this done May-ish.

sabiwara · 2023-01-26T00:19:16Z

lib/elixir/lib/string.ex

+
+  Note that the `:fast_ascii` algorithm does not affect correctness, you can expect the output of
+  `String.valid?/2` to be the same regardless of algorithm. The only difference to be expected is
+  one of performance, which can be expected to improve roughly quadratically in string length


improve roughly quadratically

I'm struggling a bit to understand this one, given that both algorithms are linear in string length.
I would have assumed the ratio to be capped by a maximum constant value?
Sorry if I misunderstood.

That's a great point! I messed up here by taking an engineering approach ('the graph's fitting curve is quadratic!') rather than a first principles approach (in which both approaches are obviously linear by inspection, as you correctly point out).

My error was basically failing 'chart literacy 101'. Here's the chart that I based my conclusion on:

Looks quadratic, gets fit well by a quadratic trend line, so it must be quadratic. That's as far as my thought process went. But look at the x axis! It's not linear! If I graph the same data as a proper scatter plot (ie: on a linear x axis), we get:

which fits linear-ish (despite visually not fitting well, the R value is pretty strong).

So yeah. The improvement is actually linear as you suspect.

I'll update the docs to reflect this. Good catch!

Thank you for clarifying ❤️
Based on theory and the new graph, I wonder if we won't see a glass ceiling appear if we add a couple order magnitudes on string length?

lib/elixir/lib/string.ex

Co-authored-by: peter madsen <petermm@gmail.com>

mtrudel · 2023-02-10T16:56:33Z

Is this issue deadlocked? Just to be clear, from my perspective this is ready to go. Happy to do more work here if there's something missing...

josevalim · 2023-02-10T17:07:37Z

💚 💙 💜 💛 ❤️

mtrudel · 2023-03-08T21:11:08Z

Note to future other humans interested in this: http://0x80.pl/notesen/2023-03-06-swar-find-any.html

codeadict · 2024-02-02T15:20:26Z

Is adding the simdutf8 algorithm to OTP still desired? I can dedicate some time to this

mtrudel · 2024-02-02T15:30:10Z

Is adding the simdutf8 algorithm to OTP still desired? I can dedicate some time to this

Yes! That would be extremely welcome!

Add :fast_ascii mode to String.valid?/2

ce6cbb5

josevalim reviewed Jan 23, 2023

View reviewed changes

lib/elixir/lib/string.ex Outdated Show resolved Hide resolved

josevalim reviewed Jan 23, 2023

View reviewed changes

lib/elixir/lib/string.ex Outdated Show resolved Hide resolved

Apply suggestions from code review

9fe2dc3

Replace 'mode' with 'algorithm'

29303b7

sabiwara reviewed Jan 26, 2023

View reviewed changes

Correct String.valid?/2 growth documentation & line wrap

18775a7

petermm reviewed Feb 1, 2023

View reviewed changes

lib/elixir/lib/string.ex Outdated Show resolved Hide resolved

Update lib/elixir/lib/string.ex

a002057

Co-authored-by: peter madsen <petermm@gmail.com>

josevalim merged commit d83c57f into elixir-lang:main Feb 10, 2023

codeadict mentioned this pull request Mar 8, 2024

Revisit UTF-8 validation ninenines/cowlib#136

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `:fast_ascii` mode to `String.valid?/2` #12360

Add `:fast_ascii` mode to `String.valid?/2` #12360

mtrudel commented Jan 23, 2023

sabiwara commented Jan 24, 2023

mtrudel commented Jan 24, 2023 •

edited

Loading

sabiwara commented Jan 24, 2023

josevalim commented Jan 25, 2023

dvic commented Jan 25, 2023

mtrudel commented Jan 25, 2023

mtrudel commented Jan 25, 2023

josevalim commented Jan 25, 2023

mtrudel commented Jan 25, 2023 •

edited

Loading

sabiwara Jan 26, 2023

mtrudel Jan 26, 2023

sabiwara Jan 26, 2023

mtrudel commented Feb 10, 2023 •

edited

Loading

josevalim commented Feb 10, 2023

mtrudel commented Mar 8, 2023

codeadict commented Feb 2, 2024

mtrudel commented Feb 2, 2024

Add :fast_ascii mode to String.valid?/2 #12360

Add :fast_ascii mode to String.valid?/2 #12360

Conversation

mtrudel commented Jan 23, 2023

sabiwara commented Jan 24, 2023

mtrudel commented Jan 24, 2023 • edited Loading

sabiwara commented Jan 24, 2023

josevalim commented Jan 25, 2023

dvic commented Jan 25, 2023

mtrudel commented Jan 25, 2023

mtrudel commented Jan 25, 2023

josevalim commented Jan 25, 2023

mtrudel commented Jan 25, 2023 • edited Loading

sabiwara Jan 26, 2023

Choose a reason for hiding this comment

mtrudel Jan 26, 2023

Choose a reason for hiding this comment

sabiwara Jan 26, 2023

Choose a reason for hiding this comment

mtrudel commented Feb 10, 2023 • edited Loading

josevalim commented Feb 10, 2023

mtrudel commented Mar 8, 2023

codeadict commented Feb 2, 2024

mtrudel commented Feb 2, 2024

Add `:fast_ascii` mode to `String.valid?/2` #12360

Add `:fast_ascii` mode to `String.valid?/2` #12360

mtrudel commented Jan 24, 2023 •

edited

Loading

mtrudel commented Jan 25, 2023 •

edited

Loading

mtrudel commented Feb 10, 2023 •

edited

Loading