Ideas for improving performance #7

raphlinus · 2019-03-17T14:44:30Z

I'm interested in a very high performance representation of locales for skribo. I think what fluent-locale has is a good base, but have some ideas how to make it more performant, both in speed and in object size.

The main cost is likely the allocation of the many small String objects in a locale. There are existing tiny string implementations (tendril, inlinable_string, iString), but I think it's possible to do better by specializing to the needs of bcp47. Most of these strings are in the ballpark of 16 bytes each, and much of the cost is the need to spill to allocation when the strings get big. In bcp47, most of the subtags have a small, fixed maximum size.

I've prototyped a "tinystr" that uses a NonZeroU32 as its backing store, and thus takes 4 bytes, even when used as an option. It also uses SIMD-like math to verify ASCII and no NUL bytes. I'm happy to PR that into this repo, or make a separate crate (there are a number of file formats that use 4 byte tags, and this would be good for those). Use of this string type would probably not be a huge code change, as it doesn't fundamentally change the architecture, just the representation. There is unsafe code, but I think it should be possible to review it to get good confidence.

A more aggressive optimization is to use an enum between a fast-path and a general-case representation. The fast path would be optional 4 byte tiny strings for language, script, and region. The general case would be a boxed struct similar to the current one, but with an 8 byte tiny string for language and variant, and 4 byte tiny strings for the other subtags. This enum is 16 bytes on both 32 and 64 bit platforms.

I'm posting an issue to get a sense of how welcome these changes are, and also whether tinystr should be its own crate or just a source file in fluent-locale.

The text was updated successfully, but these errors were encountered:

jdm · 2019-03-17T16:06:01Z

Your tinystr sounds very similar to tendril.

raphlinus · 2019-03-17T21:01:23Z

Similar goals, but a lot smaller (4 bytes vs 16) and optimized for some operations, for example clone and eq are <1ns in tinystr and (1, 5)ns each in tendril, after quick benchmarking.

emilio · 2019-03-18T04:31:28Z

Tendril is refcounted but tinystring wouldn't I suppose? If so it makes sense to have a separate crate for it...

I've seen some SmallString (SmolString?) somewhere akin to SmallVec, but I don't recall where... Maybe in rustc? May be worth looking into it as well

zbraniecki · 2019-03-18T07:22:05Z

Thank you for this proposal! I'd love to see the PR and am open to such change.

I was thinking recently about further specifying this crate as operating on Unicode BCP 47 Locale Identifier rather than BCP 47 Language Tag, which is what I've been going for already, but didn't specify it.

The other thing I wanted to do was to switch to Cow to let us use string slices out of the original string, when possible. So for example:

let loc = Locale::from("en-US");
loc.language; // Cow::Borrowed("en")
loc.region; // Cow::Borrowed("US")

but

let loc = Locale::from("en-us");
loc.language; // Cow::Borrowed("en")
loc.region; // Cow::Owned("US")

Would that work with your tinystr? Would it make sense?

raphlinus · 2019-03-18T14:32:03Z

@zbraniecki Thanks, sounds like it makes sense to at least start working on this.

The motivation for tinystr here rather than something else is that it doesn't need a fallback for longer strings, because the spec mercifully has length limits. This has several advantages over a Cow, or strings borrowed from a serialization (as in rust-language-tags):

A Cow is 32 bytes, a tinystr is 4 (or 8).
There's no branch to deref (this is why eq is so fast, it's just a u32).
Cloning doesn't require allocation (of the serialization string).

Another fun feature of a tinystr is that I will be able to do capitalization using SIMD-like math. If it's known to be alpha only, it's a & 0x5f5f5f5f. In the general case, it's:

a & !(((a + 0x1f1f1f1f) & !(a + 0x05050505) & 0x80808080) >> 2)

As you can see, I love doing this bit-level optimization :)

SimonSapin · 2019-03-18T17:35:32Z

We also have https://github.com/servo/string-cache where Atom contains a u64 (these days we could probably make it NonZeroU64) which is a tag + either: an index into a static table for known strings, inline storage for up to 7 bytes, or a pointer to a heap-allocated atomic-refcounted entry in a global hash table.

raphlinus · 2019-03-18T21:32:14Z

I have a patch that adds the types but doesn't use them apart from tests. Should I upload that, or combine with changes to the Locale struct to use them?

This commit adds a compact string representation, but doesn't wire them up. Part of the plan for projectfluent#7

zbraniecki · 2019-03-18T22:56:59Z

hmm, what's your reason against releasing tinystr as a crate and using it in fluent-locale?

raphlinus · 2019-03-18T23:03:43Z

I'm not opposed, it's just that I think its use case is extremely specialized to bcp-47. I asked on #servo and #rust and people weren't convinced it would be more generally useful. If we do find a use case, or if it's your preference, I'm more than happy to split it out.

zbraniecki · 2019-07-24T18:27:22Z

I'm going to close this issue since we now track it in unic-locale crate.

raphlinus added a commit to raphlinus/fluent-locale-rs that referenced this issue Mar 18, 2019

Add TinyStr types

acfe366

This commit adds a compact string representation, but doesn't wire them up. Part of the plan for projectfluent#7

raphlinus mentioned this issue Mar 18, 2019

Beginning of "tinystr" optimization #8

Closed

zbraniecki mentioned this issue May 14, 2019

Ideas for better performance zbraniecki/unic-locale#2

Closed

zbraniecki closed this as completed Jul 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for improving performance #7

Ideas for improving performance #7

raphlinus commented Mar 17, 2019

jdm commented Mar 17, 2019

raphlinus commented Mar 17, 2019

emilio commented Mar 18, 2019

zbraniecki commented Mar 18, 2019

raphlinus commented Mar 18, 2019

SimonSapin commented Mar 18, 2019

raphlinus commented Mar 18, 2019

zbraniecki commented Mar 18, 2019

raphlinus commented Mar 18, 2019

zbraniecki commented Jul 24, 2019

Ideas for improving performance #7

Ideas for improving performance #7

Comments

raphlinus commented Mar 17, 2019

jdm commented Mar 17, 2019

raphlinus commented Mar 17, 2019

emilio commented Mar 18, 2019

zbraniecki commented Mar 18, 2019

raphlinus commented Mar 18, 2019

SimonSapin commented Mar 18, 2019

raphlinus commented Mar 18, 2019

zbraniecki commented Mar 18, 2019

raphlinus commented Mar 18, 2019

zbraniecki commented Jul 24, 2019