Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas for improving performance #7

Closed
raphlinus opened this issue Mar 17, 2019 · 10 comments
Closed

Ideas for improving performance #7

raphlinus opened this issue Mar 17, 2019 · 10 comments

Comments

@raphlinus
Copy link

I'm interested in a very high performance representation of locales for skribo. I think what fluent-locale has is a good base, but have some ideas how to make it more performant, both in speed and in object size.

The main cost is likely the allocation of the many small String objects in a locale. There are existing tiny string implementations (tendril, inlinable_string, iString), but I think it's possible to do better by specializing to the needs of bcp47. Most of these strings are in the ballpark of 16 bytes each, and much of the cost is the need to spill to allocation when the strings get big. In bcp47, most of the subtags have a small, fixed maximum size.

I've prototyped a "tinystr" that uses a NonZeroU32 as its backing store, and thus takes 4 bytes, even when used as an option. It also uses SIMD-like math to verify ASCII and no NUL bytes. I'm happy to PR that into this repo, or make a separate crate (there are a number of file formats that use 4 byte tags, and this would be good for those). Use of this string type would probably not be a huge code change, as it doesn't fundamentally change the architecture, just the representation. There is unsafe code, but I think it should be possible to review it to get good confidence.

A more aggressive optimization is to use an enum between a fast-path and a general-case representation. The fast path would be optional 4 byte tiny strings for language, script, and region. The general case would be a boxed struct similar to the current one, but with an 8 byte tiny string for language and variant, and 4 byte tiny strings for the other subtags. This enum is 16 bytes on both 32 and 64 bit platforms.

I'm posting an issue to get a sense of how welcome these changes are, and also whether tinystr should be its own crate or just a source file in fluent-locale.

@jdm
Copy link

jdm commented Mar 17, 2019

Your tinystr sounds very similar to tendril.

@raphlinus
Copy link
Author

Similar goals, but a lot smaller (4 bytes vs 16) and optimized for some operations, for example clone and eq are <1ns in tinystr and (1, 5)ns each in tendril, after quick benchmarking.

@emilio
Copy link
Contributor

emilio commented Mar 18, 2019

Tendril is refcounted but tinystring wouldn't I suppose? If so it makes sense to have a separate crate for it...

I've seen some SmallString (SmolString?) somewhere akin to SmallVec, but I don't recall where... Maybe in rustc? May be worth looking into it as well

@zbraniecki
Copy link
Collaborator

Thank you for this proposal! I'd love to see the PR and am open to such change.

I was thinking recently about further specifying this crate as operating on Unicode BCP 47 Locale Identifier rather than BCP 47 Language Tag, which is what I've been going for already, but didn't specify it.

The other thing I wanted to do was to switch to Cow to let us use string slices out of the original string, when possible. So for example:

let loc = Locale::from("en-US");
loc.language; // Cow::Borrowed("en")
loc.region; // Cow::Borrowed("US")

but

let loc = Locale::from("en-us");
loc.language; // Cow::Borrowed("en")
loc.region; // Cow::Owned("US")

Would that work with your tinystr? Would it make sense?

@raphlinus
Copy link
Author

@zbraniecki Thanks, sounds like it makes sense to at least start working on this.

The motivation for tinystr here rather than something else is that it doesn't need a fallback for longer strings, because the spec mercifully has length limits. This has several advantages over a Cow, or strings borrowed from a serialization (as in rust-language-tags):

  • A Cow is 32 bytes, a tinystr is 4 (or 8).
  • There's no branch to deref (this is why eq is so fast, it's just a u32).
  • Cloning doesn't require allocation (of the serialization string).

Another fun feature of a tinystr is that I will be able to do capitalization using SIMD-like math. If it's known to be alpha only, it's a & 0x5f5f5f5f. In the general case, it's:

a & !(((a + 0x1f1f1f1f) & !(a + 0x05050505) & 0x80808080) >> 2)

As you can see, I love doing this bit-level optimization :)

@SimonSapin
Copy link

We also have https://github.com/servo/string-cache where Atom contains a u64 (these days we could probably make it NonZeroU64) which is a tag + either: an index into a static table for known strings, inline storage for up to 7 bytes, or a pointer to a heap-allocated atomic-refcounted entry in a global hash table.

@raphlinus
Copy link
Author

I have a patch that adds the types but doesn't use them apart from tests. Should I upload that, or combine with changes to the Locale struct to use them?

raphlinus added a commit to raphlinus/fluent-locale-rs that referenced this issue Mar 18, 2019
This commit adds a compact string representation, but doesn't wire them up.

Part of the plan for projectfluent#7
@zbraniecki
Copy link
Collaborator

hmm, what's your reason against releasing tinystr as a crate and using it in fluent-locale?

@raphlinus
Copy link
Author

I'm not opposed, it's just that I think its use case is extremely specialized to bcp-47. I asked on #servo and #rust and people weren't convinced it would be more generally useful. If we do find a use case, or if it's your preference, I'm more than happy to split it out.

@zbraniecki
Copy link
Collaborator

I'm going to close this issue since we now track it in unic-locale crate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants