Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String API additions and SIMD optimizations #129

Open
2 of 38 tasks
mosra opened this issue Mar 31, 2022 · 0 comments
Open
2 of 38 tasks

String API additions and SIMD optimizations #129

mosra opened this issue Mar 31, 2022 · 0 comments
Milestone

Comments

@mosra
Copy link
Owner

mosra commented Mar 31, 2022

A meta-issue tracking various ideas for SIMD-optimized string algorithms. A reason why we're making our own string APIs is because the C string library is made for null-terminated strings, which is quite useless when the major use case is working on slices of larger strings, such as in parsers. And (of course) the C++ counterparts are too bloated and either impose an implicit allocation or require a too new C++ standard.

SIMD in general

Construction

Searching

Comparison

  • Check pointer equality before calling into memcmp() in StringView::operator==(), could save a lot especially when comparing literals that the compiler might have deduplicated
    • Don't do that in String tho
  • Would we gain anything by implementing memcmp() ourselves?
    • especially for SSO strings that have a fixed size, which could be a single (masked) instruction?
    • by not having to explicitly test for nullptr when size == 0 just to not hit an UB because the standard is stupid and generally disallows passing nullptr to any string/memory function even if the size is zero?
  • Case-insensitive comparison -- http://www.phoronix.com/scan.php?page=news_item&px=Glibc-strcasecmp-AVX2-EVEX
    • Possibly useful for extension comparison in Any* plugins, OTOH there it's probably faster to normalize the extension first and then do 100 memcmp()s

Unicode

Number-to-string

For Utility::Debug, Utility::format() etc. The core should be a direct overhead-less API working on builtin types (writing into a statically-sized char[], e.g.), with convenience wrappers above.

String-to-number

Because strto*() has insane usability issues.

General printing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: TODO
Development

No branches or pull requests

1 participant