-
-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider precomputing a DAFSA or perfect hash for the list #35
Comments
Thanks for your investigation. The built-in data uses a sorted array, the lookup is a binary search, lookup performance is O(log n). But just have a look at current speed: on a Intel i3 using 1 core, I can do 5-10 million lookups per second (taken the PSL data as input). The libpsl code is reentrant, so you could use any number of cores in parallel - if that matters to you. But of course, a hashmap could boost the lookups by a factor of ~10 (roughly). For dynamic loaded PSL data I keep this in mind - initialization is roughly the same for a hashmap and for a vector. Code complexity is bit higher, but nothing to worry. |
Well, both the gperf and DAFSA code of Chromium share this property, so I don't think it is unique to your solution. The naive solution (which is similar to what you're using now) was roughly an order of magnitude larger than the gperf solution (which traded a one-time cost at build for faster load times), which itself was about 8X larger than what we currently use for the DAFSA. In Chrome, we found that library size (as in, the size of the .so/dll) to understandably contribute to load times and page faults, which is why we reoptimized things. Opera contributed the DAFSA code to make our PSL implementation smaller for mobile - 200K vs 35K is a BIG difference for small apps. My hope is that libpsl will optimize to use the smallest amount of memory possible (that is compatible with your complexity and performance goals), to make it more suitable for use in smaller devices. I filed the issue since it seemed like an easy win, low-hanging fruit, and friendly licensing of existing solutions :) |
I read about DAFSA - very nice concept, indeed (thanks for pointing out). I'll implement that definitely, i am just pretty busy right now. Hope, I can do that within the next weeks. |
DAFSA is in branch 'develop' now, thanks again for your request. Some measurements: Size of the library (stripped) here (amd64) is now 46816 vs 435352 before, a factor of 9.3 !!! |
Released in libpsl-0.12.0 Thanks for pointing out DAFSA ! |
A common strategy used by many PSL implementations is to optimize the PSL for both size and speed, given that it contains a complete set of the IANA Root Zone Database, in addition to public and private registerable domains.
Firefox currently uses a simple hash, Chromium previously used a gperf-generated hash tree, and presently uses a DAFSA, while Guava uses a trie
These solutions each make different trade-offs with respect to size (in memory) and speed, but all offer better performance than the current implementation, and are using relatively friendly licensing.
The text was updated successfully, but these errors were encountered: