-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pnomolos/faster #2
base: master
Are you sure you want to change the base?
Conversation
@joelparkerhenderson Testing in all encodings is turning out to be more of a pain than I expected. What are your thoughts on this? Should |
@joelparkerhenderson I've pulled in your latest changes and updated the benchmark. |
Is it possible to get this PR merged ASAP, addressable gem using this as a dependency and this takes lot of memory. I found this pr when when I was about to raise one with some optimizations which are already done in this PR |
Thanks! I'll go through these today & tomorrow. First glance it all looks fine. |
@pnomolos You're asking great questions. I propose that a reliable way for now is to do just UTF8, and also append that to the gem name i.e. name the gem |
BTW it is better to gitignore coverage files |
@redtachyons Thanks, I'll ignore /coverage/ and /coverage.data, in an independent commit. I'll also change from listing the dependencies in Gemfile to using the "gemspec" line as above, in an independent commit. And also ignore the Gemfile.lock file. |
@joelparkerhenderson We're still using this gem internally - if I rework this commit would you consider pulling it in and cutting a new version of the gem? |
Yes. What version of Ruby are you using? What I suggest, if you want, is I can get the gem up to date with Ruby 3. I have time this weekend. Then could you add your improvements on top of that? (Or do you prefer a different way?) |
Hi Joel, If you have time to do that it would be great, as we're hoping to move to Ruby 3 sometime later this year. Thanks! |
Hi @joelparkerhenderson if you don't have bandwidth to get to this I'd be fine doing two PRs - one for Ruby 3 support and an updated one for this. Would that be OK with you? Thanks, Phil |
@joelparkerhenderson After running way too many benchmarks and trying to come up with far too many possible algorithms I believe I have a winner - results here.
I tested the various algorithms (using benchmark-ipsa) against several sources (wikipedia articles and project gutenberg books). Not included in the benchmarks are also some tests against single characters, and against the entire set of characters. I realized we should probably add tests for various encodings as well, I'll try and add those in soonish.
The libunac implementation also blows everything else away (as I expected) but it needs some cleanup and a few more checks.
As well, per my comment (#1 (comment)) should I change the output of those two characters?
Algorithm three was a very interesting one as you'll notice from the results.
very low memory usage (as I believe it would be a single allocation in C) but speed results were all over the place. I'm curious if there's some sort of cutoff in terms of string length where we could alternate between the two, but I haven't had a chance to investigate that far.