Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce cache to speed things up and cleanup method #171

Merged
merged 1 commit into from
Jul 19, 2021

Conversation

lzap
Copy link
Contributor

@lzap lzap commented Jun 18, 2021

Whatsup folks! A colleague of mine added puppetdb into our tests and I am extremely happy to see more test coverage thanks to this wonderful effort. But our tests are now a minute slower. We can do better, come on! :-)

This patch helps a bit with two problems - CPU and memory.

First, from what I understand every filter string should generate the very same result. So why not to cache both input (filter string) and output? I use thread-local variable for that. It memorizes the last filter seen and if there is a match, it quickly returns the filtered data. This will be also thread-safe.

Now this will only help in cases when facts are requested with same filter, however it looks like that is exactly what we do in our test suite. We construct facterdb and then get facts for several combinations of OSes testing one fact after another. This patch shaves off a lot of time in our case. And if you think our project might be some kind of special case, well, I measured tests of the facterdb itself.

Without this patch:

real	0m12,418s
user	0m11,805s
sys	0m0,508s

With this patch:

real	0m6,395s
user	0m5,799s
sys	0m0,514s

That's a reasonable 50% speedup if I do my math correctly. Not bad for Friday, not bad. I am not stopping there tho.

Second, once facts are loaded into memory and stored in the @database field, there is no way to reclaim the memory back. The files currently weights about 15MB and it will probably only grow. So I added a new method called cleanup which allows Ruby VM to reclaim this memory back. Not problem for this project, but our project has about 4000 other tests to execute after its done with facterdb. Typically users will call this method in global teardown or after(:suite) or similar.

I also have a suggestion which I am not going to work on: I understand that the database is populated into one huge JSON string during runtime. Why this isn't part of the build process? That huge JSON could be distributed in the gem and built via a Rake task so there is no need to load it from individual files. Just a tought.

Cheers!

@@ -11,6 +11,13 @@ def self.database
@database ||= "[#{facterdb_fact_files.map { |f| read_json_file(f) }.join(',')}]\n"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way this should have been probably in a synchronized block, but that's another story.

@ekohl
Copy link
Member

ekohl commented Jun 18, 2021

This may make voxpupuli/rspec-puppet-facts@43bb17e less needed. One thing to note is that there I chose to use marshal as a way to deep clone in order to prevent in place modifications would corrupt the cache.

@ekohl
Copy link
Member

ekohl commented Jun 18, 2021

I also have a suggestion which I am not going to work on: I understand that the database is populated into one huge JSON string during runtime. Why this isn't part of the build process? That huge JSON could be distributed in the gem and built via a Rake task so there is no need to load it from individual files. Just a tought.

That's an interesting thought. Looks like it'll require some fiddling in the gemspec (now it uses git ls-files) and the release step.

This gem was just migrated to Voxpupuli last week and we haven't had a chance to convert from Travis to Github Actions, but the release actions we use should make it quite easy to add another file. I'll look into this.

@lzap
Copy link
Contributor Author

lzap commented Jun 18, 2021

This may make voxpupuli/rspec-puppet-facts@43bb17e less needed

Probably, on the other hand the cleanup method is something that should be probably added into there so users do not need to care.

I'll look into this.

It might not speedup things much, I think the biggest offender is the JGrep but I might be wrong.

If there is a chance to see a release after this is merged, we would be able to rebase our patch and use the cleanup method from the day one. That would be cool.

filter_str = generate_filter_str(filter)
JGrep.jgrep(database, filter_str).map { |hash| Hash[hash.map{ |k, v| [k.to_sym, v] }] }
result = JGrep.jgrep(database, filter_str).map { |hash| Hash[hash.map{ |k, v| [k.to_sym, v] }] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this is why only the keys are symbols while it should really be something like deep_symbolize or use something like https://api.rubyonrails.org/classes/ActiveSupport/HashWithIndifferentAccess.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the implementation enough, but my general rule of thumb is - for big dictionaries, stay away from symbolizing/indifferent access it can be quite slow. I suggest to use just strings and convert all input to strings as well if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that's an API change that's quite big. That's why HashWithIndifferentAccess is probably a cleaner solution.

@bastelfreak
Copy link
Member

@lzap thanks for the PR. Can you please rebase? That should enable CI again

@lzap
Copy link
Contributor Author

lzap commented Jun 18, 2021

Sure, done.

Copy link
Member

@bastelfreak bastelfreak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but it would be awesome if someone like @alexjfisher or @ekohl , that really understand ruby, could take a look as well.

Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (only the last entry) does match most use cases and where it doesn't, the extra CPU time is neglible. Thanks!

@bastelfreak bastelfreak merged commit 763f6bb into voxpupuli:master Jul 19, 2021
@lzap lzap deleted the introduce-cache branch July 21, 2021 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants