Introduce cache to speed things up and cleanup method #171

lzap · 2021-06-18T08:49:08Z

Whatsup folks! A colleague of mine added puppetdb into our tests and I am extremely happy to see more test coverage thanks to this wonderful effort. But our tests are now a minute slower. We can do better, come on! :-)

This patch helps a bit with two problems - CPU and memory.

First, from what I understand every filter string should generate the very same result. So why not to cache both input (filter string) and output? I use thread-local variable for that. It memorizes the last filter seen and if there is a match, it quickly returns the filtered data. This will be also thread-safe.

Now this will only help in cases when facts are requested with same filter, however it looks like that is exactly what we do in our test suite. We construct facterdb and then get facts for several combinations of OSes testing one fact after another. This patch shaves off a lot of time in our case. And if you think our project might be some kind of special case, well, I measured tests of the facterdb itself.

Without this patch:

real	0m12,418s
user	0m11,805s
sys	0m0,508s

With this patch:

real	0m6,395s
user	0m5,799s
sys	0m0,514s

That's a reasonable 50% speedup if I do my math correctly. Not bad for Friday, not bad. I am not stopping there tho.

Second, once facts are loaded into memory and stored in the @database field, there is no way to reclaim the memory back. The files currently weights about 15MB and it will probably only grow. So I added a new method called cleanup which allows Ruby VM to reclaim this memory back. Not problem for this project, but our project has about 4000 other tests to execute after its done with facterdb. Typically users will call this method in global teardown or after(:suite) or similar.

I also have a suggestion which I am not going to work on: I understand that the database is populated into one huge JSON string during runtime. Why this isn't part of the build process? That huge JSON could be distributed in the gem and built via a Rake task so there is no need to load it from individual files. Just a tought.

Cheers!

lzap · 2021-06-18T08:55:04Z

lib/facterdb.rb

@@ -11,6 +11,13 @@ def self.database
    @database ||= "[#{facterdb_fact_files.map { |f| read_json_file(f) }.join(',')}]\n"


By the way this should have been probably in a synchronized block, but that's another story.

ekohl · 2021-06-18T08:55:30Z

This may make voxpupuli/rspec-puppet-facts@43bb17e less needed. One thing to note is that there I chose to use marshal as a way to deep clone in order to prevent in place modifications would corrupt the cache.

ekohl · 2021-06-18T08:59:54Z

I also have a suggestion which I am not going to work on: I understand that the database is populated into one huge JSON string during runtime. Why this isn't part of the build process? That huge JSON could be distributed in the gem and built via a Rake task so there is no need to load it from individual files. Just a tought.

That's an interesting thought. Looks like it'll require some fiddling in the gemspec (now it uses git ls-files) and the release step.

This gem was just migrated to Voxpupuli last week and we haven't had a chance to convert from Travis to Github Actions, but the release actions we use should make it quite easy to add another file. I'll look into this.

lzap · 2021-06-18T09:16:03Z

This may make voxpupuli/rspec-puppet-facts@43bb17e less needed

Probably, on the other hand the cleanup method is something that should be probably added into there so users do not need to care.

I'll look into this.

It might not speedup things much, I think the biggest offender is the JGrep but I might be wrong.

If there is a chance to see a release after this is merged, we would be able to rebase our patch and use the cleanup method from the day one. That would be cool.

ekohl · 2021-06-18T09:43:17Z

lib/facterdb.rb

    filter_str = generate_filter_str(filter)
-    JGrep.jgrep(database, filter_str).map { |hash| Hash[hash.map{ |k, v| [k.to_sym, v] }] }
+    result = JGrep.jgrep(database, filter_str).map { |hash| Hash[hash.map{ |k, v| [k.to_sym, v] }] }


Note to self: this is why only the keys are symbols while it should really be something like deep_symbolize or use something like https://api.rubyonrails.org/classes/ActiveSupport/HashWithIndifferentAccess.html

I do not understand the implementation enough, but my general rule of thumb is - for big dictionaries, stay away from symbolizing/indifferent access it can be quite slow. I suggest to use just strings and convert all input to strings as well if possible.

Yes, but that's an API change that's quite big. That's why HashWithIndifferentAccess is probably a cleaner solution.

bastelfreak · 2021-06-18T10:40:59Z

@lzap thanks for the PR. Can you please rebase? That should enable CI again

lzap · 2021-06-18T10:48:45Z

Sure, done.

bastelfreak

This looks good to me, but it would be awesome if someone like @alexjfisher or @ekohl , that really understand ruby, could take a look as well.

ekohl

This (only the last entry) does match most use cases and where it doesn't, the extra CPU time is neglible. Thanks!

lzap mentioned this pull request Jun 18, 2021

Fixes #27906 - Always prefer modern facts in Facter theforeman/foreman#8449

Merged

lzap commented Jun 18, 2021

View reviewed changes

lzap force-pushed the introduce-cache branch from bcac887 to e9f5411 Compare June 18, 2021 09:19

ekohl reviewed Jun 18, 2021

View reviewed changes

Introduce cache and cleanup

4da6b1f

lzap force-pushed the introduce-cache branch from e9f5411 to 4da6b1f Compare June 18, 2021 10:48

bastelfreak approved these changes Jul 16, 2021

View reviewed changes

ekohl approved these changes Jul 19, 2021

View reviewed changes

bastelfreak merged commit 763f6bb into voxpupuli:master Jul 19, 2021

bastelfreak added the enhancement label Jul 19, 2021

lzap deleted the introduce-cache branch July 21, 2021 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce cache to speed things up and cleanup method #171

Introduce cache to speed things up and cleanup method #171

lzap commented Jun 18, 2021

lzap Jun 18, 2021

ekohl commented Jun 18, 2021

ekohl commented Jun 18, 2021

lzap commented Jun 18, 2021

ekohl Jun 18, 2021

lzap Jun 18, 2021

ekohl Jun 18, 2021

bastelfreak commented Jun 18, 2021

lzap commented Jun 18, 2021

bastelfreak left a comment

ekohl left a comment

		@@ -11,6 +11,13 @@ def self.database
		@database \|\|= "[#{facterdb_fact_files.map { \|f\| read_json_file(f) }.join(',')}]\n"

Introduce cache to speed things up and cleanup method #171

Introduce cache to speed things up and cleanup method #171

Conversation

lzap commented Jun 18, 2021

lzap Jun 18, 2021

Choose a reason for hiding this comment

ekohl commented Jun 18, 2021

ekohl commented Jun 18, 2021

lzap commented Jun 18, 2021

ekohl Jun 18, 2021

Choose a reason for hiding this comment

lzap Jun 18, 2021

Choose a reason for hiding this comment

ekohl Jun 18, 2021

Choose a reason for hiding this comment

bastelfreak commented Jun 18, 2021

lzap commented Jun 18, 2021

bastelfreak left a comment

Choose a reason for hiding this comment

ekohl left a comment

Choose a reason for hiding this comment