Add checksum cache #102

allenh1 · 2017-12-19T16:15:29Z

This is an experiment to speed up the generation of OE recipes.

tfoote

I have a few comments.

Does computing the hashes really become the bottle neck for the builds? And to confirm the archive name fully bakes the version etc?

tfoote · 2017-12-19T22:01:39Z

superflore/generators/bitbake/run.py

+            md5_cache = pickle.load(md5_file)
+            md5_file.close()
+        except IOError:
+            md5_cache = dict()


Should this be None so it's the same as if the option wasn't passed?

I think my line of thought there was they want to do a cache, so we'll initialize a dictionary to store that option, so we can check later that it's not None.

Sorry, this branch was far from ready for review... I was just opening this so you could see it.

tfoote · 2017-12-19T22:01:51Z

superflore/generators/bitbake/run.py

+            sha256_cache = pickle.load(sha256_file)
+            sha256_file.close()
+        except IOError:
+            sha256_cache = dict()


Same None here as above?

tfoote · 2017-12-19T22:02:29Z

superflore/generators/bitbake/run.py

+            md5_cache = dict()
+        try:
+            sha256_file = open('%s/md5_cache.pickle', 'rb')
+            sha256_cache = pickle.load(sha256_file)


Are there matching write/dump calls to update the cache with the new hashes somewhere?

Not yet -- this is still a WIP. Still deciding where to out those.

allenh1 · 2017-12-19T22:20:32Z

Does computing the hashes really become the bottle neck for the builds?

I'm not sure. OE is significantly slower than the ebuild generation, so I'm trying to figure out the source of the slow down. This is really the only extra step that's not in the ebuild generator, so I'm trying to speed things up here. Could be this does nothing.

But this also gives us the ability to remove the tar archives (as only the checksums are needed).

…recipe generation).

…an IO exception occurs.

allenh1 · 2017-12-21T00:26:20Z

@tfoote Ok, this is the full changeset. Let me know what you think. I'm going to do a benchmark to see if this helps at all.

This holds the full path from the file, including the distro and version.

allenh1 · 2017-12-21T00:57:21Z

Ok, here's the benchmark.

Without cache:

real: 9 minutes and 3 seconds.
user: 17.7 seconds
sys: 3.304 seconds

With cache:

real: 6 minutes and 37 seconds
user: 15.5 seconds
sys: 2.640 seconds

That's somewhat significant, especially in the real time. This also has the added advantage that we can remove the tars after the pickle cache generates.

tfoote

Sorry this ended up with a lot of suggestions on a new approach with a HashCache class suggested to allow code reuse and increase readability/decrease branching.

Let me know if my suggestion is clear enough.

tfoote · 2017-12-21T00:38:48Z

superflore/generators/bitbake/run.py

+    sha256_cache = None
+    if args.tar_archive_dir:
+        try:
+            md5_file = open('%s/md5_cache.pickle' % args.tar_archive_dir, 'rb')


I recommend using a context manager here to open and close the file automatically.

https://jeffknupp.com/blog/2016/03/07/python-with-context-managers/

tfoote · 2017-12-21T00:38:57Z

superflore/generators/bitbake/run.py

+        except IOError:
+            md5_cache = dict()
+        try:
+            sha256_file = open(


context manager here too

tfoote · 2017-12-21T00:41:55Z

superflore/generators/bitbake/yocto_recipe.py

+                self.downloadArchive()
+                md5_cache[self.getArchiveName()] = hashlib.md5(
+                    open(self.getArchiveName(), 'rb').read()).hexdigest()
+                md5_file = open('%s/md5_cache.pickle' % tar_dir, 'wb')


context manager here too

tfoote · 2017-12-21T00:42:04Z

superflore/generators/bitbake/yocto_recipe.py

+            if self.getArchiveName() not in sha256_cache:
+                sha256_cache[self.getArchiveName()] = hashlib.sha256(
+                    open(self.getArchiveName(), 'rb').read()).hexdigest()
+                sha256_file = open('%s/sha256_cache.pickle' % tar_dir, 'wb')


context manager here too

tfoote · 2017-12-21T00:43:53Z

superflore/generators/bitbake/yocto_recipe.py

-            open(self.getArchiveName(), 'rb').read()).hexdigest()
+        if md5_cache is not None:
+            if self.getArchiveName() not in md5_cache:
+                self.downloadArchive()


Ahh, this is likely the performance bottleneck. It has to download the archive before computing the hash.

tfoote · 2017-12-21T00:48:15Z

superflore/generators/bitbake/yocto_recipe.py

-            open(self.getArchiveName(), 'rb').read()).hexdigest()
+        if md5_cache is not None:
+            if self.getArchiveName() not in md5_cache:
+                self.downloadArchive()


Also you download the archive in both the if and else, but then don't download it in the sha256 since it's already there.

Also if you load the cache at the higher level you should write it out again at the higher level instead of each time you process a new package and just update the internal (md5)_cache object for each computation. And write the new cache values out on exit.

It might be worth making a dedicated class that loads holds, and computes the hashes conditionally. It can be parameterized on the hash function. And then you just pass it the md5_cache.get_hash(filename) and it will query if it's stored else compute it. If md5_cache is a context manager, you can pass it the cache filename and hash function. And it will load the file going in and dump the cache on the way out automatically.

Also if you load the cache at the higher level you should write it out again at the higher level instead of each time you process a new package and just update the internal (md5)_cache object for each computation. And write the new cache values out on exit.

That definitely sounds like a more pythonic way to do this. I'll get right on that.

… files.

allenh1 · 2017-12-21T04:04:44Z

Sorry this ended up with a lot of suggestions on a new approach with a HashCache class suggested to allow code reuse and increase readability/decrease branching.

@tfoote Upon implementing, this turned into the CacheManager class (which is just a bit more generic).

tfoote

One small change otherwise lgtm. I'm about to get on the plane so feel free to merge w/o rereview.

tfoote · 2017-12-22T05:40:26Z

superflore/generators/bitbake/yocto_recipe.py

-            open(self.getArchiveName(), 'rb').read()).hexdigest()
-        self.src_md5 = hashlib.md5(
-            open(self.getArchiveName(), 'rb').read()).hexdigest()
+        if self.getArchiveName() not in md5_cache and \


This should be an or in case one or the other is preinitialized but not the other.

I'll add this in #105 and merge this, since you're mid-travel.

tfoote · 2017-12-22T05:48:03Z

superflore/generators/bitbake/yocto_recipe.py

+        if self.getArchiveName() not in md5_cache and \
+           self.getArchiveName() not in sha256_cache:
+                self.downloadArchive()
+                md5_cache[self.getArchiveName()] = hashlib.md5(


This logic could be collapsed into the cache itself if you pass it the hashing function and a lambda or something that would allow it to call self.downloadArchive. But the complexity would be similar so this is fine.

* Import pickle to allow the tar dictionary to be read in (to speed up recipe generation). * Added logic to use the cached version of the hash in the yocto files. * Wrote a CacheManager class to implement a context manager on the cache files. * Fixed copyright line in TempFileManaager.py

allenh1 added enhancement in progress labels Dec 19, 2017

allenh1 self-assigned this Dec 19, 2017

tfoote requested changes Dec 19, 2017

View reviewed changes

allenh1 force-pushed the add-checksum-cache branch from 50b0f47 to 1fece31 Compare December 19, 2017 22:26

allenh1 added 8 commits December 20, 2017 18:23

Import pickle to allow the tar dictionary to be read in (to speed up …

8d43601

…recipe generation).

Added logic to use the cached version of the hash in the yocto files.

8016660

Close the files after loading the pickle, and don't open the file if …

10f1654

…an IO exception occurs.

Fix parameters.

c6ea623

s/close(file)/file.close()

00f7b9f

Add pickle dump.

1e74167

Fix typo.

afd7444

Actually uses the archive now.

f956ae3

allenh1 force-pushed the add-checksum-cache branch from c13e896 to f956ae3 Compare December 21, 2017 00:23

allenh1 added in review and removed in progress labels Dec 21, 2017

tfoote requested changes Dec 21, 2017

View reviewed changes

allenh1 added 4 commits December 20, 2017 21:44

Wrote a HashManager class to implement a context manager on the cache…

d306345

… files.

Fixed copyright line.

8a0ee0a

Fixed the HashManager class.

0c2151c

s/HashManager/CacheManager

02693b3

allenh1 added 2 commits December 20, 2017 22:12

Un-nest the clause.

ad9510a

Linting

feb59d7

tfoote approved these changes Dec 22, 2017

View reviewed changes

allenh1 merged commit 212ac2f into master Dec 22, 2017

allenh1 deleted the add-checksum-cache branch December 22, 2017 22:21

herb-kuta-lge removed the in review label Apr 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add checksum cache #102

Add checksum cache #102

allenh1 commented Dec 19, 2017

tfoote left a comment

tfoote Dec 19, 2017

allenh1 Dec 19, 2017

tfoote Dec 19, 2017

tfoote Dec 19, 2017

allenh1 Dec 19, 2017

allenh1 commented Dec 19, 2017

allenh1 commented Dec 21, 2017

allenh1 commented Dec 21, 2017

tfoote left a comment

tfoote Dec 21, 2017

tfoote Dec 21, 2017

tfoote Dec 21, 2017

tfoote Dec 21, 2017

tfoote Dec 21, 2017

tfoote Dec 21, 2017

allenh1 Dec 21, 2017

allenh1 commented Dec 21, 2017

tfoote left a comment

tfoote Dec 22, 2017

allenh1 Dec 22, 2017

tfoote Dec 22, 2017

Add checksum cache #102

Add checksum cache #102

Conversation

allenh1 commented Dec 19, 2017

tfoote left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allenh1 commented Dec 19, 2017

allenh1 commented Dec 21, 2017

allenh1 commented Dec 21, 2017

Without cache:

With cache:

tfoote left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allenh1 commented Dec 21, 2017

tfoote left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment