Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cachePath Option for Preflight to Utilize Pre-cached Container Images #1162

Open
ansvu opened this issue May 9, 2024 · 5 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@ansvu
Copy link

ansvu commented May 9, 2024

Is your feature request related to a problem? Please describe

Our goal is to identify bottlenecks in the CNF certification process for partners.
Specifically, we're looking at container image certification, where the preflight scanning tool becomes a significant time factor for larger images (over 94 images) and can take up to 5 hours.

One key issue we found is that preflight currently doesn't leverage local image caching.
This means it rescans the same images repeatedly and pull them every scan, contributing to the overall processing time.

crane.Pull: https://github.com/redhat-openshift-ecosystem/openshift-preflight/blob/main/internal/engine/engine.go#L125-L126

Interestingly, we observed that the preflight tool creates a temporary cache directory in /tmp/ during the scan. However, this cache is removed after the scan completes.

ls -lrt /tmp/preflight-1377051767/cache/
total 361652
   13824 May  8 16:23 sha256:b59d9bf6e38fd1dbb79287e5baf321b7ea2e9d059dd2ea6c865c2d5b00f8f783
    1536 May  8 16:23 sha256:247bb054bd68b03cd4c8c5d87d7098b405eae5ced4c1ed8252679a2585c170b3
   53760 May  8 16:23 sha256:200948edae012c73ec552b12fd3af34e33136f2f6e5b979542a7601ea4b5e20c
   11776 May  8 16:23 sha256:4b63ca3044336e5a7266ca0248b317331cc3fddadb36cec1591a108402f2c354
    7168 May  8 16:23 sha256:adbf2aca5ac3472bb9f6721ac41ea71e8234dee6925fffa1241ad7911820412a
95157760 May  8 16:23 sha256:d8d8484a6de73a08582e978aa4e31ada8b14874e1530813762eeb6a6c58cd240
56533504 May  8 16:23 sha256:836f2831d809e5131d7e8d1ddb3cd139a44471c7d77fbde6057bd2638261f4d7
18535936 May  8 16:23 sha256:15e504407dcb25e6c40b22d78cc5c852c3485d3240b4d47d7852a963fdbfc195

Created cache: https://github.com/redhat-openshift-ecosystem/openshift-preflight/blob/main/internal/engine/engine.go#L143-L148

Removed tmpdir(cache): https://github.com/redhat-openshift-ecosystem/openshift-preflight/blob/main/internal/engine/engine.go#L138

Describe the solution you'd like.

Implementing an option to use pre-cached images through a cachePath parameter would be a valuable improvement. This would eliminate the need to repeatedly pull container images on every scan, significantly reducing processing time.

Describe alternatives you've considered.

Based on our test result using crane manually, implementing a cachePath option appears to be the most effective approach for preflight scan time optimization.

Additional context.

Testing crane manually to pull image with cache-path option

Before cache saved:

$ date;./crane pull quay.certeam.bos2.lab/certeam/cnf/global-amf-ava:v1 ava.tar.gz -c /tmp/test/ ;date
Wed May  8 10:30:28 AM CDT 2024
2024/05/08 10:30:34 Layer sha256:3b7adf049118244599c2f433c32bb40ea46462b457d9ca01ab066462c5f38561 not found (compressed) in cache, getting
2024/05/08 10:30:52 Layer sha256:54c03ed0e49a6b4eb15976084eccc4d6058807c2b91e8fce72c2bcdc3abdea8c not found (compressed) in cache, getting
2024/05/08 10:30:52 Layer sha256:7d4a074f26673168cfd1447dfc73d29013d5e365be465d3e4b2bc69d4b8fc671 not found (compressed) in cache, getting
2024/05/08 10:30:57 Layer sha256:38e6ae2262e6e78a88312873e8c2365251d68ca71e919ba6a93e096976a2eaa4 not found (compressed) in cache, getting
2024/05/08 10:31:34 Layer sha256:16debf60051c86dd6138fc84d811114be0962dcc6081b8261c537e4be6d1464f not found (compressed) in cache, getting
2024/05/08 10:31:49 Layer sha256:2aa791b22e6974cc8a5028c3afd23f09bd2b8dce22b045197c1e8305e4302994 not found (compressed) in cache, getting
2024/05/08 10:32:02 Layer sha256:988097975b456d72f642ffa6d741382a0a778ebba72794d9024881762b3881e3 not found (compressed) in cache, getting
2024/05/08 10:32:02 Layer sha256:eb4bfe7ef1209b47703ae511d2cebd6fd79cfac047ed2e9cb7b5836c03af1fde not found (compressed) in cache, getting
2024/05/08 10:32:11 Layer sha256:5d35394d704805ccd9e22f929a083dee895cba3fa43403f1f68be4cc04d188bc not found (compressed) in cache, getting
2024/05/08 10:32:11 Layer sha256:5398b65826fe66fcb42a2ba2e96de055339718ec2f1c9172773d4e973caca5a7 not found (compressed) in cache, getting
2024/05/08 10:32:11 Layer sha256:8964f5fb1afa75e196f6adf7943183a3b9fdd2280d41229bb497d2a8374f5a46 not found (compressed) in cache, getting
Wed May  8 10:32:33 AM CDT 2024

Total: 115s

Using pre-cached:

$ date;./crane pull quay.certeam.bos2.lab/certeam/cnf/global-amf-ava:v1 ava.tar.gz -c /tmp/test/ ;date
Wed May  8 10:33:41 AM CDT 2024
2024/05/08 10:33:43 Layer sha256:3b7adf049118244599c2f433c32bb40ea46462b457d9ca01ab066462c5f38561 found (compressed) in cache
2024/05/08 10:33:44 Layer sha256:54c03ed0e49a6b4eb15976084eccc4d6058807c2b91e8fce72c2bcdc3abdea8c found (compressed) in cache
2024/05/08 10:33:44 Layer sha256:7d4a074f26673168cfd1447dfc73d29013d5e365be465d3e4b2bc69d4b8fc671 found (compressed) in cache
2024/05/08 10:33:48 Layer sha256:38e6ae2262e6e78a88312873e8c2365251d68ca71e919ba6a93e096976a2eaa4 found (compressed) in cache
2024/05/08 10:33:49 Layer sha256:16debf60051c86dd6138fc84d811114be0962dcc6081b8261c537e4be6d1464f found (compressed) in cache
2024/05/08 10:33:51 Layer sha256:2aa791b22e6974cc8a5028c3afd23f09bd2b8dce22b045197c1e8305e4302994 found (compressed) in cache
2024/05/08 10:33:51 Layer sha256:988097975b456d72f642ffa6d741382a0a778ebba72794d9024881762b3881e3 found (compressed) in cache
2024/05/08 10:33:52 Layer sha256:eb4bfe7ef1209b47703ae511d2cebd6fd79cfac047ed2e9cb7b5836c03af1fde found (compressed) in cache
2024/05/08 10:33:52 Layer sha256:5d35394d704805ccd9e22f929a083dee895cba3fa43403f1f68be4cc04d188bc found (compressed) in cache
2024/05/08 10:33:52 Layer sha256:5398b65826fe66fcb42a2ba2e96de055339718ec2f1c9172773d4e973caca5a7 found (compressed) in cache
2024/05/08 10:33:54 Layer sha256:8964f5fb1afa75e196f6adf7943183a3b9fdd2280d41229bb497d2a8374f5a46 found (compressed) in cache
Wed May  8 10:33:54 AM CDT 2024

total: 13s

As you can see it saved almost 2mins.
We understand that Pre-caching images improves scan times but it requires additional local storage.

@ansvu ansvu added the kind/feature Categorizes issue or PR as related to a new feature. label May 9, 2024
@acornett21
Copy link
Contributor

@ansvu Why are images being tested more then once? Preflight only pulls the images it's asked to pull. I'm not sure what you mean by a cache here, since it would be for a single image, which again should only be tested once.

@wying3
Copy link

wying3 commented May 9, 2024

the process of our current certification is seperated into 2 steps , 1. pre-check image without submit for identifying issues to fix and then 2. formal run certification process that include preflight testing, submit to project and publication. so, during the pre-check phase , there will be multiple run on Preflight, if we can pre-cache the image to local storage, then each time when pulling new image with correction from repo, it only update correction portion/layer without pull whole image again, and in step 2, we still need a final run and submit good report to portal, this will only use last good tested pre-cache image without pulling from repo again, this is our use case.

@acornett21
Copy link
Contributor

I don't think I'd be willing to trust a cache on disk, since someone could manipulate the files in the cache on disk to circumvent the testing/certification. This one of many reasons why we require a remote registry during submission, and not a local registry.

@wying3
Copy link

wying3 commented May 9, 2024

understand, but my use case is for specific env where the jumphost which run test is located in lab behind the firewall or have authentircation to access cluster and repository(e.g., via vpn) , the security is in place. this is most telco partner cases to run certification tool. or you talking about someone change the cache files to make test pass? regarding to remote registry, it's a common case for many partner to use local registry instead of repo outside of company. we have to adapt this on-premise env for certification test.

@acornett21
Copy link
Contributor

or you talking about someone change the cache files to make test pass?

Yes, this is exactly what I am talking about, and like mentioned why we do not trust a local registry. We do not have to adapt this to support on-prem, if it opens things up for further manipulation vectors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants