Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

render: improve olm.bundle.object rendering for bundles #1094

Conversation

joelanford
Copy link
Member

@joelanford joelanford commented May 16, 2023

This pull request is a series of commits that progressively improves FBC parsing. Commits are:

  1. Render olm.bundle.object just for CSVs with opm render <bundleImage>
  2. Render olm.bundle.object just for CSVs with opm render <sqliteImageOrFile> (but only if the bundle has an image reference in the database).
  3. Use $numCPUs goroutines to parse files in an FBC directory concurrently.
  4. Introduce an olm.csv.metadata property, and use it in place of the olm.bundle.object properties in the scenarios covered in commits 1 and 2. When serving the GRPC API, this property is used to generate a synthetic CSV to be served.
  5. Fix a bug in Meta object unmarshaling that was made apparent when testing commit 4 on real-world catalogs

A few notes:

  • Commits 1 and 2 are somewhat overtaken by commit 4 (no code path generates full olm.bundle.object properties for just CSVs after commit 4).
  • Only commit 3 will realize gains for existing FBCs. Commits 1, 2, and 4 require re-rendering or re-migrating from bundles/sqlite catalogs (or manually pruning/converting olm.bundle.object properties)
  • This PR does not attempt to proactively remove objects from non-channel-heads. Despite the inability for existing OLMv0 components to make use of this metadata, it is nonetheless conceivably valuable for it to remain in catalogs to be readable by future versions of OLM.
  • Commit 4's functionality is only activated if a bundle both:
    • has no CSV in any defined olm.bundle.object properties
    • has exactly 1 olm.csv.metadata property.
  • Commit 4 injects the icon defined in olm.package into synthetic CSVs, if defined in the package.
  • Commit 4 injects the description defined in olm.package into synthetic CSVs, if not defined in the olm.csv.metadata object.

Open Questions:

  • How much of the olm.csv.metadata object belongs in the bundle vs. the package (e.g. display name, keywords, maintainers, provider)? In this PR, I'm hesitant to add new root level fields or package properties, but maybe that's an unfounded fear. It would be a further optimization because catalogs wouldn't have to repeat those same things over and over again in each bundle.
  • Should we update the render and migrate commands to do any conversion of olm.bundle.object to olm.csv.metadata for existing FBCs? It seems like this would be pretty useful, so I've got another commit in a separate branch that would do this. I can include it in this PR or make a separate PR for it after we finalize this PR. Thoughts?
  • This does PR does break an invariant we had originally tried to keep. That is "an sqlite db and its migrated FBC equivalent will serve the same GRPC API". Now that opm migrate intentionally loses fidelity, that invariant no longer holds. Are we okay with this? IMO, I think this is okay. SQLite serving has been deprecated for several years, so I think it is okay to start diverging. Perhaps we even consider dropping the SQLite server prior to the SQLite-based catalog building commands?

/hold
There's a bunch here to discuss, evaluate, and manually test. I updated our tests as best I could, and I think I've covered old code paths + new code paths. But this is a pretty significant change, so it would be good to get some detail-oriented eyes on this.

Signed-off-by: Joe Lanford joe.lanford@gmail.com

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /docs
  • Commit messages sensible and descriptive

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 16, 2023
@openshift-ci openshift-ci bot requested review from njhale and theishshah May 16, 2023 04:23
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 16, 2023
@joelanford joelanford force-pushed the optimize-olm-bundle-objects branch 2 times, most recently from d59200a to 0cb93c9 Compare May 16, 2023 12:57
@joelanford
Copy link
Member Author

Looks like a deadlock I introduced in the LoadFS concurrency commit: https://github.com/operator-framework/operator-registry/actions/runs/4992141512/jobs/8940057783?pr=1094#step:4:172

Investigating now.

@joelanford joelanford force-pushed the optimize-olm-bundle-objects branch from 0cb93c9 to 5f2e380 Compare May 16, 2023 14:56
@codecov
Copy link

codecov bot commented May 16, 2023

Codecov Report

Merging #1094 (825b5f7) into master (7629c6f) will decrease coverage by 0.55%.
The diff coverage is 47.68%.

❗ Current head 825b5f7 differs from pull request most recent head 9c782fc. Consider uploading reports for the commit 9c782fc to get more accurate results

@@            Coverage Diff             @@
##           master    #1094      +/-   ##
==========================================
- Coverage   52.93%   52.38%   -0.55%     
==========================================
  Files         107      107              
  Lines        9527     9688     +161     
==========================================
+ Hits         5043     5075      +32     
- Misses       3548     3671     +123     
- Partials      936      942       +6     
Impacted Files Coverage Δ
alpha/declcfg/write.go 67.74% <0.00%> (-7.65%) ⬇️
alpha/property/property.go 79.16% <0.00%> (-13.52%) ⬇️
pkg/api/model_to_api.go 48.33% <5.45%> (-36.75%) ⬇️
pkg/registry/registry_to_model.go 58.33% <25.00%> (-4.01%) ⬇️
pkg/api/api_to_model.go 67.81% <66.66%> (-1.70%) ⬇️
alpha/declcfg/declcfg.go 46.87% <77.77%> (-33.62%) ⬇️
alpha/declcfg/load.go 76.62% <96.36%> (+1.62%) ⬆️
alpha/action/migrate.go 73.33% <100.00%> (+11.17%) ⬆️
alpha/action/render.go 65.02% <100.00%> (+0.88%) ⬆️
alpha/property/scheme.go 100.00% <100.00%> (ø)

@joelanford
Copy link
Member Author

Alright, I think the concurrency deadlock issue is fixed up!

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable to me, just have a couple questions

alpha/declcfg/declcfg.go Outdated Show resolved Hide resolved
alpha/template/semver/README.md Outdated Show resolved Hide resolved
@joelanford joelanford force-pushed the optimize-olm-bundle-objects branch 4 times, most recently from 2cacf8e to 94eeb61 Compare May 18, 2023 13:58
@joelanford
Copy link
Member Author

I've uncovered and fixed a few more issues:

  • The concurrency logic needed one more fix to prevent deadlocks. Now channels reads and writes can be preempted by the context being canceled
  • In testing an FBC that uses olm.csv.metadata, I found that the csv.metadata.labels needed to be preserved.

With these changes, I:

  • Pulled down the latest operatorhub catalog and migrated it to use olm.csv.metadata instead of olm.bundle.object properties, rebuilt it with an opm binary from this branch, and pushed to a personal quay repo.
  • Created a kind cluster and installed OLM using operator-sdk olm install (this includes a catalog source for the real operatorhubio catalog)
  • Created a catalog source pointing to my personal rebuilt operatorhub catalog
  • Queried for all packagemanifests (once for the real catalog, once for my rebuilt catalog)
  • Filtered out spec.metadata.creationTimestamp and spec.metadata.labels[catalog*] which are not expected to be the same
  • Performed a diff and noticed that the packagemanifests output for the rebuilt catalog contained an extra relatedImage for the bundle image
  • Filtered out status.channels[].currentCSVDesc.relatedImages, performed another diff, which turned nothing up.

The last test I want to do is to put the opm binary from this branch into the existing operatorhubio catalog without changing any other contents, and verify that that image produces the same packagemanifests as the original. Once I verify that, I'll unhold.

@joelanford
Copy link
Member Author

Ok I was able to confirm that the packagemanifest output of the existing operatorhub catalog using the opm binary from this branch is identical. So I think this is ready to go from a back-compat standpoint.

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 18, 2023
Copy link
Member

@stevekuznetsov stevekuznetsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON bits continue to break my brain but the parallel stuff looks mostly good!

alpha/declcfg/load.go Show resolved Hide resolved
alpha/declcfg/load.go Show resolved Hide resolved
alpha/declcfg/load.go Outdated Show resolved Hide resolved
alpha/declcfg/load.go Show resolved Hide resolved
alpha/declcfg/load.go Show resolved Hide resolved
@stevekuznetsov
Copy link
Member

A Go benchmark would be sweet to capture this improvement.

@joelanford
Copy link
Member Author

JSON bits continue to break my brain

I think I understand it much better now that I know that sigs.k8s.io/yaml.Unmarshal into json.RawMessage does this: kubernetes-sigs/yaml#94

@joelanford joelanford force-pushed the optimize-olm-bundle-objects branch from 94eeb61 to 476fb40 Compare May 20, 2023 13:17
When rendering individual bundles, only generate olm.bundle.object
properties for the CSV if there is an image reference for the bundle.

Signed-off-by: Joe Lanford <joe.lanford@gmail.com>
When rendering sqlite-based catalogs, only generate olm.bundle.object
properties for the CSV if there is an image reference for the bundle.

Signed-off-by: Joe Lanford <joe.lanford@gmail.com>
@joelanford joelanford force-pushed the optimize-olm-bundle-objects branch from 476fb40 to fc25695 Compare May 25, 2023 01:39
@joelanford
Copy link
Member Author

I added a go benchmark in 4cfabfe, just before the changes that added concurrency. Then I:

  • ran the benchmark on 4cfabfe
  • ran the same exact benchmark on aef5a6b – the commit that added concurrency
  • updated the benchmark to use olm.csv.metadata properties instead of olm.bundle.object properties in fc25695, and then ran it.

benchstat results below:

goos: darwin
goarch: arm64
pkg: github.com/operator-framework/operator-registry/alpha/declcfg
          │ 00-4cfabfe.txt │           01-aef5a6b.txt            │           02-fc25695.txt           │
          │     sec/op     │    sec/op     vs base               │   sec/op     vs base               │
LoadFS-10    10667.6m ± 0%   1355.1m ± 1%  -87.30% (p=0.002 n=6)   109.4m ± 1%  -98.97% (p=0.002 n=6)

TL;DR

  • Just concurrency gets us ~8x faster on my 8-core machine
  • Using olm.csv.metadata and dropping olm.bundle.object gets another ~12x faster using the generated data
  • So overall improvement with generated data on my machine is ~97x faster

Signed-off-by: Joe Lanford <joe.lanford@gmail.com>
Signed-off-by: Joe Lanford <joe.lanford@gmail.com>
1. When rendering sqlite DBs and bundle images, generate an
   "olm.csv.metadata" property instead of a full CSV (so long as
   there is a bundle image reference associated with the corresponding
   bundle)
2. When serving the GRPC interface and a full CSV is not present in an
   "olm.bundle.object" property, generate a CSV from (a) the
   "olm.csv.metadata" property. Also include the bundle's related
   images, and the package's icon, if defined. If there is no
   description in the CSV metadata, also include the package's
   description in the generated CSV.

Signed-off-by: Joe Lanford <joe.lanford@gmail.com>
return err
}
for key, ptr := range map[string]*string{"schema": &m.Schema, "package": &m.Package, "name": &m.Name} {
if _, ok := blobMap[key]; !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to consider doing a Fold on key before doing the existence check, here and pre-L121?

import (
  "golang.org/x/text/cases"
)

foldKey := cases.Fold()
if _, ok := blobMap[foldKey]; !ok {
...

That way at least we'll prevent simultaneous schema and Schema keys in the blobs.

@joelanford joelanford force-pushed the optimize-olm-bundle-objects branch 2 times, most recently from 6d00945 to f74583f Compare June 6, 2023 13:34
@joelanford
Copy link
Member Author

Blergh - we're still using an old version of Go that doesn't have errors.Join

It turns out that straight byte-based replacements of unicode escape
characters back to their ascii representations is invalid if the unicode
escape character itself is escaped (e.g. "\u003c" => "\\u003c" => "\<").

To solve this, we will instead unmarshal Meta objects to
map[string]interface{}, extract the expected Meta fields from the map,
and then use a JSON encoder with SetEscapeHTML(false) to re-encode the
map back to JSON to be stored in Meta.Blob.

Signed-off-by: Joe Lanford <joe.lanford@gmail.com>
…m.bundle.object properties

Signed-off-by: Joe Lanford <joe.lanford@gmail.com>
@joelanford joelanford force-pushed the optimize-olm-bundle-objects branch from f74583f to 9c782fc Compare June 6, 2023 13:51
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: everettraven, grokspawn, joelanford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [grokspawn,joelanford]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@grokspawn
Copy link
Contributor

I think we have more conversation to explore the case-insensitivity of schema keys (FWIW, folks filed a bug with go/json about it, and they basically said "too late... Hyrum's law"), but I don't want to hold this heavy PR up because of that discussion. We just need to not forgot about it.

Waiting for the CI to finish to push in.

@grokspawn
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 6, 2023
@openshift-merge-robot openshift-merge-robot merged commit 2ee231b into operator-framework:master Jun 6, 2023
9 of 10 checks passed
@AdrianVasiliu
Copy link

AdrianVasiliu commented Jun 29, 2023

@joelanford Hello, for some reason, for the catalog of our operator, upgrading operator-registry from 1.27.1 to 1.28.0 and testing in OpenShift 4.12.7, I observe the following:

  • The CatalogSource passes in Ready status as expected.
  • But the operator doesn't show up in OperatorHub anymore.

Can this be a side-effect of the changes brought by this PR? We do observe (with opm render) large changes consistent with those involved here - just that we weren't expecting it would break our catalog.
The notes in https://github.com/operator-framework/operator-registry/releases/tag/v1.28.0 don't seem to mention breaking changes.
All in one:

  • Is it a bug, or is it an expected breaking change?
  • If so, what adjustments should we bring to our catalog/bundle build to make it work with operator-registry 1.28.0?

@joelanford
Copy link
Member Author

joelanford commented Jun 29, 2023

The changes require the opm server to be using 1.28 as well. Can you confirm the version of opm used in the catalog image to serve the new FBC?

@joelanford joelanford deleted the optimize-olm-bundle-objects branch June 30, 2023 00:28
@AdrianVasiliu
Copy link

@joelanford Thanks a lot for the feedback.
Our catalog image gets the following opm version from registry.redhat.io/openshift4/ose-operator-registry:v4.13:

$ opm version 
Version: version.Version{OpmVersion:"ce46f5b97", GitCommit:"ce46f5b97e4838c4508f6dc471cabc1199a8f0b9", BuildDate:"2023-06-07T20:53:52Z", GoOs:"linux", GoArch:"amd64"}

So build date June 7, but on https://github.com/openshift/operator-framework-olm/tree/master/staging/operator-registry. , while https://github.com/operator-framework/operator-registry/releases/tag/v1.28.0 is from June 9 and the code base is different.

I don't know what is the policy of keeping both in sync. Anyway, we'll use your 1.28 in our catalog too, and report back the testing results.

@AdrianVasiliu
Copy link

@joelanford I confirm using operator-registry 1.28.0 in the catalog image and at our build time does the trick! All good. Thanks a lot!

@joelanford
Copy link
Member Author

Awesome! Thanks for confirming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants