Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve resolution efficiency #418

Closed
everettraven opened this issue Sep 19, 2023 · 6 comments · Fixed by #508
Closed

Improve resolution efficiency #418

everettraven opened this issue Sep 19, 2023 · 6 comments · Fixed by #508
Assignees

Comments

@everettraven
Copy link
Contributor

It was brought up in #411 (comment) that multiple calls are made during resolution that results in bundle information being retrieved and processed multiple times.

We should see if we can optimize this by answering the following questions and making improvements as identified by the answers:

  • Why does resolution make multiple calls to fetch bundle information?
  • Can it just make one, and hold on to the data it needs?
@m1kola
Copy link
Member

m1kola commented Sep 20, 2023

Thanks for creating this ticket.

Why does resolution make multiple calls to fetch bundle information?

Resolution currently relies on client in-memory cache. It is not necessary to fetch the catalog metadata multiple times during single resulotion, but it is necessary to be able to read all the bundles multiple times for filtering porpuses.

Can it just make one, and hold on to the data it needs?

I believe in this slack thread decided to:

  • Make client cache catalog metadata on disk
  • Make client to be able to send conditional request with If-Modified-Since to retrieve new data and update cache only if needed

Then we can modify the controller so tha we instantiate the client1 with empty cache/resets cache before every resolution.

With all this in place we get the following benefits:

  • We hold cache on disks between resolutions, but on every resolution we check whether it is still valid
  • We only make one conditional network call during single resolution. In most of the cases when cache is already there we will receive an empty response indicating valid cache
  • We can read and filter data multiple times during single resolution
  • We do not hold everything in memory while the controller is idling

Note:

  • We still will be holding metadata of all catalogs in memory during single resolution.

Footnotes

  1. Or some other unit which hold the cache, haven't looked into latest changes in Use catalogd HTTP Server instead of CatalogMetadata API #411

@m1kola
Copy link
Member

m1kola commented Sep 21, 2023

Re: If-Modified-Since: it looks like #411 moving into the direction of not making a request at all and checking whether cache was made for a specific catalog image.

@everettraven
Copy link
Contributor Author

Re: If-Modified-Since: it looks like #411 moving into the direction of not making a request at all and checking whether cache was made for a specific catalog image.

Instead of using the If-Modified-Since header, we are checking the resolved source of the Catalog and using that to determine if we need to refresh the cache. If the Catalog has not yet been cached or we need to refresh the cache, we send a request for the catalog contents

@stevekuznetsov
Copy link
Member

Something I've encountered in the operator-lifecycle-manager codebase that I think applies here as well - we need to ensure that the input to resolution is static. Today in the v0 codebase the resolver reads cluster state (via listers) and catalog state (via the cache) quite a large number of times during one resolution.

The resolver does not control who mutates cluster state, and when - if any object on the cluster changes while the resolver is running, or if a new catalog is loaded, etc, subsequent calls to fetch data during a resolution run will cause an incoherent view of the world and likely always end up with a failure to satisfy constraints or similar nonsense.

The operator-controller must snapshot the world once and pass that data - statically - to the resolver. It's still possible that if e.g. two listers are used in tandem to snapshot the world, they will be out-of-sync with each other, but the general pattern for controllers is to re-queue objects when the input set changes. Therefore, a well-behaved controller tolerates temporary lister desync issues by eventually converging to the correct end state. It's important not to treat resolution errors as terminal logically - if the set of catalogs, Operators, or BundleDeployments changes, the outcome of resolution will, too.

@m1kola
Copy link
Member

m1kola commented Nov 2, 2023

I think this will be resolved by #501

@m1kola
Copy link
Member

m1kola commented Nov 3, 2023

I decided to split this out of #501 and created #508.

I will either need 4 parts of variable source refactoring to get merged or something similar to #508 so I can progress with #449

@ncdc ncdc closed this as completed in #508 Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants