Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] optimizing package policy upgrade and dry run #126088

Merged
merged 12 commits into from
Feb 23, 2022

Conversation

juliaElastic
Copy link
Contributor

@juliaElastic juliaElastic commented Feb 21, 2022

Summary

Optimizing package policy upgrade and dry run based on comments here.

Original suggestion from @joshdover:

This getUpgradeDryRunDiff function may have some opportunities for other improvements. It currently makes these queries:

Within getUpgradePackagePolicyInfo

  • It fetches the package policy again (we already have it fetched above)
  • It fetches getInstallation again (we already have it fetched above)
  • It fetches getPackageInfo - this has many of the same fields that Installation does
  • It fetches the package from the registry Registry.fetchInfo - the data needed here could be compiled from the package info I believe. Related to [Fleet] Rely on the content of a package when installing it #115032

The actual improvements made:

Within getUpgradeDryRunDiff and upgrade:

  • Passing optional packagePolicy, so the already fetched object can be reused when called from setup.
  • Passing optional pkgVersion, so that getInstallation can be skipped as well when called from setup
  • _compilePackagePolicyInputs and _compilePackageStreams didn't use installablePackage parameter, so removed it

Open questions:

  • Installation doesn't seem to have the policy_template of the package, so I had to keep getPackageInfo. Am I missing something? EDIT: confirmed that getPackageInfo is still needed for now
  • It is not clear whether pkgInfo.elasticsearch has the same value as installablePackage.elasticsearch, can someone confirm? EDIT: confirmed that they have the same value by testing

@juliaElastic juliaElastic added release_note:skip Skip the PR/issue when compiling release notes v8.2.0 labels Feb 21, 2022
@juliaElastic juliaElastic requested a review from a team as a code owner February 21, 2022 14:20
@juliaElastic juliaElastic self-assigned this Feb 21, 2022
@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Feb 21, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@joshdover
Copy link
Contributor

joshdover commented Feb 22, 2022

Installation doesn't seem to have the policy_template of the package, so I had to keep getPackageInfo. Am I missing something?

You're correct, I don't think the policy_template is available on the Installation object directly. I think it's only available via the manifest.yml file from the epm-package-assets objects for the package, so we'd have to reconstruct it from there. Probably makes sense to hold off on this until we tackle #115032 with across the entire project.

@kpollich is this a scenario where lack of registry connectivity would cause Fleet setup to fail?

@kpollich
Copy link
Member

Installation doesn't seem to have the policy_template of the package, so I had to keep getPackageInfo. Am I missing something?

You're correct, I don't think the policy_template is available on the Installation object directly. I think it's only available via the manifest.yml file from the epm-package-assets objects for the package, so we'd have to reconstruct it from there. Probably makes sense to hold off on this until we tackle #115032 with across the entire project.

@kpollich is this a scenario where lack of registry connectivity would cause Fleet setup to fail?

We consider failures to upgrade policies non-fatal here:

const nonFatalErrors = [...preconfiguredPackagesNonFatalErrors, ...packagePolicyUpgradeErrors];

So, an error from the registry connection here shouldn't cause the overall setup process to fail.

if (!packagePolicy) {
({ packagePolicy, packageInfo } = await this.getUpgradePackagePolicyInfo(soClient, id));
} else if (!packageInfo) {
packageInfo = await getPackageInfo({
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing I noticed that getPackageInfo is called in a for loop for each package policy id, this could be optimized to query once for each package only

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getPackageInfo should resolve from Fleet's in-memory cache on subsequent runs, so we might be duplicating memoization efforts here. But, it's worth looking into for sure.

Copy link
Contributor Author

@juliaElastic juliaElastic Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I measured this earlier locally, and it seemed that getPackageInfo takes about 100ms for the first time, and immediate subsequent calls take about 50ms. This still seems much if someone has 10-20 integration policies.
Can you point me to the code where this in-memory caching is done?

EDIT: found the cache here:

const packageInfoCache: Map<SharedKeyString, ArchivePackage | RegistryPackage> = new Map();
const sharedKey = ({ name, version }: SharedKey) => `${name}-${version}`;
export const getPackageInfo = (args: SharedKey) => {
return packageInfoCache.get(sharedKey(args));
};

still it looks strange that it takes that much time for the same package
e.g. just with a few integration policies (e.g. apache), toggling between Integration policies and Settings tab on UI triggers a few calls of getPackageInfo:

[2022-02-22T17:05:59.746+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from ES
apache_1.3.4 not found in cache, took: 107 ms
[2022-02-22T17:05:59.817+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 56 ms
[2022-02-22T17:06:02.220+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 481 ms
[2022-02-22T17:06:02.308+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 60 ms
[2022-02-22T17:06:04.146+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 52 ms
[2022-02-22T17:06:04.214+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 58 ms
[2022-02-22T17:06:06.078+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 171 ms
[2022-02-22T17:06:06.147+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 60 ms

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getPackageFromSource will resolve the package from cache if possible:

const getPackageRes = await getPackageFromSource({
pkgName,
pkgVersion: responsePkgVersion,
savedObjectsClient,
installedPkg: savedObject?.attributes,
});

if (installedPkg && installedPkg.version === pkgVersion) {
const { install_source: pkgInstallSource } = installedPkg;
// check cache
res = getArchivePackage({
name: pkgName,
version: pkgVersion,
});
if (res) {
logger.debug(`retrieved installed package ${pkgName}-${pkgVersion} from cache`);
}

But, we don't cache the installation object or latest package record in getPackageInfo, so that's likely where the added time is coming from

const [savedObject, latestPackage] = await Promise.all([
getInstallationObject({ savedObjectsClient, pkgName }),
Registry.fetchFindLatestPackageOrUndefined(pkgName),
]);

Copy link
Contributor Author

@juliaElastic juliaElastic Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems like Registry.fetchFindLatestPackageOrUndefined(pkgName) is the culprit, we are calling this each time even when the packageInfo object is cached, to check whether there is a new version of the package in the registry.
I think this is an overkill, perhaps the cache could be made smarter to avoid calling the registry for some time e.g. 1 day, since I think it's a rare event to find new versions published. I could open another issue for this if the team agrees.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree there. I think caching the actual registry responses is a good idea. We do have this issue to track potentially honoring various cache-control headers on responses from EPR, but I don't know that that applies to server-side requests: #125794

I do think determining a caching strategy for our "latest package" requests is out of scope for this PR, though, so we don't need to come up with a solution in order to land this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, cache-control definitely helps with caching Fleet API calls.
As for server side, I think that depends on node fetch capabilities, which are different than in browser.

Copy link
Member

@kpollich kpollich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for your work on improving this! 🚀

@juliaElastic juliaElastic added the backport:skip This commit does not require backporting label Feb 23, 2022
@juliaElastic
Copy link
Contributor Author

@elasticmachine merge upstream

@juliaElastic juliaElastic enabled auto-merge (squash) February 23, 2022 13:53
@juliaElastic
Copy link
Contributor Author

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
fleet 1150 1228 +78

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
fleet 10 9 -1
Unknown metric groups

API count

id before after diff
fleet 1266 1344 +78

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @juliaElastic

@juliaElastic juliaElastic merged commit f96af16 into elastic:main Feb 23, 2022
@juliaElastic juliaElastic deleted the dry-run-refactor branch February 23, 2022 17:06
lucasfcosta pushed a commit to lucasfcosta/kibana that referenced this pull request Mar 2, 2022
* optimizing package policy upgrade and dry run

* optimizing package policy upgrade and dry run

* optimizing package policy upgrade and dry run

* missing params

* fixed tests

* removed unused import

* fixed tests

* fixed checks

* reduced duplication, separated validate logic

* reduced duplication, separated validate logic

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@jen-huang jen-huang added auto-backport Deprecated - use backport:version if exact versions are needed v8.1.1 and removed backport:skip This commit does not require backporting labels Mar 16, 2022
@jen-huang
Copy link
Contributor

jen-huang commented Mar 16, 2022

[kicking off a backport into 8.1 in an effort to make backporting #127673 cleaner]

Nevermind, I misread the conflicts 😅

@jen-huang jen-huang added backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes and removed release_note:skip Skip the PR/issue when compiling release notes auto-backport Deprecated - use backport:version if exact versions are needed v8.1.1 labels Mar 16, 2022
@kibanamachine
Copy link
Contributor

⚪ Backport skipped

The pull request was not backported as there were no branches to backport to. If this is a mistake, please apply the desired version labels or run the backport tool manually.

Manual backport

To create the backport manually run:

node scripts/backport --pr 126088

Questions ?

Please refer to the Backport tool documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants