-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] optimizing package policy upgrade and dry run #126088
Conversation
Pinging @elastic/fleet (Team:Fleet) |
You're correct, I don't think the policy_template is available on the Installation object directly. I think it's only available via the @kpollich is this a scenario where lack of registry connectivity would cause Fleet setup to fail? |
We consider failures to upgrade policies non-fatal here:
So, an error from the registry connection here shouldn't cause the overall setup process to fail. |
if (!packagePolicy) { | ||
({ packagePolicy, packageInfo } = await this.getUpgradePackagePolicyInfo(soClient, id)); | ||
} else if (!packageInfo) { | ||
packageInfo = await getPackageInfo({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing I noticed that getPackageInfo
is called in a for loop for each package policy id, this could be optimized to query once for each package only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getPackageInfo
should resolve from Fleet's in-memory cache on subsequent runs, so we might be duplicating memoization efforts here. But, it's worth looking into for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I measured this earlier locally, and it seemed that getPackageInfo
takes about 100ms for the first time, and immediate subsequent calls take about 50ms. This still seems much if someone has 10-20 integration policies.
Can you point me to the code where this in-memory caching is done?
EDIT: found the cache here:
kibana/x-pack/plugins/fleet/server/services/epm/archive/cache.ts
Lines 40 to 45 in c94d5fd
const packageInfoCache: Map<SharedKeyString, ArchivePackage | RegistryPackage> = new Map(); | |
const sharedKey = ({ name, version }: SharedKey) => `${name}-${version}`; | |
export const getPackageInfo = (args: SharedKey) => { | |
return packageInfoCache.get(sharedKey(args)); | |
}; |
still it looks strange that it takes that much time for the same package
e.g. just with a few integration policies (e.g. apache), toggling between Integration policies
and Settings
tab on UI triggers a few calls of getPackageInfo
:
[2022-02-22T17:05:59.746+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from ES
apache_1.3.4 not found in cache, took: 107 ms
[2022-02-22T17:05:59.817+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 56 ms
[2022-02-22T17:06:02.220+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 481 ms
[2022-02-22T17:06:02.308+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 60 ms
[2022-02-22T17:06:04.146+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 52 ms
[2022-02-22T17:06:04.214+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 58 ms
[2022-02-22T17:06:06.078+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 171 ms
[2022-02-22T17:06:06.147+01:00][DEBUG][plugins.fleet] retrieved installed package apache-1.3.4 from cache
apache_1.3.4 not found in cache, took: 60 ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getPackageFromSource
will resolve the package from cache if possible:
kibana/x-pack/plugins/fleet/server/services/epm/packages/get.ts
Lines 160 to 165 in c94d5fd
const getPackageRes = await getPackageFromSource({ | |
pkgName, | |
pkgVersion: responsePkgVersion, | |
savedObjectsClient, | |
installedPkg: savedObject?.attributes, | |
}); |
kibana/x-pack/plugins/fleet/server/services/epm/packages/get.ts
Lines 237 to 247 in c94d5fd
if (installedPkg && installedPkg.version === pkgVersion) { | |
const { install_source: pkgInstallSource } = installedPkg; | |
// check cache | |
res = getArchivePackage({ | |
name: pkgName, | |
version: pkgVersion, | |
}); | |
if (res) { | |
logger.debug(`retrieved installed package ${pkgName}-${pkgVersion} from cache`); | |
} |
But, we don't cache the installation object or latest package record in getPackageInfo
, so that's likely where the added time is coming from
kibana/x-pack/plugins/fleet/server/services/epm/packages/get.ts
Lines 147 to 150 in c94d5fd
const [savedObject, latestPackage] = await Promise.all([ | |
getInstallationObject({ savedObjectsClient, pkgName }), | |
Registry.fetchFindLatestPackageOrUndefined(pkgName), | |
]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems like Registry.fetchFindLatestPackageOrUndefined(pkgName)
is the culprit, we are calling this each time even when the packageInfo
object is cached, to check whether there is a new version of the package in the registry.
I think this is an overkill, perhaps the cache could be made smarter to avoid calling the registry for some time e.g. 1 day, since I think it's a rare event to find new versions published. I could open another issue for this if the team agrees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree there. I think caching the actual registry responses is a good idea. We do have this issue to track potentially honoring various cache-control
headers on responses from EPR, but I don't know that that applies to server-side requests: #125794
I do think determining a caching strategy for our "latest package" requests is out of scope for this PR, though, so we don't need to come up with a solution in order to land this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, cache-control definitely helps with caching Fleet API calls.
As for server side, I think that depends on node fetch capabilities, which are different than in browser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for your work on improving this! 🚀
@elasticmachine merge upstream |
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]Public APIs missing comments
Public APIs missing exports
History
To update your PR or re-run it, just comment with: |
* optimizing package policy upgrade and dry run * optimizing package policy upgrade and dry run * optimizing package policy upgrade and dry run * missing params * fixed tests * removed unused import * fixed tests * fixed checks * reduced duplication, separated validate logic * reduced duplication, separated validate logic Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Nevermind, I misread the conflicts 😅 |
⚪ Backport skippedThe pull request was not backported as there were no branches to backport to. If this is a mistake, please apply the desired version labels or run the backport tool manually. Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
Summary
Optimizing package policy upgrade and dry run based on comments here.
Original suggestion from @joshdover:
The actual improvements made:
Within
getUpgradeDryRunDiff
andupgrade
:packagePolicy
, so the already fetched object can be reused when called from setup.pkgVersion
, so thatgetInstallation
can be skipped as well when called from setup_compilePackagePolicyInputs
and_compilePackageStreams
didn't useinstallablePackage
parameter, so removed itOpen questions:
Installation
doesn't seem to have thepolicy_template
of the package, so I had to keepgetPackageInfo
. Am I missing something? EDIT: confirmed thatgetPackageInfo
is still needed for nowpkgInfo.elasticsearch
has the same value asinstallablePackage.elasticsearch
, can someone confirm? EDIT: confirmed that they have the same value by testing