Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Stream-based programmatic API for installing packages #187646

Closed
Tracked by #174168
banderror opened this issue Jul 5, 2024 · 7 comments
Closed
Tracked by #174168

[Fleet] Stream-based programmatic API for installing packages #187646

banderror opened this issue Jul 5, 2024 · 7 comments
Assignees
Labels
Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team:Fleet Team label for Observability Data Collection Fleet team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.

Comments

@banderror
Copy link
Contributor

banderror commented Jul 5, 2024

Epics: https://github.com/elastic/security-team/issues/1974 (internal), #174168

Summary

Recently we had an incident in Serverless where Kibana instances would crash with an OOM because of an installation of the security_detection_engine Fleet package that Security Solution uses to distribute prebuilt detection rules. Fleet loads whole packages into memory before installing their assets, and this package had become too big for that. The incident has been mitigated by temporarily decreasing the number of assets in the package by ~50%. However, this is a short-term measure that we cannot keep for a long time. We need a fundamental solution to this problem in Fleet itself.

Our idea is to introduce a stream-based API for installing Fleet packages:

  • This would be a programmatic API (a method of the PackageClient) available for Security Solution on the server side, and not available to Kibana users via HTTP. Security Solution would wrap this API with its own HTTP API endpoint for installation of the security_detection_engine package.
  • This API would use Nodejs streams to avoid loading whole Fleet packages into memory.

We hope this solution would help us prevent spikes in memory usage when installing the security_detection_engine package.

Details

This is where/how Security Solution installs the package on the server side:

// Install the package
await context
.getInternalFleetServices()
.packages.ensureInstalledPackage({ pkgName: PREBUILT_RULES_PACKAGE_NAME, pkgVersion });

The corresponding method of the PackageClient is:

ensureInstalledPackage(options: {
pkgName: string;
pkgVersion?: string;
spaceId?: string;
force?: boolean;
}): Promise<Installation>;

We would need a stream-based alternative of the ensureInstalledPackage method.

It could be done via adding an option to the existing method:

  ensureInstalledPackage(options: {
    pkgName: string;
    pkgVersion?: string;
    spaceId?: string;
    force?: boolean;
    stream?: boolean; // <-- NEW OPTION, by default is false
  }): Promise<Installation>;

Or via adding a new method:

  ensureInstalledPackageInStreamMode(options: {
    pkgName: string;
    pkgVersion?: string;
    spaceId?: string;
    force?: boolean;
  }): Promise<Installation>;
@banderror banderror added triage_needed Team:Fleet Team label for Observability Data Collection Fleet team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area labels Jul 5, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@banderror banderror changed the title [Fleet] Stream-based programmatic API for installing the package with prebuilt rules (DRAFT) [Fleet] Stream-based programmatic API for installing packages (DRAFT) Jul 5, 2024
@banderror
Copy link
Contributor Author

Hey @kpollich, here's the ticket we promised earlier. @xcrzx is going to switch to it next week (week of July 8). Could we please find someone to actively assist with it from the Fleet side (be available for questions, pair programming, code review, etc)?

@banderror
Copy link
Contributor Author

Update from @kpollich:

We do not have the streams based approach captured in #187646 scheduled for development at the moment. I think the best approach at this point would be to implement this as a one-off for the security detection engine integration (with potential to "allow list" the streams approach for other integrations if the need arises).

I don't think this will be trivial to implement broadly for all integrations, as there are complexities around installing things like ML detections, transforms, and soon SLO's (which essentially "wrap" a bunch of other under-the-hood assets), so I fear having streaming on top of all those more complex asset types will be a massive undertaking. For content-only packages, streaming probably makes sense as the default approach but we've only just landed the package-spec support for these package and there's still a ways to go before we can start leveraging that broadly: elastic/package-spec#351.

If the generic solution isn't expected to be released by mid to late October, our team will need to start working on an alternative solution as early as mid-September.

Let's just commit to this explicitly today and plan accordingly: there won't be a generic solution available by late October.

Update from @xcrzx:

So I think the plan is for us to start working on an alternative package installation approach in September. I can begin this after returning from my PTO on September 16th. Here's the approach I plan to take:

  1. Introduce a new endpoint in Security Solution for detection rule installation or reuse the existing bootstrap endpoint. The key point is that the implementation will be entirely on the Security Solution side.
  2. Copy the existing package installation logic from Fleet and strip out all code not related to saved object installation.
  3. Rewrite the saved object installation process, switching from savedObject.import to savedObject.bulkCreate for better memory efficiency.
  4. Implement incremental saved object installation without deleting existing objects.
  5. Add Stream Support

This is a rough outline. An important note here is that I'll be using the EPR API directly to fetch package information and download package content (or read from disk if it's prebundled). To ensure compatibility with Fleet, I'll reuse the package saved object type, so even if the package is installed through the Security Solution endpoint, it will still be visible in the Integrations UI. The detection rules package will remain installable and upgradeable via Fleet's UI, but this will not be the recommended method. In Security Solution, we'll exclusively use the new installation endpoint.

Thank you both. With that, I'm removing the 8.16 target from this one. We'll be working on the optimized package installation within a separate ticket #192350.

@banderror
Copy link
Contributor Author

@xcrzx ended up implementing a server-side programmatic API for stream-based package installation in the fleet plugin as part of #192350. I think we can close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team:Fleet Team label for Observability Data Collection Fleet team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.
Projects
None yet
Development

No branches or pull requests

4 participants