Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

software/versions endpoint is RAM-heavy #23679

Closed
iansltx opened this issue Nov 10, 2024 · 7 comments
Closed

software/versions endpoint is RAM-heavy #23679

iansltx opened this issue Nov 10, 2024 · 7 comments
Assignees
Labels
~backend Backend-related issue. bug Something isn't working as documented #g-endpoint-ops Endpoint ops product group :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. ~released bug This bug was found in a stable release.
Milestone

Comments

@iansltx
Copy link
Member

iansltx commented Nov 10, 2024

Fleet version: 4.58.0


💥  Actual behavior

Running software/versions with a large number of hosts (e.g. 10k) with a typical amont of software per host is a heavy operation on RAM usage, spiking customer pod RAM usage and potentially causing the environment to OOM.

🧑‍💻  Steps to reproduce

  1. Load a bunch of hosts and software into the system
  2. Load the software/versions endpoint

🕯️ More info

Split from #23078, referenced in #22291. This is a lower priority than the hosts list with software listings enabled as the customer's use case can get by with using just that endpoint per current understanding. This endpoint doesn't appear to be DB-heavy, but generates a large response that we need to store and serialize more efficiently if possible.

🛠️ To fix

TBD, but probably involves deduplicating data structures.

@iansltx iansltx added #g-endpoint-ops Endpoint ops product group :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. bug Something isn't working as documented customer-flacourtia ~backend Backend-related issue. labels Nov 10, 2024
@sharon-fdm
Copy link
Collaborator

Timebox 2 points for code review and attempt to make it lighter.

@lukeheath lukeheath added the ~released bug This bug was found in a stable release. label Nov 15, 2024
@sharon-fdm sharon-fdm added :product Product Design department (shows up on 🦢 Drafting board) and removed :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. labels Nov 18, 2024
@sharon-fdm sharon-fdm added :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. and removed customer-flacourtia :product Product Design department (shows up on 🦢 Drafting board) labels Nov 18, 2024
@sharon-fdm sharon-fdm added this to the 4.61.0-tentative milestone Nov 18, 2024
@ksykulev
Copy link
Contributor

ksykulev commented Nov 27, 2024

The performance issue/memory bloat is due to two issues both stemming from the way we fetch and display vulnerabilities.

  1. In mysql/software.go the listSoftwareDB method has a query that joins software to software_cve and cve_meta. Although there is only 20 rows in the software versions table the query returns a variable number of results depending on the number of cves impacting the software version (upwards of hundreds at times). The method iterates through all the results making a copy of each result 904: result := result.

  2. Once the vulnerabilities list has been grouped by software. It is then converted into a json data structure returned to the front end. The front end displays a count of vulnerabilities and up to three cve names in a tooltip. Unfortunately the entire list of cves is returned in the response, bloating the memory and the response json size.

Screenshot 2024-11-27 at 11 31 52 AM

Will open a new issue to deal with number 1.
Will address number 2 as part of this issue. The suggested fix is to truncate the list of cve returned from the api and allow fleet premium users to override this default, similar to what was done in ticket #23710.

rachaelshaw added a commit that referenced this issue Dec 5, 2024
…ns endpoint (#24246)

#23679

In order to improve performance on the software version endpoint the
option `without_vulnerability_details` has been added. It can be set to
true to decrease the size of the response.

On the free tier setting this option to `false` will have no effect. On
Fleet Premium setting it to `false` or omitting will include verbose
vulnerability details.

---------

Co-authored-by: Rachael Shaw <r@rachael.wtf>
ksykulev added a commit that referenced this issue Dec 9, 2024
The software versions endpoint cve details can be truncated using the
`without_vulnerability_details` flag.

#23679

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

<!-- Note that API documentation changes are now addressed by the
product design team. -->

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Committing-Changes.md#changes-files)
for more information.
- [x] Added/updated tests
- [x] Manual QA for all new/changed functionality

---------

Co-authored-by: Ian Littman <iansltx@gmail.com>
ksykulev added a commit that referenced this issue Dec 9, 2024
The software versions endpoint cve details can be truncated using the
`without_vulnerability_details` flag.

#23679

# Checklist for submitter

If some of the following don't apply, delete the relevant line.

<!-- Note that API documentation changes are now addressed by the
product design team. -->

- [x] Changes file added for user-visible changes in `changes/`,
`orbit/changes/` or `ee/fleetd-chrome/changes`.
See [Changes
files](https://github.com/fleetdm/fleet/blob/main/docs/Contributing/Committing-Changes.md#changes-files)
for more information.
- [x] Added/updated tests
- [x] Manual QA for all new/changed functionality

---------

Co-authored-by: Ian Littman <iansltx@gmail.com>
ksykulev added a commit that referenced this issue Dec 9, 2024
#23679

Original PR: #24496

Co-authored-by: Ian Littman <iansltx@gmail.com>
@xpkoala
Copy link
Contributor

xpkoala commented Dec 13, 2024

Tested in a loadtest environment with ~58k hosts enrolled and ~82k software entries. While hitting the software/versions endpoint repeatedly (15 times per minute) I did not see any increase in ram usage. Memory utilization stayed constant at roughly %3 usage of max available RAM.

@jmwatts
Copy link
Member

jmwatts commented Dec 13, 2024

Looks like a new regression: #24765

Unable to filter on CVE severity on software/versions page

@jmwatts
Copy link
Member

jmwatts commented Dec 14, 2024

Copying test plan steps from #24765 that were completed. Will add additional info once this is fully tested.

fleet free
-> no vulnerability filter

  • &without_vulnerability_details=false - should ignore and still show without details
  • &without_vulnerability_details=true - should show without details
    -> vulnerability filter
  • &without_vulnerability_details=false - should ignore and still show without details
  • &without_vulnerability_details=true - should show without details

fleet premium
-> no vulnerability filter

  • &without_vulnerability_details=false - doesn't show details

  • &without_vulnerability_details=true - doesn't show details, is performant
    -> vulnerability filter

  • &without_vulnerability_details=false - shows details, is heavy for large page sizes

  • &without_vulnerability_details=true - doesn't show details, is slightly less heavy for large page sizes

  • Also should test Free with vulnerability filters to confirm that we get license errors there.

  • Verified in UI that only Vulnerable software filter is available on Fleet free, and all available filters are functional on Fleet free and Fleet premium

@jmwatts
Copy link
Member

jmwatts commented Dec 14, 2024

QA Notes:

Per discussion with @ksykulev and @iansltx this fix + changes from #24765 won't necessarily resolve the whole problem, but this fix allows for a performance improvement on the software/versions page when we are not filtering by Severity or Known exploits.

For all scenarios other than when applying Severity or Known exploits filters, the &without_vulnerability_details=true flag will avoid the costly joins on CVE scores for all known vulnerabilities.

After understanding the changes implemented and testing for regressions, this change looks good to release.

@fleet-release
Copy link
Contributor

Lighter memory footprint,
Cities in the clouds, now swift.
More hosts run with ease.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
~backend Backend-related issue. bug Something isn't working as documented #g-endpoint-ops Endpoint ops product group :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. ~released bug This bug was found in a stable release.
Development

No branches or pull requests

8 participants