Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve http output handling #966

Merged

Conversation

aleksmaus
Copy link
Contributor

@aleksmaus aleksmaus commented Aug 12, 2024

What type of PR is this?

/kind bug
/kind cleanup

Any specific area of the project related to this PR?

/area config
/area outputs

What this PR does / why we need it:

Multiple improvements of http outputs handling.

Addressing the following areas:

  1. There is no limit on the number of outgoing http requests, which is combined with with the lack of http client reuse creates thousands of connections to the host (the http client reuse is addresses in this previous PR Reuse http client #962). That except in the cases where some auth related http headers are set using the mutex lock that causes the requests to be limited to one at a time competing for the mutex.
    For example the Elasticsearch output is limited to one request at a time when basic auth is used, but will create unlimited number of requests at a time when the custom headers are used to pass basic auth or api key headers: https://github.com/falcosecurity/falcosidekick/blob/master/outputs/elasticsearch.go#L60. Which leads to unpredictable runtime characteristics in terms of connections use etc.
  2. FIXME
    "encoding/json"
  3. The config.go defaults settings is prone to mistakes (looking at previous PRs) where the same prefix has to be retyped every time and some defaults are set twice, for example
    v.SetDefault("Grafana.MutualTls", false)

    and
    v.SetDefault("Grafana.MutualTls", false)

    It also required multiple lines of default definitions when needed to add the new setting per output.

Here is the list of changes addressing the points listed above:

  1. Add MaxConcurrentRequests configuration per output in order to limit
    the number of requests/connections. Decided to go with 1 as default, because it's already one at time in multiple cases due to the use of the mutex for the auth headers. This is item number 2 listed in the "short term" sections Improve Outputs throughput handling  #963.
    I'm using semaphore in order to limit the number of requests. I left one TODO item there, that eventually we want to make http requests and semaphore cancellable to make the http client implementation more robust.

  2. Refactor HTTP auth headers handling, addressing this FIXME

    "encoding/json"

    This eliminates the unexpected limit on the number of requests in some cases and no limit in another cases with the same output.

  3. Extract common configuration for http, refactor NewClient to avoid
    adding one more parameter for MaxConcurrentRequests.
    Now the outputs cat update requests properties including headers before the request is executed, also allows to use already existing SetBasicAuth method implemented with Go http.Headers.

  4. Refactor default configuration definition in order to avoid typos with
    repetitive Output name prefix and avoid repetitive use of defaults. This item is not strictly necessary, but I think makes it easier to manage and add new outputs defaults and avoid errors. Some of the configurations like AWS, GCP a non-output configuration are still left inlined for now, but could be extracted later if needed.

Example of the throughput improvement for Elasticsearch Output:

As I mentioned above due to the current use of mutex lock on basic auth header, the requests are executed one after another, one request at a time if the user uses username and password for auth: because of this
https://github.com/falcosecurity/falcosidekick/blob/master/outputs/elasticsearch.go#L60

So current through numbers with Elasticsearch cloud instance are about 4 docs/sec

Now with addition of MaxConcurrentRequests configuration, the user can adjust the number of the requests

maxconcurrentrequests throughput
1 4 docs/sec
2 8 docs/sec
10 40 docs/sec

Now if we combine this PR with with http client reuse we get about 10 doc/sec with default 1 request at a time

maxconcurrentrequests throughput
1 10 docs/sec
2 20 docs/sec
10 100 docs/sec

This throughput could improved further with the addition of batching, which I'll have shortly.

Which issue(s) this PR fixes:

Related to # #963
First steps.

Special notes for your reviewer:

* Add MaxConcurrentRequests configuration per output in order to limit
  the number of requests/connections.
* Refactor HTTP auth headers handling, eliminate mutex on that code
  path.
* Extract common configuration for http, refactor NewClient to avoid
  adding one more parameter.
* Refactor default configuration definition in order to avoid typos with
  repetitive Output name prefix and avoid repetitive use of defaults

Signed-off-by: Aleksandr Maus <aleksandr.maus@elastic.co>
@aleksmaus aleksmaus force-pushed the feature/cleanup_requests_handling branch from 261eb9d to 0f9a5ea Compare August 21, 2024 11:49
@aleksmaus
Copy link
Contributor Author

didn't realize that the merge commits are not allowed in this repo, redid it with rebase

Signed-off-by: Aleksandr Maus <aleksandr.maus@elastic.co>
Signed-off-by: Aleksandr Maus <aleksandr.maus@elastic.co>
@poiana poiana added the lgtm label Aug 22, 2024
@poiana
Copy link

poiana commented Aug 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aleksmaus, Issif

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana
Copy link

poiana commented Aug 22, 2024

LGTM label has been added.

Git tree hash: 058531bc27dec5657f2c1f1dcfc330c397394126

@poiana poiana merged commit c6e0752 into falcosecurity:master Aug 22, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants