Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[filebeat][streaming] - Added support for TLS & forward proxy configs for websockets #41934

Merged
merged 9 commits into from
Dec 11, 2024

Conversation

ShourieG
Copy link
Contributor

@ShourieG ShourieG commented Dec 6, 2024

Type of change

  • Enhancement
  • Docs

Proposed commit message

Added support for functional TLS config & proxy configs for websockets by implementing a custom transport object and NetDialContext() and leveraging the httpcommon library. Added supporting tests to accompany

Note:

Tried using in memory certs and keys but our current httpcommon library always asks for a file.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

No enduser impact as this is all additive.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 6, 2024
@ShourieG ShourieG self-assigned this Dec 6, 2024
@ShourieG ShourieG added the Team:Security-Service Integrations Security Service Integrations Team label Dec 6, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 6, 2024
Copy link
Contributor

mergify bot commented Dec 6, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @ShourieG? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Dec 6, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Dec 6, 2024
@ShourieG ShourieG added needs_team Indicates that the issue/PR needs a Team:* label input:streaming and removed backport-8.x Automated backport to the 8.x branch with mergify labels Dec 6, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 6, 2024
@ShourieG ShourieG added Filebeat Filebeat needs_team Indicates that the issue/PR needs a Team:* label labels Dec 6, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 6, 2024
@ShourieG ShourieG added enhancement needs_team Indicates that the issue/PR needs a Team:* label labels Dec 6, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 6, 2024
@botelastic
Copy link

botelastic bot commented Dec 6, 2024

This pull request doesn't have a Team:<team> label.

Copy link
Contributor

mergify bot commented Dec 6, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @ShourieG? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Dec 6, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Dec 6, 2024
@ShourieG ShourieG marked this pull request as ready for review December 6, 2024 15:52
@ShourieG ShourieG requested a review from a team as a code owner December 6, 2024 15:52
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@ShourieG ShourieG requested a review from efd6 December 6, 2024 15:52
Copy link
Contributor

mergify bot commented Dec 6, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b websocket/tls_proxy upstream/websocket/tls_proxy
git merge upstream/main
git push upstream websocket/tls_proxy

@@ -349,6 +349,32 @@ The minimum time to wait between retries. This ensures that retries are spaced o

The maximum time to wait between retries. This prevents the retry mechanism from becoming too slow, ensuring that the client does not wait indefinitely between retries. This is crucial in systems where timeouts or user experience are critical. For example, `wait_max` might be set to 10 seconds, meaning that even if the calculated backoff is greater than this, the client will wait at most 10 seconds before retrying.

[float]
==== `handshake_timeout`
This specifies the time to wait for the `websocket` handshake and the `http.Upgrade()` operation to complete. This timeout occurs at the application layer and not at the TCP layer. The `default value` is `20` seconds. This setting is specific to the `websocket` streaming input type only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This specifies the time to wait for the `websocket` handshake and the `http.Upgrade()` operation to complete. This timeout occurs at the application layer and not at the TCP layer. The `default value` is `20` seconds. This setting is specific to the `websocket` streaming input type only.
This specifies the time to wait for the WebSocket handshake and protocol upgrade to complete. This timeout occurs at the application layer and not at the TCP layer. The `default value` is `20` seconds. This setting is specific to the `websocket` streaming input type only.

Referring to a call is unhelpful to users; discuss the intent not the mechanism in user-facing documentation. Also, it was incorrectly referring to http.Upgrade which does not exist. It is used by websocket.Upgrader.Upgrade, but we don't call that except in testing, instead using websocket.Dialer.DialContext (should we refer to upgrading at all? I think this is an implementation detail and only tenuously relevant). Note that the lib's default is 45s; do we want to match that or is there a reason to use 20s?

I'm curious, is it meaningful to have a handshake timeout that is less than the TCP timeout? Looking at the implementation of the client, the handshake timeout is used to construct a deadline context that must lose the race to return from DialContext, and this context is passed into the net.Dialer.DialContext call, meaning that obtaining the netconn must fit within this deadline. Where does the timeout below come in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efd6,
What I was going for initially was faster failures at the application layer to achieve better program responsiveness. But then I realised that we might want to wait for TCP retires to overcome any intermittent network issues present and setting a handshake timeout lesser than the tcp timeout defeats that purpose.

The timeout option below actually controls the net.Dialer.DialContext timeout, which by default is set to 90s, half of the os default for TCP timeout window.

So my question is, should we extend the DialContext and handshake timeouts to mirror the TCP time out value of the os ?

As for the documentation, I'll update it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest taking a look at the gorilla code to see how the timeout is propagated.

Copy link
Contributor Author

@ShourieG ShourieG Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efd6, I saw the gorilla code just creates a context deadline out of the provided timeout in the dialcontext. So in this case having the handshake value < the dial context doesn't make much sense by default. We can keep them at the same value then if the user feels they need to change they can do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not bother adding the handshake timeout since we already provide a mechanism to hand in a dial timeout which achieves the same end.

@ShourieG
Copy link
Contributor Author

@efd6, I've addressed the PR suggestions. I was wondering should we back port this to 8.17 ? I know the FF has already happened but this might help out a lot of users currently facing issues.

@ShourieG ShourieG added the backport-8.17 Automated backport with mergify label Dec 11, 2024
@ShourieG ShourieG merged commit fd81074 into elastic:main Dec 11, 2024
22 checks passed
mergify bot pushed a commit that referenced this pull request Dec 11, 2024
mergify bot pushed a commit that referenced this pull request Dec 11, 2024
@ShourieG ShourieG deleted the websocket/tls_proxy branch December 11, 2024 02:53
ShourieG added a commit that referenced this pull request Dec 11, 2024
… for websockets (#41934) (#41984)

(cherry picked from commit fd81074)

Co-authored-by: ShourieG <shourie.ganguly@elastic.co>
ShourieG added a commit that referenced this pull request Dec 11, 2024
… for websockets (#41934) (#41985)

(cherry picked from commit fd81074)

Co-authored-by: ShourieG <shourie.ganguly@elastic.co>
@ShourieG ShourieG added the backport-8.16 Automated backport with mergify label Dec 12, 2024
mergify bot pushed a commit that referenced this pull request Dec 12, 2024
ShourieG added a commit that referenced this pull request Dec 12, 2024
… for websockets (#41934) (#42009)

(cherry picked from commit fd81074)

Co-authored-by: ShourieG <shourie.ganguly@elastic.co>
michalpristas pushed a commit to michalpristas/beats that referenced this pull request Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify enhancement Filebeat Filebeat input:streaming Team:Security-Service Integrations Security Service Integrations Team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants