Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.14] Fix handling of custom Endpoint when using S3 + SQS #39709

Merged
merged 14 commits into from
May 28, 2024

Conversation

strawgate
Copy link
Contributor

@strawgate strawgate commented May 24, 2024

Proposed commit message

Fix issues described in #39706 that prevent using a custom endpoint with S3 + SQS.

Users can workaround this issue via S3 bucket polling. The S3 bucket polling still works just fine with a custom endpoint, it's just adding in SQS where it breaks. We need to publish a new version of the AWS integration with the endpoint field exposed on the relevant AWS integrations which is tracked here

Proposed Fixes for Main: #39722

Fixes for 8.14:

  • Fix saving broken region to the configuration when using a custom endpoint with SQS queue_url. I've fixed here on top of 8.14 but it is separately already fixed on Main. Thanks @faec!
  • Fix handling of default_region. Not fixed on 8.14 but fixed on Main. Thanks @faec!
  • Fix exception when we can parse the URL from the queue_url, there is no region in the config, and there's a region mismatch in the parsing. I've fixed here on top of 8.14 but it is separately already fixed on Main. Thanks @faec!
  • Fix parsing regionname from custom endpoint
  • Fix failing region parsing if default_region is set but region is not. I've fixed here on top of 8.14 but it is separately already fixed on Main. Thanks @faec!
  • Use the default endpoint resolver if the endpoint begins with s3

Optional for 8.14:

  • Keep the current behavior (overwriting every service to use the Endpoint value) when the endpoint does not begin with s3

Limit the scope of the endpoint resolver:

  1. When users provide us an endpoint that begins with S3, do not set an Endpoint Resolver but set the Endpoint field so the Default resolver can generate URLs using the Endpoint but with a unique domain for each service (sqs.us-east-1, dynamodb.us-east-1, s3.us-east-1, ...)
  2. When users provide us an endpoint that doesn't begin with S3, use the exact same URL AS-IS for every single service (sqs endpoint = endpoint, s3 endpoint = endpoint). This allows this to be backwards compatible.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

Hopefully none.

The entire addition of getRegionFromQueueURL to handle custom endpoints can be removed and the user would just have to manually specify a region. Which would make this a bit smaller.

How to test this PR locally

Login to AWS CLI, provide the following in a filebeat config

filebeat.inputs:
- type: aws-s3
  queue_url: https://sqs.us-east-1.amazonaws.com/123123123123123/queue_path
  number_of_workers: 1
  region: us-east-1
  endpoint: https://s3.us-east-1.amazonaws.com

See that the SQS ReceiveMessage works and you can publish an item to the bucket and get a result

filebeat.inputs:
- type: aws-s3
  queue_url: https://sqs.us-east-1.amazonaws.com/123123123123123/queue_path
  number_of_workers: 1
  endpoint: https://s3.us-east-1.amazonaws.com

See that the SQS ReceiveMessage works as the region is inferred from the queue_url matching the endpoint, and you can publish an item to the bucket and get a result

filebeat.inputs:
- type: aws-s3
  queue_url: https://sqs.us-east-1.amazonaws.com/123123123123123/queue_path
  number_of_workers: 1

See that the SQS ReceiveMessage works as the region is inferred from the queue_url matching the endpoint, and you can publish an item to the bucket and get a result

See that the following fails:

- type: aws-s3
  queue_url: https://sqs.us-east-1.amazonaws.com/946960629917/billeaston-s3-queue
  number_of_workers: 1
  endpoint: https://us-east-1.amazonaws.com
{"log.level":"warn","@timestamp":"2024-05-23T23:23:17.585-0500","log.logger":"input.aws-s3","log.origin":{"function":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*s3Input).Run","file.name":"awss3/input.go","file.line":132},"message":"configured region disagrees with queue_url region: \"localtest\" != \"amazonaws\": using \"\"","service.name":"filebeat","id":"43D90D58192992F9","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2024-05-23T23:23:25.801-0500","log.logger":"input.aws-s3.sqs","log.origin":{"function":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsReader).Receive","file.name":"awss3/sqs.go","file.line":68},"message":"SQS ReceiveMessage returned an error. Will retry after a short delay.","service.name":"filebeat","id":"43D90D58192992F9","queue_url":"https://sqs.localtest.amazonaws.com/946960629917/billeaston-s3-queue","error":{"message":"sqs ReceiveMessage failed: operation error SQS: ReceiveMessage, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://localtest.amazonaws.com/\": dial tcp: lookup localtest.amazonaws.com: no such host"},"ecs.version":"1.6.0"}

> lookup localtest.amazonaws.com: no such host

See that the following works but fails to connect (no such host)

filebeat.inputs:
- type: aws-s3
  queue_url: https://sqs.localtest.abc.xyz/946960629917/billeaston-s3-queue
  number_of_workers: 1
  region: localtest
  endpoint: https://s3.localtest.abc.xyz
{"log.level":"warn","@timestamp":"2024-05-23T23:24:38.825-0500","log.logger":"input.aws-s3.sqs","log.origin":{"function":"github.com/elastic/beats/v7/x-pack/filebeat/input/awss3.(*sqsReader).Receive","file.name":"awss3/sqs.go","file.line":68},"message":"SQS ReceiveMessage returned an error. Will retry after a short delay.","service.name":"filebeat","id":"56FBB4DE51C84BB9","queue_url":"https://sqs.localtest.abc.xyz/946960629917/billeaston-s3-queue","error":{"message":"sqs ReceiveMessage failed: operation error SQS: ReceiveMessage, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://sqs.localtest.amazonaws.com/\": dial tcp: lookup sqs.localtest.amazonaws.com: no such host"},"ecs.version":"1.6.0"}

See that endpoint is s3...... but the failure message says sqs.localtest.amazonaws.com

Use cases

Allow users who use custom-but-AWS domains to enjoy the benefits of S3 and SQS together.

@strawgate strawgate added the bug label May 24, 2024
@strawgate strawgate requested a review from a team as a code owner May 24, 2024 02:23
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 24, 2024
@strawgate strawgate changed the title Allow users to specify a custom Endpoint when using S3 + SQS Fix handling of custom Endpoint when using S3 + SQS May 24, 2024
@ycombinator ycombinator added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label May 24, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 24, 2024
@strawgate strawgate changed the title Fix handling of custom Endpoint when using S3 + SQS [8.14] Fix handling of custom Endpoint when using S3 + SQS May 24, 2024
@pierrehilbert pierrehilbert requested review from zmoog and faec May 24, 2024 11:53
@pierrehilbert pierrehilbert added the Team:Cloud-Monitoring Label for the Cloud Monitoring team label May 24, 2024
@pierrehilbert pierrehilbert linked an issue May 24, 2024 that may be closed by this pull request
10 tasks
x-pack/filebeat/input/awss3/input.go Show resolved Hide resolved
x-pack/filebeat/input/awss3/input.go Show resolved Hide resolved
x-pack/filebeat/input/awss3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/awss3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/awss3/input.go Show resolved Hide resolved
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly some general Go commentary. I didn't look at the getRegionFromQueueURL changes very closely.

x-pack/filebeat/input/awss3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/awss3/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/awss3/input.go Outdated Show resolved Hide resolved
regionName, err := getRegionFromQueueURL(in.config.QueueURL, in.config.AWSConfig.Endpoint, in.config.RegionName)
if err != nil && in.config.RegionName == "" {
return fmt.Errorf("failed to get AWS region from queue_url: %w", err)
regionName, err := getRegionFromQueueURL(in.config.QueueURL, in.config.AWSConfig.Endpoint, in.config.AWSConfig.DefaultRegion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's uncommon to return an error and a value that the caller should use. Typically these are mutually exclusive. You either get an error OR you get values that you should use. I suggest trying to a do a small bit of refactoring to keep with those conventions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@faec has refactored basically all of this plugin on main including undoing this but it's too different to backport.

I made a series of integration tests which cover all the various combinations of settings but I'm worried refactoring this might not be worth it given it's all going away soon

@strawgate
Copy link
Contributor Author

@cmacknz added additional tests

@alexsapran alexsapran added the aws Enable builds in the CI for aws cloud testing label May 27, 2024
@alexsapran
Copy link
Contributor

@cmacknz added additional tests

To have CI actually trigger the tests you need to add the aws label, otherwise they don't run.

@alexsapran
Copy link
Contributor

/test

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. Thanks for the additional tests.

@jlind23
Copy link
Collaborator

jlind23 commented May 28, 2024

@andresrc @zmoog @bturquet can we please get an approval from the obs-cloud-monitoring team?

@strawgate strawgate merged commit 35eccb8 into 8.14 May 28, 2024
18 checks passed
@strawgate strawgate deleted the fix-sqs-endpoint branch May 28, 2024 21:49
faec added a commit that referenced this pull request Nov 6, 2024
Fix custom endpoint selection in the S3/SQS input (#39718) by porting @strawgate's 8.14 fix (#39709) to main.

In addition to the previous fixes, this simplifies the logic for detecting queue region, since the 8.14 version still had some broken cases caused by requiring over-strict endpoint matching, and it was concluded (talking to @strawgate) that there's no advantage to rejecting standard region format from queue URLs just because the endpoint URL is different (if there is a genuine mismatch in the queue and endpoint we'll learn it from the connection attempt, not from `getRegionFromQueueURL`).
mergify bot pushed a commit that referenced this pull request Nov 6, 2024
Fix custom endpoint selection in the S3/SQS input (#39718) by porting @strawgate's 8.14 fix (#39709) to main.

In addition to the previous fixes, this simplifies the logic for detecting queue region, since the 8.14 version still had some broken cases caused by requiring over-strict endpoint matching, and it was concluded (talking to @strawgate) that there's no advantage to rejecting standard region format from queue URLs just because the endpoint URL is different (if there is a genuine mismatch in the queue and endpoint we'll learn it from the connection attempt, not from `getRegionFromQueueURL`).

(cherry picked from commit cf13781)
mergify bot pushed a commit that referenced this pull request Nov 6, 2024
Fix custom endpoint selection in the S3/SQS input (#39718) by porting @strawgate's 8.14 fix (#39709) to main.

In addition to the previous fixes, this simplifies the logic for detecting queue region, since the 8.14 version still had some broken cases caused by requiring over-strict endpoint matching, and it was concluded (talking to @strawgate) that there's no advantage to rejecting standard region format from queue URLs just because the endpoint URL is different (if there is a genuine mismatch in the queue and endpoint we'll learn it from the connection attempt, not from `getRegionFromQueueURL`).

(cherry picked from commit cf13781)
faec added a commit that referenced this pull request Nov 6, 2024
…#41537)

* Fix handling of custom endpoints in AWS input (#41504)

Fix custom endpoint selection in the S3/SQS input (#39718) by porting @strawgate's 8.14 fix (#39709) to main.

In addition to the previous fixes, this simplifies the logic for detecting queue region, since the 8.14 version still had some broken cases caused by requiring over-strict endpoint matching, and it was concluded (talking to @strawgate) that there's no advantage to rejecting standard region format from queue URLs just because the endpoint URL is different (if there is a genuine mismatch in the queue and endpoint we'll learn it from the connection attempt, not from `getRegionFromQueueURL`).

(cherry picked from commit cf13781)

* Update CHANGELOG.next.asciidoc

auto-merge picked up an unrelated change

---------

Co-authored-by: Fae Charlton <fae.charlton@elastic.co>
faec added a commit that referenced this pull request Nov 6, 2024
Fix custom endpoint selection in the S3/SQS input (#39718) by porting @strawgate's 8.14 fix (#39709) to main.

In addition to the previous fixes, this simplifies the logic for detecting queue region, since the 8.14 version still had some broken cases caused by requiring over-strict endpoint matching, and it was concluded (talking to @strawgate) that there's no advantage to rejecting standard region format from queue URLs just because the endpoint URL is different (if there is a genuine mismatch in the queue and endpoint we'll learn it from the connection attempt, not from `getRegionFromQueueURL`).

(cherry picked from commit cf13781)

Co-authored-by: Fae Charlton <fae.charlton@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Enable builds in the CI for aws cloud testing bug Team:Cloud-Monitoring Label for the Cloud Monitoring team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot use a custom Endpoint with SQS
10 participants