Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Enable non-AWS S3 buckets for aws-s3 input #28234

Merged
merged 9 commits into from
Oct 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -782,6 +782,7 @@ for a few releases. Please use other tools provided by Elastic to fetch data fro
- Add latency config option for aws-cloudwatch input. {pull}28509[28509]
- Added proxy support to threatintel/malwarebazaar. {pull}28533[28533]
- Add `text/csv` decoder to `httpjson` input {pull}28564[28564]
- Update `aws-s3` input to connect to non AWS S3 buckets {issue}28222[28222] {pull}28234[28234]

*Heartbeat*

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,18 @@
# to arrive in the queue before returning.
#sqs.wait_time: 20s

# Bucket ARN used for polling AWS S3 buckets
#bucket_arn: arn:aws:s3:::test-s3-bucket

# Bucket Name used for polling non-AWS S3 buckets
#non_aws_bucket_name: test-s3-bucket

# Configures the AWS S3 API to use path style instead of virtual host style (default)
#path_style: false

# Overrides the `cloud.provider` field for non-AWS S3 buckets. See docs for auto recognized providers.
#provider: minio

#------------------------------ AWS CloudWatch input --------------------------------
# Beta: Config options for AWS CloudWatch input
#- type: aws-cloudwatch
Expand Down
65 changes: 63 additions & 2 deletions x-pack/filebeat/docs/inputs/input-aws-s3.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,28 @@ Listing of the S3 bucket will be polled according the time interval defined by
expand_event_list_from_field: Records
----


The `aws-s3` input can also poll 3rd party S3 compatible services such as the self hosted Minio.
Using non-AWS S3 compatible buckets requires the use of `access_key_id` and `secret_access_key` for authentication.
To specify the S3 bucket name, use the `non_aws_bucket_name` config and the `endpoint` must be set to replace the default API endpoint.
`endpoint` should be a full URI in the form of `https(s)://<s3 endpoint>`, that will be used as the API endpoint of the service, or a single domain.
If a domain is provided, the full endpoint URI will be constructed with the region name in the standard form of `https://s3.<region>.<domain>` supported by AWS and several 3rd party providers.
No `endpoint` is needed if using the native AWS S3 service hosted at `amazonaws.com`.
Please see <<aws-credentials-config,Configuration parameters>> for alternate AWS domains that require a different endpoint.

["source","yaml",subs="attributes"]
----
{beatname_lc}.inputs:
- type: aws-s3
non_aws_bucket_name: test-s3-bucket
number_of_workers: 5
bucket_list_interval: 300s
access_key_id: xxxxxxx
secret_access_key: xxxxxxx
endpoint: https://s3.example.com:9000
expand_event_list_from_field: Records
----

The `aws-s3` input supports the following configuration options plus the
<<{beatname_lc}-input-{type}-common-options>> described later.

Expand Down Expand Up @@ -236,7 +258,7 @@ configuring multiline options.
[float]
==== `queue_url`

URL of the AWS SQS queue that messages will be received from. (Required when `bucket_arn` is not set).
URL of the AWS SQS queue that messages will be received from. (Required when `bucket_arn` and `non_aws_bucket_name` are not set).

[float]
==== `visibility_timeout`
Expand Down Expand Up @@ -270,7 +292,12 @@ value is `20s`.
[float]
==== `bucket_arn`

ARN of the AWS S3 bucket that will be polled for list operation. (Required when `queue_url` is not set).
ARN of the AWS S3 bucket that will be polled for list operation. (Required when `queue_url` and `non_aws_bucket_name` are not set).

[float]
==== `non_aws_bucket_name`

Name of the S3 bucket that will be polled for list operation. Required for 3rd party S3 compatible services. (Required when `queue_url` and `bucket_arn` are not set).

[float]
==== `bucket_list_interval`
Expand All @@ -288,6 +315,40 @@ Prefix to apply for the list request to the S3 bucket. Default empty.
Number of workers that will process the S3 objects listed. (Required when `bucket_arn` is set).


[float]
==== `provider`

Name of the 3rd party S3 bucket provider like backblaze or GCP.
The following endpoints/providers will be detected automatically:
|===
|Domain |Provider
|amazonaws.com, amazonaws.com.cn, c2s.sgov.gov, c2s.ic.gov |aws
|backblazeb2.com |backblaze
|wasabisys.com |wasabi
|digitaloceanspaces.com |digitalocean
|dream.io |dreamhost
|scw.cloud |scaleway
|googleapis.com |gcp
|cloud.it |arubacloud
|linodeobjects.com |linode
|vultrobjects.com |vultr
|appdomain.cloud |ibm
|aliyuncs.com |alibaba
|oraclecloud.com |oracle
|exo.io |exoscale
|upcloudobjects.com |upcloud
|ilandcloud.com |iland
|zadarazios.com |zadara
|===


[float]
==== `path_style`

Enabling this option sets the bucket name as a path in the API call instead of a subdomain. When enabled
https://<bucket-name>.s3.<region>.<provider>.com becomes https://s3.<region>.<provider>.com/<bucket-name>.
This is only supported with 3rd party S3 providers. AWS does not support path style.


[float]
==== `aws credentials`
Expand Down
12 changes: 12 additions & 0 deletions x-pack/filebeat/filebeat.reference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3045,6 +3045,18 @@ filebeat.inputs:
# to arrive in the queue before returning.
#sqs.wait_time: 20s

# Bucket ARN used for polling AWS S3 buckets
#bucket_arn: arn:aws:s3:::test-s3-bucket

# Bucket Name used for polling non-AWS S3 buckets
#non_aws_bucket_name: test-s3-bucket

# Configures the AWS S3 API to use path style instead of virtual host style (default)
#path_style: false

# Overrides the `cloud.provider` field for non-AWS S3 buckets. See docs for auto recognized providers.
#provider: minio

#------------------------------ AWS CloudWatch input --------------------------------
# Beta: Config options for AWS CloudWatch input
#- type: aws-cloudwatch
Expand Down
38 changes: 29 additions & 9 deletions x-pack/filebeat/input/awss3/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
package awss3

import (
"errors"
"fmt"
"time"

Expand All @@ -28,12 +29,15 @@ type config struct {
MaxNumberOfMessages int `config:"max_number_of_messages"`
QueueURL string `config:"queue_url"`
BucketARN string `config:"bucket_arn"`
NonAWSBucketName string `config:"non_aws_bucket_name"`
BucketListInterval time.Duration `config:"bucket_list_interval"`
BucketListPrefix string `config:"bucket_list_prefix"`
NumberOfWorkers int `config:"number_of_workers"`
AWSConfig awscommon.ConfigAWS `config:",inline"`
FileSelectors []fileSelectorConfig `config:"file_selectors"`
ReaderConfig readerConfig `config:",inline"` // Reader options to apply when no file_selectors are used.
PathStyle bool `config:"path_style"`
ProviderOverride string `config:"provider"`
}

func defaultConfig() config {
Expand All @@ -46,27 +50,33 @@ func defaultConfig() config {
SQSMaxReceiveCount: 5,
FIPSEnabled: false,
MaxNumberOfMessages: 5,
PathStyle: false,
}
c.ReaderConfig.InitDefaults()
return c
}

func (c *config) Validate() error {
if c.QueueURL == "" && c.BucketARN == "" {
logp.NewLogger(inputName).Warnf("neither queue_url nor bucket_arn were provided, input %s will stop", inputName)
return nil
configs := []bool{c.QueueURL != "", c.BucketARN != "", c.NonAWSBucketName != ""}
enabled := []bool{}
for i := range configs {
if configs[i] {
enabled = append(enabled, configs[i])
}
}

if c.QueueURL != "" && c.BucketARN != "" {
return fmt.Errorf("queue_url <%v> and bucket_arn <%v> "+
"cannot be set at the same time", c.QueueURL, c.BucketARN)
if len(enabled) == 0 {
logp.NewLogger(inputName).Warnf("neither queue_url, bucket_arn, non_aws_bucket_name were provided, input %s will stop", inputName)
return nil
} else if len(enabled) > 1 {
return fmt.Errorf("queue_url <%v>, bucket_arn <%v>, non_aws_bucket_name <%v> "+
"cannot be set at the same time", c.QueueURL, c.BucketARN, c.NonAWSBucketName)
}

if c.BucketARN != "" && c.BucketListInterval <= 0 {
if (c.BucketARN != "" || c.NonAWSBucketName != "") && c.BucketListInterval <= 0 {
return fmt.Errorf("bucket_list_interval <%v> must be greater than 0", c.BucketListInterval)
}

if c.BucketARN != "" && c.NumberOfWorkers <= 0 {
if (c.BucketARN != "" || c.NonAWSBucketName != "") && c.NumberOfWorkers <= 0 {
return fmt.Errorf("number_of_workers <%v> must be greater than 0", c.NumberOfWorkers)
}

Expand All @@ -90,6 +100,16 @@ func (c *config) Validate() error {
c.APITimeout, c.SQSWaitTime)
}

if c.FIPSEnabled && c.NonAWSBucketName != "" {
return errors.New("fips_enabled cannot be used with a non-AWS S3 bucket.")
}
if c.PathStyle && c.NonAWSBucketName == "" {
return errors.New("path_style can only be used when polling non-AWS S3 services")
}
if c.ProviderOverride != "" && c.NonAWSBucketName == "" {
return errors.New("provider can only be overriden when polling non-AWS S3 services")
}

return nil
}

Expand Down
Loading