Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[awss3 input] unescape characters in s3 file names #38012

Closed
tetianakravchenko opened this issue Feb 14, 2024 · 7 comments · Fixed by #38125
Closed

[awss3 input] unescape characters in s3 file names #38012

tetianakravchenko opened this issue Feb 14, 2024 · 7 comments · Fixed by #38125
Assignees
Labels
Team:Cloud-Monitoring Label for the Cloud Monitoring team

Comments

@tetianakravchenko
Copy link
Contributor

At some point there was introduced this fix #18370 to unescape characters in s3 file names.
Since then it seems that the implementation has changed and now there is similar error:

failed processing S3 event for object key "2024-02-08T08:35:00 00:00.json" in bucket "xxxx": failed to get s3 object (elapsed_time_ns=143737473): s3 GetObject failed: operation error S3: GetObject, https response error StatusCode: 404, RequestID: ...

Filname: 2024-02-08T08:35:00+00:00.json

this error seems to be coming from the GetObject call

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 14, 2024
@tetianakravchenko tetianakravchenko added the Team:Cloud-Monitoring Label for the Cloud Monitoring team label Feb 14, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 14, 2024
@zmoog
Copy link
Contributor

zmoog commented Feb 15, 2024

The error happens because the poller turns the original key

2024-02-08T08:35:00+00:02.json.gz

Into

2024-02-08T08:35:00 00:02.json.gz

The Poller performs this change at:

// Unescape s3 key name. For example, convert "%3D" back to "=".
filename, err := url.QueryUnescape(*object.Key)

Quick test using the Go Playground:
https://go.dev/play/p/GvQcEQG607C

This only happens when using the S3 input in polling mode; the S3 input in SQS mode can successfully process the same object.

As a workaround, I recommend using the S3 input in SQS mode.

@zmoog
Copy link
Contributor

zmoog commented Feb 15, 2024

It seems the list objects API doesn't escape the keys as the S3 notification did. So we don't need to unescape it.

For example, here's how the list objects API returns the following two objects in S3.

S3 objects:

CleanShot 2024-02-15 at 17 02 58@2x

S3 list objects response:

CleanShot 2024-02-15 at 17 01 50@2x

The filename, err := url.QueryUnescape(*object.Key) was originally introduced with #18370 to unescape object keys in the S3 notifications via SQS messages. However, since it is now used in polling only, it is probably no longer required and can be removed.

@zmoog zmoog self-assigned this Feb 15, 2024
@zmoog
Copy link
Contributor

zmoog commented Feb 15, 2024

@kaiyan-sheng, @aspacca, please let me know what you think.

If the url.QueryUnescape(*object.Key) is no longer required, I can open a PR.

@kaiyan-sheng
Copy link
Contributor

@zmoog Thanks for looking into it! Yes I think we can remove the QueryUnescape in s3 polling!

@zmoog
Copy link
Contributor

zmoog commented Feb 15, 2024

Yes I think we can remove the QueryUnescape in s3 polling!

Great. I'll draft a PR and run a couple of tests then.

@aspacca, let me know what you think 🙇

@aspacca
Copy link

aspacca commented Feb 19, 2024

@zmoog, yes, you're totally on spot!

The bug was already reported #33998 and I came to the same conclusion

As mentioned in my comment on that issue:

there is indeed a bug in the filebeat input, since we should unescape the s3 object key only when the value comes from an s3-sqs notification. in your case, since you are using the direct s3 listing input, the key is not escaped and we should not unescape it.

so, as @kaiyan-sheng also mentioned, we should remove QueryUnescape only when dealing with s3 polling

@aspacca
Copy link

aspacca commented Feb 19, 2024

@zmoog , sorry, please note that #18370 is the old version of the input.

it was refactored as we know, this is the place where QueryUnescape is invoked now here for sqs-s3: https://github.com/elastic/beats/blob/main/x-pack/filebeat/input/awss3/sqs_s3_event.go#L279

We should not remove this call.

But rather this one: https://github.com/elastic/beats/blob/main/x-pack/filebeat/input/awss3/s3.go#L212
That's the one made for s3-listing mode

zmoog added a commit to zmoog/beats that referenced this issue Feb 23, 2024
We introduced [^1] the `url.QueryUnescape()` function to unescape
object keys from S3 notification in SQS messages.

However, the object keys in the S3 list object responses do not
require [^2] unescape.

We must remove the unescape to avoid unintended changes to the S3
object key.

[^1]: elastic#18370
[^2]: elastic#38012 (comment)
zmoog added a commit to zmoog/beats that referenced this issue Feb 23, 2024
We introduced [^1] the `url.QueryUnescape()` function to unescape
object keys from S3 notification in SQS messages.

However, the object keys in the S3 list object responses do not
require [^2] unescape.

We must remove the unescape to avoid unintended changes to the S3
object key.

[^1]: elastic#18370
[^2]: elastic#38012 (comment)
zmoog added a commit to zmoog/beats that referenced this issue Feb 23, 2024
We introduced [^1] the `url.QueryUnescape()` function to unescape
object keys from S3 notification in SQS messages.

However, the object keys in the S3 list object responses do not
require [^2] unescape.

We must remove the unescape to avoid unintended changes to the S3
object key.

[^1]: elastic#18370
[^2]: elastic#38012 (comment)
zmoog added a commit to zmoog/beats that referenced this issue Mar 4, 2024
We introduced [^1] the `url.QueryUnescape()` function to unescape
object keys from S3 notification in SQS messages.

However, the object keys in the S3 list object responses do not
require [^2] unescape.

We must remove the unescape to avoid unintended changes to the S3
object key.

[^1]: elastic#18370
[^2]: elastic#38012 (comment)
zmoog added a commit that referenced this issue Mar 4, 2024
…de (#38125)

* Remove url.QueryUnescape()

We introduced [^1] the `url.QueryUnescape()` function to unescape
object keys from S3 notification in SQS messages.

However, the object keys in the S3 list object responses do not
require [^2] unescape.

We must remove the unescape to avoid unintended changes to the S3
object key.

[^1]: #18370
[^2]: #38012 (comment)

---------

Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
mergify bot pushed a commit that referenced this issue Mar 4, 2024
…de (#38125)

* Remove url.QueryUnescape()

We introduced [^1] the `url.QueryUnescape()` function to unescape
object keys from S3 notification in SQS messages.

However, the object keys in the S3 list object responses do not
require [^2] unescape.

We must remove the unescape to avoid unintended changes to the S3
object key.

[^1]: #18370
[^2]: #38012 (comment)

---------

Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
(cherry picked from commit 5f1e656)
zmoog pushed a commit that referenced this issue Mar 4, 2024
…s-s3 input in polling mode (#38165)

* [AWS] [S3] Remove url.QueryUnescape() from aws-s3 input in polling mode (#38125)

We introduced [^1] the `url.QueryUnescape()` function to unescape
object keys from S3 notification in SQS messages.

However, the object keys in the S3 list object responses do not
require [^2] unescape.

We must remove the unescape to avoid unintended changes to the S3
object key.

[^1]: #18370
[^2]: #38012 (comment)

---------

Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloud-Monitoring Label for the Cloud Monitoring team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants