Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 cost optimisation: Remember the last index value that was used #160

Open
tomfitzhenry opened this issue Jul 29, 2016 · 17 comments
Open
Labels

Comments

@tomfitzhenry
Copy link

tomfitzhenry commented Jul 29, 2016

The problem

Suppose you use the defaults:

  • s3_object_key_format of %{path}%{time_slice}_%{index}.%{file_extension}
  • time_slice_format of %Y%m%d%H

And suppose you flush every 30 seconds. So 120 files per hour.

The first file will check whether foo-2016010109_1.gz exists via an S3 HEAD request, see it doesn't exist, and then upload to that filename.

The next file will be first check whether foo-2016010109_1.gz exists via an S3 HEAD request, see it exists, and so increment the index to foo-2016010109_2.gz, check whether it exists via with an S3 HEAD request, see it doesn't exist, and then upload to the filename.

This will continue. When we get to the final file of the hour (the 120th file), we'll first do 119 HEAD requests!

That's 1+2+...+119 = 7140 S3 requests over the hour. And that's per input log file, per instance.

S3 HEAD requests are "$0.004 per 10,000 requests". So the monthly cost of the above, for 5 log files for 100 instances amounts to 7140_5_100_24_30*$0.004/1000 = $1028

More generally, 1+2+...+n is O(n^2) and we can reduce this to O(n).

Solutions

(a) The user can modify time_slice_format to include %M. Or the default could include %M.
(b) fluent-plugin-s3 could remember the last index it uploaded to, and so not have to check whether the n-1 earlier files already exist: fluentd would know they do.

If either solution was implemented, we'd've reduced the number of HEAD requests from O(n^2) to O(n). (Technically (a) doesn't reduce the solution to O(n^2), it just makes our n tiny.)

So rather than 7140 S3 requests per hour per log file per instance, we'd only do 120.

This reduces the monthly cost from $1028 to $17.

@cnorthwood
Copy link

The issue with (b) is if you have multiple instances writing to the same path, you'd still need to have the collision check.

@tomfitzhenry
Copy link
Author

Oops. I updated (b) to clarify that the HEAD request would still occur for the index we're trying, but it then wouldn't need to try the n-1 earlier requests.

If we're up to index n, we needn't check 1, 2, ..., n-1. But fluentd-plugin-s3 seems to: See https://github.com/fluent/fluent-plugin-s3/blob/master/lib/fluent/plugin/out_s3.rb#L190 and line 219.

@tomfitzhenry
Copy link
Author

I should say also it was @cnorthwood who spotted this issue. :)

@tomfitzhenry
Copy link
Author

For less sensationalism, perhaps a more common case of having 20 instances uploading 5 files every 5 minutes costs $2.2 with the O(n^2) and would cost $0.34 with the O(n).

@repeatedly
Copy link
Member

Hard point of (b) is s3 plugin has num_threads.
If num_threads is larger than 2, "last index" is unclear in flushing threads.

Hmm...

@repeatedly
Copy link
Member

(a) The user can modify time_slice_format to include %M. Or the default could include %M.

If default includes %M, flush cycle is shorter. Maybe, almost users is not expeced because it pus lots of files on S3.
This is not efficient for several processing engines, e.g. Hadoop or imporintg files into DWH.

Some users add %{hex_random} to s3_object_key_format.
This reduces requests and improves performance.

For reducing HEAD cost, one way is adding skip_existence_check like option with ${hex_random} / ${uuid_flush}.
But I'm not sure this is good or not...

@ozonni
Copy link

ozonni commented Jun 17, 2019

we also encountered the same problem. The main reason for us to use s3 plugin was to reduce costs for storing huge log files. We use fluentd+s3 next to regular ELK-like solution. But now actually we are paying more for s3 as we would pay to increase our main log storage.

caleb15 added a commit to caleb15/fluent-plugin-s3 that referenced this issue Oct 2, 2020
@caleb15
Copy link
Contributor

caleb15 commented Oct 2, 2020

We ran into this issue as well. There's been months where we received literally billions of HEAD requests. It's hard to say how much it has cost us exactly but based on our historical usage I'd guess around 5,000 dollars†. That's right, Jeff Bezos has gotten so much money off this bug he could buy a used 2005 mazda mazda3:

beautiful picture of mazda3 car

(Just kidding, but the point is we should probably warn newcomers or fix the bug unless we want to buy Bezos another car :P )

Hard point of (b) is s3 plugin has num_threads.

You could create a variable to store the last log name and lock it behind a mutex so threads can set & access it safely.
See #3 in https://blog.arkency.com/3-ways-to-make-your-ruby-object-thread-safe/

I'm tempted to try it out but I don't know ruby... hmm.

There's also a slight problem with the approach in that at the start of the program you would still have to go through all the prefixed s3 items to find the last index. If td-agent is restarted frequently or if there is a crazy amount of logs in the bucket this could also lead to high costs.

I think it would be better to make ${hex_random} and check_object true the default. That would require no complicated code changes and would avoid the problem entirely. The downside to that is it becomes slightly harder for a user to find the latest log in s3. They would have to look through the s3 last modified timestamp in all logs in the latest prefixed hour/hour/day/month. For most users this is probably not a big deal. The other downside is that this would require a new major release version. I still think it's worth it, assuming no other downsides. Any objections to this approach?

As a temporary workaround we have simplified our td-agent config and upgraded to td-agent v4. I'm not even sure how we ran into this problem - the config is set to flush only once every 24 hours, but somehow we flushed about once a minute. The really odd thing is the rapid flushing happened in a periodic cycle, where it would continue for weeks and then pause for long time before starting back up again. I'll be monitoring the # of requests to see if our flushing problem (and thus this problem) is solved for us.

† This is a very rough guess, I won't know for sure until I see how much costs come down by.

@repeatedly
Copy link
Member

You could create a variable to store the last log name and lock it behind a mutex so threads can set & access it safely.

I think this has slow upload issue with multiple threads. If previous upload takes 20 seconds, other threads wait 20 seconds or more. It causes buffer overflow easily.

Dlder s3 plugin doesn't have check_object parameter, so some old comments are outdated.
I think documentation update is needed for this case. Add S3 cost optimization section and mention check_object / random placeholder is better for high volume environment.

worr added a commit to worr/logging-operator that referenced this issue Oct 22, 2020
The s3 plugin uses a default object key that is problematic in a few ways.

1. It makes HEAD requests for each chunk it uploads, starting from 1 each time.
If you have uploaded 2000 log files within the same time slice, it will make
2001 HEAD requests to figure out if it exists.

fluent/fluent-plugin-s3#160

2. The above check is not thread-safe, and two threads can race and decide to
use the same `%{index}` value, with the loser of the race overwriting the chunk
from the winner.

fluent/fluent-plugin-s3#326

This is planned to change for v2, but there's no clear path to v2 right now.
The plugin does warn already if you use multiple threads and don't use either
`%{chunk_id}` or `%{uuid_hash}` in the object key.
worr added a commit to worr/logging-operator that referenced this issue Oct 22, 2020
The s3 plugin uses a default object key that is problematic in a few ways.

1. It makes HEAD requests for each chunk it uploads, starting from 1 each time.
If you have uploaded 2000 log files within the same time slice, it will make
2001 HEAD requests to figure out if it exists.

fluent/fluent-plugin-s3#160

2. The above check is not thread-safe, and two threads can race and decide to
use the same `%{index}` value, with the loser of the race overwriting the chunk
from the winner.

fluent/fluent-plugin-s3#326

This is planned to change for v2, but there's no clear path to v2 right now.
The plugin does warn already if you use multiple threads and don't use either
`%{chunk_id}` or `%{uuid_hash}` in the object key.
worr added a commit to worr/logging-operator that referenced this issue Oct 22, 2020
The s3 plugin uses a default object key that is problematic in a few ways.

1. It makes HEAD requests for each chunk it uploads, starting from 1 each time.
If you have uploaded 2000 log files within the same time slice, it will make
2001 HEAD requests to figure out if it exists.

fluent/fluent-plugin-s3#160

2. The above check is not thread-safe, and two threads can race and decide to
use the same `%{index}` value, with the loser of the race overwriting the chunk
from the winner.

fluent/fluent-plugin-s3#326

This is planned to change for v2, but there's no clear path to v2 right now.
The plugin does warn already if you use multiple threads and don't use either
`%{chunk_id}` or `%{uuid_hash}` in the object key.
worr added a commit to worr/fluent-plugin-s3 that referenced this issue Oct 23, 2020
As mentioned in a warning, as well as fluent#326
and fluent#160, the process of
determining the index added to the default object key is not thread-safe. This
adds some thread-safety until version 2.x is out where chunk_id is used
instead of an index value.

This is not a perfect implementation, since there can still be races between
different workers if workers are enabled in fluentd, or if there are multiple
fluentd instances uploading to the same bucket. This commit is just to resolve
this problem short-term in a way that's backwards compatible.
worr added a commit to worr/fluent-plugin-s3 that referenced this issue Oct 24, 2020
As mentioned in a warning, as well as fluent#326
and fluent#160, the process of
determining the index added to the default object key is not thread-safe. This
adds some thread-safety until version 2.x is out where chunk_id is used
instead of an index value.

This is not a perfect implementation, since there can still be races between
different workers if workers are enabled in fluentd, or if there are multiple
fluentd instances uploading to the same bucket. This commit is just to resolve
this problem short-term in a way that's backwards compatible.
worr added a commit to worr/fluent-plugin-s3 that referenced this issue Oct 24, 2020
As mentioned in a warning, as well as fluent#326
and fluent#160, the process of
determining the index added to the default object key is not thread-safe. This
adds some thread-safety until version 2.x is out where chunk_id is used
instead of an index value.

This is not a perfect implementation, since there can still be races between
different workers if workers are enabled in fluentd, or if there are multiple
fluentd instances uploading to the same bucket. This commit is just to resolve
this problem short-term in a way that's backwards compatible.

Signed-off-by: William Orr <will@worrbase.com>
caleb15 added a commit to caleb15/fluent-plugin-s3 that referenced this issue Oct 29, 2020
see fluent#160

Signed-off-by: caleb15 <caleb@15five.com>
@formanojhr
Copy link

formanojhr commented Nov 11, 2020

@caleb15 Apart from the warning added - is there a recommended solution to avoid the index checking ? which in turn will avoid the HEAD requests? This is the %{path}%{hostname}-%{time_slice}_%{index}.%{file_extension} s3_object_key_format we are using. We are seeing high number of HEAD requests. Does this #326 locally cache the last index for the file between threads to avoid the issue? Thanks!

@repeatedly
Copy link
Member

For avoiding HEAD request call, set check_object false.
In addition. you need to change s3_object_key_format to use %{uuid_flush} or ${chunk_id} for unique path.

@formanojhr
Copy link

formanojhr commented Nov 11, 2020

thanks @repeatedly what is the implication of avoiding HEAD request call ? Would it overwrite existing file ? or know from cache what index it needs to write to? Guessing %{uuid_flush} is the id then picked up from cache

@caleb15
Copy link
Contributor

caleb15 commented Nov 13, 2020

@formanojhr uuid = universally unique id. Because the id is totally unique overwriting becomes virtually impossible.

https://stackoverflow.com/questions/1155008/how-unique-is-uuid

@formanojhr
Copy link

@repeatedly I think since you are the dev on this might be something you can answer. Once I add the check_object false and adding s3_object_key_format %{path}%{hostname}-%{uuid_flush}-%{time_slice}_%{index}.%{file_extension}
check_object=false what we notice is the HEAD requests really drop to zero but also the PUT requests go down to zero for though we do see the logs being written. Any idea if a different protocol is being used to write?

@github-actions
Copy link

github-actions bot commented Jul 6, 2021

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

@github-actions github-actions bot added the stale label Jul 6, 2021
@caleb15
Copy link
Contributor

caleb15 commented Jul 6, 2021

@cosmo0920 can you remove the stale lifecycle and add a bug label please. This is a rare but serious issue.

@kenhys kenhys added bug and removed stale labels Jul 7, 2021
@sabdalla80
Copy link

sabdalla80 commented Jul 19, 2022

I agree this seems to be a bug. I have also experienced this default behavior where we only found out about it after $20K bill at the end of the month due to the billions of HEAD requests. I switched to check_object=false, and used a combo of UID and other variables to ensure uniqueness and avoid conflicts - of course the downside is the challenge of looking up objects sequentially. I do believe this needs to be treated with higher urgency or at least switch the default behavior to where it is not making billions of HEAD calls. Some may argue this can be a feature with "suffix" as a variable with options and documenting the pros and cons of the options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants