Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage from S3::Object#get in aws-sdk-core 3.21.0 #1786

Closed
domcleal opened this issue May 18, 2018 · 5 comments
Closed

High memory usage from S3::Object#get in aws-sdk-core 3.21.0 #1786

domcleal opened this issue May 18, 2018 · 5 comments

Comments

@domcleal
Copy link

Issue description

Between aws-sdk-core 3.20.0 and 3.21.0, the memory usage when using Aws::S3::Object#get appears to have increased when downloading (streaming) large files, resulting in running out of memory for multi-GB files.

It looks like @raw_stream in 571c2d0#diff-b34bc111f15a632c8dc7fe14d71bdd66R71 in Seahorse::Client::Http::Response is continually being appended to even between multiple ranged GET requests for the object. Commenting this out seems to lower the memory usage again.

Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version

  * aws-eventstream (1.0.0)
  * aws-partitions (1.87.0)
  * aws-sdk-cloudfront (1.2.0)
  * aws-sdk-core (3.21.0)
  * aws-sdk-kms (1.5.0)
  * aws-sdk-s3 (1.11.0)
  * aws-sigv4 (1.0.2)

Version of Ruby, OS environment

Ruby 2.4.3, Linux x86_64.

Code snippets / steps to reproduce

require 'objspace'
require 'aws-sdk-s3'

def print_stats(size)
  puts "DOWNLOADED: #{size/1024} KB"
  puts "RSS: #{`ps -eo rss,pid | grep #{Process.pid} | grep -v grep | awk '{ print $1,"KB";}'`}"
  puts "HEAP SIZE: #{(GC.stat[:heap_sorted_length] * 408 * 40)/1024} KB"
  puts "SIZE OF ALL OBJECTS: #{ObjectSpace.memsize_of_all/1024} KB"
end

obj = Aws::S3::Resource.new.bucket('redacted').object('redacted-419MB-object')

size = 0
obj.get { |chunk| size += chunk.length }
print_stats(size)

419MB file output

With aws-sdk-core 3.21.0:

DOWNLOADED: 429558 KB
RSS: 664932 KB
HEAP SIZE: 29659 KB
SIZE OF ALL OBJECTS: 862593 KB

and with 3.20.0, the RSS is ~400MB lower and total object size over 800MB lower:

DOWNLOADED: 429558 KB
RSS: 270204 KB
HEAP SIZE: 29388 KB
SIZE OF ALL OBJECTS: 55454 KB

4GB file output

3.21.0 resulted in an RSS of nearly 5GB:

DOWNLOADED: 4753510 KB
RSS: 5022612 KB
HEAP SIZE: 34966 KB
SIZE OF ALL OBJECTS: 6357457 KB

3.20.0 is low:

DOWNLOADED: 4753510 KB
RSS: 212544 KB
HEAP SIZE: 29372 KB
SIZE OF ALL OBJECTS: 62262 KB
@awood45
Copy link
Member

awood45 commented May 18, 2018

Thanks for reporting this, I can see where this would be happening. We'll be working on this today.

@awood45
Copy link
Member

awood45 commented May 18, 2018

The @raw_stream object is only used for event stream responses, so the fix here is going to be to ensure we are not collecting a copy of the streaming response in memory except when we are required to (in the case of an actual eventstream response).

@awood45
Copy link
Member

awood45 commented May 18, 2018

We've confirmed that #1787 resolves the object space size explosion, and we're intending to release it today. Should be released as aws-sdk-core 3.21.1.

@awood45
Copy link
Member

awood45 commented May 18, 2018

New release is out. Feel free to reopen if you continue to see this after upgrading, thanks!

@awood45 awood45 closed this as completed May 18, 2018
@domcleal
Copy link
Author

domcleal commented May 21, 2018

Thanks for the fast response @awood45 and @cjyclaire! Confirmed with a few tests that memory usage is back to normal on 3.21.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants