Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting thumbnail from S3 is still slow #252

Closed
Gwildor opened this issue Mar 27, 2014 · 14 comments
Closed

Getting thumbnail from S3 is still slow #252

Gwildor opened this issue Mar 27, 2014 · 14 comments

Comments

@Gwildor
Copy link

Gwildor commented Mar 27, 2014

I'm running the latest version (34e1ffa) and I'm (still) trying to optimize pages with a lot of thumbnails. This time, I'm running into troubles with our Amazon S3 storage. Getting thumbnails stored on there can take over a second per thumbnail unless it's cached, which does not always to seem the case. I hoped the replacement of exists() mentioned in #92 would fix my issue, but alas, it's still taking a very long time. A page with about 15 thumbnails on it can take up to 10-15 seconds to load because of this. I hope you guys can give me any pointers.

I've used the same line profiling decorator as I mentioned in #232 which clearly adds some loading time (over 2 seconds per thumbnail), but it shows a notable difference between a cached thumbnail and one which has to be retrieved from S3:

Example of a cached thumbnail:

File: sorl/thumbnail/base.py
Function: get_thumbnail at line 61
Total time: 0.00121 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    61                                               @profile_each_line
    62                                               def get_thumbnail(self, file_, geometry_string, **options):
    63                                                   """
    64                                                   Returns thumbnail as an ImageFile instance for file with geometry and
    65                                                   options given. First it will try to get it from the key value store,
    66                                                   secondly it will create it.
    67                                                   """
    68         1            4      4.0      0.3          logger.debug(text_type('Getting thumbnail for file [%s] at [%s]'), file_,
    69         1            9      9.0      0.7                       geometry_string)
    70         1            1      1.0      0.1          if file_:
    71         1           14     14.0      1.2              source = ImageFile(file_)
    72                                                   elif settings.THUMBNAIL_DUMMY:
    73                                                       return DummyImageFile(geometry_string)
    74                                                   else:
    75                                                       return None
    76
    77                                                   #preserve image filetype
    78         1            6      6.0      0.5          if settings.THUMBNAIL_PRESERVE_FORMAT:
    79                                                       options.setdefault('format', self._get_format(file_))
    80
    81        10           13      1.3      1.1          for key, value in self.default_options.items():
    82         9           12      1.3      1.0              options.setdefault(key, value)
    83
    84
    85                                                   # For the future I think it is better to add options only if they
    86                                                   # differ from the default settings as below. This will ensure the same
    87                                                   # filenames being generated for new options at default.
    88         4            6      1.5      0.5          for key, attr in self.extra_options:
    89         3           10      3.3      0.8              value = getattr(settings, attr)
    90         3            4      1.3      0.3              if value != getattr(default_settings, attr):
    91                                                           options.setdefault(key, value)
    92         1          327    327.0     27.0          name = self._get_thumbnail_filename(source, geometry_string, options)
    93         1           13     13.0      1.1          thumbnail = ImageFile(name, default.storage)
    94         1          787    787.0     65.0          cached = default.kvstore.get(thumbnail)
    95         1            3      3.0      0.2          if cached:
    96         1            1      1.0      0.1              return cached

Example of a thumbnail retrieved from S3:

File: sorl/thumbnail/base.py
Function: get_thumbnail at line 61
Total time: 2.30476 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    61                                               @profile_each_line
    62                                               def get_thumbnail(self, file_, geometry_string, **options):
    63                                                   """
    64                                                   Returns thumbnail as an ImageFile instance for file with geometry and
    65                                                   options given. First it will try to get it from the key value store,
    66                                                   secondly it will create it.
    67                                                   """
    68         1            4      4.0      0.0          logger.debug(text_type('Getting thumbnail for file [%s] at [%s]'), file_,
    69         1           14     14.0      0.0                       geometry_string)
    70         1            2      2.0      0.0          if file_:
    71         1           16     16.0      0.0              source = ImageFile(file_)
    72                                                   elif settings.THUMBNAIL_DUMMY:
    73                                                       return DummyImageFile(geometry_string)
    74                                                   else:
    75                                                       return None
    76
    77                                                   #preserve image filetype
    78         1            8      8.0      0.0          if settings.THUMBNAIL_PRESERVE_FORMAT:
    79                                                       options.setdefault('format', self._get_format(file_))
    80
    81        10           16      1.6      0.0          for key, value in self.default_options.items():
    82         9           14      1.6      0.0              options.setdefault(key, value)
    83
    84
    85                                                   # For the future I think it is better to add options only if they
    86                                                   # differ from the default settings as below. This will ensure the same
    87                                                   # filenames being generated for new options at default.
    88         4            7      1.8      0.0          for key, attr in self.extra_options:
    89         3           18      6.0      0.0              value = getattr(settings, attr)
    90         3            6      2.0      0.0              if value != getattr(default_settings, attr):
    91                                                           options.setdefault(key, value)
    92         1          493    493.0      0.0          name = self._get_thumbnail_filename(source, geometry_string, options)
    93         1           20     20.0      0.0          thumbnail = ImageFile(name, default.storage)
    94         1        29609  29609.0      1.3          cached = default.kvstore.get(thumbnail)
    95         1            2      2.0      0.0          if cached:
    96                                                       return cached
    97                                                   else:
    98                                                       # We have to check exists() because the Storage backend does not
    99                                                       # overwrite in some implementations.
   100                                                       # so we make the assumption that if the thumbnail is not cached, it doesn't exist
   101         1           19     19.0      0.0              print 'No cached thumbnail'
   102         1            2      2.0      0.0              try:
   103         1      1078407 1078407.0     46.8                  source_image = default.engine.get_image(source)
   104                                                       except IOError:
   105                                                           if settings.THUMBNAIL_DUMMY:
   106                                                               return DummyImageFile(geometry_string)
   107                                                           else:
   108                                                               # if S3Storage says file doesn't exist remotely, don't try to
   109                                                               # create it and exit early.
   110                                                               # Will return working empty image type; 404'd image
   111                                                               logger.warn(text_type('Remote file [%s] at [%s] does not exist'), file_, geometry_string)
   112                                                               return thumbnail
   113
   114                                                       # We might as well set the size since we have the image in memory
   115         1           15     15.0      0.0              image_info = default.engine.get_image_info(source_image)
   116         1            2      2.0      0.0              options['image_info'] = image_info
   117         1            8      8.0      0.0              size = default.engine.get_image_size(source_image)
   118         1            7      7.0      0.0              source.set_size(size)
   119         1            2      2.0      0.0              try:
   120         1            3      3.0      0.0                  self._create_thumbnail(source_image, geometry_string, options,
   121         1       998498 998498.0     43.3                                         thumbnail)
   122         1           11     11.0      0.0                  self._create_alternative_resolutions(source_image, geometry_string,
   123         1          223    223.0      0.0                                                       options, thumbnail.name)
   124                                                       finally:
   125         1           13     13.0      0.0                  default.engine.cleanup(source_image)
   126
   127                                                   # If the thumbnail exists we don't create it, the other option is
   128                                                   # to delete and write but this could lead to race conditions so I
   129                                                   # will just leave that out for now.
   130         1         1139   1139.0      0.0          default.kvstore.get_or_set(source)
   131         1       196181 196181.0      8.5          default.kvstore.set(thumbnail, source)
   132         1            3      3.0      0.0          return thumbnail
@mariocesar
Copy link
Collaborator

First of all, features on #96 had been merged.

On the problem, thanks for such a detailed description. Getting the source image, that is for what I see the slowest line

source_image = default.engine.get_image(source)
class Engine(EngineBase):
    def get_image(self, source):
        buffer = BufferIO(source.read())
        return Image.open(buffer)

So, I'm implying that talking to S3 is slowing down the process, the same is in the step to create the thumbnail.

What can we do is optimizing that backend, Right now I'm not sure how.

In other aspects, how do you connect to S3, your app is on AWS also?

@Gwildor
Copy link
Author

Gwildor commented Mar 27, 2014

Yes, we have an EC2 instance running with our website, and our media and static is on an S3.

@sbaechler
Copy link
Contributor

I have the same issue. Here Sorl is creating new thumbnails each time the page loads.
On both the Redis and the Cached DB backends.
I'm using the S3 boto storage as is described here: http://stackoverflow.com/questions/14266950/wrong-url-with-django-sorl-thumbnail-with-amazon-s3

@sbaechler
Copy link
Contributor

Ok, I seem to have gotten it working again. I downgraded sorl and boto.
The problems arose with sorl-thumbnail==11.12.1b and 11.12 in combination with boto==2.32.1.
The combination sorl-thumbnail==11.12 with boto==2.12.0 seems to be working fine.

@Gesias
Copy link

Gesias commented Sep 28, 2014

@sbaechler I encountered the same thing. Thanks for the tip I´ll try it!

@relekang relekang added this to the Version 12.1 milestone Oct 18, 2014
@webjunkie
Copy link

I cannot downgrade, because the old version does not work with Django 1.7. Yet I have severe performance issues, because it is accessing S3 all the time. What am I left to do? Will this be fixed?

@sbaechler
Copy link
Contributor

@webjunkie I think it's fixed in the current master. There are not tests yet, however.
try pip install git+https://github.com/mariocesar/sorl-thumbnail.git@8174d97e6ffec1174361b6dd65a57d97e8931382.

@webjunkie
Copy link

@sbaechler Yes, you are right. That works, thanks. My performance problem now seems to be somewhere else...

@relekang
Copy link
Member

@sbaechler @webjunkie does this mean that we can close this one?

@sbaechler
Copy link
Contributor

Sure. Please do a new beta release soon.

@relekang
Copy link
Member

I think we may be reaching the goal for version 12 pretty soon, so I hope the next release will be soon :)

@radzhome
Copy link

Does it work with latest boto or does it have to be a specific version ? (for pip install git+https://github.com/mariocesar/sorl-thumbnail.git@8174d97e6ffec1174361b6dd65a57d97e8931382 )

@cristianocca
Copy link

Sorry to refloat this, is S3 usable now?

@Gesias
Copy link

Gesias commented Jul 28, 2017

@cristianocca I have been using it since 2015 with S3 and no issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants