Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Media file access very slow when user has large number of attachments #495

Closed
jnm opened this issue Nov 20, 2018 · 0 comments · Fixed by #496
Closed

Media file access very slow when user has large number of attachments #495

jnm opened this issue Nov 20, 2018 · 0 comments · Fixed by #496
Assignees

Comments

@jnm
Copy link
Member

jnm commented Nov 20, 2018

The problem

When receiving a request like

GET /attachment/original?media_file=mrbigstuff/attachments/1535292196297691507656-15_3_25.jpg

we intentionally execute a very inefficient database query containing:

WHERE ("auth_user"."username" = 'mrbigstuff' AND "logger_attachment"."media_file"::text LIKE
%/15352039032921459275620-14\_31\_54.jpg')

For most users, the "auth_user"."username" condition reduces the possible result set to a reasonable size before the wildcard on "logger_attachment"."media_file" has to be processed. However, if the user has, say, 2.7 million attachments, this query takes several seconds to complete.

Proposed fix

Add a new, indexed media_file_basename column to logger_attachment containing just the filename of the attachment, without any path, e.g. 15352039032921459275620-14_31_54.jpg. This has to be done in Django, but here's a Postgres-only test run:

Verifying the problem

Since this test database contains logger_attachment only, we cannot filter on auth_user; therefore, these queries represent worst-case performance.

kobotoolbox=# select count(*) from logger_attachment;
  count   
----------
 17695288
(1 row)

Time: 2202.825 ms

kobotoolbox=# select count(*) from logger_attachment where media_file like '%/15352039032921459275620-14\_31\_54.jpg';
 count 
-------
     1
(1 row)

Time: 7192.046 ms
kobotoolbox=# select count(*) from logger_attachment where media_file like '%/15352039032921459275620-14\_31\_54.jpg';
 count 
-------
     1
(1 row)

Time: 7193.225 ms
kobotoolbox=# select count(*) from logger_attachment where media_file like '%/15352039032921459275620-14\_31\_54.jpg';
 count 
-------
     1
(1 row)

Time: 7176.825 ms

Adding the new column and index

kobotoolbox=# alter table logger_attachment add column media_file_basename character varying(380);
ALTER TABLE
Time: 5.167 ms

kobotoolbox=# create index logger_attachment_media_file_basename_jnm_test on logger_attachment (media_file_basename varchar_pattern_ops);
CREATE INDEX
Time: 19264.205 ms

Populating the new column

This takes a long time, but I verified that I could INSERT INTO logger_attachment while the query was running:

kobotoolbox=# update logger_attachment set media_file_basename = substring(media_file from '/([^/]+)$');
UPDATE 17695288
Time: 4264980.958 ms

Improved performance

kobotoolbox=# select count(*) from logger_attachment where media_file_basename = '15352039032921459275620-14_31_54.jpg';
 count 
-------
     1
(1 row)

Time: 0.343 ms
kobotoolbox=# select count(*) from logger_attachment where media_file_basename = '15352039032921459275620-14_31_54.jpg';
 count 
-------
     1
(1 row)

Time: 0.370 ms
kobotoolbox=# select count(*) from logger_attachment where media_file_basename = '15352039032921459275620-14_31_54.jpg';
 count 
-------
     1
(1 row)

Time: 0.333 ms
@jnm jnm self-assigned this Nov 20, 2018
jnm added a commit that referenced this issue Nov 20, 2018
performance when a single `auth_user` has may rows in `logger_attachment`. This
commit lacks a management command to populate the `media_file_basename` column,
but we'll say that it closes #495.
jnm added a commit that referenced this issue Nov 20, 2018
performance when a single `auth_user` has may rows in `logger_attachment`. This
commit lacks a management command to populate the `media_file_basename` column,
but we'll say that it closes #495.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant