You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GET /attachment/original?media_file=mrbigstuff/attachments/1535292196297691507656-15_3_25.jpg
we intentionally execute a very inefficient database query containing:
WHERE ("auth_user"."username" = 'mrbigstuff' AND "logger_attachment"."media_file"::text LIKE
%/15352039032921459275620-14\_31\_54.jpg')
For most users, the "auth_user"."username" condition reduces the possible result set to a reasonable size before the wildcard on "logger_attachment"."media_file" has to be processed. However, if the user has, say, 2.7 million attachments, this query takes several seconds to complete.
Proposed fix
Add a new, indexed media_file_basename column to logger_attachment containing just the filename of the attachment, without any path, e.g. 15352039032921459275620-14_31_54.jpg. This has to be done in Django, but here's a Postgres-only test run:
Verifying the problem
Since this test database contains logger_attachment only, we cannot filter on auth_user; therefore, these queries represent worst-case performance.
kobotoolbox=# select count(*) from logger_attachment;
count
----------
17695288
(1 row)
Time: 2202.825 ms
kobotoolbox=# select count(*) from logger_attachment where media_file like '%/15352039032921459275620-14\_31\_54.jpg';
count
-------
1
(1 row)
Time: 7192.046 ms
kobotoolbox=# select count(*) from logger_attachment where media_file like '%/15352039032921459275620-14\_31\_54.jpg';
count
-------
1
(1 row)
Time: 7193.225 ms
kobotoolbox=# select count(*) from logger_attachment where media_file like '%/15352039032921459275620-14\_31\_54.jpg';
count
-------
1
(1 row)
Time: 7176.825 ms
Adding the new column and index
kobotoolbox=# alter table logger_attachment add column media_file_basename character varying(380);
ALTER TABLE
Time: 5.167 ms
kobotoolbox=# create index logger_attachment_media_file_basename_jnm_test on logger_attachment (media_file_basename varchar_pattern_ops);
CREATE INDEX
Time: 19264.205 ms
Populating the new column
This takes a long time, but I verified that I could INSERT INTO logger_attachment while the query was running:
kobotoolbox=# update logger_attachment set media_file_basename = substring(media_file from '/([^/]+)$');
UPDATE 17695288
Time: 4264980.958 ms
Improved performance
kobotoolbox=# select count(*) from logger_attachment where media_file_basename = '15352039032921459275620-14_31_54.jpg';
count
-------
1
(1 row)
Time: 0.343 ms
kobotoolbox=# select count(*) from logger_attachment where media_file_basename = '15352039032921459275620-14_31_54.jpg';
count
-------
1
(1 row)
Time: 0.370 ms
kobotoolbox=# select count(*) from logger_attachment where media_file_basename = '15352039032921459275620-14_31_54.jpg';
count
-------
1
(1 row)
Time: 0.333 ms
The text was updated successfully, but these errors were encountered:
performance when a single `auth_user` has may rows in `logger_attachment`. This
commit lacks a management command to populate the `media_file_basename` column,
but we'll say that it closes#495.
performance when a single `auth_user` has may rows in `logger_attachment`. This
commit lacks a management command to populate the `media_file_basename` column,
but we'll say that it closes#495.
The problem
When receiving a request like
we intentionally execute a very inefficient database query containing:
For most users, the
"auth_user"."username"
condition reduces the possible result set to a reasonable size before the wildcard on"logger_attachment"."media_file"
has to be processed. However, if the user has, say, 2.7 million attachments, this query takes several seconds to complete.Proposed fix
Add a new, indexed
media_file_basename
column tologger_attachment
containing just the filename of the attachment, without any path, e.g.15352039032921459275620-14_31_54.jpg
. This has to be done in Django, but here's a Postgres-only test run:Verifying the problem
Since this test database contains
logger_attachment
only, we cannot filter onauth_user
; therefore, these queries represent worst-case performance.Adding the new column and index
Populating the new column
This takes a long time, but I verified that I could
INSERT INTO logger_attachment
while the query was running:Improved performance
The text was updated successfully, but these errors were encountered: