-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Remove duplicates in the user_ips table and add an index #4370
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #4370 +/- ##
===========================================
- Coverage 73.68% 73.64% -0.05%
===========================================
Files 300 300
Lines 29815 29815
Branches 4895 4897 +2
===========================================
- Hits 21970 21958 -12
- Misses 6407 6419 +12
Partials 1438 1438
Continue to review full report at Codecov.
|
A couple of notes:
The way I'd probably do this is by iterating over the table using
SELECT last_seen FROM user_ips
WHERE last_seen > ?
ORDER BY last_seen
LIMIT 1
OFFSET ?
SELECT user_id, access_token, ip, MAX(last_seen)
FROM (
SELECT user_id, access_token, ip
FROM user_ips
WHERE ? <= last_seen AND last_seen < ?
ORDER BY last_seen
) c
INNER JOIN user_ips USING (user_id, access_token, ip)
GROUP BY user_id, access_token, ip;
HAVING count(*) > 1
After we've successfully walked the entire table we probably need to:
Otherwise we risk inserting duplicates while trying to build the unique index :( |
The conversation that led to the indexing changes: https://matrix.to/#/!yZHTGeDKZUeKaqeTeU:matrix.org/$1546981086143427NlIBG:matrix.org?via=matrix.org&via=sw1v.org&via=meshspace.de&via=half-shot.uk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Would be good to add a final BG update to remove the old index
VALUES (?, ?, ?, ?, ?, ?) | ||
""", | ||
(user_id, access_token, ip, device_id, user_agent, last_seen) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May as well use the _simple_{insert,delete}_txn
wrappers here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeahhhh but everything else in this function is sql :P
synapse/storage/client_ips.py
Outdated
uid, access_token, ip = key | ||
if uid == user_id: | ||
user_id, access_token, ip = key | ||
if user_id == user_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
yeah need some coffee
synapse/storage/client_ips.py
Outdated
ORDER BY last_seen | ||
) c | ||
INNER JOIN user_ips USING (user_id, access_token, ip) | ||
GROUP BY user_id, access_token, ip; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spurious railing ;
synapse/storage/client_ips.py
Outdated
(user_id, access_token, ip, device_id, user_agent, last_seen) | ||
) | ||
|
||
self._background_update_progress_txn(txn, {"last_seen": last_seen}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing update name param
No description provided.