Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Include user agent in user daily visits table #8503

Merged
merged 11 commits into from
Oct 15, 2020
1 change: 1 addition & 0 deletions changelog.d/8503.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add user agent to user_daily_visits table.
6 changes: 4 additions & 2 deletions synapse/storage/databases/main/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,8 +282,8 @@ def _generate_user_daily_visits(txn):
now = self._clock.time_msec()

sql = """
INSERT INTO user_daily_visits (user_id, device_id, timestamp)
SELECT u.user_id, u.device_id, ?
INSERT INTO user_daily_visits (user_id, device_id, timestamp, user_agent)
SELECT u.user_id, u.device_id, ?, u.user_agent
Copy link
Member

@erikjohnston erikjohnston Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't in the GROUP BY clause, and in general there may technically be multiple user agents associated with a device so we'll need to decide which one to record. In an ideal world I'd probably say we should do one of:

  1. record every UA and make the useragent column an array (but that doesn't work on sqlite, and may make processing harder);
  2. record the UA that we saw the most, but that probably involves complicated SQL; or
  3. take the most recent UA we saw (easy on postgres, hard on sqlite).

All of which sound like faff, so assuming that we don't really care too much which one we pick I'd probably do MAX(u.user_agent).

(Note, this query probably works fine on SQLite as it allows adding columns not in the GROUP BY clause, it'll return an entry from a random row, but postgres refuses to allow it as in general its likely not the behaviour you want).

FROM user_ips AS u
LEFT JOIN (
SELECT user_id, device_id, timestamp FROM user_daily_visits
Expand Down Expand Up @@ -314,11 +314,13 @@ def _generate_user_daily_visits(txn):
today_start,
),
)

self._last_user_visit_update = today_start

txn.execute(
sql, (today_start, today_start, self._last_user_visit_update, now)
)

# Update _last_user_visit_update to now. The reason to do this
# rather just clamping to the beginning of the day is to limit
# the size of the join - meaning that the query can be run more
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
/* Copyright 2020 The Matrix.org Foundation C.I.C.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

-- Add new column to user_daily_visits to track user agent
ALTER TABLE user_daily_visits
ADD COLUMN user_agent TEXT;
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
/* Copyright 2020 The Matrix.org Foundation C.I.C.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

-- Unused
DROP table cache_invalidation_stream;
DROP table account_data_max_stream_id;
clokep marked this conversation as resolved.
Show resolved Hide resolved
5 changes: 1 addition & 4 deletions synapse/storage/prepare_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,7 @@

# Remember to update this number every time a change is made to database
# schema files, so the users will be informed on server restarts.
# XXX: If you're about to bump this to 59 (or higher) please create an update
# that drops the unused `cache_invalidation_stream` table, as per #7436!
# XXX: Also add an update to drop `account_data_max_stream_id` as per #7656!
SCHEMA_VERSION = 58
SCHEMA_VERSION = 59

dir_path = os.path.abspath(os.path.dirname(__file__))

Expand Down