-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits, but I think this needs to be hooked up so that we write to it, otherwise its hard to see whether this is the right way of doing things.
But broadly looks good so far!
* limitations under the License. | ||
*/ | ||
|
||
-- a table of users who have requested that their details be erased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LIES AND SLANDER
sql = """ | ||
SELECT COALESCE(count(*), 0) FROM ( | ||
SELECT user_id FROM monthly_active_users | ||
) u |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this just be SELECT COALESCE(count(*), 0) FROM monthly_active_users
?
) | ||
|
||
def clean_out_monthly_active_users(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is spurious
|
||
defer.returnValue( | ||
result | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who's jeff?
I'd probably write it/reformat to simply be:
defer.returnValue(bool(user_present))
"timestamp": int(self._clock.time_msec()), | ||
}, | ||
lock=False, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upserts without locks are dangerous as they're racey. Upserts with locks are bad and should be avoided.
I'd quite like to see how this is actually hooked up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have reworked nits, and left upserts until we figure out a better way to solve this
This reverts commit ec716a3.
…o neilj/mau_tracker
synapse/storage/client_ips.py
Outdated
@defer.inlineCallbacks | ||
def _populate_monthly_active_users(self, user_id): | ||
store = self.hs.get_datastore() | ||
print "entering _populate_monthly_active_users" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No prints. Please use logger if you want to have that
synapse/storage/client_ips.py
Outdated
else: | ||
count = yield store.get_monthly_active_count() | ||
print "count is %d" % count | ||
if count < self.hs.config.max_mau_value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to return somehow if this condition is false.
monthly_active_users now fully backing log in and register tests. Especially keen to get comment on adding @defer.inlineCallbacks to insert_client_ip |
synapse/storage/client_ips.py
Outdated
user_id(str): the user_id to query | ||
""" | ||
|
||
store = self.hs.get_datastore() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store is the same as self
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good point
Cleans out monthly active user table to ensure that no stale | ||
entries exist. | ||
Return: | ||
Defered() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferred
with two r's. Also, Returns
with an s, and a newline before it. (This is so that IDEs can read it to get type hints)
sql = """ | ||
DELETE FROM monthly_active_users | ||
ORDER BY timestamp desc | ||
LIMIT -1 OFFSET ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to include a LIMIT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't need it for Postgres, but best I can tell I do need it for sqlite https://sqlite.org/lang_select.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, that's true. But it doesn't look like postgres supports negative limits. You can instead probably structure the query as:
DELETE FROM monthly_active_users
WHERE user_id NOT IN (
SELECT user_id FROM monthly_active_users
ORDER BY timestamp DESC
LIMIT ?
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
groans, I guess the table never gets too large
# Have resolved to invalidate the whole cache for now and do | ||
# something about it if and when the perf becomes significant | ||
self.is_user_monthly_active.invalidate_all() | ||
self.get_monthly_active_count.invalidate_all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These invalidations should only happen after the transaction has committed, which can be done by yielding on the self.runInteraction
synapse/api/auth.py
Outdated
such as monthly active user limiting or global disable flag | ||
Args: | ||
error (Error): The error that should be raised if user is to be | ||
blocked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems odd. Either this should raise an AuthError and callers can catch AuthErrors and re-raise as a more appropriate exception, or it should just return a boolean.
Generates current count of monthly active users.abs | ||
Return: | ||
Defered(int): Number of current monthly active users | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know its a bit nitty, but consistency is quite important for readability:
"""Generates current count of monthly active users.abs
Returns:
Defered[int]: Number of current monthly active users
""""
(Indentation, returns and square brackets for the type Deferred returns)
synapse/storage/client_ips.py
Outdated
@@ -93,6 +95,25 @@ def insert_client_ip(self, user_id, access_token, ip, user_agent, device_id, | |||
|
|||
self._batch_row_update[key] = (user_agent, device_id, now) | |||
|
|||
@defer.inlineCallbacks | |||
def _populate_monthly_active_users(self, user_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it should be in monthly_active_users.py
synapse/storage/client_ips.py
Outdated
if self.hs.config.limit_usage_by_mau: | ||
is_user_monthly_active = yield store.is_user_monthly_active(user_id) | ||
if is_user_monthly_active: | ||
yield store.upsert_monthly_active_user(user_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid having to do an update on every request. One way of avoiding this is to have the is_user_monthly_active
return the timestamp and use that to only insert if the difference is greater than, say, a couple of hours. Since that DB function is cached we avoid doing queries most of the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
lock=False, | ||
) | ||
self.is_user_monthly_active.invalidate((user_id,)) | ||
self.get_monthly_active_count.invalidate(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only invalidate if we actually added a new user to the table (rather than just updating), helpfully _simple_upsert
returns True on insert and False on update.
I think that's fine, so long as we don't actually perform a query in the common case. |
) | ||
timestamp = None | ||
if len(result) > 0: | ||
timestamp = result[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a _simple_select_one_onecol
ftr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handy, done
if count < self.hs.config.max_mau_value: | ||
yield self.upsert_monthly_active_user(user_id) | ||
elif now - last_seen_timestamp > LAST_SEEN_GRANULARITY: | ||
yield self.upsert_monthly_active_user(user_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you quickly add a comment to say why we're not always upserting?
-- a table of monthly active users, for use where blocking based on mau limits | ||
CREATE TABLE monthly_active_users ( | ||
user_id TEXT NOT NULL, | ||
timestamp BIGINT NOT NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a quick comment saying what timestamp is? In particular, its probably worth calling out that it isn't necessarily accurate.
# If MAU user count still exceeds the MAU threshold, then delete on | ||
# a least recently active basis. | ||
# Note it is not possible to write this query using OFFSET due to | ||
# incompatibilities in how sqlite an postgres support the feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/an/and/
|
||
@cached(num_args=0) | ||
def get_monthly_active_count(self): | ||
"""Generates current count of monthly active users.abs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, editor autocomplete
user_id TEXT NOT NULL, | ||
-- Last time we saw the user. Not guaranteed to be accurate due to rate limiting | ||
-- on updates, Granularity of updates governed by | ||
-- syanpse.storage.monthly_active_users.LAST_SEEN_GRANULARITY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/syanpse/synapse/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modulo spelling mistakes, thanks @krombel
Features -------- - Add support for the SNI extension to federation TLS connections. Thanks to @vojeroen! ([\#3439](#3439)) - Add /_media/r0/config ([\#3184](#3184)) - speed up /members API and add `at` and `membership` params as per MSC1227 ([\#3568](#3568)) - implement `summary` block in /sync response as per MSC688 ([\#3574](#3574)) - Add lazy-loading support to /messages as per MSC1227 ([\#3589](#3589)) - Add ability to limit number of monthly active users on the server ([\#3633](#3633)) - Support more federation endpoints on workers ([\#3653](#3653)) - Basic support for room versioning ([\#3654](#3654)) - Ability to disable client/server Synapse via conf toggle ([\#3655](#3655)) - Ability to whitelist specific threepids against monthly active user limiting ([\#3662](#3662)) - Add some metrics for the appservice and federation event sending loops ([\#3664](#3664)) - Where server is disabled, block ability for locked out users to read new messages ([\#3670](#3670)) - set admin uri via config, to be used in error messages where the user should contact the administrator ([\#3687](#3687)) - Synapse's presence functionality can now be disabled with the "use_presence" configuration option. ([\#3694](#3694)) - For resource limit blocked users, prevent writing into rooms ([\#3708](#3708)) Bugfixes -------- - Fix occasional glitches in the synapse_event_persisted_position metric ([\#3658](#3658)) - Fix bug on deleting 3pid when using identity servers that don't support unbind API ([\#3661](#3661)) - Make the tests pass on Twisted < 18.7.0 ([\#3676](#3676)) - Don’t ship recaptcha_ajax.js, use it directly from Google ([\#3677](#3677)) - Fixes test_reap_monthly_active_users so it passes under postgres ([\#3681](#3681)) - Fix mau blocking calulation bug on login ([\#3689](#3689)) - Fix missing yield in synapse.storage.monthly_active_users.initialise_reserved_users ([\#3692](#3692)) - Improve HTTP request logging to include all requests ([\#3700](#3700)) - Avoid timing out requests while we are streaming back the response ([\#3701](#3701)) - Support more federation endpoints on workers ([\#3705](#3705), [\#3713](#3713)) - Fix "Starting db txn 'get_all_updated_receipts' from sentinel context" warning ([\#3710](#3710)) - Fix bug where `state_cache` cache factor ignored environment variables ([\#3719](#3719)) - Fix bug in v0.33.3rc1 which caused infinite loops and OOMs ([\#3723](#3723)) - Fix bug introduced in v0.33.3rc1 which made the ToS give a 500 error ([\#3732](#3732)) Deprecations and Removals ------------------------- - The Shared-Secret registration method of the legacy v1/register REST endpoint has been removed. For a replacement, please see [the admin/register API documentation](https://github.com/matrix-org/synapse/blob/master/docs/admin_api/register_api.rst). ([\#3703](#3703)) Internal Changes ---------------- - The test suite now can run under PostgreSQL. ([\#3423](#3423)) - Refactor HTTP replication endpoints to reduce code duplication ([\#3632](#3632)) - Tests now correctly execute on Python 3. ([\#3647](#3647)) - Sytests can now be run inside a Docker container. ([\#3660](#3660)) - Port over enough to Python 3 to allow the sytests to start. ([\#3668](#3668)) - Update docker base image from alpine 3.7 to 3.8. ([\#3669](#3669)) - Rename synapse.util.async to synapse.util.async_helpers to mitigate async becoming a keyword on Python 3.7. ([\#3678](#3678)) - Synapse's tests are now formatted with the black autoformatter. ([\#3679](#3679)) - Implemented a new testing base class to reduce test boilerplate. ([\#3684](#3684)) - Rename MAU prometheus metrics ([\#3690](#3690)) - add new error type ResourceLimit ([\#3707](#3707)) - Logcontexts for replication command handlers ([\#3709](#3709)) - Update admin register API documentation to reference a real user ID. ([\#3712](#3712))
-- on updates, Granularity of updates governed by | ||
-- synapse.storage.monthly_active_users.LAST_SEEN_GRANULARITY | ||
-- Measured in ms since epoch. | ||
timestamp BIGINT NOT NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really surprised this is called timestamp
, which is a reserved keyword in postgres and most SQL (it's a data type), as you can from the syntax highlighting. It also doesn't describe what the column is used for - surely it should be last_active
or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it's also a problem because you can't rename columns in sqlite, so we really want to get it right the first time).
); | ||
|
||
CREATE UNIQUE INDEX monthly_active_users_users ON monthly_active_users(user_id); | ||
CREATE INDEX monthly_active_users_time_stamp ON monthly_active_users(timestamp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be monthly_active_users_timestamp
to be consistent, as timestamp is one word
since we're in the business of finding problems in PRs after they have landed:
|
We usually create a new delta file after a release? Anyway, do we really re run schema files even if they've been applied before? But agreed should try and remember |
A new delta file, sure. A new directory? Well, I didn't get the memo, and I question the value of doing so vs the problems it presents for anyone trying to roll back.
we do not, but the update code does get upset if either (a) the database schema number is higher than it knows about or (b) it finds records of delta files from version X already having been applied when trying to upgrade from X-1 to X.
gahhhh |
Doesn't actually do anything other than ensure that table purges old values.
Will form basis for extending #3630