-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use read replica for statistics retrieval operations #1396
Conversation
This reverts commit 0e92606.
src/server/modules/statistics/repositories/StatisticsRepository.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of default and custom scopes is fairly defensive which is great.
Could it be even more defensive and also have the code read better, if we were to have a useReplica
custom scope specifying useMaster: false
instead of the current useMasterDb
, and rely on defaultScope
to specify useMaster: true
? Then the default behavior would be to always read from the master instance, which will defensively change API behavior from opt-out to opt-in. I think that should also remove the need to call unscoped()
when reading the link statistics, which is not explicit.
I also looked at extending the types/interfaces but that doesn't seem very straightforward to achieve. It seems like other users of sequelize are also complaining but still proceeding with specifying useMaster
everywhere.
Let me know your thoughts!
FYI, I didn't review to see if any other read queries that required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of default and custom scopes is fairly defensive which is great.
Could it be even more defensive and also have the code read better, if we were to have a
useReplica
custom scope specifyinguseMaster: false
instead of the currentuseMasterDb
, and rely ondefaultScope
to specifyuseMaster: true
? Then the default behavior would be to always read from the master instance, which will defensively change API behavior from opt-out to opt-in. I think that should also remove the need to callunscoped()
when reading the link statistics, which is not explicit.
Approval given predicated on this; so long as the default scopes on the models are consistently set to useMaster: true
, we should be a in relatively safe place. The penalty for neglecting to use the default scope when another is desired is unintentionally hitting the master database... but that should be a relatively small risk.
I still think that we ought to use multiple sequelize instances to represent multiple connections especially given that this application effectively has two classes of reads (ie, transactional and analytical) though, and would have been willing to undertake the risk of refactoring.
Inversion from "opt-in for primary DB usage" to "opt-out".
Specifying useMaster: false does not seem to be creating the intended effect of using replica
Noted on the comments above. I've modified the code and changed the approach from using Sequelize did not provide documentation on forcing the use of the replica; I tried setting Will merge after final checks |
Problem
Features such as analytics CSV download may prompt a large number of rows to be returned by the database. Too many such queries could exhaust database resources, affecting application stability.
Closes #1304
Solution
Configure sequelize to use read replication. Sequelize's default behaviour of using replicas for read operations is problematic for our use-case because we'd like to default to using the primary DB. In this PR, I propose the use of sequelize's
scope
. We specify adefaultScope
that containsuseMaster: true
, which acts as a safeguard when contributors forget to include theuseMasterDb
scope. One would have to explicitly usemodel.unscoped()
to remove the defaultuseMaster: true
behaviour.But this approach is not foolproof, as default scopes also get overwritten when
model.scope()
is called; code reviews and further checks might be necessary moving forward.As a rule of thumb, we are allowing the use of replica for statistics data and those that are usually already cached by redis.
Deploy Notes
New environment variables:
REPLICA_URI
: DB_URI for the read replica (compulsory env var)New dependencies:
pg-connection-string
: Parses the connection string and returns a struct which is compatible with sequelize's input format.