Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use read replica for statistics retrieval operations #1396

Merged
merged 21 commits into from
Apr 20, 2021
Merged

Conversation

yong-jie
Copy link
Member

Problem

Features such as analytics CSV download may prompt a large number of rows to be returned by the database. Too many such queries could exhaust database resources, affecting application stability.

Closes #1304

Solution

Configure sequelize to use read replication. Sequelize's default behaviour of using replicas for read operations is problematic for our use-case because we'd like to default to using the primary DB. In this PR, I propose the use of sequelize's scope. We specify a defaultScope that contains useMaster: true, which acts as a safeguard when contributors forget to include the useMasterDb scope. One would have to explicitly use model.unscoped() to remove the default useMaster: true behaviour.

But this approach is not foolproof, as default scopes also get overwritten when model.scope() is called; code reviews and further checks might be necessary moving forward.

As a rule of thumb, we are allowing the use of replica for statistics data and those that are usually already cached by redis.

Deploy Notes

New environment variables:

  • REPLICA_URI : DB_URI for the read replica (compulsory env var)

New dependencies:

  • pg-connection-string : Parses the connection string and returns a struct which is compatible with sequelize's input format.

Copy link
Contributor

@liangyuanruo liangyuanruo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of default and custom scopes is fairly defensive which is great.

Could it be even more defensive and also have the code read better, if we were to have a useReplica custom scope specifying useMaster: false instead of the current useMasterDb, and rely on defaultScope to specify useMaster: true? Then the default behavior would be to always read from the master instance, which will defensively change API behavior from opt-out to opt-in. I think that should also remove the need to call unscoped() when reading the link statistics, which is not explicit.

I also looked at extending the types/interfaces but that doesn't seem very straightforward to achieve. It seems like other users of sequelize are also complaining but still proceeding with specifying useMaster everywhere.

Let me know your thoughts!

@liangyuanruo
Copy link
Contributor

FYI, I didn't review to see if any other read queries that required useMaster to be configured was missed out. You might want to do another round of checks before merging.

Copy link
Contributor

@LoneRifle LoneRifle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of default and custom scopes is fairly defensive which is great.

Could it be even more defensive and also have the code read better, if we were to have a useReplica custom scope specifying useMaster: false instead of the current useMasterDb, and rely on defaultScope to specify useMaster: true? Then the default behavior would be to always read from the master instance, which will defensively change API behavior from opt-out to opt-in. I think that should also remove the need to call unscoped() when reading the link statistics, which is not explicit.

Approval given predicated on this; so long as the default scopes on the models are consistently set to useMaster: true, we should be a in relatively safe place. The penalty for neglecting to use the default scope when another is desired is unintentionally hitting the master database... but that should be a relatively small risk.

I still think that we ought to use multiple sequelize instances to represent multiple connections especially given that this application effectively has two classes of reads (ie, transactional and analytical) though, and would have been willing to undertake the risk of refactoring.

@yong-jie
Copy link
Member Author

Noted on the comments above. I've modified the code and changed the approach from using useMasterDb and Model.unscoped to defaultScope and useReplica.

Sequelize did not provide documentation on forcing the use of the replica; I tried setting useMaster: false but our replica did not register any inbound connections from the app. I eventually found that setting useMaster: undefined works, and am using that approach now.

Will merge after final checks

@yong-jie yong-jie merged commit d39ba5c into develop Apr 20, 2021
@yong-jie yong-jie deleted the use-replica-csv branch April 20, 2021 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use database read replica where possible
3 participants