Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add per-organization rate limits #3231

Merged
merged 13 commits into from
Sep 26, 2022
Merged

feat: add per-organization rate limits #3231

merged 13 commits into from
Sep 26, 2022

Conversation

mxyng
Copy link
Collaborator

@mxyng mxyng commented Sep 16, 2022

Summary

Add limits to how many requests can be sent to Infra server on a per-organization, per-minute basis. Only enabled if Redis is configured

@mxyng mxyng marked this pull request as draft September 16, 2022 20:50
@mxyng mxyng force-pushed the mxyng/rate-limit branch 3 times, most recently from 430b9bc to d0508fb Compare September 19, 2022 17:45
Copy link
Contributor

@ssoroka ssoroka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok, but I might just call cache "redis", considering we don't use it for a cache now, and we'll probably use it for things other than a cache in the future.

internal/server/cache/cache.go Outdated Show resolved Hide resolved
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

internal/server/cache/cache.go Outdated Show resolved Hide resolved
internal/server/cache/cache.go Outdated Show resolved Hide resolved
internal/server/cache/cache.go Outdated Show resolved Hide resolved
internal/cmd/server.go Outdated Show resolved Hide resolved
@mxyng mxyng force-pushed the mxyng/rate-limit branch 3 times, most recently from 50043fc to 2a446c3 Compare September 20, 2022 19:02
@mxyng mxyng marked this pull request as ready for review September 20, 2022 19:46
@mxyng mxyng force-pushed the mxyng/rate-limit branch 2 times, most recently from 4811d04 to c88fd87 Compare September 21, 2022 00:30
redisOptions.Username = options.Username
redisOptions.Password = options.Password

client = redis.NewClient(redisOptions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to use the cluster client here? I'm assuming we would want some kind of redundancy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends how the Redis cluster is configured. Managed services like GCP's MemoryStore and AWS's ElastiCache can implement redundancy without resorting to Redis Clusters. Implementing other types of redundancy will require more complex configurations such as sentinels or cluster configurations.

For the time being, we can get away with implementing just the basic configuration

internal/server/routes.go Outdated Show resolved Hide resolved
@mxyng mxyng force-pushed the mxyng/rate-limit branch 9 times, most recently from 72280d3 to f0ba365 Compare September 23, 2022 00:39
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! A few questions, and a few minor suggestions, but nothing blocking.

It'd be great for us to dig into github.com/go-redis/redis_rate/v9 a bit more as well, and make sure we understanding all the implementation trade-offs made in that library, since it seems we are relying it quite a bit.

I guess our existing API metrics should give us some indication of the number of requests hitting our rate limits, right? It might be interesting to add some more fine grained metrics about exactly which of the rate limits is rejecting the request, but seems like that should be a follow up and not part of this (if we want it).

The limits we have set seem reasonable. We might consider increasing the org limit one to 5000 or 6000 to be extra safe.

internal/errors.go Outdated Show resolved Hide resolved
internal/server/errors.go Outdated Show resolved Hide resolved
"testing"
"time"

"github.com/alicebob/miniredis/v2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you see as the advantage to this in-process pseudo-redis compared to running a real instance of redis in a separate process or in a container (like we do with postgres).

I imagine this is slightly faster, but we risk the implementation being a bit different than what we use in production? It seems like we might avoid a few extra dependencies used by miniredis if we went with the out-of-process real redis.

Not a blocker, we can always change it later if we run into problems. Mostly I'm interested in the rational for this approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main advantage is speed and not needing to start a redis instance in order to run tests. This makes writing tests easy and fast.

I haven't used miniredis previous but @pdevine has and he seems happy with it. I've also used Python's redislite previously for unit tests and it's a similar idea. I'm not that concerned with compatibility since the Redis API itself is relatively simple and we're not using any advanced features (yet). We can move on to an external Redis if it ever becomes a problem.

"gotest.tools/v3/assert"
)

func TestRedis(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome test coverage!

internal/server/redis/limit.go Outdated Show resolved Hide resolved
internal/server/redis/limit.go Outdated Show resolved Hide resolved
func LoginBad(r *Redis, key string, limit int) {
if r != nil && r.client != nil {
ctx := context.TODO()
rate := r.client.Incr(ctx, loginKey(key)).Val()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are silently ignoring errors here, and in r.client.Pipelined above, and in client.SetArgs below. The return value is a StatusCmd so the linter doesn't notice that the error is being ignored.

I suspect we should at least log those errors

internal/server/redis/limit_test.go Show resolved Hide resolved
internal/server/handlers.go Outdated Show resolved Hide resolved
Comment on lines 227 to 228
// TODO: limit should be a per-organization setting
if err := redis.RateOK(a.server.redis, org.ID.String(), 3000); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO still relevant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's still relevant. The rate limit may eventually be different per org so this will need to load the value from settings

Copy link
Contributor

@ssoroka ssoroka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. One thing I think is common to run into is the need to scale the limits by org size. Though we can tackle that later.

@mxyng
Copy link
Collaborator Author

mxyng commented Sep 26, 2022

lgtm. One thing I think is common to run into is the need to scale the limits by org size. Though we can tackle that later.

Yes definitely. That's the intent behind the TODO for per-organization limits

@mxyng mxyng merged commit 866ec67 into main Sep 26, 2022
@mxyng mxyng deleted the mxyng/rate-limit branch September 26, 2022 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants