Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(logger): rate limit based on peer address #1351

Merged
merged 21 commits into from
Nov 21, 2023

Conversation

oddgrd
Copy link
Contributor

@oddgrd oddgrd commented Oct 27, 2023

Description of change

Add rate limiting by peer address to the logger server. This sets the sustained load LPS limit for projects to 512, but allowing bursts of up to 256 * 6. The rate limiter will regenerate a slot every 0.5s.

How has this been tested? (if applicable)

Tested by running projects locally that went over the limit. I also created an integration test to verify the limits work as expected, and the returned tonic status and headers are as expected when rate limiting is applied.

@oddgrd oddgrd marked this pull request as draft October 27, 2023 12:24
Err(err) => {
debug!(error = %err, "failed to parse logs");

let message = if line.contains("rate limit") {
Copy link
Contributor Author

@oddgrd oddgrd Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too happy about how I handle this case (string matching), but it should be rare (it will only happen when the application is being rate limited and the user tries to request logs, and it won't always fail). Suggestions on how to improve the error handling here are most welcome.

Copy link
Contributor

@Kazy Kazy Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about having the error be an actual JSON as well ? You could deserialize this as an enum (it's either a log line from the app or a control message), and match on that instead.

Copy link
Contributor Author

@oddgrd oddgrd Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I don't think we can make an enum that matches on either log line or control message, since it would break the logs stream of all users who don't upgrade. But just sending a JSON error works. New clients will be able to deserialize the error, older clients will receive an error that suggests upgrading their CLI.

Here is the error they get in the rare case that the logger client is rate limited + their logs request got rate limited:

Error: 429 Too Many Requests
Message: your application is producing too many logs. Interactions with the shuttle logger service will be rate limited

@oddgrd oddgrd marked this pull request as ready for review November 2, 2023 08:39
Copy link
Contributor

@chesedo chesedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some notes 😄

@@ -26,6 +26,8 @@ pub enum Error {
Internal(#[from] anyhow::Error),
#[error("Missing header: {0}")]
MissingHeader(String),
#[error("{0}. Retry the command in a few minutes")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is rate limiting telling the user to retry a command?

Copy link
Contributor Author

@oddgrd oddgrd Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this is returned if the deployer's logger client is being rate limited and the user calls the get_logs or get_logs_stream endpoints. Since this API is also consumed from the console, instructing to run a command is a bit misleading, I can rename it to Retry the request....

deployer/src/handlers/mod.rs Outdated Show resolved Hide resolved
logs.into_inner()
.log_items
.into_iter()
.map(|l| l.to_log_item_with_id(deployment_id))
.collect(),
))
} else {
Err(Error::NotFound("deployment not found".to_string()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the not found case no longer relevant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think it ever was. The https://github.com/shuttle-hq/shuttle/blob/main/logger/src/lib.rs#L89-L100 call can fail in the following ways:

  • can't extract the claim from extensions, this shouldn't happen since we check the claim in the deployer scopedlayers. This would return internal error
  • claim is missing the logs scope, shouldn't happen because we check it in deployer scopedlayer. This would return permission denied, I can add a match for this to the deployer handler, even though it is currently impossible, if we make changes in the deployer that may change.
  • the db query fails, this would return internal error.

Copy link
Contributor Author

@oddgrd oddgrd Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, after thinking about this some more, if the caller has access to this endpoint, but the logger client doesn't have access to get_logs, I think it makes sense to keep this as is, returning an internal error to the user from the deployer.

deployer/src/handlers/mod.rs Outdated Show resolved Hide resolved
Comment on lines 304 to 310
let governor_config = GovernorConfigBuilder::default()
.per_millisecond(500)
.burst_size(6)
.use_headers()
.key_extractor(TonicPeerIpKeyExtractor)
.finish()
.unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could make changes to the main code (rate limiter) and not catch it in the test because of this repeat. Maybe there should be a helper function in the main code that returns the governor? Or something that adds it to the server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I tried to extract the service builder into a utility function, but I couldn't get it to compile with the complicated types. I tried again now to just extract the config creation into a util, but that was also difficult due to some private types. I will add constants for these values, it's not the cleanest solution, but it should do the trick.

proto/src/lib.rs Show resolved Hide resolved
proto/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Member

@jonaro00 jonaro00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

proto/src/lib.rs Show resolved Hide resolved
logger/src/rate_limiting.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@chesedo chesedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@oddgrd oddgrd merged commit 4a99d4a into main Nov 21, 2023
33 checks passed
@oddgrd oddgrd deleted the feature/eng-1609-rate-limits-the-amount-of-logs-send branch November 21, 2023 11:30
oddgrd added a commit that referenced this pull request Nov 27, 2023
jonaro00 pushed a commit that referenced this pull request Nov 28, 2023
* revert: rate limit based on peer address #1351

* revert: keep cargo-shuttle logs error handling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants