Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: changes rate limiter #549

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

adityathebe
Copy link
Member

@adityathebe adityathebe commented May 16, 2024

resolves: #530

@adityathebe adityathebe force-pushed the feat/changes-rate-limiter branch 2 times, most recently from 9e039f5 to a7358dd Compare May 16, 2024 06:42
@adityathebe adityathebe changed the title Feat/changes rate limiter feat: changes rate limiter May 16, 2024
@adityathebe adityathebe marked this pull request as ready for review May 16, 2024 10:09
@adityathebe adityathebe requested a review from moshloop May 16, 2024 10:30
db/update.go Outdated
@@ -276,7 +276,11 @@ func extractChanges(ctx api.ScrapeContext, result *v1.ScrapeResult, ci *models.C
if changeResult.UpdateExisting {
updates = append(updates, change)
} else {
newOnes = append(newOnes, change)
if ok, err := ctx.TempCache().IsChangePersisted(change.ConfigID, change.ExternalChangeId); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be extremely slow - If we keep the on conflict do nothing, and add a returning config_id we can just increment the rate limit counter after insertion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the rate limit needs to happen before insertion.

We must know which changes shouldn't account for rate limits.
Maybe a one query to filter out existing changes should do the job.

db/changes.go Outdated
query := `WITH latest_changes AS (
SELECT
DISTINCT ON (config_id) config_id,
change_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rate limit should be by config id, not config id and type. - We can just doselect config_id, count(*) from config_changes where scraper_id = ? and created_at > now() - window

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rate limit is on config_id only.

https://github.com/flanksource/config-db/pull/549/files#diff-a359b758685582dec24c6f5672655ddff1c9392e2c5dfebe38eb874cc559c1e0R124

This query finds all the configs of the given scraper whose latest change was "TooManyChanges".

db/changes.go Outdated
// filterOutPersistedChanges returns only those changes that weren't seen in the db.
func filterOutPersistedChanges(ctx api.ScrapeContext, changes []*models.ConfigChange) ([]*models.ConfigChange, error) {
// use cache to filter out ones that we've already seen before
changes = lo.Filter(changes, func(c *models.ConfigChange, _ int) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache would be too big, Why can't we just insert on conflict do nothing returning config_id - That will return the config_id for changes that were inserted ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the max changes count is 10 per window and we insert a batch of 500 changes first, the rate limit would be disregarded totally.

The config_change cache isn't actualy caching the changes though. It's value is empty struct whose size=0.

We don't want to create new config changes with a ON CONFLICT DO NOTHING
clause because now we don't want the same config change to take up
multiple quotas from the rate limiter.

i.e. if an aws scraper is run @every 5m, we'll be trying to insert the
same config changes generated by the cloudtrail scraper again & again on
every run. The same change will take up one more quota from the rate
limiter on every run.

By knowing that the change already exist, we can avoid inserting that
change in the first place and the rate limiter will be happy about it.
@adityathebe adityathebe force-pushed the feat/changes-rate-limiter branch 10 times, most recently from 1957d50 to 284b39f Compare May 27, 2024 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

High change rate circuit breaker
2 participants