Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many MOVED error in # Errorstats section of INFO ALL #4118

Closed
chakaz opened this issue Nov 12, 2024 · 1 comment · Fixed by #4125
Closed

Too many MOVED error in # Errorstats section of INFO ALL #4118

chakaz opened this issue Nov 12, 2024 · 1 comment · Fixed by #4125
Assignees
Labels
bug Something isn't working

Comments

@chakaz
Copy link
Collaborator

chakaz commented Nov 12, 2024

Dragonfly saves count per error here:

void RedisReplyBuilderBase::SendError(std::string_view str, std::string_view type) {
ReplyScope scope(this);
if (type.empty()) {
type = str;
if (type == kSyntaxErr)
type = kSyntaxErrType;
}
tl_facade_stats->reply_stats.err_count[type]++;
last_error_ = str;
if (str[0] != '-')
WritePieces("-ERR ", str, kCRLF);
else
WritePieces(str, kCRLF);
}

The issue is that, we type is empty (which it is for MOVED replies), then we save an entry per str, which could be 16k in a regular situation, or more if we do slot migrations.

Then, all INFO replies fetch this information (from all threads), even if the ERRORSTATS section is not requested.

This is a general problem in the way we do error handling, but hopefully most errors contain constant strings instead of slot id + host.

To fix this only for MOVED (like I propose) we could simply add some special handling in the above code (if starts with MOVED then...)

@chakaz chakaz added the bug Something isn't working label Nov 12, 2024
@chakaz chakaz self-assigned this Nov 12, 2024
@adiholden adiholden added this to the dfly cluster v4 milestone Nov 12, 2024
@romange
Copy link
Collaborator

romange commented Nov 12, 2024

@chakaz type was designed exactly for this.

@chakaz chakaz closed this as completed in 1513134 Nov 14, 2024
romange pushed a commit that referenced this issue Nov 20, 2024
**The problem:**

When in cluster mode, `MOVED` replies (which are arguably not even errors) are aggregated per slot-id + remote host, and displayed in `# Errorstats` as such. For example, in a server that does _not_ own 8k slots, we will aggregate 8k different errors, and their counts (in memory).

This slows down all `INFO` replies, takes a lot of memory, and also makes `INFO` replies very long.

**The fix:**

Use `type` `MOVED` for moved replies, making them all the same under `# Errorstats`

Fixes #4118
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants