Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Command Parsing Refactor #373

Merged
merged 11 commits into from
May 14, 2024

Conversation

vazois
Copy link
Contributor

@vazois vazois commented May 10, 2024

This PR addresses regressions due to changes in the semantics of DrainCommand and to adhere to the new parsing restrictions.
It refactors ClusterCommands.cs into multiple files similar to how other Network commands are implemented.
The PR contains the following enchantments:

  1. Separation of cluster subcommand parsing and matching from corresponding execution portion.
  2. I removed the lowercase sequences used for matching the individual cluster subcommands. For all cluster commands, except those used for internode communication, the input token will be converted to upper case command matching is performed.
  3. Added explicit checks for the arguments of each cluster command.
  4. Implemented negative tests to validate parsing edge cases (e.g. wrong number of arguments or partially send packets).

Some notes:

  • Performance impact should be minimal since cluster commands are generally in the slow path.
  • I have prioritized checking first APPENDLOG and GOSSIP since these are generally the most frequently executed commands.

@vazois vazois force-pushed the vazois/parsing-regression-testing branch from 29f3969 to 4c07d9b Compare May 10, 2024 22:40
@PaulusParssinen
Copy link
Contributor

PaulusParssinen commented May 10, 2024

I used MemoryExtensions.SequenceEqual because I read somewhere that it offers better performance ReadOnlySpan.SequenceEqual. I could benchmark it or if someone can chime in that will be great.

They're the same method. MemoryExtensions.SequenceEqual(this ReadOnlySpan<T> a, ReadOnlySpan<T> b) is the extension method that gets binded when you do span.SequenceEqual(other).

There is LINQ Enumerable.SequenceEqual which won't get binded to spans, but you can accidentally bind it to anything that is IEnumerable 😅

In general, cluster commands are in the slow path, so I am wondering if we should use MakeUpperCase to avoid having two versions of each command type when parsing param byte sequence.

IMO yes. And in the cases where we want don't want to try help with a fast path (IIRC I think it was OperationDirection or something that had first fast-path check for already uppercase and then MakerUpperCase + SequenceEquals again) we could have a AsciiUtils.EqualsIgnoreCase helper which does this pattern for us.

out of topic, sorta: MakeUpperCase is potentially dangerous. It has no length limit so it'll keep scanning potentially out of bounds and output the client anything that it got to. Current use of it is okay-ish, as long as nothing goes wrong (I haven't tried to craft malicious payload to it, so I think that method should have some limit..) 😅

edit: Found the place where we have "fast-path" for upper case (if we know to prioritize it for some reason)

public OperationDirection GetOperationDirection(ArgSlice input)
{
// Optimize for the common case
if (input.ReadOnlySpan.SequenceEqual("LEFT"u8))
return OperationDirection.Left;
if (input.ReadOnlySpan.SequenceEqual("RIGHT"u8))
return OperationDirection.Right;
// Rare case: try making upper case and retry
MakeUpperCase(input.ptr);
if (input.ReadOnlySpan.SequenceEqual("LEFT"u8))
return OperationDirection.Left;
if (input.ReadOnlySpan.SequenceEqual("RIGHT"u8))
return OperationDirection.Right;
return OperationDirection.Unknown;
}

We could have something like:

namespace Garnet.Common;

public static class AsciiUtils
{
    public static bool EqualsIgnoreCase(byte* start, byte* end /* (or length) */, ReadOnlySpan<byte> other)
    {
        // get the ptrs
        MakeUpperCase(start, end /* so we don't cause a heartbleed 2.0 */); // We expect "other" to be already upper case AND constant because we control it.
        return span.SequenceEquals(other);
    }
}

but I guess in this case where all the comparisons are in the same method, it is worth it to have just one MakeUpperCase and then do comparisons against the constants.

@vazois
Copy link
Contributor Author

vazois commented May 13, 2024

I used MemoryExtensions.SequenceEqual because I read somewhere that it offers better performance ReadOnlySpan.SequenceEqual. I could benchmark it or if someone can chime in that will be great.

They're the same method. MemoryExtensions.SequenceEqual(this ReadOnlySpan<T> a, ReadOnlySpan<T> b) is the extension method that gets binded when you do span.SequenceEqual(other).

There is LINQ Enumerable.SequenceEqual which won't get binded to spans, but you can accidentally bind it to anything that is IEnumerable 😅

Thanks for the clarification. Are you saying span.SequenceEnqual is more readable? I don't have a personal preference. I pushed a new commit where I follow the pattern used in ParseCommand (i.e. MemoryMarshal.Read)

In general, cluster commands are in the slow path, so I am wondering if we should use MakeUpperCase to avoid having two versions of each command type when parsing param byte sequence.

IMO yes. And in the cases where we want don't want to try help with a fast path (IIRC I think it was OperationDirection or something that had first fast-path check for already uppercase and then MakerUpperCase + SequenceEquals again) we could have a AsciiUtils.EqualsIgnoreCase helper which does this pattern for us.

out of topic, sorta: MakeUpperCase is potentially dangerous. It has no length limit so it'll keep scanning potentially out of bounds and output the client anything that it got to. Current use of it is okay-ish, as long as nothing goes wrong (I haven't tried to craft malicious payload to it, so I think that method should have some limit..) 😅

edit: Found the place where we have "fast-path" for upper case (if we know to prioritize it for some reason)

public OperationDirection GetOperationDirection(ArgSlice input)
{
// Optimize for the common case
if (input.ReadOnlySpan.SequenceEqual("LEFT"u8))
return OperationDirection.Left;
if (input.ReadOnlySpan.SequenceEqual("RIGHT"u8))
return OperationDirection.Right;
// Rare case: try making upper case and retry
MakeUpperCase(input.ptr);
if (input.ReadOnlySpan.SequenceEqual("LEFT"u8))
return OperationDirection.Left;
if (input.ReadOnlySpan.SequenceEqual("RIGHT"u8))
return OperationDirection.Right;
return OperationDirection.Unknown;
}

We could have something like:

namespace Garnet.Common;

public static class AsciiUtils
{
    public static bool EqualsIgnoreCase(byte* start, byte* end /* (or length) */, ReadOnlySpan<byte> other)
    {
        // get the ptrs
        MakeUpperCase(start, end /* so we don't cause a heartbleed 2.0 */); // We expect "other" to be already upper case AND constant because we control it.
        return span.SequenceEquals(other);
    }
}

but I guess in this case where all the comparisons are in the same method, it is worth it to have just one MakeUpperCase and then do comparisons against the constants.

I will see if I can address this. For cluster commands, we already pay the cost of extracting a Span from the receive buffer, so it would be easy to have an uppercase bounded by the Span boundaries. I will have to benchmark the case where MakeUpperCase
is used in ProcessMessages.
Also I was thinking, that for internode communication commands, we don't need to call MakeUpperCase because they already formatted like this and external clients are not supposed to call them directly. So maybe I can separate the switch statement into two parts.

@vazois vazois marked this pull request as ready for review May 13, 2024 18:17
@vazois
Copy link
Contributor Author

vazois commented May 13, 2024

I used MemoryExtensions.SequenceEqual because I read somewhere that it offers better performance ReadOnlySpan.SequenceEqual. I could benchmark it or if someone can chime in that will be great.

They're the same method. MemoryExtensions.SequenceEqual(this ReadOnlySpan<T> a, ReadOnlySpan<T> b) is the extension method that gets binded when you do span.SequenceEqual(other).
There is LINQ Enumerable.SequenceEqual which won't get binded to spans, but you can accidentally bind it to anything that is IEnumerable 😅

Thanks for the clarification. Are you saying span.SequenceEnqual is more readable? I don't have a personal preference. I pushed a new commit where I follow the pattern used in ParseCommand (i.e. MemoryMarshal.Read)

In general, cluster commands are in the slow path, so I am wondering if we should use MakeUpperCase to avoid having two versions of each command type when parsing param byte sequence.

IMO yes. And in the cases where we want don't want to try help with a fast path (IIRC I think it was OperationDirection or something that had first fast-path check for already uppercase and then MakerUpperCase + SequenceEquals again) we could have a AsciiUtils.EqualsIgnoreCase helper which does this pattern for us.
out of topic, sorta: MakeUpperCase is potentially dangerous. It has no length limit so it'll keep scanning potentially out of bounds and output the client anything that it got to. Current use of it is okay-ish, as long as nothing goes wrong (I haven't tried to craft malicious payload to it, so I think that method should have some limit..) 😅
edit: Found the place where we have "fast-path" for upper case (if we know to prioritize it for some reason)

public OperationDirection GetOperationDirection(ArgSlice input)
{
// Optimize for the common case
if (input.ReadOnlySpan.SequenceEqual("LEFT"u8))
return OperationDirection.Left;
if (input.ReadOnlySpan.SequenceEqual("RIGHT"u8))
return OperationDirection.Right;
// Rare case: try making upper case and retry
MakeUpperCase(input.ptr);
if (input.ReadOnlySpan.SequenceEqual("LEFT"u8))
return OperationDirection.Left;
if (input.ReadOnlySpan.SequenceEqual("RIGHT"u8))
return OperationDirection.Right;
return OperationDirection.Unknown;
}

We could have something like:

namespace Garnet.Common;

public static class AsciiUtils
{
    public static bool EqualsIgnoreCase(byte* start, byte* end /* (or length) */, ReadOnlySpan<byte> other)
    {
        // get the ptrs
        MakeUpperCase(start, end /* so we don't cause a heartbleed 2.0 */); // We expect "other" to be already upper case AND constant because we control it.
        return span.SequenceEquals(other);
    }
}

but I guess in this case where all the comparisons are in the same method, it is worth it to have just one MakeUpperCase and then do comparisons against the constants.

I will see if I can address this. For cluster commands, we already pay the cost of extracting a Span from the receive buffer, so it would be easy to have an uppercase bounded by the Span boundaries. I will have to benchmark the case where MakeUpperCase is used in ProcessMessages. Also I was thinking, that for internode communication commands, we don't need to call MakeUpperCase because they already formatted like this and external clients are not supposed to call them directly. So maybe I can separate the switch statement into two parts.

I removed MemoryExtensions and reverted back to span.SequenceEqual. Looks much better now 😄

@badrishc
Copy link
Contributor

badrishc commented May 14, 2024

MakeUpperCase is potentially dangerous. It has no length limit so it'll keep scanning potentially out of bounds and output the client anything that it got to

MakeUpperCase does have a length limit of whatever the input buffer contains. It also stops conversion as soon as it sees a protocol character (such as \r and \n), so it should not make its way into user data even within the input buffer.

@vazois vazois force-pushed the vazois/parsing-regression-testing branch from b8a6950 to 62f8648 Compare May 14, 2024 02:55
@vazois vazois force-pushed the vazois/parsing-regression-testing branch from 16d9d25 to 1187128 Compare May 14, 2024 22:16
@vazois vazois merged commit 51667fd into microsoft:main May 14, 2024
23 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 14, 2024
@vazois vazois deleted the vazois/parsing-regression-testing branch September 17, 2024 17:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants