Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command name to count metrics #437

Merged
merged 16 commits into from
May 31, 2023
Merged

Conversation

maheshrajamani
Copy link
Contributor

@maheshrajamani maheshrajamani commented May 23, 2023

What this PR does:
Add command name to count metrics

Which issue(s) this PR fixes:
Fixes #411

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CLA Signed: DataStax CLA

@maheshrajamani maheshrajamani requested a review from a team as a code owner May 23, 2023 18:31
private final MeterRegistry meterRegistry;

/** The tag for error being true, created only once. */
private final Tag errorTag = Tag.of("error", "true");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can probably be static? (unlike injected things like meterRegistry)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add it. The class is marked application scope so only one instance is maintained.

// Add metrics
Tags tags = Tags.of(apiTag, commandTag, statusCodeTag, errorTag);
// record
meterRegistry.counter("http_server_requests_custom_seconds_count", tags).increment();
Copy link
Contributor

@tatu-at-datastax tatu-at-datastax May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there's a good place to define these constants? If not, inlining is fine.
(looking down looks like we have MetricsConstants added so maybe add there)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add it to the Constants

* @param responseContext {@link ContainerResponseContext}
*/
@ServerResponseFilter
public void record(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use different name even if record is not technically a reserved word in Java?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change it to recordMetric

// reset the stream to fetch from beginning of stream
inputStream.reset();
// get body from requestContext
final Iterator<String> fieldIterator = objectMapper.readTree(inputStream).fieldNames();
Copy link
Contributor

@tatu-at-datastax tatu-at-datastax May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rather expensive, parsing the whole JSON contents. Is there no other way? It also requires deeper knowledge of payload than should be needed in a filter.

I guess feasibility depends on whether this is called for each and every command: if it was only for errors it'd be less problematic.
But I guess our choice to NOT use URL REST-style is making this way more challenging than it should be...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we can actually avoid most of decoding, although do need to stream through content more than once.

Since we only need to see:

{ "command-name"

it is possible to create JsonParser using ObjectMapper and use nextToken() to look only at first 2 tokens: they must be:

  1. JsonToken.START_OBJECT
  2. JsonToken.FIELD_NAME

and after (2), can call jsonParser.currentName() to get command name we need.
After that JsonParser should be close()d (frees/recycles buffers).

This would remove most of processing overhead, as long as request payload is buffered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make the change

@ServerResponseFilter
public void record(
ContainerRequestContext requestContext, ContainerResponseContext responseContext) {
String commandName = getCommandName(requestContext.getEntityStream());
Copy link
Contributor

@tatu-at-datastax tatu-at-datastax May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed this stream can always reset() reliably? And does it need to be reset after reading or before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done post command is run, so after reset is not needed. I will add a null check, if Stream object present reset won't fail.

Copy link
Contributor

@tatu-at-datastax tatu-at-datastax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about having to decode JSON for every command additional time, but that is probably necessary due to design of API. So I guess I am ok with that. There are some additional suggestions (wrt constant made static), but more generally I think I'd like someone else (Ivan) to have a look.

Copy link
Contributor

@ivansenic ivansenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to propose different solution, which has pros and cons, but definitely avoids doing that gymnastics in the filter, which imo can have negative performance for the requests..

Proposal

How about we don't count record on the HTTP level, but rather record the performance of the CommandProcessor. There we could easily use MeterRegistry directly and measure time for the command execution. We could include tags that we want and that we can take from the exceptions, context, tenant info provider, etc.

The implementation would be way easier and we would read information directly from the model objects, thus we would have way better performance.

The other libraries often have such metrics as well, and do report on in-flight messages, error rates, etc, so it wouldn't be anything unusual. For example, I know Axon has something similar for their command gateway.

The implementation can be very simple:

  public <T extends Command> Uni<CommandResult> processCommand(
      CommandContext commandContext, T command) {
     Timer timer = meterRegistry.timer(...);

      return commandResolverService
        .resolverForCommand(command)

        // resolver can be null, not handled in CommandResolverService for now
        .flatMap(
            resolver -> {
              // if we have resolver, resolve operation and execute
              Operation operation = resolver.resolveCommand(commandContext, command);
              return operation.execute(queryExecutor);
            })

       .onItemOrFailure((item, ex) -> {
           // capture metrics & handle result or exception 
       });
  }

Pros over current solution

  • can measure time and not only count
  • does not manipulation of the response input stream
  • can include tenant if available (can the current solution as well)
  • can include detailed information on the error, for example error code or error class
  • better performance

Cons over current solution

  • no information on 4xx response as they never reach the command gateway
  • measures only time in command processor (current solution does not measure time, but still we would not get complete HTTP time, but that's fine imo)

I would like to sync about this and thus putting Request changes until we agree on the approach..

@tatu-at-datastax
Copy link
Contributor

@ivansenic I like this idea: access from model does give more information.

Wrt 4xx responses; perhaps there could be different mechanism to deal with those? And if we only processed stream content (... if we must) for 4xx, it'd be less of performance concern as well. Or, alternatively, just have non-command-specific 4xx counts: not optimal but at least could alert.

@ivansenic
Copy link
Contributor

The 4xx would still be captured in the http metrics out of the box, so we would have info about those as well, just without the command name. @maheshrajamani and I agreed to go and record in the command processor and we agreed to pick up error codes, exception classes and even the info if error is caused by user or not.

@tatu-at-datastax
Copy link
Contributor

The 4xx would still be captured in the http metrics out of the box, so we would have info about those as well, just without the command name. @maheshrajamani and I agreed to go and record in the command processor and we agreed to pick up error codes, exception classes and even the info if error is caused by user or not.

Excellent! Sounds like a plan.

Copy link
Contributor

@ivansenic ivansenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I provided a lot of good suggestions, please have a look..

@maheshrajamani maheshrajamani requested a review from ivansenic May 25, 2023 16:37
Copy link
Contributor

@ivansenic ivansenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check what comments do make and adapt.. Approving to remove the need for wait..

@maheshrajamani maheshrajamani merged commit e579f53 into main May 31, 2023
@maheshrajamani maheshrajamani deleted the add-metrics-by-command branch May 31, 2023 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add command name and errors to the server-side HTTP metrics
3 participants