Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update binding params #339

Merged
merged 5 commits into from
Jul 17, 2023
Merged

Update binding params #339

merged 5 commits into from
Jul 17, 2023

Conversation

Benjamintf1
Copy link
Member

@Benjamintf1 Benjamintf1 commented Jun 30, 2023

Description

  • Unify where parsing of binding parameters happen
  • Unify the types of drains that exist. Allow users to egress timers, allow aggregate drains to egress only logs
  • Update the name of the omit-metadata flag.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Testing performed?

  • Unit tests
  • Integration tests
  • Acceptance tests

Checklist:

  • This PR is being made against the main branch, or relevant version branch
  • I have made corresponding changes to the documentation
  • I have added testing for my changes

If you have any questions, or want to get attention for a PR or issue please reach out on the #logging-and-metrics channel in the cloudfoundry slack

@Benjamintf1 Benjamintf1 requested a review from a team as a code owner June 30, 2023 19:12
Copy link
Contributor

@chombium chombium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice code clean up. I only find the drain-type value for aggregate drains not intuitive and confusing.

bindingType = syslog.BINDING_TYPE_METRIC
case "all":
bindingType = syslog.BINDING_TYPE_ALL
case "allWithTimers":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the parameter value confusing. Why is it allowWithTimers and it refers to aggregate drains? Why not simply aggregate as drain type?

Can you please explain what does allow with timers even mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good question.

the short answer of why it's allWithTimers is that...honestly I thought a work in progress name making a pr would get the idea across and I didn't want to wait forever sitting around trying to come up with something better.

Right now "ALL" type doesn't actually allow all envelopes to pass through. So we have multiple types. All let's gauges and counters and logs through. Logs let's logs through, metrics lets gauges and counters through. However, as you may know, there exists more envelope types then that.

The main one in use is

return c.toRFC5424TimerMessage(env, hostname, appID)
timers, which are in use for http start stop events, and I believe some of the newer cpu measures. There's been moves to try and push away from the usage of timers in general but it hasn't gotten that far.

The other one is

return c.toRFC5424EventMessage(env, hostname, appID)
events? I don't think they're so much used, but they exist.

Aggregate drains get them because a lot of the work towards getting aggregate drains to work was for log-cache. Log-cache needed everything, and so did some people's external integrations I presume, so they were allowed to get everything. App drains were I guess restricted to only get the "good" metrics.

So that leaves us in a tough situation if we want to make things more consistant, because anyone using "all" is likely to be surprised if they suddenly start getting timers and events, but what do you call "more then all". all-with-timers is extremly verbose, but technecially accurate without breaking backwards compatibility.

I'd be open and interested if you have ideas for alternate schemes tho.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general

  1. we've seen some people ask for metrics only aggregate drains
    and
  2. it seems foolish to me to enforce the end of timers through inconsistant egress fiat rather then via reworking diego and router to use the new types first.

Copy link
Member Author

@Benjamintf1 Benjamintf1 Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The enum AGGREGATE_DRAIN could/should probubly also be changed. Maybe should have done it. Not super comfy with "all-with-timers", but might be the best we have).

Thanks for calling attention to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Benjamintf1 Thanks for the clarification. It's good that we are thinking tin he same direction :) I also understood allWithTimers as everything that comes with all + the timers.

I find this change as a must have, but I would do it other way:
Why don't we use the same parameters as it is done in the Log Cache CLI's tail command, two parameters envelope-type and envelope-class? This way we'll have the same API on both places.

The other option that came to my mind if we don't want to change the current drain-type=all behavior, is to add another include parameter and make a union of the current all and timers and/or events.

I'm personally for unification of the API and do the same what is done in the Log Cache CLI.

What do you think about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, but let's have timers as a new type and all will return all types, logs, metrics and timers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the more natural division here is metrics(counters and gauges), logs(logs, maybe events but those kinda don't exist), and timers(aka traces).

I like this division! What if

  1. Replace allWithTimers with a timers type, and made the logs type include events.
  2. Then either introduce a breaking change or feature flag to switch all to really send everything.

(1) doesn't seem like a breaking change in my mind given the seeming nonexistence of events, and it allows for users to at least have some way to retrieve timers if they want them. It's less efficient for sure, but you could, in that world, have two different drains one for logs/metrics/all, and another for timers.

(2) would move us toward an all type that makes sense.

Copy link
Member

@ctlong ctlong Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Possibly the simplest answer for now is to remove allWithTimers from this PR and create a GH issue to discuss with how to incorporate timers moving forward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed allWithTimers.

Added new parameter "drain-data". Drain data operates under "new" logs/metrics/traces/all split vs "all/logs-and-metrics for app drains/logs/metrics". Events under logs.

Copy link
Member Author

@Benjamintf1 Benjamintf1 Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably events should be under events, but loggregator isn't a good eventing system. Events should be deprecated and removed. But events arn't used much, so it's reasonable to keep it under logs for what they're used for.

return bindingType
}

func getRemoveMetadataQuery(u *url.URL) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool way to avoid breaking changes in the API and keep backward compatibility. Well done! :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand avoiding breaking changes is important :D

…at complies with logs/metrics/traces/all split
Copy link
Contributor

@chombium chombium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that both drain-type and drain-data control the same thing, what will be forwarded by the syslog drain, which is confusing.


func getBindingType(u *url.URL) syslog.DrainData {
drainData := syslog.LOGS
switch u.Query().Get("drain-type") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Benjamintf1 I know that with this handling of the drain-type and the addition of the drain-data parameter you've kept the things 100% backward compatible, but now we have two parameters which control the same thing: what will be forwarded by the syslog drain. That is confusing. Especially if a user specifies values for both parameters.

I was hoping that you will add traces as drain-type and drain-type=all will return logs + metrics + traces. IMHO this is an enhancement of the current all behavior and not a breaking change. If someone needs some drain type which is not covered by the built in combinations, they can create multiple drains.

The other clean solution would be to remove the drain-type parameter and everything will be controlled the drain-data parameter. The users can define something like drain-data=logs,metrics or drain-data=metrics,traces and we parse the value and forward the envelope types which are specified with the parameter... It will a breaking change, but we could provide a script which will adjust the urls and replace the drain-type with an equivalent value for he drain-data parameter. If the drain-type parameter is missing we will add an drain-data value which is equivalent for the current drain-type behavior.

What do you think about this?

@ctlong Any thoughts on this?

Copy link
Contributor

@chombium chombium Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, after sleeping over it, maybe we can stay backward compatible if we make the parameters mutually exclusive.

We could have something like:

if u.Query().Get("include-metrics-deprecated") != "" {
		return syslog.ALL
}

switch u.Query().Get("drain-data") {
      case "value" :
              return syslog_type
      ...
}

switch u.Query().Get("drain-type") {
      case "value" :
              return syslog_type
      ....
}

This way:

  • if both parameters drain-data and drain-type are specified drain-data will have priority
  • the currently active behavior can be still reached if only the drain-type parameter is specified, so no breaking changes.

I find the usage of both parameters still confusing in this case, but this might be a good compromise if we don't won't to introduce breaking changes. TBH I'm more in favor of breaking changes, and do the things clean with only one parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're already mutually exclusive in the pr? The new drain-data overrides the old drain-type parameter. Also, given we havn't documented drain-type especially well, I believe most of the usage will be the old deprecated drain-cli usage, which shouldn't have mixed flags.

include-metrics-deprecated also I guess overrides both? I didn't have a paticularly good reason to make it override everything, but I believe it should mostly be being used for log-cache, which we can replace with that all type flag at some point.

Copy link
Contributor

@chombium chombium Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Benjamintf1 Thanks for the explanation. I've checked the code once again and saw that everything is as it should be.

Regarding the docs, I will adjust the Streaming app logs to log management services page and add the new parameter.

Should I maybe write that the new drain-data parameter is the way to go and drain-type is used for backward compatibility when someone sets its value to all?

@Benjamintf1 Benjamintf1 merged commit 2d71e20 into main Jul 17, 2023
@Benjamintf1 Benjamintf1 deleted the update-binding-params branch July 17, 2023 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants