Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize source_lag_time_seconds to component_lag_time_seconds #14379

Open
binarylogic opened this issue Sep 12, 2022 · 4 comments
Open

Generalize source_lag_time_seconds to component_lag_time_seconds #14379

binarylogic opened this issue Sep 12, 2022 · 4 comments
Labels
domain: observability Anything related to monitoring/observing Vector type: feature A value-adding code addition that introduce new functionality.

Comments

@binarylogic
Copy link
Contributor

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

  • Calculate upstream lag from any component in Vector
  • Calculate internal latency by subtracting source and sink lag

Attempted Solutions

No response

Proposal

We should generalize source_lag_time_seconds to component_lag_time_seconds. This will allow us to observe upstream lag from any component in the topology. As a side benefits we can also derive Vector's internal latency. I realize this is an imperfect metric, but it's a simple first step that'll eliminate these blind spots until we can implement something more sophisticated.

References

No response

Version

No response

@binarylogic binarylogic added type: feature A value-adding code addition that introduce new functionality. domain: observability Anything related to monitoring/observing Vector labels Sep 12, 2022
@bruceg
Copy link
Member

bruceg commented Sep 12, 2022

I agree this is the least effort way of generating these values. I do have some concerns.

  1. Can we run into a situation where different logs coming into a component (transform or sink) could have the timestamp in differently named fields due to different source schemas? Presumably we can account for this in the schema metadata.
  2. Will this become tech debt if/when we move to a more reliable time source (ie an ingest timestamp not affected by either remote clock synchronization problems or network latencies)?
  3. I think we will want to distinguish these latencies based on the source component, which would require recording a source component identifier in each event. That probably requires a bit of a discussion on how best to store that (dedicated field in Metadata vs something in the dynamic metadata).
  4. Given the above, how are merged events handled, both in terms of the timestamps and combining source identifiers?

@binarylogic
Copy link
Contributor Author

binarylogic commented Sep 13, 2022

  1. Yes, schemas is the solution for this. Until then we will ignore events that don't use the proper timestamp field name.
  2. No, it'll always be an interesting data point. Granted, it will become more of a secondary signal but still useful.
  3. Sure, this is beyond the scope of this issue though. I think it's fine to have a single data point that's based off of the event's timestamp as one perspective on upstream lag.
  4. Because this is happening in the topology, merging shouldn't be a concern.

To reiterate, this is just one signal/layer on the overall picture of lag and latency. I'm not looking for a single source of truth. As we implement better metrics this will become less important, but it's still useful nonetheless.

@smitthakkar96
Copy link

@jszwedko, any idea when this will be prioritised?

@jszwedko
Copy link
Member

@jszwedko, any idea when this will be prioritised?

Unfortunately it isn't on our roadmap at the moment. We'd be happy to help support a contribution for it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: observability Anything related to monitoring/observing Vector type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

4 participants