From c03d1e1c30399b0acbdebe7512c6f43497eba7ac Mon Sep 17 00:00:00 2001 From: Stephen Wakely Date: Fri, 5 May 2023 11:28:02 +0100 Subject: [PATCH 01/13] Initial draft of RFC Signed-off-by: Stephen Wakely --- rfcs/2023-05-03-data-volume-metrics.md | 146 +++++++++++++++++++++++++ 1 file changed, 146 insertions(+) create mode 100644 rfcs/2023-05-03-data-volume-metrics.md diff --git a/rfcs/2023-05-03-data-volume-metrics.md b/rfcs/2023-05-03-data-volume-metrics.md new file mode 100644 index 0000000000000..a1903c4d12421 --- /dev/null +++ b/rfcs/2023-05-03-data-volume-metrics.md @@ -0,0 +1,146 @@ +# RFC - 2023-05-02 - Data Volume Insights metrics + +Vector needs to be able to emit accurate metrics that can be usefully queried +to give users insights into the volume of data moving through the system. + +## Scope + +### In scope + +- All event metrics within Vector need to emit the estimated JSON size of the + event. With a consistent method for determining the size it will be easier + to accurately compare data in vs data out. +- The metrics sent by each sink needs to be tagged with the source id of the + event so the route an event takes through Vector can be queried. +- Each event needs to be labelled with a `service`. This is a new concept + within Vector and represents the application that generated the log or + metric. +- The service tag and source tag in the metrics needs to be opt in so customers + that don't need the increased cardinality are unaffected. + +### Out of scope + +- A separete metric, `component_sent_bytes_total` that indicates network bytes + sent by Vector is not considered here. + +## Pain + +Currently it is difficult to accurately gauge the volume of data that is moving +through Vector. It is difficult to query where data being sent out has come +from. + +## Proposal + +### User Experience + +Global config optionns will be provided allowing the `service` tag and the +`source` tag to be specified. For example: + +```yaml +metrics: + tags: + service: service + source: input +``` + +The default will be to not emit these tags. + +### Implementation + +#### Metric tags + +*service* - to attach the service, we need to add a new meaning to Vector - *service*. Any sources that + receive data that could potentially be considered a service will need to indicate which field + means `service`. + This work has largely already been done with the LogNamespacing work, so it will be trivial to add + this new field. + +*source* - A new field will be added to the [Event metadata][event_metadata] - `Arc` that will indicate the source + of the event. + +We will need to do an audit of all components to ensure the bytes emitted for the `component_received_event_bytes_total` +and `component_sent_event_bytes_total` metrics are the estimated JSON size of the event. + +#### `component_received_event_bytes_total` + +This metric is emitted by the framework [here][source_sender], so it looks like the only change needed is +to add the service tag. + +#### `component_sent_event_bytes_total` + +For stream based sinks this will typically be the byte value returned by `DriverResponse::events_sent`. + +Despite being in the [Component Spec][component_spec], not all sinks currently conform to this. + +As an example, from a cursory glance over a couple of sinks: + +The Amqp sink currently emits this value as the length of the binary data that is sent. By the time the data has +reached the code where the `component_sent_event_bytes_total` event is emitted, that event has been encoded +and the actual estimated JSON size has been lost. The sink will need to be updated so that when the event is +encoded, the encoded event together with the pre-encoded JSON bytesize will be sent to the service where the +event is emitted. + +The Kafka sink also currently sends the binary size, but it looks like the estimated JSON bytesize is easily +accessible at the point of emitting, so would not need too much of a change. + +To ensure that the correct metric is sent in a typesafe manner, we will wrap the estimated JSON size in a +newtype: + +```rust +pub struct JsonSize(usize); +``` + +The `EventsSent` metric will only accept this type. + +## Rationale + +The ability to visualize data flowing through Vector will allow users to ascertain +the effectiveness of the current use of Vector. This will enable users to +optimise their configurations to make the best use of Vector's features. + +## Drawbacks + +The additional tags being added to the metrics will increase the cardinality of +those metrics if they are enabled. + +## Prior Art + + +## Alternatives + +We could use an alternative metric instead of estimated JSON size. + +- *Network bytes* This provides a more accurate picture of the actual data being received + and sent by Vector, but will regularly produce different sizes for an incoming event + to an outgoing event. +- *In memory size* The size of the event as held in memory. This may be more accurate in + determining the amount of memory Vector will be utilizing at any time, will often be + less accurate compared to the data being sent and recieved which is often JSON. + +## Outstanding Questions + +- Should the default be to not emit tags? Could the default be different for OP users and standard Vector + users? +- What should the source be tagged as for events that go through a transform that could potentially + combine one event from multiple sources - eg. `reduce`? +- What should the source be for events that come from no source (emitted by the `lua` transform)? +- Not all sources will have a `service`. Should we default the service to a value, emit a null service tag + or not include the tag at all if there is no service? + +## Plan Of Attack + +Incremental steps to execute this change. These will be converted to issues after the RFC is approved: + +- [ ] Submit a PR with spike-level code _roughly_ demonstrating the change. +- [ ] Incremental change #1 +- [ ] Incremental change #2 +- [ ] ... + +Note: This can be filled out during the review process. + +## Future Improvements + + +[component_spec]: https://github.com/vectordotdev/vector/blob/master/docs/specs/component.md#componenteventssent +[source_sender]: https://github.com/vectordotdev/vector/blob/master/src/source_sender/mod.rs#L265-L268 +[event_metadata]: https://github.com/vectordotdev/vector/blob/master/lib/vector-core/src/event/metadata.rs#L20-L38 \ No newline at end of file From 89fd5de972faf0840d49ef137f9d6e2636099d27 Mon Sep 17 00:00:00 2001 From: Stephen Wakely Date: Fri, 5 May 2023 11:42:15 +0100 Subject: [PATCH 02/13] Spelling Signed-off-by: Stephen Wakely --- rfcs/2023-05-03-data-volume-metrics.md | 28 +++++++++++++------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/rfcs/2023-05-03-data-volume-metrics.md b/rfcs/2023-05-03-data-volume-metrics.md index a1903c4d12421..a1b77c4b7e799 100644 --- a/rfcs/2023-05-03-data-volume-metrics.md +++ b/rfcs/2023-05-03-data-volume-metrics.md @@ -20,7 +20,7 @@ to give users insights into the volume of data moving through the system. ### Out of scope -- A separete metric, `component_sent_bytes_total` that indicates network bytes +- A separate metric, `component_sent_bytes_total` that indicates network bytes sent by Vector is not considered here. ## Pain @@ -33,7 +33,7 @@ from. ### User Experience -Global config optionns will be provided allowing the `service` tag and the +Global config options will be provided allowing the `service` tag and the `source` tag to be specified. For example: ```yaml @@ -51,7 +51,7 @@ The default will be to not emit these tags. *service* - to attach the service, we need to add a new meaning to Vector - *service*. Any sources that receive data that could potentially be considered a service will need to indicate which field - means `service`. + means `service`. This work has largely already been done with the LogNamespacing work, so it will be trivial to add this new field. @@ -64,17 +64,17 @@ and `component_sent_event_bytes_total` metrics are the estimated JSON size of th #### `component_received_event_bytes_total` This metric is emitted by the framework [here][source_sender], so it looks like the only change needed is -to add the service tag. +to add the service tag. #### `component_sent_event_bytes_total` -For stream based sinks this will typically be the byte value returned by `DriverResponse::events_sent`. +For stream based sinks this will typically be the byte value returned by `DriverResponse::events_sent`. Despite being in the [Component Spec][component_spec], not all sinks currently conform to this. As an example, from a cursory glance over a couple of sinks: -The Amqp sink currently emits this value as the length of the binary data that is sent. By the time the data has +The Amqp sink currently emits this value as the length of the binary data that is sent. By the time the data has reached the code where the `component_sent_event_bytes_total` event is emitted, that event has been encoded and the actual estimated JSON size has been lost. The sink will need to be updated so that when the event is encoded, the encoded event together with the pre-encoded JSON bytesize will be sent to the service where the @@ -83,7 +83,7 @@ event is emitted. The Kafka sink also currently sends the binary size, but it looks like the estimated JSON bytesize is easily accessible at the point of emitting, so would not need too much of a change. -To ensure that the correct metric is sent in a typesafe manner, we will wrap the estimated JSON size in a +To ensure that the correct metric is sent in a type-safe manner, we will wrap the estimated JSON size in a newtype: ```rust @@ -95,12 +95,12 @@ The `EventsSent` metric will only accept this type. ## Rationale The ability to visualize data flowing through Vector will allow users to ascertain -the effectiveness of the current use of Vector. This will enable users to +the effectiveness of the current use of Vector. This will enable users to optimise their configurations to make the best use of Vector's features. ## Drawbacks -The additional tags being added to the metrics will increase the cardinality of +The additional tags being added to the metrics will increase the cardinality of those metrics if they are enabled. ## Prior Art @@ -111,17 +111,17 @@ those metrics if they are enabled. We could use an alternative metric instead of estimated JSON size. - *Network bytes* This provides a more accurate picture of the actual data being received - and sent by Vector, but will regularly produce different sizes for an incoming event + and sent by Vector, but will regularly produce different sizes for an incoming event to an outgoing event. - *In memory size* The size of the event as held in memory. This may be more accurate in - determining the amount of memory Vector will be utilizing at any time, will often be - less accurate compared to the data being sent and recieved which is often JSON. + determining the amount of memory Vector will be utilizing at any time, will often be + less accurate compared to the data being sent and received which is often JSON. ## Outstanding Questions -- Should the default be to not emit tags? Could the default be different for OP users and standard Vector +- Should the default be to not emit tags? Could the default be different for OP users and standard Vector users? -- What should the source be tagged as for events that go through a transform that could potentially +- What should the source be tagged as for events that go through a transform that could potentially combine one event from multiple sources - eg. `reduce`? - What should the source be for events that come from no source (emitted by the `lua` transform)? - Not all sources will have a `service`. Should we default the service to a value, emit a null service tag From 5abf58b1442218721c2d13c47b11fa80d13d07fe Mon Sep 17 00:00:00 2001 From: Stephen Wakely Date: Fri, 5 May 2023 11:43:20 +0100 Subject: [PATCH 03/13] Remove issue num Signed-off-by: Stephen Wakely --- rfcs/2023-05-03-data-volume-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2023-05-03-data-volume-metrics.md b/rfcs/2023-05-03-data-volume-metrics.md index a1b77c4b7e799..37100da291494 100644 --- a/rfcs/2023-05-03-data-volume-metrics.md +++ b/rfcs/2023-05-03-data-volume-metrics.md @@ -1,4 +1,4 @@ -# RFC - 2023-05-02 - Data Volume Insights metrics +# RFC 2023-05-02 - Data Volume Insights metrics Vector needs to be able to emit accurate metrics that can be usefully queried to give users insights into the volume of data moving through the system. From 1c7a7be34171030a3d27a4399962002827950469 Mon Sep 17 00:00:00 2001 From: Stephen Wakely Date: Wed, 10 May 2023 11:52:51 +0100 Subject: [PATCH 04/13] Feedback from Jesse Signed-off-by: Stephen Wakely --- rfcs/2023-05-03-data-volume-metrics.md | 64 ++++++++++++++++---------- 1 file changed, 40 insertions(+), 24 deletions(-) diff --git a/rfcs/2023-05-03-data-volume-metrics.md b/rfcs/2023-05-03-data-volume-metrics.md index 37100da291494..cc1029667d1ac 100644 --- a/rfcs/2023-05-03-data-volume-metrics.md +++ b/rfcs/2023-05-03-data-volume-metrics.md @@ -7,21 +7,25 @@ to give users insights into the volume of data moving through the system. ### In scope -- All event metrics within Vector need to emit the estimated JSON size of the - event. With a consistent method for determining the size it will be easier - to accurately compare data in vs data out. +- All volume event metrics within Vector need to emit the estimated JSON size of the + event. With a consistent method for determining the size it will be easier to accurately + compare data in vs data out. + - `component_received_event_bytes_total` + - `component_sent_event_bytes_total` + - `component_received_event_total` + - `component_sent_event_total` - The metrics sent by each sink needs to be tagged with the source id of the event so the route an event takes through Vector can be queried. - Each event needs to be labelled with a `service`. This is a new concept - within Vector and represents the application that generated the log or - metric. + within Vector and represents the application that generated the log, + metric or trace. - The service tag and source tag in the metrics needs to be opt in so customers that don't need the increased cardinality are unaffected. ### Out of scope -- A separate metric, `component_sent_bytes_total` that indicates network bytes - sent by Vector is not considered here. +- Separate metrics, `component_sent_bytes_total` and `component_received_bytes_total` + that indicate network bytes sent by Vector are not considered here. ## Pain @@ -33,14 +37,20 @@ from. ### User Experience -Global config options will be provided allowing the `service` tag and the +Global config options will be provided allowing the name of the `service` tag and the `source` tag to be specified. For example: ```yaml -metrics: +telemetry: tags: - service: service - source: input + service: theservice + source: theinput +``` + +This will cause Vector to emit a metric like (note the last two tags): + +```statds +vector.component_sent_event_bytes_total:123|c|#component_id:out,component_kind:sink,component_name:out,component_type:console,host:machine,theservice:somekindofservice,theinput:stdin ``` The default will be to not emit these tags. @@ -49,18 +59,20 @@ The default will be to not emit these tags. #### Metric tags -*service* - to attach the service, we need to add a new meaning to Vector - *service*. Any sources that - receive data that could potentially be considered a service will need to indicate which field - means `service`. - This work has largely already been done with the LogNamespacing work, so it will be trivial to add - this new field. +**service** - to attach the service, we need to add a new meaning to Vector - *service*. Any sources that + receive data that could potentially be considered a service will need to indicate which field + means `service`. + This work has largely already been done with the LogNamespacing work, so it will be trivial to add + this new field. -*source* - A new field will be added to the [Event metadata][event_metadata] - `Arc` that will indicate the source - of the event. +**source** - A new field will be added to the [Event metadata][event_metadata] - `Arc` that will indicate the source + of the event. We will need to do an audit of all components to ensure the bytes emitted for the `component_received_event_bytes_total` and `component_sent_event_bytes_total` metrics are the estimated JSON size of the event. +These tags will be given the name that was configured in [User Experience](#user-experience). + #### `component_received_event_bytes_total` This metric is emitted by the framework [here][source_sender], so it looks like the only change needed is @@ -103,6 +115,9 @@ optimise their configurations to make the best use of Vector's features. The additional tags being added to the metrics will increase the cardinality of those metrics if they are enabled. +We will lose the ability to preregister the metrics since the tags will need to be +dynamic. This will cause a noticable, but likely negligible performance loss. + ## Prior Art @@ -131,12 +146,13 @@ We could use an alternative metric instead of estimated JSON size. Incremental steps to execute this change. These will be converted to issues after the RFC is approved: -- [ ] Submit a PR with spike-level code _roughly_ demonstrating the change. -- [ ] Incremental change #1 -- [ ] Incremental change #2 -- [ ] ... - -Note: This can be filled out during the review process. +- [ ] Add the `source` field to the Event metadata to indicate the source the event has come from. +- [ ] Update the Volume event metrics to take a `JsonSize` value. Use the compiler to ensure all metrics + emitted use this. +- [ ] Add the Service meaning. Update any sources that potentially create a service to point the meaning + to the relevant field. +- [ ] Update the emitted events to accept the new tags - taking the `telemetry` configuration options + into account. ## Future Improvements From d7e6c6e509be33594843ffda422169ffee8f54aa Mon Sep 17 00:00:00 2001 From: Stephen Wakely Date: Wed, 10 May 2023 12:28:45 +0100 Subject: [PATCH 05/13] Added details on no source or service tags Signed-off-by: Stephen Wakely --- rfcs/2023-05-03-data-volume-metrics.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/rfcs/2023-05-03-data-volume-metrics.md b/rfcs/2023-05-03-data-volume-metrics.md index cc1029667d1ac..e5019785950f8 100644 --- a/rfcs/2023-05-03-data-volume-metrics.md +++ b/rfcs/2023-05-03-data-volume-metrics.md @@ -24,7 +24,7 @@ to give users insights into the volume of data moving through the system. ### Out of scope -- Separate metrics, `component_sent_bytes_total` and `component_received_bytes_total` +- Separate metrics, `component_sent_bytes_total` and `component_received_bytes_total` that indicate network bytes sent by Vector are not considered here. ## Pain @@ -73,6 +73,14 @@ and `component_sent_event_bytes_total` metrics are the estimated JSON size of th These tags will be given the name that was configured in [User Experience](#user-experience). +Transforms `reduce` and `aggregate` combine multiple events together. In this case the `source` and `service` +of the first event will be taken. + +If there is no `source` specified (the event was created by the `lua` trnasform) - a source of `_no_source` will +be emitted. + +If there is no `service` available, a service of `_no_service` will be emitted. + #### `component_received_event_bytes_total` This metric is emitted by the framework [here][source_sender], so it looks like the only change needed is @@ -115,7 +123,7 @@ optimise their configurations to make the best use of Vector's features. The additional tags being added to the metrics will increase the cardinality of those metrics if they are enabled. -We will lose the ability to preregister the metrics since the tags will need to be +We will lose the ability to preregister the metrics since the tags will need to be dynamic. This will cause a noticable, but likely negligible performance loss. ## Prior Art @@ -134,14 +142,6 @@ We could use an alternative metric instead of estimated JSON size. ## Outstanding Questions -- Should the default be to not emit tags? Could the default be different for OP users and standard Vector - users? -- What should the source be tagged as for events that go through a transform that could potentially - combine one event from multiple sources - eg. `reduce`? -- What should the source be for events that come from no source (emitted by the `lua` transform)? -- Not all sources will have a `service`. Should we default the service to a value, emit a null service tag - or not include the tag at all if there is no service? - ## Plan Of Attack Incremental steps to execute this change. These will be converted to issues after the RFC is approved: @@ -151,7 +151,7 @@ Incremental steps to execute this change. These will be converted to issues afte emitted use this. - [ ] Add the Service meaning. Update any sources that potentially create a service to point the meaning to the relevant field. -- [ ] Update the emitted events to accept the new tags - taking the `telemetry` configuration options +- [ ] Update the emitted events to accept the new tags - taking the `telemetry` configuration options into account. ## Future Improvements From c13861bc1586a2d2aa8bff1643d789ff58f72323 Mon Sep 17 00:00:00 2001 From: Stephen Wakely Date: Thu, 11 May 2023 13:15:21 +0100 Subject: [PATCH 06/13] Responding to feedback Signed-off-by: Stephen Wakely --- rfcs/2023-05-03-data-volume-metrics.md | 68 +++++++++++++++----------- 1 file changed, 40 insertions(+), 28 deletions(-) diff --git a/rfcs/2023-05-03-data-volume-metrics.md b/rfcs/2023-05-03-data-volume-metrics.md index e5019785950f8..af89c6a4f8ed9 100644 --- a/rfcs/2023-05-03-data-volume-metrics.md +++ b/rfcs/2023-05-03-data-volume-metrics.md @@ -59,52 +59,64 @@ The default will be to not emit these tags. #### Metric tags -**service** - to attach the service, we need to add a new meaning to Vector - *service*. Any sources that - receive data that could potentially be considered a service will need to indicate which field - means `service`. - This work has largely already been done with the LogNamespacing work, so it will be trivial to add - this new field. +**service** - to attach the service, we need to add a new meaning to Vector - + `service`. Any sources that receive data that could potentially + be considered a service will need to indicate which field means + `service`. This work has largely already been done with the + LogNamespacing work, so it will be trivial to add this new field. -**source** - A new field will be added to the [Event metadata][event_metadata] - `Arc` that will indicate the source - of the event. +**source** - A new field will be added to the [Event metadata][event_metadata] - + `Arc` that will indicate the source of the event. + `OutputId` will need to be serializable so it can be stored in the + disk buffer. If the event is loaded from the buffer and it is pointing + to a source that no longer exists, given that it is just an identifier + it won't cause any issues. -We will need to do an audit of all components to ensure the bytes emitted for the `component_received_event_bytes_total` -and `component_sent_event_bytes_total` metrics are the estimated JSON size of the event. +We will need to do an audit of all components to ensure the +bytes emitted for the `component_received_event_bytes_total` and +`component_sent_event_bytes_total` metrics are the estimated JSON size of the +event. -These tags will be given the name that was configured in [User Experience](#user-experience). +These tags will be given the name that was configured in [User Experience] +(#user-experience). -Transforms `reduce` and `aggregate` combine multiple events together. In this case the `source` and `service` -of the first event will be taken. +Transforms `reduce` and `aggregate` combine multiple events together. In this +case the `source` and `service` of the first event will be taken. -If there is no `source` specified (the event was created by the `lua` trnasform) - a source of `_no_source` will -be emitted. +If there is no `source` specified (the event was created by the `lua` transform) +- a source of `-` will be emitted. -If there is no `service` available, a service of `_no_service` will be emitted. +If there is no `service` available, a service of `-` will be emitted. #### `component_received_event_bytes_total` -This metric is emitted by the framework [here][source_sender], so it looks like the only change needed is -to add the service tag. +This metric is emitted by the framework [here][source_sender], so it looks like +the only change needed is to add the service tag. #### `component_sent_event_bytes_total` -For stream based sinks this will typically be the byte value returned by `DriverResponse::events_sent`. +For stream based sinks this will typically be the byte value returned by +`DriverResponse::events_sent`. -Despite being in the [Component Spec][component_spec], not all sinks currently conform to this. +Despite being in the [Component Spec][component_spec], not all sinks currently +conform to this. As an example, from a cursory glance over a couple of sinks: -The Amqp sink currently emits this value as the length of the binary data that is sent. By the time the data has -reached the code where the `component_sent_event_bytes_total` event is emitted, that event has been encoded -and the actual estimated JSON size has been lost. The sink will need to be updated so that when the event is -encoded, the encoded event together with the pre-encoded JSON bytesize will be sent to the service where the -event is emitted. +The Amqp sink currently emits this value as the length of the binary +data that is sent. By the time the data has reached the code where the +`component_sent_event_bytes_total` event is emitted, that event has been +encoded and the actual estimated JSON size has been lost. The sink will need +to be updated so that when the event is encoded, the encoded event together +with the pre-encoded JSON bytesize will be sent to the service where the event +is emitted. -The Kafka sink also currently sends the binary size, but it looks like the estimated JSON bytesize is easily -accessible at the point of emitting, so would not need too much of a change. +The Kafka sink also currently sends the binary size, but it looks like the +estimated JSON bytesize is easily accessible at the point of emitting, so would +not need too much of a change. -To ensure that the correct metric is sent in a type-safe manner, we will wrap the estimated JSON size in a -newtype: +To ensure that the correct metric is sent in a type-safe manner, we will wrap +the estimated JSON size in a newtype: ```rust pub struct JsonSize(usize); From fdff8f54b431d737955af55f4d2e4275fba8eeb6 Mon Sep 17 00:00:00 2001 From: Stephen Wakely Date: Wed, 17 May 2023 14:47:55 +0100 Subject: [PATCH 07/13] Responding to feedback Signed-off-by: Stephen Wakely --- rfcs/2023-05-03-data-volume-metrics.md | 62 +++++++++++++++++--------- 1 file changed, 40 insertions(+), 22 deletions(-) diff --git a/rfcs/2023-05-03-data-volume-metrics.md b/rfcs/2023-05-03-data-volume-metrics.md index af89c6a4f8ed9..367ac132395e8 100644 --- a/rfcs/2023-05-03-data-volume-metrics.md +++ b/rfcs/2023-05-03-data-volume-metrics.md @@ -37,20 +37,21 @@ from. ### User Experience -Global config options will be provided allowing the name of the `service` tag and the -`source` tag to be specified. For example: +Global config options will be provided to indicate that the `service` tag and the +`source` tag should be sent. For example: ```yaml telemetry: tags: - service: theservice - source: theinput + service: true + source_id: true ``` This will cause Vector to emit a metric like (note the last two tags): -```statds -vector.component_sent_event_bytes_total:123|c|#component_id:out,component_kind:sink,component_name:out,component_type:console,host:machine,theservice:somekindofservice,theinput:stdin +```prometheus +vector_component_sent_event_bytes_total{component_id="out",component_kind="sink",component_name="out",component_type="console" + ,host="machine",service="potato",source_id="stdin"} 123 ``` The default will be to not emit these tags. @@ -59,18 +60,21 @@ The default will be to not emit these tags. #### Metric tags -**service** - to attach the service, we need to add a new meaning to Vector - - `service`. Any sources that receive data that could potentially - be considered a service will need to indicate which field means - `service`. This work has largely already been done with the +**service** - to attach the service, we need to add a new meaning to Vector - + `service`. Any sources that receive data that could potentially + be considered a service will need to indicate which field means + `service`. This work has largely already been done with the LogNamespacing work, so it will be trivial to add this new field. + Not all sources will be able to specify a specific field to + indicate the `service`. In time it will be possible for this to + be accomplished through `VRL`. -**source** - A new field will be added to the [Event metadata][event_metadata] - - `Arc` that will indicate the source of the event. - `OutputId` will need to be serializable so it can be stored in the - disk buffer. If the event is loaded from the buffer and it is pointing - to a source that no longer exists, given that it is just an identifier - it won't cause any issues. +**source_id** - A new field will be added to the [Event metadata][event_metadata] - + `Arc` that will indicate the source of the event. + `OutputId` will need to be serializable so it can be stored in + the disk buffer. Since this field is just an identifier, it can + still be used even if the source no longer exists when the event + is consumed by a sink. We will need to do an audit of all components to ensure the bytes emitted for the `component_received_event_bytes_total` and @@ -83,11 +87,19 @@ These tags will be given the name that was configured in [User Experience] Transforms `reduce` and `aggregate` combine multiple events together. In this case the `source` and `service` of the first event will be taken. -If there is no `source` specified (the event was created by the `lua` transform) -- a source of `-` will be emitted. +If there is no `source` a source of `-` will be emitted. The only way this can +happen is if the event was created by the `lua` transform. If there is no `service` available, a service of `-` will be emitted. +Emitting a `-` rather than not emitting anything at all makes it clear that +there was no value rather than it just having been forgotten and ensures it +is clear that the metric represents no `service` or `source` rather than the +aggregate value across all services. + +The [Component Spec][component_spec] will need updating to indicate these tags +will need including. + #### `component_received_event_bytes_total` This metric is emitted by the framework [here][source_sender], so it looks like @@ -124,6 +136,13 @@ pub struct JsonSize(usize); The `EventsSent` metric will only accept this type. +### Registered metrics + +It is currently not possible to have dynamic tags with preregistered metrics. + +We will need to introduce a registered event caching layer that will register +and cache new events keyed on the tags that are sent to it. + ## Rationale The ability to visualize data flowing through Vector will allow users to ascertain @@ -135,9 +154,6 @@ optimise their configurations to make the best use of Vector's features. The additional tags being added to the metrics will increase the cardinality of those metrics if they are enabled. -We will lose the ability to preregister the metrics since the tags will need to be -dynamic. This will cause a noticable, but likely negligible performance loss. - ## Prior Art @@ -160,14 +176,16 @@ Incremental steps to execute this change. These will be converted to issues afte - [ ] Add the `source` field to the Event metadata to indicate the source the event has come from. - [ ] Update the Volume event metrics to take a `JsonSize` value. Use the compiler to ensure all metrics - emitted use this. + emitted use this. The `EstimatedJsonEncodedSizeOf` trait will be updated return a `JsonSize`. - [ ] Add the Service meaning. Update any sources that potentially create a service to point the meaning to the relevant field. +- [ ] Introduce an event caching layer that caches registered events based on the tags sent to it. - [ ] Update the emitted events to accept the new tags - taking the `telemetry` configuration options into account. ## Future Improvements +- Logs emitted by Vector should also be tagged with `source_id` and `service`. [component_spec]: https://github.com/vectordotdev/vector/blob/master/docs/specs/component.md#componenteventssent [source_sender]: https://github.com/vectordotdev/vector/blob/master/src/source_sender/mod.rs#L265-L268 From 86836d3d4a7bfb1eaae36219d1414fae23fd0fe3 Mon Sep 17 00:00:00 2001 From: Stephen Wakely Date: Wed, 24 May 2023 11:02:40 +0100 Subject: [PATCH 08/13] Added details about the event caching Signed-off-by: Stephen Wakely --- rfcs/2023-05-03-data-volume-metrics.md | 43 ++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/rfcs/2023-05-03-data-volume-metrics.md b/rfcs/2023-05-03-data-volume-metrics.md index 367ac132395e8..5881cf33903b6 100644 --- a/rfcs/2023-05-03-data-volume-metrics.md +++ b/rfcs/2023-05-03-data-volume-metrics.md @@ -100,6 +100,13 @@ aggregate value across all services. The [Component Spec][component_spec] will need updating to indicate these tags will need including. +**Performace** - There is going to be a performance hit when emitting these metrics. +Currently for each batch a simple event is emitted containing the count and size +of the entire batch. With this change it will be necessary to scan the entire +batch to obtain the count of source, service combinations of events before emitting +the counts. This will involve additional allocations to maintain the counts as well +as the O(1) scan. + #### `component_received_event_bytes_total` This metric is emitted by the framework [here][source_sender], so it looks like @@ -140,9 +147,41 @@ The `EventsSent` metric will only accept this type. It is currently not possible to have dynamic tags with preregistered metrics. +Preregistering these metrics are essential to ensure that they don't expire. + +The current mechanism to expire metrics is to check if a handle to the given +metric is being held. If it isn't, and nothing has updated that metric in +the last cycle - the metric is dropped. If a metric is dropped, the next time +that event is emitted with those tags, the count starts at zero again. + We will need to introduce a registered event caching layer that will register and cache new events keyed on the tags that are sent to it. +Currently a registered metrics is stored in a `Registered`. + +We will need a new struct that can wrap this that will be generic over a tuple of +the tags for each event and the event - `Cached<(String, String), EventSent>`. +This struct will maintain a BTreeMap of tags -> `Registered`. In pseudo rust: + +```rust +struct Registered { + cache: BTreemap>, + register: Fn(Tags) -> Registered, +} + +impl Registered { + fn emit(&mut self, tags: Tags, value: Event) -> { + if Some(event) = self.cache.get(tags) { + event.emit(value); + } else { + let event = self.register(tags); + event.emit(value); + self.cache.insert(tags, event); + } + } +} +``` + ## Rationale The ability to visualize data flowing through Vector will allow users to ascertain @@ -186,6 +225,10 @@ Incremental steps to execute this change. These will be converted to issues afte ## Future Improvements - Logs emitted by Vector should also be tagged with `source_id` and `service`. +- This rfc proposes storing the source and service as strings. This incurs a cost since scanning each + event to get the counts of events by source and service will involve multiple string comparisons. A + future optimization could be to hash the combination of these values at the source into a single + integer. [component_spec]: https://github.com/vectordotdev/vector/blob/master/docs/specs/component.md#componenteventssent [source_sender]: https://github.com/vectordotdev/vector/blob/master/src/source_sender/mod.rs#L265-L268 From c59cf870a9b8c426f2194739cd55e841f473b6d1 Mon Sep 17 00:00:00 2001 From: Stephen Wakely Date: Wed, 24 May 2023 11:12:45 +0100 Subject: [PATCH 09/13] Spelling Signed-off-by: Stephen Wakely --- rfcs/2023-05-03-data-volume-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2023-05-03-data-volume-metrics.md b/rfcs/2023-05-03-data-volume-metrics.md index 5881cf33903b6..b816832f33656 100644 --- a/rfcs/2023-05-03-data-volume-metrics.md +++ b/rfcs/2023-05-03-data-volume-metrics.md @@ -100,7 +100,7 @@ aggregate value across all services. The [Component Spec][component_spec] will need updating to indicate these tags will need including. -**Performace** - There is going to be a performance hit when emitting these metrics. +**Performance** - There is going to be a performance hit when emitting these metrics. Currently for each batch a simple event is emitted containing the count and size of the entire batch. With this change it will be necessary to scan the entire batch to obtain the count of source, service combinations of events before emitting From dceb4defe8643489dca5c0389c70a9e26da43a3e Mon Sep 17 00:00:00 2001 From: Stephen Wakely Date: Thu, 25 May 2023 10:23:26 +0100 Subject: [PATCH 10/13] Correct name is Cached Signed-off-by: Stephen Wakely --- rfcs/2023-05-03-data-volume-metrics.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rfcs/2023-05-03-data-volume-metrics.md b/rfcs/2023-05-03-data-volume-metrics.md index b816832f33656..2dd24237b6265 100644 --- a/rfcs/2023-05-03-data-volume-metrics.md +++ b/rfcs/2023-05-03-data-volume-metrics.md @@ -160,16 +160,16 @@ and cache new events keyed on the tags that are sent to it. Currently a registered metrics is stored in a `Registered`. We will need a new struct that can wrap this that will be generic over a tuple of -the tags for each event and the event - `Cached<(String, String), EventSent>`. +the tags for each event and the event - eg. `Cached<(String, String), EventSent>`. This struct will maintain a BTreeMap of tags -> `Registered`. In pseudo rust: ```rust -struct Registered { +struct Cached { cache: BTreemap>, register: Fn(Tags) -> Registered, } -impl Registered { +impl Cached { fn emit(&mut self, tags: Tags, value: Event) -> { if Some(event) = self.cache.get(tags) { event.emit(value); From 554e797cb728db7bcfc47062bc54b8c023e5b045 Mon Sep 17 00:00:00 2001 From: Stephen Wakely Date: Thu, 25 May 2023 20:45:37 +0100 Subject: [PATCH 11/13] Added mention of RwLock for Cache Signed-off-by: Stephen Wakely --- rfcs/2023-05-03-data-volume-metrics.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/rfcs/2023-05-03-data-volume-metrics.md b/rfcs/2023-05-03-data-volume-metrics.md index 2dd24237b6265..87418b9f73a19 100644 --- a/rfcs/2023-05-03-data-volume-metrics.md +++ b/rfcs/2023-05-03-data-volume-metrics.md @@ -161,11 +161,14 @@ Currently a registered metrics is stored in a `Registered`. We will need a new struct that can wrap this that will be generic over a tuple of the tags for each event and the event - eg. `Cached<(String, String), EventSent>`. -This struct will maintain a BTreeMap of tags -> `Registered`. In pseudo rust: +This struct will maintain a BTreeMap of tags -> `Registered`. Since this will +need to be shared across threads, the cache will need to be stored in an `RwLock`. + +In pseudo rust: ```rust struct Cached { - cache: BTreemap>, + cache: Arc>>, register: Fn(Tags) -> Registered, } From d723c9519162dec4d62049adb66419837ca52e1b Mon Sep 17 00:00:00 2001 From: Stephen Wakely Date: Fri, 26 May 2023 09:31:45 +0100 Subject: [PATCH 12/13] Add benchmarking to the list of tasks Signed-off-by: Stephen Wakely --- rfcs/2023-05-03-data-volume-metrics.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/rfcs/2023-05-03-data-volume-metrics.md b/rfcs/2023-05-03-data-volume-metrics.md index 87418b9f73a19..cf1c42a34aeef 100644 --- a/rfcs/2023-05-03-data-volume-metrics.md +++ b/rfcs/2023-05-03-data-volume-metrics.md @@ -224,6 +224,8 @@ Incremental steps to execute this change. These will be converted to issues afte - [ ] Introduce an event caching layer that caches registered events based on the tags sent to it. - [ ] Update the emitted events to accept the new tags - taking the `telemetry` configuration options into account. +- [ ] There is going to be a hit on performance with these changes. Add benchmarking to help us understand + how much the impact will be. ## Future Improvements From 71058bb3400ef25754a59a58fb9cc676f45bac06 Mon Sep 17 00:00:00 2001 From: Stephen Wakely Date: Fri, 26 May 2023 16:54:21 +0100 Subject: [PATCH 13/13] Trailing newline Signed-off-by: Stephen Wakely --- rfcs/2023-05-03-data-volume-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/2023-05-03-data-volume-metrics.md b/rfcs/2023-05-03-data-volume-metrics.md index cf1c42a34aeef..368fe874cd420 100644 --- a/rfcs/2023-05-03-data-volume-metrics.md +++ b/rfcs/2023-05-03-data-volume-metrics.md @@ -237,4 +237,4 @@ Incremental steps to execute this change. These will be converted to issues afte [component_spec]: https://github.com/vectordotdev/vector/blob/master/docs/specs/component.md#componenteventssent [source_sender]: https://github.com/vectordotdev/vector/blob/master/src/source_sender/mod.rs#L265-L268 -[event_metadata]: https://github.com/vectordotdev/vector/blob/master/lib/vector-core/src/event/metadata.rs#L20-L38 \ No newline at end of file +[event_metadata]: https://github.com/vectordotdev/vector/blob/master/lib/vector-core/src/event/metadata.rs#L20-L38