docs: Document the ability to use prefix in dynamic sampler FieldList (…

…#1396) ## Short description of the changes - This PR documents the feature from #1275, showing our users how to specify the `root.` prefix in FieldLists everywhere - It also sprinkles the use of this feature around the `rules_complete.yaml` example Signed-off-by: Irving Popovetsky <irving@honeycomb.io>
honeycombio · Oct 23, 2024 · ce34790 · ce34790
1 parent 8829dec
commit ce34790
Show file tree

Hide file tree

Showing 4 changed files with 42 additions and 3 deletions.
diff --git a/config/metadata/rulesMeta.yaml b/config/metadata/rulesMeta.yaml
@@ -102,6 +102,14 @@ groups:
           all endpoints under normal traffic and call out when there is
           failing traffic to any endpoint.
 
+          As of Refinery 2.8.0, the `root.` prefix can be used to limit the
+          field value to that of the root span. For example,
+          `root.http.response.status_code` will only consider the
+          `http.response.status_code` field from the root span rather than a
+          combination of all the spans in the trace.  This is useful when you
+          want to sample based on the root span's properties rather than the
+          entire trace, and helps to reduce the cardinality of the sampler key.
+
           In contrast, for example, consider as a bad set of fields: a
           combination of `HTTP endpoint`, `status code`, and `pod id`, since it
           would result in keys that are all unique, and therefore result in

diff --git a/refinery_rules.md b/refinery_rules.md
@@ -97,6 +97,9 @@ Using fields with very high cardinality, like `k8s.pod.id`, is a bad choice.
 If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything.
 If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces.
 As an example, consider as a good set of fields: the combination of `HTTP endpoint` (high-frequency and boring), `HTTP method`, and `status code` (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint.
+As of Refinery 2.8.0, the `root.` prefix can be used to limit the field value to that of the root span.
+For example, `root.http.response.status_code` will only consider the `http.response.status_code` field from the root span rather than a combination of all the spans in the trace.
+This is useful when you want to sample based on the root span's properties rather than the entire trace, and helps to reduce the cardinality of the sampler key.
 In contrast, for example, consider as a bad set of fields: a combination of `HTTP endpoint`, `status code`, and `pod id`, since it would result in keys that are all unique, and therefore result in sampling 100% of traces.
 For example, rather than a set of fields, using only the `HTTP endpoint` field is a **bad** choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a `500`, might not be sampled.
 Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.
@@ -199,6 +202,9 @@ Using fields with very high cardinality, like `k8s.pod.id`, is a bad choice.
 If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything.
 If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces.
 As an example, consider as a good set of fields: the combination of `HTTP endpoint` (high-frequency and boring), `HTTP method`, and `status code` (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint.
+As of Refinery 2.8.0, the `root.` prefix can be used to limit the field value to that of the root span.
+For example, `root.http.response.status_code` will only consider the `http.response.status_code` field from the root span rather than a combination of all the spans in the trace.
+This is useful when you want to sample based on the root span's properties rather than the entire trace, and helps to reduce the cardinality of the sampler key.
 In contrast, for example, consider as a bad set of fields: a combination of `HTTP endpoint`, `status code`, and `pod id`, since it would result in keys that are all unique, and therefore result in sampling 100% of traces.
 For example, rather than a set of fields, using only the `HTTP endpoint` field is a **bad** choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a `500`, might not be sampled.
 Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.
@@ -313,6 +319,9 @@ Using fields with very high cardinality, like `k8s.pod.id`, is a bad choice.
 If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything.
 If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces.
 As an example, consider as a good set of fields: the combination of `HTTP endpoint` (high-frequency and boring), `HTTP method`, and `status code` (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint.
+As of Refinery 2.8.0, the `root.` prefix can be used to limit the field value to that of the root span.
+For example, `root.http.response.status_code` will only consider the `http.response.status_code` field from the root span rather than a combination of all the spans in the trace.
+This is useful when you want to sample based on the root span's properties rather than the entire trace, and helps to reduce the cardinality of the sampler key.
 In contrast, for example, consider as a bad set of fields: a combination of `HTTP endpoint`, `status code`, and `pod id`, since it would result in keys that are all unique, and therefore result in sampling 100% of traces.
 For example, rather than a set of fields, using only the `HTTP endpoint` field is a **bad** choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a `500`, might not be sampled.
 Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.
@@ -398,6 +407,9 @@ Using fields with very high cardinality, like `k8s.pod.id`, is a bad choice.
 If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything.
 If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces.
 As an example, consider as a good set of fields: the combination of `HTTP endpoint` (high-frequency and boring), `HTTP method`, and `status code` (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint.
+As of Refinery 2.8.0, the `root.` prefix can be used to limit the field value to that of the root span.
+For example, `root.http.response.status_code` will only consider the `http.response.status_code` field from the root span rather than a combination of all the spans in the trace.
+This is useful when you want to sample based on the root span's properties rather than the entire trace, and helps to reduce the cardinality of the sampler key.
 In contrast, for example, consider as a bad set of fields: a combination of `HTTP endpoint`, `status code`, and `pod id`, since it would result in keys that are all unique, and therefore result in sampling 100% of traces.
 For example, rather than a set of fields, using only the `HTTP endpoint` field is a **bad** choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a `500`, might not be sampled.
 Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.
@@ -608,6 +620,9 @@ Using fields with very high cardinality, like `k8s.pod.id`, is a bad choice.
 If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything.
 If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces.
 As an example, consider as a good set of fields: the combination of `HTTP endpoint` (high-frequency and boring), `HTTP method`, and `status code` (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint.
+As of Refinery 2.8.0, the `root.` prefix can be used to limit the field value to that of the root span.
+For example, `root.http.response.status_code` will only consider the `http.response.status_code` field from the root span rather than a combination of all the spans in the trace.
+This is useful when you want to sample based on the root span's properties rather than the entire trace, and helps to reduce the cardinality of the sampler key.
 In contrast, for example, consider as a bad set of fields: a combination of `HTTP endpoint`, `status code`, and `pod id`, since it would result in keys that are all unique, and therefore result in sampling 100% of traces.
 For example, rather than a set of fields, using only the `HTTP endpoint` field is a **bad** choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a `500`, might not be sampled.
 Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.

diff --git a/rules.md b/rules.md
@@ -1,7 +1,7 @@
 # Honeycomb Refinery Rules Documentation
 
 This is the documentation for the rules configuration for Honeycomb's Refinery.
-It was automatically generated on 2024-10-11 at 16:33:02 UTC.
+It was automatically generated on 2024-10-22 at 22:51:47 UTC.
 
 ## The Rules file
 
@@ -118,6 +118,9 @@ Using fields with very high cardinality, like `k8s.pod.id`, is a bad choice.
 If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything.
 If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces.
 As an example, consider as a good set of fields: the combination of `HTTP endpoint` (high-frequency and boring), `HTTP method`, and `status code` (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint.
+As of Refinery 2.8.0, the `root.` prefix can be used to limit the field value to that of the root span.
+For example, `root.http.response.status_code` will only consider the `http.response.status_code` field from the root span rather than a combination of all the spans in the trace.
+This is useful when you want to sample based on the root span's properties rather than the entire trace, and helps to reduce the cardinality of the sampler key.
 In contrast, for example, consider as a bad set of fields: a combination of `HTTP endpoint`, `status code`, and `pod id`, since it would result in keys that are all unique, and therefore result in sampling 100% of traces.
 For example, rather than a set of fields, using only the `HTTP endpoint` field is a **bad** choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a `500`, might not be sampled.
 Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.
@@ -223,6 +226,9 @@ Using fields with very high cardinality, like `k8s.pod.id`, is a bad choice.
 If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything.
 If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces.
 As an example, consider as a good set of fields: the combination of `HTTP endpoint` (high-frequency and boring), `HTTP method`, and `status code` (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint.
+As of Refinery 2.8.0, the `root.` prefix can be used to limit the field value to that of the root span.
+For example, `root.http.response.status_code` will only consider the `http.response.status_code` field from the root span rather than a combination of all the spans in the trace.
+This is useful when you want to sample based on the root span's properties rather than the entire trace, and helps to reduce the cardinality of the sampler key.
 In contrast, for example, consider as a bad set of fields: a combination of `HTTP endpoint`, `status code`, and `pod id`, since it would result in keys that are all unique, and therefore result in sampling 100% of traces.
 For example, rather than a set of fields, using only the `HTTP endpoint` field is a **bad** choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a `500`, might not be sampled.
 Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.
@@ -340,6 +346,9 @@ Using fields with very high cardinality, like `k8s.pod.id`, is a bad choice.
 If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything.
 If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces.
 As an example, consider as a good set of fields: the combination of `HTTP endpoint` (high-frequency and boring), `HTTP method`, and `status code` (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint.
+As of Refinery 2.8.0, the `root.` prefix can be used to limit the field value to that of the root span.
+For example, `root.http.response.status_code` will only consider the `http.response.status_code` field from the root span rather than a combination of all the spans in the trace.
+This is useful when you want to sample based on the root span's properties rather than the entire trace, and helps to reduce the cardinality of the sampler key.
 In contrast, for example, consider as a bad set of fields: a combination of `HTTP endpoint`, `status code`, and `pod id`, since it would result in keys that are all unique, and therefore result in sampling 100% of traces.
 For example, rather than a set of fields, using only the `HTTP endpoint` field is a **bad** choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a `500`, might not be sampled.
 Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.
@@ -428,6 +437,9 @@ Using fields with very high cardinality, like `k8s.pod.id`, is a bad choice.
 If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything.
 If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces.
 As an example, consider as a good set of fields: the combination of `HTTP endpoint` (high-frequency and boring), `HTTP method`, and `status code` (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint.
+As of Refinery 2.8.0, the `root.` prefix can be used to limit the field value to that of the root span.
+For example, `root.http.response.status_code` will only consider the `http.response.status_code` field from the root span rather than a combination of all the spans in the trace.
+This is useful when you want to sample based on the root span's properties rather than the entire trace, and helps to reduce the cardinality of the sampler key.
 In contrast, for example, consider as a bad set of fields: a combination of `HTTP endpoint`, `status code`, and `pod id`, since it would result in keys that are all unique, and therefore result in sampling 100% of traces.
 For example, rather than a set of fields, using only the `HTTP endpoint` field is a **bad** choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a `500`, might not be sampled.
 Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.
@@ -651,6 +663,9 @@ Using fields with very high cardinality, like `k8s.pod.id`, is a bad choice.
 If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything.
 If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces.
 As an example, consider as a good set of fields: the combination of `HTTP endpoint` (high-frequency and boring), `HTTP method`, and `status code` (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint.
+As of Refinery 2.8.0, the `root.` prefix can be used to limit the field value to that of the root span.
+For example, `root.http.response.status_code` will only consider the `http.response.status_code` field from the root span rather than a combination of all the spans in the trace.
+This is useful when you want to sample based on the root span's properties rather than the entire trace, and helps to reduce the cardinality of the sampler key.
 In contrast, for example, consider as a bad set of fields: a combination of `HTTP endpoint`, `status code`, and `pod id`, since it would result in keys that are all unique, and therefore result in sampling 100% of traces.
 For example, rather than a set of fields, using only the `HTTP endpoint` field is a **bad** choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a `500`, might not be sampled.
 Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.

diff --git a/rules_complete.yaml b/rules_complete.yaml
@@ -34,7 +34,7 @@ Samplers:
             ClearFrequency: 1m0s
             FieldList:
                 - request.method
-                - http.target
+                - root.http.target
                 - response.status_code
             UseTraceLength: true
     env2:
@@ -47,7 +47,7 @@ Samplers:
             BurstDetectionDelay: 3
             FieldList:
                 - request.method
-                - http.target
+                - root.http.target
                 - response.status_code
             UseTraceLength: true
     env3:
@@ -134,3 +134,4 @@ Samplers:
             GoalThroughputPerSec: 100
             FieldList:
                 - request.method
+                - root.http.target