diff --git a/config.md b/config.md index b4f9e0cb17..521ad62b16 100644 --- a/config.md +++ b/config.md @@ -1,7 +1,7 @@ # Honeycomb Refinery Configuration Documentation This is the documentation for the configuration file for Honeycomb's Refinery. -It was automatically generated on 2023-06-12 at 15:58:06 UTC. +It was automatically generated on 2023-06-12 at 18:45:41 UTC. ## The Config file @@ -23,8 +23,7 @@ OTelMetrics: APIKey: SetThisToAHoneycombKey ``` -The remainder of this document describes the sections within the file and the -fields in each. +The remainder of this document describes the sections within the file and the fields in each. ## Table of Contents - [General Configuration](#general-configuration) @@ -53,58 +52,49 @@ fields in each. ### Section Name: `General` -Contains general configuration options that apply to the entire -refinery process. +Contains general configuration options that apply to the entire refinery process. + ### `ConfigurationVersion` -ConfigurationVersion is the file format of this particular -configuration file. +ConfigurationVersion is the file format of this particular configuration file. This file is version 2. -This field is required. It exists to allow the configuration system to -adapt to future changes in the configuration file format. +This field is required. +It exists to allow the configuration system to adapt to future changes in the configuration file format. - Not eligible for live reload. - Type: `int` - Default: `2` + ### `MinRefineryVersion` -MinRefineryVersion is the minimum version of Refinery that can load -this configuration file. +MinRefineryVersion is the minimum version of Refinery that can load this configuration file. -This specifies the lowest Refinery version capable of loading all of -the features used in this file. If this value is present, Refinery -will refuse to start if its version is less than this. +This specifies the lowest Refinery version capable of loading all of the features used in this file. +If this value is present, Refinery will refuse to start if its version is less than this. - Not eligible for live reload. - Type: `string` - Default: `v2.0` + ### `DatasetPrefix` -DatasetPrefix is a prefix that can be used to distinguish a dataset -from an environment in the rules. +DatasetPrefix is a prefix that can be used to distinguish a dataset from an environment in the rules. -If telemetry is being sent to both a classic dataset and a new -environment called the same thing (eg `production`), this parameter -can be used to distinguish these cases. When Refinery receives -telemetry using an API key associated with a classic dataset, it will -then use the prefix in the form `{prefix}.{dataset}` when trying to -resolve the rules definition. +If telemetry is being sent to both a classic dataset and a new environment called the same thing (eg `production`), this parameter can be used to distinguish these cases. +When Refinery receives telemetry using an API key associated with a classic dataset, it will then use the prefix in the form `{prefix}.{dataset}` when trying to resolve the rules definition. - Not eligible for live reload. - Type: `string` + ### `ConfigReloadInterval` -ConfigReloadInterval is the average interval between attempts at -reloading the configuration file. +ConfigReloadInterval is the average interval between attempts at reloading the configuration file. -A single instance of Refinery will attempt to read its configuration -and check for changes at approximately this interval. This time is -varied by a random amount to avoid all instances refreshing together. -Within a cluster, Refinery will gossip information about new -configuration so that all instances can reload at close to the same -time. +A single instance of Refinery will attempt to read its configuration and check for changes at approximately this interval. +This time is varied by a random amount to avoid all instances refreshing together. +Within a cluster, Refinery will gossip information about new configuration so that all instances can reload at close to the same time. - Not eligible for live reload. - Type: `duration` @@ -116,38 +106,34 @@ time. Contains network configuration options. + ### `ListenAddr` ListenAddr is the address refinery listens to for incoming requests. This is the IP and port on which to listen for incoming HTTP requests. -These requests include traffic formatted as Honeycomb events, proxied -requests to the Honeycomb API, and Open Telemetry data using the http -protocol. Incoming traffic is expected to be HTTP, so if SSL is a -requirement, put something like nginx in front to do the decryption. +These requests include traffic formatted as Honeycomb events, proxied requests to the Honeycomb API, and Open Telemetry data using the http protocol. +Incoming traffic is expected to be HTTP, so if SSL is a requirement, put something like nginx in front to do the decryption. - Not eligible for live reload. - Type: `hostport` - Default: `0.0.0.0:8080` + ### `PeerListenAddr` -PeerListenAddr is the IP and port on which to listen for traffic being -rerouted from a peer. +PeerListenAddr is the IP and port on which to listen for traffic being rerouted from a peer. -Incoming traffic is expected to be HTTP, so if using SSL use something -like nginx or a load balancer to do the decryption. +Incoming traffic is expected to be HTTP, so if using SSL use something like nginx or a load balancer to do the decryption. - Not eligible for live reload. - Type: `hostport` - Default: `0.0.0.0:8081` + ### `HoneycombAPI` -HoneycombAPI is the URL of the Honeycomb API to which data will be -sent. +HoneycombAPI is the URL of the Honeycomb API to which data will be sent. -HoneycombAPI is the URL for the upstream Honeycomb API; this is the -destination to which refinery sends all events that it decides to -keep. +HoneycombAPI is the URL for the upstream Honeycomb API; this is the destination to which refinery sends all events that it decides to keep. - Eligible for live reload. - Type: `url` @@ -157,31 +143,28 @@ keep. ### Section Name: `AccessKeys` -Contains access keys -- API keys that the proxy will treat specially, -and other flags that control how the proxy handles API keys. +Contains access keys -- API keys that the proxy will treat specially, and other flags that control how the proxy handles API keys. + ### `ReceiveKeys` -ReceiveKeys is a set of Honeycomb API keys that the proxy will treat -specially. +ReceiveKeys is a set of Honeycomb API keys that the proxy will treat specially. -This list only applies to span traffic - other Honeycomb API actions -will be proxied through to the upstream API directly without modifying -keys. +This list only applies to span traffic - other Honeycomb API actions will be proxied through to the upstream API directly without modifying keys. - Not eligible for live reload. - Type: `stringarray` - Example: `your-key-goes-here` + ### `AcceptOnlyListedKeys` -AcceptOnlyListedKeys is a boolean flag that causes events arriving -with API keys not in the ReceiveKeys list to be rejected. +AcceptOnlyListedKeys is a boolean flag that causes events arriving with API keys not in the ReceiveKeys list to be rejected. If true, only traffic using the keys listed in APIKeys is accepted. -Events arriving with API keys not in the ReceiveKeys list will be -rejected with an HTTP 401 error. If false, all traffic is accepted and -ReceiveKeys is ignored. Must be specified if APIKeys is specified. +Events arriving with API keys not in the ReceiveKeys list will be rejected with an HTTP 401 error. +If false, all traffic is accepted and ReceiveKeys is ignored. +Must be specified if APIKeys is specified. - Eligible for live reload. - Type: `bool` @@ -190,48 +173,39 @@ ReceiveKeys is ignored. Must be specified if APIKeys is specified. ### Section Name: `RefineryTelemetry` -Configuration info for the telemetry that Refinery uses to record its -own operation. +Configuration info for the telemetry that Refinery uses to record its own operation. + ### `AddRuleReasonToTrace` -AddRuleReasonToTrace controls whether to decorate traces with refinery -rule evaluation results. +AddRuleReasonToTrace controls whether to decorate traces with refinery rule evaluation results. -This causes traces that are sent to Honeycomb to include the field -`meta.refinery.reason`. This field contains text indicating which rule -was evaluated that caused the trace to be included. We recommend -enabling this field whenever a rules-based sampler is in use, as it is -useful for debugging and understanding the behavior of your refinery -installation. +This causes traces that are sent to Honeycomb to include the field `meta.refinery.reason`. +This field contains text indicating which rule was evaluated that caused the trace to be included. +We recommend enabling this field whenever a rules-based sampler is in use, as it is useful for debugging and understanding the behavior of your refinery installation. - Eligible for live reload. - Type: `bool` - Example: `true` + ### `AddSpanCountToRoot` -AddSpanCountToRoot controls whether to add a metadata field to root -spans indicating the number of child spans. +AddSpanCountToRoot controls whether to add a metadata field to root spans indicating the number of child spans. -Adds a new metadata field, `meta.span_count` to root spans to indicate -the number of child spans on the trace at the time the sampling -decision was made. This value is available to the rules-based sampler, -making it possible to write rules that are dependent upon the number -of spans in the trace. If true, Refinery will add meta.span_count to -the root span. +Adds a new metadata field, `meta.span_count` to root spans to indicate the number of child spans on the trace at the time the sampling decision was made. +This value is available to the rules-based sampler, making it possible to write rules that are dependent upon the number of spans in the trace. +If true, Refinery will add meta.span_count to the root span. - Eligible for live reload. - Type: `bool` - Default: `true` + ### `AddHostMetadataToTrace` -AddHostMetadataToTrace specifies whether to add host metadata to -traces. +AddHostMetadataToTrace specifies whether to add host metadata to traces. -AddHostMetadataToTrace specifies whether to add host metadata to -traces. If true, Refinery will add the following tags to all traces: - -meta.refinery.local_hostname: the hostname of the Refinery node (we -should consider adding more metadata here, like IP address, etc) +AddHostMetadataToTrace specifies whether to add host metadata to traces. +If true, Refinery will add the following tags to all traces: - meta.refinery.local_hostname: the hostname of the Refinery node (we should consider adding more metadata here, like IP address, etc) - Eligible for live reload. - Type: `bool` @@ -243,68 +217,64 @@ should consider adding more metadata here, like IP address, etc) Configuration for how traces are managed. + ### `SendDelay` SendDelay is the duration to wait before sending a trace. This is a short timer that will be triggered when a trace is complete. Refinery will wait this duration before actually sending the trace. -The reason for this short delay is to allow for small network delays -or clock jitters to elapse and any final spans to arrive before -actually sending the trace. Set to 0 for immediate sends. +The reason for this short delay is to allow for small network delays or clock jitters to elapse and any final spans to arrive before actually sending the trace. +Set to 0 for immediate sends. - Eligible for live reload. - Type: `duration` - Default: `2s` + ### `BatchTimeout` BatchTimeout is how frequently Refinery sends unfulfilled batches. -Dictates how frequently to send unfulfilled batches. By default this -will use the DefaultBatchTimeout in libhoney as its value, which is -100ms. +Dictates how frequently to send unfulfilled batches. +By default this will use the DefaultBatchTimeout in libhoney as its value, which is 100ms. - Eligible for live reload. - Type: `duration` - Example: `500ms` + ### `TraceTimeout` -TraceTimeout is the duration to wait before making the trace decision -on an incomplete trace. +TraceTimeout is the duration to wait before making the trace decision on an incomplete trace. -A long timer; it represents the outside boundary of how long to wait -before making the trace decision about an incomplete trace. Normally -trace decisions (send or drop) are made when the root span arrives. -Sometimes the root span never arrives (due to crashes or whatever), -and this timer will send a trace even without having received the root -span. If you have particularly long-lived traces you should increase -this timer. Note that this will also increase the memory requirements -for refinery. +A long timer; it represents the outside boundary of how long to wait before making the trace decision about an incomplete trace. +Normally trace decisions (send or drop) are made when the root span arrives. +Sometimes the root span never arrives (due to crashes or whatever), and this timer will send a trace even without having received the root span. +If you have particularly long-lived traces you should increase this timer. +Note that this will also increase the memory requirements for refinery. - Eligible for live reload. - Type: `duration` - Default: `60s` + ### `MaxBatchSize` -MaxBatchSize is the maximum number of events to be included in each -batch for sending. +MaxBatchSize is the maximum number of events to be included in each batch for sending. -This value is used to set the BatchSize field in the libhoney library -used to send data to Honeycomb. If you have particularly large traces -you should increase this value. Note that this will also increase the -memory requirements for refinery. +This value is used to set the BatchSize field in the libhoney library used to send data to Honeycomb. +If you have particularly large traces you should increase this value. +Note that this will also increase the memory requirements for refinery. - Eligible for live reload. - Type: `int` - Default: `500` + ### `SendTicker` SendTicker is the interval between checks for traces to send. -A short timer that determines the duration between trace cache review -runs to send. Increasing this will spend more time processing incoming -events to reduce incoming_ or peer_router_dropped spikes. Decreasing -this will check the trace cache for timeouts more frequently. +A short timer that determines the duration between trace cache review runs to send. +Increasing this will spend more time processing incoming events to reduce incoming_ or peer_router_dropped spikes. +Decreasing this will check the trace cache for timeouts more frequently. - Eligible for live reload. - Type: `duration` @@ -316,60 +286,52 @@ this will check the trace cache for timeouts more frequently. Configuration values used when setting up and debugging Refinery. + ### `DebugServiceAddr` DebugServiceAddr is the IP and port the debug service will run on. -Sets the IP and port for the debug service. The debug service is -generally only used when debugging Refinery itself, and will only run -if the command line flag -d is specified. If this value is not -specified, the debug service runs on the first open port between -localhost:6060 and :6069. +Sets the IP and port for the debug service. +The debug service is generally only used when debugging Refinery itself, and will only run if the command line flag -d is specified. +If this value is not specified, the debug service runs on the first open port between localhost:6060 and :6069. - Not eligible for live reload. - Type: `hostport` - Example: `localhost:6060` + ### `QueryAuthToken` -QueryAuthToken is the token that must be specified to access the -/query endpoint. +QueryAuthToken is the token that must be specified to access the /query endpoint. -Provides a token that must be specified with the header -"X-Honeycomb-Refinery-Query" in order for a /query request to succeed. -These /query requests are intended for debugging refinery during setup -and are not typically needed in normal operation. If not specified, -the /query endpoints are inaccessible. +Provides a token that must be specified with the header "X-Honeycomb-Refinery-Query" in order for a /query request to succeed. +These /query requests are intended for debugging refinery during setup and are not typically needed in normal operation. +If not specified, the /query endpoints are inaccessible. - Not eligible for live reload. - Type: `string` - Example: `some-private-value` + ### `AdditionalErrorFields` -AdditionalErrorFields is a list of span fields to include when logging -errors. +AdditionalErrorFields is a list of span fields to include when logging errors. -A list of span fields that should be included when logging errors that -happen during ingestion of events (for example, the span too large -error). This is primarily useful in trying to track down misbehaving -senders in a large installation. The fields `dataset`, `apihost`, and -`environment` are always included. If a field is not present in the -span, it will not be present in the error log. +A list of span fields that should be included when logging errors that happen during ingestion of events (for example, the span too large error). +This is primarily useful in trying to track down misbehaving senders in a large installation. +The fields `dataset`, `apihost`, and `environment` are always included. +If a field is not present in the span, it will not be present in the error log. - Eligible for live reload. - Type: `stringarray` - Example: `trace.span_id` + ### `DryRun` DryRun controls whether sampling is applied to incoming traces. -If enabled, marks the traces that would be dropped given the current -sampling rules, and sends all traces regardless of the sampling -decision. This is useful for evaluating sampling rules. In DryRun -mode, traces will be decorated with meta.refinery.dryrun.kept set to -true or false based on whether the trace would be kept or dropped. In -addition, SampleRate will be set to the incoming rate for all traces, -and the field meta.refinery.dryrun.sample_rate will be set to the -sample rate that would have been used. +If enabled, marks the traces that would be dropped given the current sampling rules, and sends all traces regardless of the sampling decision. +This is useful for evaluating sampling rules. +In DryRun mode, traces will be decorated with meta.refinery.dryrun.kept set to true or false based on whether the trace would be kept or dropped. +In addition, SampleRate will be set to the incoming rate for all traces, and the field meta.refinery.dryrun.sample_rate will be set to the sample rate that would have been used. - Eligible for live reload. - Type: `bool` @@ -381,26 +343,28 @@ sample rate that would have been used. Configuration for logging. + ### `Type` Type is the type of logger to use. -Specifies where (and if) refinery sends logs. `none` means that logs -are discarded. `honeycomb` means that logs will be forwarded to -honeycomb as events according to the settings below. `stdout` means -that logs will be written to stdout. +Specifies where (and if) refinery sends logs. +`none` means that logs are discarded. +`honeycomb` means that logs will be forwarded to honeycomb as events according to the settings below. +`stdout` means that logs will be written to stdout. - Not eligible for live reload. - Type: `string` - Default: `stdout` - Options: `stdout honeycomb none` + ### `Level` Level is the logging level above which refinery should send a log. -Sets the logging level above which refinery should send logs to the -logger. `debug` is very verbose, and should not be used in production -environments. `warn` is the recommended level for production. +Sets the logging level above which refinery should send logs to the logger. +`debug` is very verbose, and should not be used in production environments. +`warn` is the recommended level for production. - Not eligible for live reload. - Type: `string` @@ -411,30 +375,31 @@ environments. `warn` is the recommended level for production. ### Section Name: `HoneycombLogger` -Configuration for logging to Honeycomb. Only used if Logger.Type is -"honeycomb". +Configuration for logging to Honeycomb. +Only used if Logger.Type is "honeycomb". + ### `APIHost` APIHost is the URL of the Honeycomb API to which logs will be sent. -Sets the upstream Honeycomb API for logs; this is the destination to -which refinery sends its own logs. +Sets the upstream Honeycomb API for logs; this is the destination to which refinery sends its own logs. - Not eligible for live reload. - Type: `url` - Default: `https://api.honeycomb.io` + ### `APIKey` APIKey is the API key to use when sending logs to Honeycomb. -This is the API key to use for Refinery's logs when sending them to -Honeycomb. It is recommended that you create a separate team and key -for Refinery logs. +This is the API key to use for Refinery's logs when sending them to Honeycomb. +It is recommended that you create a separate team and key for Refinery logs. - Not eligible for live reload. - Type: `string` - Example: `SetThisToAHoneycombKey` + ### `Dataset` Dataset is the dataset to which logs will be sent. @@ -444,26 +409,25 @@ Specifies the Honeycomb dataset to which logs will be sent. - Not eligible for live reload. - Type: `string` - Default: `Refinery Logs` + ### `SamplerEnabled` SamplerEnabled controls whether to sample logs. -Controls whether logs are sampled before sending to Honeycomb. The -sample rate is controlled by the SamplerThroughput setting. +Controls whether logs are sampled before sending to Honeycomb. +The sample rate is controlled by the SamplerThroughput setting. - Not eligible for live reload. - Type: `bool` - Default: `true` + ### `SamplerThroughput` -SamplerThroughput is the sampling throughput for logs in events per -second. +SamplerThroughput is the sampling throughput for logs in events per second. -SamplerThroughput is the sampling throughput for logs measured in -events per second. The sampling algorithm attempts to make sure that -the average throughput approximates this value, while also ensuring -that all unique logs arrive at Honeycomb at least once per sampling -period. TODO: THROUGHPUT FOR THE CLUSTER +SamplerThroughput is the sampling throughput for logs measured in events per second. +The sampling algorithm attempts to make sure that the average throughput approximates this value, while also ensuring that all unique logs arrive at Honeycomb at least once per sampling period. +TODO: THROUGHPUT FOR THE CLUSTER - Not eligible for live reload. - Type: `float` @@ -474,15 +438,15 @@ period. TODO: THROUGHPUT FOR THE CLUSTER ### Section Name: `StdoutLogger` -Configuration for logging to stdout. Only used if Logger.Type is -"stdout". +Configuration for logging to stdout. +Only used if Logger.Type is "stdout". + ### `Structured` Structured controls whether to used structured logging. -Specifies whether the stdout logger generates structured logs (JSON) -or not (plain text). +Specifies whether the stdout logger generates structured logs (JSON) or not (plain text). - Not eligible for live reload. - Type: `bool` @@ -492,27 +456,25 @@ or not (plain text). ### Section Name: `PrometheusMetrics` -Configuration for Refinery's internally-generated metrics as made -available through Prometheus. +Configuration for Refinery's internally-generated metrics as made available through Prometheus. + ### `Enabled` -Enabled controls whether to expose refinery metrics over -PromethusListenAddr +Enabled controls whether to expose refinery metrics over PromethusListenAddr -The flag specifies whether Refinery should expose its own metrics over -the PrometheusListenAddr port. +The flag specifies whether Refinery should expose its own metrics over the PrometheusListenAddr port. - Not eligible for live reload. - Type: `bool` + ### `ListenAddr` -ListenAddr is the IP and port the prometheus metrics server will run -on. +ListenAddr is the IP and port the prometheus metrics server will run on. -Determines the interface and port on which Prometheus will listen for -requests for /metrics. Must be different from the main Refinery -listener. Only used if "Enabled" is true in PrometheusMetrics. +Determines the interface and port on which Prometheus will listen for requests for /metrics. +Must be different from the main Refinery listener. +Only used if "Enabled" is true in PrometheusMetrics. - Not eligible for live reload. - Type: `hostport` @@ -522,10 +484,11 @@ listener. Only used if "Enabled" is true in PrometheusMetrics. ### Section Name: `LegacyMetrics` -Configuration for Refinery's legacy metrics. Version 1.x of Refinery -used this format for sending Metrics to Honeycomb. The metrics -generated that way are nonstandard and will be deprecated in a future -release. New installations should prefer OTelMetrics. +Configuration for Refinery's legacy metrics. +Version 1.x of Refinery used this format for sending Metrics to Honeycomb. +The metrics generated that way are nonstandard and will be deprecated in a future release. +New installations should prefer OTelMetrics. + ### `Enabled` @@ -536,6 +499,7 @@ This controls whether to send legacy-formatted metrics to Honeycomb. - Not eligible for live reload. - Type: `bool` + ### `APIHost` APIHost is the URL of the Honeycomb API to which metrics will be sent. @@ -545,17 +509,18 @@ Specifies the URL for the upstream Honeycomb API for legacy metrics. - Not eligible for live reload. - Type: `url` - Default: `https://api.honeycomb.io` + ### `APIKey` APIKey is the API key used to send Honeycomb metrics. -Specifies the API key used when refinery sends its own metrics. It is -recommended that you create a separate team and key for Refinery -metrics. +Specifies the API key used when refinery sends its own metrics. +It is recommended that you create a separate team and key for Refinery metrics. - Not eligible for live reload. - Type: `string` - Example: `SetThisToAHoneycombKey` + ### `Dataset` Dataset is the Honeycomb dataset to which metrics will be sent. @@ -565,13 +530,13 @@ Specifies the dataset to which refinery sends its own metrics. - Not eligible for live reload. - Type: `string` - Default: `Refinery Metrics` + ### `ReportingInterval` -ReportingInterval is the interval between sending legacy metrics to -Honeycomb. +ReportingInterval is the interval between sending legacy metrics to Honeycomb. -The interval between sending metrics to Honeycomb. Between 1 and 60 -seconds is typical. +The interval between sending metrics to Honeycomb. +Between 1 and 60 seconds is typical. - Not eligible for live reload. - Type: `duration` @@ -581,9 +546,10 @@ seconds is typical. ### Section Name: `OTelMetrics` -Configuration for Refinery's OpenTelemetry metrics. This is the -preferred way to send metrics to Honeycomb. New installations should -prefer OTelMetrics. +Configuration for Refinery's OpenTelemetry metrics. +This is the preferred way to send metrics to Honeycomb. +New installations should prefer OTelMetrics. + ### `Enabled` @@ -594,29 +560,29 @@ Enabled controls whether to send OpenTelemetry metrics to Honeycomb. - Not eligible for live reload. - Type: `bool` + ### `APIHost` APIHost is the URL of the OTel API to which metrics will be sent. -Specifies a URL for the upstream API to receive refinery's own OTel -metrics. +Specifies a URL for the upstream API to receive refinery's own OTel metrics. - Not eligible for live reload. - Type: `url` - Default: `https://api.honeycomb.io` + ### `APIKey` APIKey is the API key used to send Honeycomb metrics via OTel. -Specifies the API key used when refinery sends its own metrics. It is -recommended that you create a separate team and key for Refinery -metrics. If this is blank, Refinery will not set the -Honeycomb-specific headers for OTel, and your APIHost must be set to a -valid OTel endpoint. +Specifies the API key used when refinery sends its own metrics. +It is recommended that you create a separate team and key for Refinery metrics. +If this is blank, Refinery will not set the Honeycomb-specific headers for OTel, and your APIHost must be set to a valid OTel endpoint. - Not eligible for live reload. - Type: `string` - Example: `SetThisToAHoneycombKey` + ### `Dataset` Dataset is the Honeycomb dataset to which OTel metrics will be sent. @@ -627,26 +593,25 @@ Only used if APIKey is specified. - Not eligible for live reload. - Type: `string` - Default: `Refinery Metrics` + ### `ReportingInterval` -ReportingInterval is the interval between sending OTel metrics to -Honeycomb. +ReportingInterval is the interval between sending OTel metrics to Honeycomb. -The interval between sending metrics to Honeycomb. Between 1 and 60 -seconds is typical. +The interval between sending metrics to Honeycomb. +Between 1 and 60 seconds is typical. - Not eligible for live reload. - Type: `duration` - Default: `30s` + ### `Compression` -Compression is the compression algorithm to use when sending OTel -metrics. +Compression is the compression algorithm to use when sending OTel metrics. The compression algorithm to use when sending metrics to Honeycomb. -`gzip` is the default and recommended value. In rare circumstances, -compression costs may outweigh the benefits, in which case `none` may -be used. +`gzip` is the default and recommended value. +In rare circumstances, compression costs may outweigh the benefits, in which case `none` may be used. - Not eligible for live reload. - Type: `string` @@ -659,69 +624,61 @@ be used. Controls how the Refinery cluster communicates between peers. + ### `Type` Type is the type of peer management to use. -Sets the type of peer management (the mechanism by which Refinery -locates its peers). `file` means that Refinery gets its peer list from -the Peers list in this config file. `redis` means that refinery -self-registers with a redis instance and gets its peer list from -there. +Sets the type of peer management (the mechanism by which Refinery locates its peers). +`file` means that Refinery gets its peer list from the Peers list in this config file. +`redis` means that refinery self-registers with a redis instance and gets its peer list from there. - Not eligible for live reload. - Type: `string` - Default: `redis` - Options: `redis file` + ### `Identifier` -Identifier specifies the identifier to use when registering itself -with peers. +Identifier specifies the identifier to use when registering itself with peers. -By default, when using a peer registry, Refinery will use the local -hostname to identify itself to other peers. If your environment -requires something else, (for example, if peers can't resolve each -other by name), you can specify the exact identifier (IP address, etc) -to use here. Overrides IdentifierInterfaceName, if both are set. +By default, when using a peer registry, Refinery will use the local hostname to identify itself to other peers. +If your environment requires something else, (for example, if peers can't resolve each other by name), you can specify the exact identifier (IP address, etc) to use here. +Overrides IdentifierInterfaceName, if both are set. - Not eligible for live reload. - Type: `string` - Example: `192.168.1.1` + ### `IdentifierInterfaceName` -IdentifierInterfaceName specifies a network interface to use when -finding a local hostname. +IdentifierInterfaceName specifies a network interface to use when finding a local hostname. -By default, when using a peer registry, Refinery will use the local -hostname to identify itself to other peers. If your environment -requires that you use IPs as identifiers (for example, if peers can't -resolve eachother by name), you can specify the network interface that -Refinery is listening on here. Refinery will use the first unicast -address that it finds on the specified network interface as its -identifier. +By default, when using a peer registry, Refinery will use the local hostname to identify itself to other peers. +If your environment requires that you use IPs as identifiers (for example, if peers can't resolve eachother by name), you can specify the network interface that Refinery is listening on here. +Refinery will use the first unicast address that it finds on the specified network interface as its identifier. - Not eligible for live reload. - Type: `string` - Example: `eth0` + ### `UseIPV6Identifier` -UseIPV6Identifier specifies that Refinery should use an IPV6 address -as its identifier. +UseIPV6Identifier specifies that Refinery should use an IPV6 address as its identifier. -If using IdentifierInterfaceName, Refinery will default to the first -IPv4 unicast address it finds for the specified interface. If this -value is specified, Refinery will use the first IPV6 unicast address -found. +If using IdentifierInterfaceName, Refinery will default to the first IPv4 unicast address it finds for the specified interface. +If this value is specified, Refinery will use the first IPV6 unicast address found. - Not eligible for live reload. - Type: `bool` + ### `Peers` Peers is the list of peers to use when Type is "file". Sets the list of peers to use when Type is "file", excluding self. -This list is ignored when Type is "redis". The format is a list of -strings of the form "host:port". +This list is ignored when Type is "redis". +The format is a list of strings of the form "host:port". - Not eligible for live reload. - Type: `stringarray` @@ -731,89 +688,86 @@ strings of the form "host:port". ### Section Name: `RedisPeerManagement` -Controls how the Refinery cluster communicates between peers when -using Redis. Only applies when PeerManagement.Type is "redis". +Controls how the Refinery cluster communicates between peers when using Redis. +Only applies when PeerManagement.Type is "redis". + ### `Host` Host is the host and port of the redis instance to use. -Sets the host and port of the redis instance to use for peer cluster -membership management. +Sets the host and port of the redis instance to use for peer cluster membership management. - Not eligible for live reload. - Type: `hostport` - Example: `localhost:6379` + ### `Username` Username is the username used to connect to redis. -The username used to connect to redis for peer cluster membership -management. +The username used to connect to redis for peer cluster membership management. - Not eligible for live reload. - Type: `string` + ### `Password` Password is the password used to connect to redis. -Sets the password used to connect to redis for peer cluster membership -management. +Sets the password used to connect to redis for peer cluster membership management. - Not eligible for live reload. - Type: `string` + ### `Prefix` Prefix is a string used as a prefix for the keys in redis. -Specifies a string to be used as a prefix for the keys in redis while -storing the peer membership. It might be useful to override this in -any situation where multiple refinery clusters or multiple -applications want to share a single Redis instance. It may not be -blank. +Specifies a string to be used as a prefix for the keys in redis while storing the peer membership. +It might be useful to override this in any situation where multiple refinery clusters or multiple applications want to share a single Redis instance. +It may not be blank. - Not eligible for live reload. - Type: `string` - Default: `refinery` - Example: `customPrefix` + ### `Database` -Database is the database number to use for the Redis instance storing -the peer membership. +Database is the database number to use for the Redis instance storing the peer membership. -An integer from 0-15 indicating the database number to use for the -Redis instance storing the peer membership. It might be useful to set -this in any situation where multiple refinery clusters or multiple -applications want to share a single Redis instance. +An integer from 0-15 indicating the database number to use for the Redis instance storing the peer membership. +It might be useful to set this in any situation where multiple refinery clusters or multiple applications want to share a single Redis instance. - Not eligible for live reload. - Type: `int` - Example: `1` + ### `UseTLS` UseTLS enables TLS when connecting to redis. -Enables TLS when connecting to redis for peer cluster membership -management, and sets the MinVersion in the TLS configuration to 1.2. +Enables TLS when connecting to redis for peer cluster membership management, and sets the MinVersion in the TLS configuration to 1.2. - Not eligible for live reload. - Type: `bool` + ### `UseTLSInsecure` UseTLSInsecure disables certificate checks when connecting to redis. -Disables certificate checks when connecting to redis for peer cluster -membership management. +Disables certificate checks when connecting to redis for peer cluster membership management. - Not eligible for live reload. - Type: `bool` + ### `Timeout` Timeout is the timeout to use when communicating with Redis. -Refinery will timeout after this duration when communicating with -Redis. +Refinery will timeout after this duration when communicating with Redis. - Not eligible for live reload. - Type: `duration` @@ -823,53 +777,47 @@ Redis. ### Section Name: `Collection` -Brings together the settings that are relevant to collecting spans -together to make traces. +Brings together the settings that are relevant to collecting spans together to make traces. + ### `CacheCapacity` -CacheCapacity is the number of traces to keep in the cache's circular -buffer. +CacheCapacity is the number of traces to keep in the cache's circular buffer. -The collection cache is used to collect all spans into a trace as well -as remember the sampling decision for any spans that might come in -after the trace has been marked "complete" (either by timing out or -seeing the root span). The number of traces in the cache should be -many multiples (100x to 1000x) of the total number of concurrently -active traces (trace throughput * trace duration). +The collection cache is used to collect all spans into a trace as well as remember the sampling decision for any spans that might come in after the trace has been marked "complete" (either by timing out or seeing the root span). +The number of traces in the cache should be many multiples (100x to 1000x) of the total number of concurrently active traces (trace throughput * trace duration). - Eligible for live reload. - Type: `int` - Default: `10000` + ### `MaxMemory` -MaxMemory is the maximum percentage of memory that should be allocated -by the span collector. - -If nonzero, it must be an integer value between 1 and 100, -representing the target maximum percentage of memory that should be -allocated by the span collector. If set to a non-zero value, once per -tick (see SendTicker) the collector will compare total allocated bytes -to this calculated value. If allocation is too high, traces will be -ejected from the cache early to reduce memory. Useful values for this -setting are generally in the range of 70-90. Depending on deployment -details, system memory information may not be available. If it is not, -a warning will be logged and the value of MaxAlloc will be used. If -this value is 0, MaxAlloc will be used. Requires MaxAlloc to be -nonzero. TODO: NOT YET IMPLEMENTED +MaxMemory is the maximum percentage of memory that should be allocated by the span collector. + +If nonzero, it must be an integer value between 1 and 100, representing the target maximum percentage of memory that should be allocated by the span collector. +If set to a non-zero value, once per tick (see SendTicker) the collector will compare total allocated bytes to this calculated value. +If allocation is too high, traces will be ejected from the cache early to reduce memory. +Useful values for this setting are generally in the range of 70-90. +Depending on deployment details, system memory information may not be available. +If it is not, a warning will be logged and the value of MaxAlloc will be used. +If this value is 0, MaxAlloc will be used. +Requires MaxAlloc to be nonzero. +TODO: NOT YET IMPLEMENTED - Eligible for live reload. - Type: `percentage` - Default: `75` - Example: `75` + ### `MaxAlloc` -MaxAlloc is the maximum number of bytes that should be allocated by -the collector. +MaxAlloc is the maximum number of bytes that should be allocated by the collector. -If set, it must be an integer >= 0. 64-bit values are supported. See -MaxMemory for more details. +If set, it must be an integer >= 0. +64-bit values are supported. +See MaxMemory for more details. - Eligible for live reload. - Type: `int` @@ -878,32 +826,29 @@ MaxMemory for more details. ### Section Name: `BufferSizes` -Brings together the settings that are relevant to the sizes of -communications buffers. +Brings together the settings that are relevant to the sizes of communications buffers. + ### `UpstreamBufferSize` -UpstreamBufferSize is the size of the queue used to buffer spans to -send to the upstream API. +UpstreamBufferSize is the size of the queue used to buffer spans to send to the upstream API. -Sets the size of the buffer (measured in spans) used to send spans to -the upstream collector. If the buffer fills up, performance will -degrade because Refinery will block while waiting for space to become -available. If this happens, you should increase the buffer size. +Sets the size of the buffer (measured in spans) used to send spans to the upstream collector. +If the buffer fills up, performance will degrade because Refinery will block while waiting for space to become available. +If this happens, you should increase the buffer size. - Eligible for live reload. - Type: `int` - Default: `10000` + ### `PeerBufferSize` -PeerBufferSize is the size of the queue used to buffer spans to send -to peer nodes. +PeerBufferSize is the size of the queue used to buffer spans to send to peer nodes. -Sets the size of the buffer (measured in spans) used to send spans to -peer nodes. If the buffer fills up, performance will degrade because -Refinery will block while waiting for space to become available. If -this happens, you should increase this buffer size. +Sets the size of the buffer (measured in spans) used to send spans to peer nodes. +If the buffer fills up, performance will degrade because Refinery will block while waiting for space to become available. +If this happens, you should increase this buffer size. - Eligible for live reload. - Type: `int` @@ -915,42 +860,38 @@ this happens, you should increase this buffer size. Special-purpose configuration options that are not typically needed. + ### `EnvironmentCacheTTL` -EnvironmentCacheTTL is the duration for which environment information -is cached. +EnvironmentCacheTTL is the duration for which environment information is cached. -This is the amount of time for which refinery caches environment -information, which it looks up from Honeycomb for each different -APIKey. This information is used when making sampling decisions. If -you have a very large number of environments, you may want to increase -this value. +This is the amount of time for which refinery caches environment information, which it looks up from Honeycomb for each different APIKey. +This information is used when making sampling decisions. +If you have a very large number of environments, you may want to increase this value. - Eligible for live reload. - Type: `duration` - Default: `1h` + ### `CompressPeerCommunication` -CompressPeerCommunication determines whether refinery will compress -span data it forwards to peers. +CompressPeerCommunication determines whether refinery will compress span data it forwards to peers. If it costs money to transmit data between refinery instances (e.g. -they're spread across AWS availability zones), then you almost -certainly want compression enabled to reduce your bill. The option to -disable it is provided as an escape hatch for deployments that value -lower CPU utilization over data transfer costs. +they're spread across AWS availability zones), then you almost certainly want compression enabled to reduce your bill. +The option to disable it is provided as an escape hatch for deployments that value lower CPU utilization over data transfer costs. - Not eligible for live reload. - Type: `bool` - Default: `true` + ### `AdditionalAttributes` -AdditionalAttributes is a map that can be used for injecting -user-defined attributes. +AdditionalAttributes is a map that can be used for injecting user-defined attributes. -A map that can be used for injecting user-defined attributes into -every span. For example, it could be used for naming a refinery -cluster. Both keys and values must be strings. +A map that can be used for injecting user-defined attributes into every span. +For example, it could be used for naming a refinery cluster. +Both keys and values must be strings. - Eligible for live reload. - Type: `map` @@ -960,29 +901,30 @@ cluster. Both keys and values must be strings. ### Section Name: `IDFields` -Controls the field names to use for the event ID fields. These fields -are used to identify events that are part of the same trace. +Controls the field names to use for the event ID fields. +These fields are used to identify events that are part of the same trace. + ### `TraceNames` TraceNames is the list of field names to use for the trace ID. -The list of field names to use for the trace ID. The first field in -the list that is present in an incoming span will be used as the trace -ID. If none of the fields are present, refinery treats the span as not -being part of a trace and forwards it immediately to Honeycomb. +The list of field names to use for the trace ID. +The first field in the list that is present in an incoming span will be used as the trace ID. +If none of the fields are present, refinery treats the span as not being part of a trace and forwards it immediately to Honeycomb. - Eligible for live reload. - Type: `stringarray` - Example: `trace.trace_id,traceId` + ### `ParentNames` ParentNames is the list of field names to use for the parent ID. -The list of field names to use for the parent ID. The first field in -the list that is present in an event will be used as the parent ID. A -trace without a parent_id is assumed to be a root span. +The list of field names to use for the parent ID. +The first field in the list that is present in an event will be used as the parent ID. +A trace without a parent_id is assumed to be a root span. - Eligible for live reload. - Type: `stringarray` @@ -992,88 +934,82 @@ trace without a parent_id is assumed to be a root span. ### Section Name: `GRPCServerParameters` -Controls the parameters of the gRPC server used to receive Open -Telemetry data in gRPC format. +Controls the parameters of the gRPC server used to receive Open Telemetry data in gRPC format. + ### `Enabled` Enabled specifies whether the gRPC server is enabled. -Specifies whether the gRPC server is enabled. If false, the gRPC -server is not started and no gRPC traffic is accepted. TODO: WE NEED -TO DEFAULT THIS TO TRUE IF PREVIOUS CONFIG HAS A LISTEN ADDRESS +Specifies whether the gRPC server is enabled. +If false, the gRPC server is not started and no gRPC traffic is accepted. +TODO: WE NEED TO DEFAULT THIS TO TRUE IF PREVIOUS CONFIG HAS A LISTEN ADDRESS - Not eligible for live reload. - Type: `bool` + ### `ListenAddr` -ListenAddr is the address refinery listens to for incoming GRPC Open -Telemetry events. +ListenAddr is the address refinery listens to for incoming GRPC Open Telemetry events. -Incoming traffic is expected to be unencrypted, so if using SSL put -something like nginx in front to do the decryption. +Incoming traffic is expected to be unencrypted, so if using SSL put something like nginx in front to do the decryption. - Not eligible for live reload. - Type: `hostport` + ### `MaxConnectionIdle` MaxConnectionIdle is the amount of time to permit an idle connection. -A duration for the amount of time after which an idle connection will -be closed by sending a GoAway. "Idle" means that there are no active -RPCs. 0s sets duration to infinity, but this is not recommended for -refinery deployments behind a load balancer, because it will prevent -the load balancer from distributing load evenly among peers. +A duration for the amount of time after which an idle connection will be closed by sending a GoAway. +"Idle" means that there are no active RPCs. +0s sets duration to infinity, but this is not recommended for refinery deployments behind a load balancer, because it will prevent the load balancer from distributing load evenly among peers. - Not eligible for live reload. - Type: `duration` - Default: `0s` - Example: `1m` + ### `MaxConnectionAge` -MaxConnectionAge is the maximum amount of time a gRPC connection may -exist. +MaxConnectionAge is the maximum amount of time a gRPC connection may exist. -Sets a duration for the maximum amount of time a connection may exist -before it will be closed by sending a GoAway. A random jitter of -+/-10% will be added to MaxConnectionAge to spread out connection -storms. 0s sets duration to infinity; a value measured in low minutes -will help load balancers to distribute load among peers more evenly. +Sets a duration for the maximum amount of time a connection may exist before it will be closed by sending a GoAway. +A random jitter of +/-10% will be added to MaxConnectionAge to spread out connection storms. +0s sets duration to infinity; a value measured in low minutes will help load balancers to distribute load among peers more evenly. - Not eligible for live reload. - Type: `duration` - Default: `3m` + ### `MaxConnectionAgeGrace` -MaxConnectionAgeGrace is the duration beyond MaxConnectionAge after -which the connection will be forcibly closed. +MaxConnectionAgeGrace is the duration beyond MaxConnectionAge after which the connection will be forcibly closed. -This is an additive period after MaxConnectionAge after which the -connection will be forcibly closed (in case the upstream node ignores -the GoAway request). 0s sets duration to infinity. +This is an additive period after MaxConnectionAge after which the connection will be forcibly closed (in case the upstream node ignores the GoAway request). +0s sets duration to infinity. - Not eligible for live reload. - Type: `duration` - Default: `60s` + ### `KeepAlive` KeepAlive is the duration between keep-alive pings. -Sets a duration for the amount of time after which if the client -doesn't see any activity it pings the server to see if the transport -is still alive. 0s sets duration to 2 hours. +Sets a duration for the amount of time after which if the client doesn't see any activity it pings the server to see if the transport is still alive. +0s sets duration to 2 hours. - Not eligible for live reload. - Type: `duration` - Default: `1m` + ### `KeepAliveTimeout` -KeepAliveTimeout is the duration the server waits for activity on the -connection. +KeepAliveTimeout is the duration the server waits for activity on the connection. -This is the amount of time after which if the server doesn't see any -activity, it pings the client to see if the transport is still alive. +This is the amount of time after which if the server doesn't see any activity, it pings the client to see if the transport is still alive. 0s sets duration to 20 seconds. - Not eligible for live reload. @@ -1084,48 +1020,42 @@ activity, it pings the client to see if the transport is still alive. ### Section Name: `SampleCache` -Controls the sample cache used to retain information about trace -status after the sampling decision has been made. +Controls the sample cache used to retain information about trace status after the sampling decision has been made. + ### `KeptSize` -KeptSize is the number of traces preserved in the cuckoo kept traces -cache. +KeptSize is the number of traces preserved in the cuckoo kept traces cache. -Controls the number of traces preserved in the cuckoo kept traces -cache. Refinery keeps a record of each trace that was kept and sent to -Honeycomb, along with some statistical information. This is most -useful in cases where the trace was sent before sending the root span, -so that the root span can be decorated with accurate metadata. Default -is 10_000 traces (each trace in this cache consumes roughly 200 -bytes). +Controls the number of traces preserved in the cuckoo kept traces cache. +Refinery keeps a record of each trace that was kept and sent to Honeycomb, along with some statistical information. +This is most useful in cases where the trace was sent before sending the root span, so that the root span can be decorated with accurate metadata. +Default is 10_000 traces (each trace in this cache consumes roughly 200 bytes). - Eligible for live reload. - Type: `int` - Default: `10000` + ### `DroppedSize` DroppedSize is the size of the cuckoo dropped traces cache. -Controls the size of the cuckoo dropped traces cache. This cache -consumes 4-6 bytes per trace at a scale of millions of traces. -Changing its size with live reload sets a future limit, but does not -have an immediate effect. +Controls the size of the cuckoo dropped traces cache. +This cache consumes 4-6 bytes per trace at a scale of millions of traces. +Changing its size with live reload sets a future limit, but does not have an immediate effect. - Eligible for live reload. - Type: `int` - Default: `1000000` + ### `SizeCheckInterval` -SizeCheckInterval controls how often the cuckoo cache re-evaluates its -capacity. +SizeCheckInterval controls how often the cuckoo cache re-evaluates its capacity. -Controls the duration the cuckoo cache uses to determine how often it -re-evaluates the remaining capacity of its dropped traces cache and -possibly cycles it. This cache is quite resilient so it doesn't need -to happen very often, but the operation is also inexpensive. Default -is 10 seconds. +Controls the duration the cuckoo cache uses to determine how often it re-evaluates the remaining capacity of its dropped traces cache and possibly cycles it. +This cache is quite resilient so it doesn't need to happen very often, but the operation is also inexpensive. +Default is 10 seconds. - Eligible for live reload. - Type: `duration` @@ -1135,112 +1065,86 @@ is 10 seconds. ### Section Name: `StressRelief` -Controls the stress relief mechanism, which is used to prevent -Refinery from being overwhelmed by a large number of traces. -There is a metric called stress_level that is emitted as part of -refinery metrics. It is a measure of refinery's throughput rate -relative to its processing rate, combined with the amount of room in -its internal queues, and ranges from 0 to 100. It is generally -expected to be 0 except under heavy load. When stress levels reach -100, there is an increased chance that refinery will become unstable. -To avoid this problem, the Stress Relief system can do deterministic -sampling on new trace traffic based solely on TraceID, without having -to store traces in the cache or take the time processing sampling -rules. Existing traces in flight will be processed normally, but when -Stress Relief is active, trace decisions are made deterministically on -a per-span basis; all spans will be sampled according to the -SamplingRate specified here. -Once Stress Relief activates (by exceeding the ActivationLevel), it -will not deactivate until stress_level falls below the -DeactivationLevel. When it deactivates, normal trace decisions are -made -- and any additional spans that arrive for traces that were -active during Stress Relief will respect the decisions made during -that time. -The measurement of stress is a lagging indicator and is highly -dependent on Refinery configuration and scaling. Other configuration -values should be well tuned first, before adjusting the Stress Relief -Activation parameters. -Stress Relief is not a substitute for proper configuration and -scaling, but it can be used as a safety valve to prevent Refinery from -becoming unstable under heavy load. +Controls the stress relief mechanism, which is used to prevent Refinery from being overwhelmed by a large number of traces. +There is a metric called stress_level that is emitted as part of refinery metrics. +It is a measure of refinery's throughput rate relative to its processing rate, combined with the amount of room in its internal queues, and ranges from 0 to 100. +It is generally expected to be 0 except under heavy load. +When stress levels reach 100, there is an increased chance that refinery will become unstable. +To avoid this problem, the Stress Relief system can do deterministic sampling on new trace traffic based solely on TraceID, without having to store traces in the cache or take the time processing sampling rules. +Existing traces in flight will be processed normally, but when Stress Relief is active, trace decisions are made deterministically on a per-span basis; all spans will be sampled according to the SamplingRate specified here. +Once Stress Relief activates (by exceeding the ActivationLevel), it will not deactivate until stress_level falls below the DeactivationLevel. +When it deactivates, normal trace decisions are made -- and any additional spans that arrive for traces that were active during Stress Relief will respect the decisions made during that time. +The measurement of stress is a lagging indicator and is highly dependent on Refinery configuration and scaling. +Other configuration values should be well tuned first, before adjusting the Stress Relief Activation parameters. +Stress Relief is not a substitute for proper configuration and scaling, but it can be used as a safety valve to prevent Refinery from becoming unstable under heavy load. + ### `Mode` Mode is a string indicating how to use Stress Relief. -Sets the stress relief mode. "never" means that Stress Relief will -never activate "monitor" is the recommended setting, and means that -Stress Relief will monitor the status of refinery and activate -according to the levels set below. "always" means that Stress Relief -is always on, which may be useful in an emergency situation. +Sets the stress relief mode. +"never" means that Stress Relief will never activate "monitor" is the recommended setting, and means that Stress Relief will monitor the status of refinery and activate according to the levels set below. +"always" means that Stress Relief is always on, which may be useful in an emergency situation. - Eligible for live reload. - Type: `string` - Default: `never` + ### `ActivationLevel` -ActivationLevel is the stress_level (from 0-100) at which Stress -Relief is triggered. +ActivationLevel is the stress_level (from 0-100) at which Stress Relief is triggered. -Sets the stress_level (from 0-100) at which Stress Relief is -triggered. +Sets the stress_level (from 0-100) at which Stress Relief is triggered. - Eligible for live reload. - Type: `percentage` - Default: `90` + ### `DeactivationLevel` -DeactivationLevel is the stress_level (from 0-100) at which Stress -Relief is turned off. +DeactivationLevel is the stress_level (from 0-100) at which Stress Relief is turned off. -Sets the stress_level (from 0-100) at which Stress Relief is turned -off (subject to MinimumActivationDuration). It must be less than -ActivationLevel. +Sets the stress_level (from 0-100) at which Stress Relief is turned off (subject to MinimumActivationDuration). +It must be less than ActivationLevel. - Eligible for live reload. - Type: `percentage` - Default: `70` + ### `SamplingRate` -SamplingRate is the sampling rate to use when Stress Relief is -activated. +SamplingRate is the sampling rate to use when Stress Relief is activated. -Controls the sampling rate to use when Stress Relief is activated. All -new traces will be deterministically sampled at this rate based only -on the traceID. It should be chosen to be a rate that sends fewer -samples than the average sampling rate Refinery is expected to -generate. For example, if Refinery is configured to normally sample at -a rate of 1 in 10, then Stress Relief should be configured to sample -at a rate of at least 1 in 30. +Controls the sampling rate to use when Stress Relief is activated. +All new traces will be deterministically sampled at this rate based only on the traceID. +It should be chosen to be a rate that sends fewer samples than the average sampling rate Refinery is expected to generate. +For example, if Refinery is configured to normally sample at a rate of 1 in 10, then Stress Relief should be configured to sample at a rate of at least 1 in 30. - Eligible for live reload. - Type: `int` - Default: `100` + ### `MinimumActivationDuration` -MinimumActivationDuration is the minimum time that stress relief will -stay enabled. +MinimumActivationDuration is the minimum time that stress relief will stay enabled. -Sets the minimum time that stress relief will stay enabled, once -activated. This helps to prevent oscillations. +Sets the minimum time that stress relief will stay enabled, once activated. +This helps to prevent oscillations. - Eligible for live reload. - Type: `duration` - Default: `10s` + ### `MinimumStartupDuration` -MinimumStartupDuration is the minimum time that stress relief will -stay enabled. - -Used when switching into Monitor mode. When stress monitoring is -enabled, it will start up in stressed mode for a at least this amount -of time to try to make sure that Refinery can handle the load before -it begins processing it in earnest. This is to help address the -problem of trying to bring a new node into an already-overloaded -cluster. If this duration is 0, Refinery will not start in stressed -mode, which will provide faster startup at the possible cost of -startup instability. +MinimumStartupDuration is the minimum time that stress relief will stay enabled. + +Used when switching into Monitor mode. +When stress monitoring is enabled, it will start up in stressed mode for a at least this amount of time to try to make sure that Refinery can handle the load before it begins processing it in earnest. +This is to help address the problem of trying to bring a new node into an already-overloaded cluster. +If this duration is 0, Refinery will not start in stressed mode, which will provide faster startup at the possible cost of startup instability. - Eligible for live reload. - Type: `duration` diff --git a/rules.md b/rules.md index aa54fe49af..9520b3fc21 100644 --- a/rules.md +++ b/rules.md @@ -1,7 +1,7 @@ # Honeycomb Refinery Rules Documentation This is the documentation for the rules configuration for Honeycomb's Refinery. -It was automatically generated on 2023-06-12 at 15:58:06 UTC. +It was automatically generated on 2023-06-12 at 18:45:41 UTC. ## The Rules file @@ -26,24 +26,21 @@ Samplers: Name: `RulesVersion` -This is a required parameter used to verify the version of -the rules file. It must be set to 2. +This is a required parameter used to verify the version of the rules file. +It must be set to 2. Name: `Samplers` -Samplers is a mapping of targets to samplers. Each target is a -Honeycomb environment (or, for classic keys, a dataset). The value is -the sampler to use for that target. The target called `__default__` -will be used for any target that is not explicitly listed. A -`__default__` target is required. -The targets are determined by examining the API key used to send the -trace. If the API key is a 'classic' key (which is a 32-character -hexadecimal value), the specified dataset name is used as the target. -If the API key is a new-style key (20-23 alphanumeric characters), the -key's environment name is used as the target. +Samplers is a mapping of targets to samplers. +Each target is a Honeycomb environment (or, for classic keys, a dataset). +The value is the sampler to use for that target. +The target called `__default__` will be used for any target that is not explicitly listed. +A `__default__` target is required. +The targets are determined by examining the API key used to send the trace. +If the API key is a 'classic' key (which is a 32-character hexadecimal value), the specified dataset name is used as the target. +If the API key is a new-style key (20-23 alphanumeric characters), the key's environment name is used as the target. -The remainder of this document describes the samplers that can be used within -the `Samplers` section and the fields that control their behavior. +The remainder of this document describes the samplers that can be used within the `Samplers` section and the fields that control their behavior. ## Table of Contents - [Deterministic Sampler](#deterministic-sampler) @@ -62,22 +59,19 @@ the `Samplers` section and the fields that control their behavior. ### Name: `DeterministicSampler` -The deterministic sampler uses a fixed sample rate to sample traces -based on their trace ID. This is the simplest sampling algorithm - it -is a static sample rate, choosing traces randomly to either keep or -send (at the appropriate rate). It is not influenced by the contents -of the trace other than the trace ID. +The deterministic sampler uses a fixed sample rate to sample traces based on their trace ID. +This is the simplest sampling algorithm - it is a static sample rate, choosing traces randomly to either keep or send (at the appropriate rate). +It is not influenced by the contents of the trace other than the trace ID. -### SampleRate +### `SampleRate` -The sample rate to use. It indicates a ratio, where one sample trace -is kept for every N traces seen. For example, a SampleRate of 30 will -keep 1 out of every 30 traces. The choice on whether to keep any -specific trace is random, so the rate is approximate. -The sample rate is calculated from the trace ID, so all spans with the -same trace ID will be sampled or not sampled together. +The sample rate to use. +It indicates a ratio, where one sample trace is kept for every N traces seen. +For example, a SampleRate of 30 will keep 1 out of every 30 traces. +The choice on whether to keep any specific trace is random, so the rate is approximate. +The sample rate is calculated from the trace ID, so all spans with the same trace ID will be sampled or not sampled together. Type: `int` @@ -90,26 +84,22 @@ Type: `int` ### Name: `DynamicSampler` -DynamicSampler is the basic Dynamic Sampler implementation. Most -installations will find the EMA Dynamic Sampler to be a better choice. -This sampler collects the values of a number of fields from a trace -and uses them to form a key. This key is handed to the standard -dynamic sampler algorithm which generates a sample rate based on the -frequency with which that key has appeared during the previous -ClearFrequency. See https://github.com/honeycombio/dynsampler-go for -more detail on the mechanics of the dynamic sampler. This sampler -uses the AvgSampleRate algorithm from that package. +DynamicSampler is the basic Dynamic Sampler implementation. +Most installations will find the EMA Dynamic Sampler to be a better choice. +This sampler collects the values of a number of fields from a trace and uses them to form a key. +This key is handed to the standard dynamic sampler algorithm which generates a sample rate based on the frequency with which that key has appeared during the previous ClearFrequency. +See https://github.com/honeycombio/dynsampler-go for more detail on the mechanics of the dynamic sampler. +This sampler uses the AvgSampleRate algorithm from that package. -### SampleRate +### `SampleRate` -The sample rate to use. It indicates a ratio, where one sample trace -is kept for every N traces seen. For example, a SampleRate of 30 will -keep 1 out of every 30 traces. The choice on whether to keep any -specific trace is random, so the rate is approximate. -The sample rate is calculated from the trace ID, so all spans with the -same trace ID will be sampled or not sampled together. +The sample rate to use. +It indicates a ratio, where one sample trace is kept for every N traces seen. +For example, a SampleRate of 30 will keep 1 out of every 30 traces. +The choice on whether to keep any specific trace is random, so the rate is approximate. +The sample rate is calculated from the trace ID, so all spans with the same trace ID will be sampled or not sampled together. Type: `int` @@ -117,11 +107,11 @@ Type: `int` -### ClearFrequency +### `ClearFrequency` -The duration after which the dynamic sampler should reset its internal -counters. It should be specified as a duration string, e.g. "30s" or -"1m". +The duration after which the dynamic sampler should reset its internal counters. +It should be specified as a duration string, e.g. +"30s" or "1m". Type: `duration` @@ -129,32 +119,19 @@ Type: `duration` -### FieldList - -A list of all the field names to use to form the key that will be -handed to the dynamic sampler. The combination of values from all of -these fields should reflect how interesting the trace is compared to -another. A good field selection has consistent values for -high-frequency, boring traffic, and unique values for outliers and -interesting traffic. Including an error field (or something like HTTP -status code) is an excellent choice. Using fields with very high -cardinality (like `k8s.pod.id`), is a bad choice. If the combination -of fields essentially makes them unique, the dynamic sampler will -sample everything. If the combination of fields is not unique enough, -you will not be guaranteed samples of the most interesting traces. As -an example, consider a combination of HTTP endpoint (high-frequency -and boring), HTTP method, and status code (normally boring but can -become interesting when indicating an error) as a good set of fields -since it will allowing proper sampling of all endpoints under normal -traffic and call out when there is failing traffic to any endpoint. -For example, in contrast, consider a combination of HTTP endpoint, -status code, and pod id as a bad set of fields, since it would result -in keys that are all unique, and therefore results in sampling 100% of -traces. Using only the HTTP endpoint field would be a **bad** choice, -as it is not unique enough and therefore interesting traces, like -traces that experienced a `500`, might not be sampled. Field names may -come from any span in the trace; if they occur on multiple spans, all -unique values will be included in the key. +### `FieldList` + +A list of all the field names to use to form the key that will be handed to the dynamic sampler. +The combination of values from all of these fields should reflect how interesting the trace is compared to another. +A good field selection has consistent values for high-frequency, boring traffic, and unique values for outliers and interesting traffic. +Including an error field (or something like HTTP status code) is an excellent choice. +Using fields with very high cardinality (like `k8s.pod.id`), is a bad choice. +If the combination of fields essentially makes them unique, the dynamic sampler will sample everything. +If the combination of fields is not unique enough, you will not be guaranteed samples of the most interesting traces. +As an example, consider a combination of HTTP endpoint (high-frequency and boring), HTTP method, and status code (normally boring but can become interesting when indicating an error) as a good set of fields since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint. +For example, in contrast, consider a combination of HTTP endpoint, status code, and pod id as a bad set of fields, since it would result in keys that are all unique, and therefore results in sampling 100% of traces. +Using only the HTTP endpoint field would be a **bad** choice, as it is not unique enough and therefore interesting traces, like traces that experienced a `500`, might not be sampled. +Field names may come from any span in the trace; if they occur on multiple spans, all unique values will be included in the key. Type: `stringarray` @@ -162,14 +139,12 @@ Type: `stringarray` -### MaxKeys +### `MaxKeys` -Limits the number of distinct keys tracked by the sampler. Once -MaxKeys is reached, new keys will not be included in the sample rate -map, but existing keys will continue to be be counted. You can use -this to keep the sample rate map size under control. Defaults to 500; -dynamic samplers will rarely achieve their goals with more keys than -this. +Limits the number of distinct keys tracked by the sampler. +Once MaxKeys is reached, new keys will not be included in the sample rate map, but existing keys will continue to be be counted. +You can use this to keep the sample rate map size under control. +Defaults to 500; dynamic samplers will rarely achieve their goals with more keys than this. Type: `int` @@ -177,14 +152,11 @@ Type: `int` -### UseTraceLength +### `UseTraceLength` -Indicates whether to include the trace length (number of spans in the -trace) as part of the key. The number of spans is exact, so if there -are normally small variations in trace length you may want to leave -this off. If traces are consistent lengths and changes in trace length -is a useful indicator of traces you'd like to see in Honeycomb, set -this to true. +Indicates whether to include the trace length (number of spans in the trace) as part of the key. +The number of spans is exact, so if there are normally small variations in trace length you may want to leave this off. +If traces are consistent lengths and changes in trace length is a useful indicator of traces you'd like to see in Honeycomb, set this to true. Type: `bool` @@ -197,36 +169,24 @@ Type: `bool` ### Name: `EMADynamicSampler` -The Exponential Moving Average (EMA) Dynamic Sampler attempts to -average a given sample rate, weighting rare traffic and frequent -traffic differently so as to end up with the correct average. -EMADynamicSampler is an improvement upon the simple DynamicSampler and -is recommended for many use cases. Based on the DynamicSampler, -EMADynamicSampler differs in that rather than compute rate based on a -periodic sample of traffic, it maintains an Exponential Moving Average -of counts seen per key, and adjusts this average at regular intervals. -The weight applied to more recent intervals is defined by `weight`, a -number between (0, 1) - larger values weight the average more toward -recent observations. In other words, a larger weight will cause sample -rates more quickly adapt to traffic patterns, while a smaller weight -will result in sample rates that are less sensitive to bursts or drops -in traffic and thus more consistent over time. -Keys that are not already present in the EMA will always have a sample -rate of 1. Keys that occur more frequently will be sampled on a -logarithmic curve. Every key will be represented at least once in any -given window and more frequent keys will have their sample rate -increased proportionally to trend towards the goal sample rate. - - - -### GoalSampleRate - -The sample rate to use. It indicates a ratio, where one sample trace -is kept for every N traces seen. For example, a SampleRate of 30 will -keep 1 out of every 30 traces. The choice on whether to keep any -specific trace is random, so the rate is approximate. -The sample rate is calculated from the trace ID, so all spans with the -same trace ID will be sampled or not sampled together. +The Exponential Moving Average (EMA) Dynamic Sampler attempts to average a given sample rate, weighting rare traffic and frequent traffic differently so as to end up with the correct average. +EMADynamicSampler is an improvement upon the simple DynamicSampler and is recommended for many use cases. +Based on the DynamicSampler, EMADynamicSampler differs in that rather than compute rate based on a periodic sample of traffic, it maintains an Exponential Moving Average of counts seen per key, and adjusts this average at regular intervals. +The weight applied to more recent intervals is defined by `weight`, a number between (0, 1) - larger values weight the average more toward recent observations. +In other words, a larger weight will cause sample rates more quickly adapt to traffic patterns, while a smaller weight will result in sample rates that are less sensitive to bursts or drops in traffic and thus more consistent over time. +Keys that are not already present in the EMA will always have a sample rate of 1. +Keys that occur more frequently will be sampled on a logarithmic curve. +Every key will be represented at least once in any given window and more frequent keys will have their sample rate increased proportionally to trend towards the goal sample rate. + + + +### `GoalSampleRate` + +The sample rate to use. +It indicates a ratio, where one sample trace is kept for every N traces seen. +For example, a SampleRate of 30 will keep 1 out of every 30 traces. +The choice on whether to keep any specific trace is random, so the rate is approximate. +The sample rate is calculated from the trace ID, so all spans with the same trace ID will be sampled or not sampled together. Type: `int` @@ -234,11 +194,11 @@ Type: `int` -### AdjustmentInterval +### `AdjustmentInterval` -The duration after which the EMA dynamic sampler should recalculate -its internal counters. It should be specified as a duration string, -e.g. "30s" or "1m". +The duration after which the EMA dynamic sampler should recalculate its internal counters. +It should be specified as a duration string, e.g. +"30s" or "1m". Type: `duration` @@ -246,14 +206,12 @@ Type: `duration` -### Weight +### `Weight` -The weight to use when calculating the EMA. It should be a number -between 0 and 1. Larger values weight the average more toward recent -observations. In other words, a larger weight will cause sample rates -more quickly adapt to traffic patterns, while a smaller weight will -result in sample rates that are less sensitive to bursts or drops in -traffic and thus more consistent over time. +The weight to use when calculating the EMA. +It should be a number between 0 and 1. +Larger values weight the average more toward recent observations. +In other words, a larger weight will cause sample rates more quickly adapt to traffic patterns, while a smaller weight will result in sample rates that are less sensitive to bursts or drops in traffic and thus more consistent over time. Type: `float` @@ -261,16 +219,13 @@ Type: `float` -### AgeOutValue +### `AgeOutValue` -Indicates the threshold for removing keys from the EMA. The EMA of any -key will approach 0 if it is not repeatedly observed, but will never -truly reach it, so we have to decide what constitutes "zero". Keys -with averages below this threshold will be removed from the EMA. -Default is the same as Weight, as this prevents a key with the -smallest integer value (1) from being aged out immediately. This value -should generally be <= Weight, unless you have very specific reasons -to set it higher. +Indicates the threshold for removing keys from the EMA. +The EMA of any key will approach 0 if it is not repeatedly observed, but will never truly reach it, so we have to decide what constitutes "zero". +Keys with averages below this threshold will be removed from the EMA. +Default is the same as Weight, as this prevents a key with the smallest integer value (1) from being aged out immediately. +This value should generally be <= Weight, unless you have very specific reasons to set it higher. Type: `float` @@ -278,14 +233,12 @@ Type: `float` -### BurstMultiple +### `BurstMultiple` -If set, this value is multiplied by the sum of the running average of -counts to define the burst detection threshold. If total counts -observed for a given interval exceed this threshold, EMA is updated -immediately, rather than waiting on the AdjustmentInterval. Defaults -to 2; a negative value disables. With the default of 2, if your -traffic suddenly doubles, burst detection will kick in. +If set, this value is multiplied by the sum of the running average of counts to define the burst detection threshold. +If total counts observed for a given interval exceed this threshold, EMA is updated immediately, rather than waiting on the AdjustmentInterval. +Defaults to 2; a negative value disables. +With the default of 2, if your traffic suddenly doubles, burst detection will kick in. Type: `float` @@ -293,10 +246,10 @@ Type: `float` -### BurstDetectionDelay +### `BurstDetectionDelay` -Indicates the number of intervals to run before burst detection kicks -in. Defaults to 3. +Indicates the number of intervals to run before burst detection kicks in. +Defaults to 3. Type: `int` @@ -304,32 +257,19 @@ Type: `int` -### FieldList - -A list of all the field names to use to form the key that will be -handed to the dynamic sampler. The combination of values from all of -these fields should reflect how interesting the trace is compared to -another. A good field selection has consistent values for -high-frequency, boring traffic, and unique values for outliers and -interesting traffic. Including an error field (or something like HTTP -status code) is an excellent choice. Using fields with very high -cardinality (like `k8s.pod.id`), is a bad choice. If the combination -of fields essentially makes them unique, the dynamic sampler will -sample everything. If the combination of fields is not unique enough, -you will not be guaranteed samples of the most interesting traces. As -an example, consider a combination of HTTP endpoint (high-frequency -and boring), HTTP method, and status code (normally boring but can -become interesting when indicating an error) as a good set of fields -since it will allowing proper sampling of all endpoints under normal -traffic and call out when there is failing traffic to any endpoint. -For example, in contrast, consider a combination of HTTP endpoint, -status code, and pod id as a bad set of fields, since it would result -in keys that are all unique, and therefore results in sampling 100% of -traces. Using only the HTTP endpoint field would be a **bad** choice, -as it is not unique enough and therefore interesting traces, like -traces that experienced a `500`, might not be sampled. Field names may -come from any span in the trace; if they occur on multiple spans, all -unique values will be included in the key. +### `FieldList` + +A list of all the field names to use to form the key that will be handed to the dynamic sampler. +The combination of values from all of these fields should reflect how interesting the trace is compared to another. +A good field selection has consistent values for high-frequency, boring traffic, and unique values for outliers and interesting traffic. +Including an error field (or something like HTTP status code) is an excellent choice. +Using fields with very high cardinality (like `k8s.pod.id`), is a bad choice. +If the combination of fields essentially makes them unique, the dynamic sampler will sample everything. +If the combination of fields is not unique enough, you will not be guaranteed samples of the most interesting traces. +As an example, consider a combination of HTTP endpoint (high-frequency and boring), HTTP method, and status code (normally boring but can become interesting when indicating an error) as a good set of fields since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint. +For example, in contrast, consider a combination of HTTP endpoint, status code, and pod id as a bad set of fields, since it would result in keys that are all unique, and therefore results in sampling 100% of traces. +Using only the HTTP endpoint field would be a **bad** choice, as it is not unique enough and therefore interesting traces, like traces that experienced a `500`, might not be sampled. +Field names may come from any span in the trace; if they occur on multiple spans, all unique values will be included in the key. Type: `stringarray` @@ -337,14 +277,12 @@ Type: `stringarray` -### MaxKeys +### `MaxKeys` -Limits the number of distinct keys tracked by the sampler. Once -MaxKeys is reached, new keys will not be included in the sample rate -map, but existing keys will continue to be be counted. You can use -this to keep the sample rate map size under control. Defaults to 500; -dynamic samplers will rarely achieve their goals with more keys than -this. +Limits the number of distinct keys tracked by the sampler. +Once MaxKeys is reached, new keys will not be included in the sample rate map, but existing keys will continue to be be counted. +You can use this to keep the sample rate map size under control. +Defaults to 500; dynamic samplers will rarely achieve their goals with more keys than this. Type: `int` @@ -352,14 +290,11 @@ Type: `int` -### UseTraceLength +### `UseTraceLength` -Indicates whether to include the trace length (number of spans in the -trace) as part of the key. The number of spans is exact, so if there -are normally small variations in trace length you may want to leave -this off. If traces are consistent lengths and changes in trace length -is a useful indicator of traces you'd like to see in Honeycomb, set -this to true. +Indicates whether to include the trace length (number of spans in the trace) as part of the key. +The number of spans is exact, so if there are normally small variations in trace length you may want to leave this off. +If traces are consistent lengths and changes in trace length is a useful indicator of traces you'd like to see in Honeycomb, set this to true. Type: `bool` @@ -372,37 +307,22 @@ Type: `bool` ### Name: `EMAThroughputSampler` -The Exponential Moving Average (EMA) Throughput Sampler attempts to -achieve a given throughput -- number of spans per second -- weighting -rare traffic and frequent traffic differently so as to end up with the -correct rate. -The EMAThroughputSampler is an improvement upon the TotalThroughput -Sampler and is recommended for most throughput-based use cases, -because it like the EMADynamicSampler, it maintains an Exponential -Moving Average of counts seen per key, and adjusts this average at -regular intervals. The weight applied to more recent intervals is -defined by `weight`, a number between (0, 1) - larger values weight -the average more toward recent observations. In other words, a larger -weight will cause sample rates more quickly adapt to traffic patterns, -while a smaller weight will result in sample rates that are less -sensitive to bursts or drops in traffic and thus more consistent over -time. -New keys that are not already present in the EMA will always have a -sample rate of 1. Keys that occur more frequently will be sampled on a -logarithmic curve. Every key will be represented at least once in any -given window and more frequent keys will have their sample rate -increased proportionally to trend towards the goal throughput. - - - -### GoalThroughputPerSec - -The desired throughput **per second**. This is the number of events -per second you want to send to Honeycomb. The sampler will adjust -sample rates to try to achieve this desired throughput. This value is -calculated for the individual instance, not for the cluster; if your -cluster has multiple instances, you will need to divide your total -desired sample rate by the number of instances to get this value. +The Exponential Moving Average (EMA) Throughput Sampler attempts to achieve a given throughput -- number of spans per second -- weighting rare traffic and frequent traffic differently so as to end up with the correct rate. +The EMAThroughputSampler is an improvement upon the TotalThroughput Sampler and is recommended for most throughput-based use cases, because it like the EMADynamicSampler, it maintains an Exponential Moving Average of counts seen per key, and adjusts this average at regular intervals. +The weight applied to more recent intervals is defined by `weight`, a number between (0, 1) - larger values weight the average more toward recent observations. +In other words, a larger weight will cause sample rates more quickly adapt to traffic patterns, while a smaller weight will result in sample rates that are less sensitive to bursts or drops in traffic and thus more consistent over time. +New keys that are not already present in the EMA will always have a sample rate of 1. +Keys that occur more frequently will be sampled on a logarithmic curve. +Every key will be represented at least once in any given window and more frequent keys will have their sample rate increased proportionally to trend towards the goal throughput. + + + +### `GoalThroughputPerSec` + +The desired throughput **per second**. +This is the number of events per second you want to send to Honeycomb. +The sampler will adjust sample rates to try to achieve this desired throughput. +This value is calculated for the individual instance, not for the cluster; if your cluster has multiple instances, you will need to divide your total desired sample rate by the number of instances to get this value. Type: `int` @@ -410,12 +330,10 @@ Type: `int` -### InitialSampleRate +### `InitialSampleRate` -InitialSampleRate is the sample rate to use during startup, before the -sampler has accumulated enough data to calculate a reasonable -throughput. This is mainly useful in situations where unsampled -throughput is high enough to cause problems. +InitialSampleRate is the sample rate to use during startup, before the sampler has accumulated enough data to calculate a reasonable throughput. +This is mainly useful in situations where unsampled throughput is high enough to cause problems. Type: `int` @@ -423,11 +341,11 @@ Type: `int` -### AdjustmentInterval +### `AdjustmentInterval` -The duration after which the EMA dynamic sampler should recalculate -its internal counters. It should be specified as a duration string, -e.g. "30s" or "1m". +The duration after which the EMA dynamic sampler should recalculate its internal counters. +It should be specified as a duration string, e.g. +"30s" or "1m". Type: `duration` @@ -435,14 +353,12 @@ Type: `duration` -### Weight +### `Weight` -The weight to use when calculating the EMA. It should be a number -between 0 and 1. Larger values weight the average more toward recent -observations. In other words, a larger weight will cause sample rates -more quickly adapt to traffic patterns, while a smaller weight will -result in sample rates that are less sensitive to bursts or drops in -traffic and thus more consistent over time. +The weight to use when calculating the EMA. +It should be a number between 0 and 1. +Larger values weight the average more toward recent observations. +In other words, a larger weight will cause sample rates more quickly adapt to traffic patterns, while a smaller weight will result in sample rates that are less sensitive to bursts or drops in traffic and thus more consistent over time. Type: `float` @@ -450,16 +366,13 @@ Type: `float` -### AgeOutValue +### `AgeOutValue` -Indicates the threshold for removing keys from the EMA. The EMA of any -key will approach 0 if it is not repeatedly observed, but will never -truly reach it, so we have to decide what constitutes "zero". Keys -with averages below this threshold will be removed from the EMA. -Default is the same as Weight, as this prevents a key with the -smallest integer value (1) from being aged out immediately. This value -should generally be <= Weight, unless you have very specific reasons -to set it higher. +Indicates the threshold for removing keys from the EMA. +The EMA of any key will approach 0 if it is not repeatedly observed, but will never truly reach it, so we have to decide what constitutes "zero". +Keys with averages below this threshold will be removed from the EMA. +Default is the same as Weight, as this prevents a key with the smallest integer value (1) from being aged out immediately. +This value should generally be <= Weight, unless you have very specific reasons to set it higher. Type: `float` @@ -467,14 +380,12 @@ Type: `float` -### BurstMultiple +### `BurstMultiple` -If set, this value is multiplied by the sum of the running average of -counts to define the burst detection threshold. If total counts -observed for a given interval exceed this threshold, EMA is updated -immediately, rather than waiting on the AdjustmentInterval. Defaults -to 2; a negative value disables. With the default of 2, if your -traffic suddenly doubles, burst detection will kick in. +If set, this value is multiplied by the sum of the running average of counts to define the burst detection threshold. +If total counts observed for a given interval exceed this threshold, EMA is updated immediately, rather than waiting on the AdjustmentInterval. +Defaults to 2; a negative value disables. +With the default of 2, if your traffic suddenly doubles, burst detection will kick in. Type: `float` @@ -482,10 +393,10 @@ Type: `float` -### BurstDetectionDelay +### `BurstDetectionDelay` -Indicates the number of intervals to run before burst detection kicks -in. Defaults to 3. +Indicates the number of intervals to run before burst detection kicks in. +Defaults to 3. Type: `int` @@ -493,32 +404,19 @@ Type: `int` -### FieldList - -A list of all the field names to use to form the key that will be -handed to the dynamic sampler. The combination of values from all of -these fields should reflect how interesting the trace is compared to -another. A good field selection has consistent values for -high-frequency, boring traffic, and unique values for outliers and -interesting traffic. Including an error field (or something like HTTP -status code) is an excellent choice. Using fields with very high -cardinality (like `k8s.pod.id`), is a bad choice. If the combination -of fields essentially makes them unique, the dynamic sampler will -sample everything. If the combination of fields is not unique enough, -you will not be guaranteed samples of the most interesting traces. As -an example, consider a combination of HTTP endpoint (high-frequency -and boring), HTTP method, and status code (normally boring but can -become interesting when indicating an error) as a good set of fields -since it will allowing proper sampling of all endpoints under normal -traffic and call out when there is failing traffic to any endpoint. -For example, in contrast, consider a combination of HTTP endpoint, -status code, and pod id as a bad set of fields, since it would result -in keys that are all unique, and therefore results in sampling 100% of -traces. Using only the HTTP endpoint field would be a **bad** choice, -as it is not unique enough and therefore interesting traces, like -traces that experienced a `500`, might not be sampled. Field names may -come from any span in the trace; if they occur on multiple spans, all -unique values will be included in the key. +### `FieldList` + +A list of all the field names to use to form the key that will be handed to the dynamic sampler. +The combination of values from all of these fields should reflect how interesting the trace is compared to another. +A good field selection has consistent values for high-frequency, boring traffic, and unique values for outliers and interesting traffic. +Including an error field (or something like HTTP status code) is an excellent choice. +Using fields with very high cardinality (like `k8s.pod.id`), is a bad choice. +If the combination of fields essentially makes them unique, the dynamic sampler will sample everything. +If the combination of fields is not unique enough, you will not be guaranteed samples of the most interesting traces. +As an example, consider a combination of HTTP endpoint (high-frequency and boring), HTTP method, and status code (normally boring but can become interesting when indicating an error) as a good set of fields since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint. +For example, in contrast, consider a combination of HTTP endpoint, status code, and pod id as a bad set of fields, since it would result in keys that are all unique, and therefore results in sampling 100% of traces. +Using only the HTTP endpoint field would be a **bad** choice, as it is not unique enough and therefore interesting traces, like traces that experienced a `500`, might not be sampled. +Field names may come from any span in the trace; if they occur on multiple spans, all unique values will be included in the key. Type: `stringarray` @@ -526,14 +424,12 @@ Type: `stringarray` -### MaxKeys +### `MaxKeys` -Limits the number of distinct keys tracked by the sampler. Once -MaxKeys is reached, new keys will not be included in the sample rate -map, but existing keys will continue to be be counted. You can use -this to keep the sample rate map size under control. Defaults to 500; -dynamic samplers will rarely achieve their goals with more keys than -this. +Limits the number of distinct keys tracked by the sampler. +Once MaxKeys is reached, new keys will not be included in the sample rate map, but existing keys will continue to be be counted. +You can use this to keep the sample rate map size under control. +Defaults to 500; dynamic samplers will rarely achieve their goals with more keys than this. Type: `int` @@ -541,14 +437,11 @@ Type: `int` -### UseTraceLength +### `UseTraceLength` -Indicates whether to include the trace length (number of spans in the -trace) as part of the key. The number of spans is exact, so if there -are normally small variations in trace length you may want to leave -this off. If traces are consistent lengths and changes in trace length -is a useful indicator of traces you'd like to see in Honeycomb, set -this to true. +Indicates whether to include the trace length (number of spans in the trace) as part of the key. +The number of spans is exact, so if there are normally small variations in trace length you may want to leave this off. +If traces are consistent lengths and changes in trace length is a useful indicator of traces you'd like to see in Honeycomb, set this to true. Type: `bool` @@ -561,42 +454,29 @@ Type: `bool` ### Name: `WindowedThroughputSampler` -Windowed Throughput sampling is an enhanced version of total -throughput sampling. Just like the TotalThroughput sampler, it -attempts to meet the goal of fixed number of events per second sent to -Honeycomb. -The original throughput sampler updates the sampling rate every -"ClearFrequency" seconds. While this parameter is configurable, it -suffers from the following tradeoff: -- Decreasing it is more responsive to load spikes, but with the -cost of making the sampling decision on less data. -- Increasing it is less responsive to load spikes, but sample rates -will be more -stable because they are made with more data. - -The windowed throughput sampler resolves this by introducing two -different, tunable parameters: -- UpdateFrequency: how often the sampling rate is recomputed -- LookbackFrequency: how much total time is considered when -recomputing sampling rate. +Windowed Throughput sampling is an enhanced version of total throughput sampling. +Just like the TotalThroughput sampler, it attempts to meet the goal of fixed number of events per second sent to Honeycomb. +The original throughput sampler updates the sampling rate every "ClearFrequency" seconds. +While this parameter is configurable, it suffers from the following tradeoff: + - Decreasing it is more responsive to load spikes, but with the + cost of making the sampling decision on less data. +- Increasing it is less responsive to load spikes, but sample rates will be more + stable because they are made with more data. +The windowed throughput sampler resolves this by introducing two different, tunable parameters: + - UpdateFrequency: how often the sampling rate is recomputed + - LookbackFrequency: how much total time is considered when recomputing sampling rate. +A standard configuration would be to set UpdateFrequency to 1s and LookbackFrequency to 30s. +In this configuration, every second, we lookback at the last 30s of data in order to compute the new sampling rate. +The actual sampling rate computation is nearly identical to the original throughput sampler, but this variant has better support for floating point numbers. -A standard configuration would be to set UpdateFrequency to 1s and -LookbackFrequency to 30s. In this configuration, every second, we -lookback at the last 30s of data in order to compute the new sampling -rate. The actual sampling rate computation is nearly identical to the -original throughput sampler, but this variant has better support for -floating point numbers. +### `GoalThroughputPerSec` -### GoalThroughputPerSec - -The desired throughput **per second**. This is the number of events -per second you want to send to Honeycomb. The sampler will adjust -sample rates to try to achieve this desired throughput. This value is -calculated for the individual instance, not for the cluster; if your -cluster has multiple instances, you will need to divide your total -desired sample rate by the number of instances to get this value. +The desired throughput **per second**. +This is the number of events per second you want to send to Honeycomb. +The sampler will adjust sample rates to try to achieve this desired throughput. +This value is calculated for the individual instance, not for the cluster; if your cluster has multiple instances, you will need to divide your total desired sample rate by the number of instances to get this value. Type: `int` @@ -604,10 +484,11 @@ Type: `int` -### UpdateFrequency +### `UpdateFrequency` -The duration between sampling rate computations. It should be -specified as a duration string, e.g. "30s" or "1m". +The duration between sampling rate computations. +It should be specified as a duration string, e.g. +"30s" or "1m". Type: `duration` @@ -615,11 +496,11 @@ Type: `duration` -### LookbackFrequency +### `LookbackFrequency` -This controls how far back in time to lookback to dynamically adjust -the sampling rate. Default is 30 * UpdateFrequencyDuration. This is -forced to be an _integer multiple_ of UpdateFrequencyDuration. +This controls how far back in time to lookback to dynamically adjust the sampling rate. +Default is 30 * UpdateFrequencyDuration. +This is forced to be an _integer multiple_ of UpdateFrequencyDuration. Type: `duration` @@ -627,32 +508,19 @@ Type: `duration` -### FieldList - -A list of all the field names to use to form the key that will be -handed to the dynamic sampler. The combination of values from all of -these fields should reflect how interesting the trace is compared to -another. A good field selection has consistent values for -high-frequency, boring traffic, and unique values for outliers and -interesting traffic. Including an error field (or something like HTTP -status code) is an excellent choice. Using fields with very high -cardinality (like `k8s.pod.id`), is a bad choice. If the combination -of fields essentially makes them unique, the dynamic sampler will -sample everything. If the combination of fields is not unique enough, -you will not be guaranteed samples of the most interesting traces. As -an example, consider a combination of HTTP endpoint (high-frequency -and boring), HTTP method, and status code (normally boring but can -become interesting when indicating an error) as a good set of fields -since it will allowing proper sampling of all endpoints under normal -traffic and call out when there is failing traffic to any endpoint. -For example, in contrast, consider a combination of HTTP endpoint, -status code, and pod id as a bad set of fields, since it would result -in keys that are all unique, and therefore results in sampling 100% of -traces. Using only the HTTP endpoint field would be a **bad** choice, -as it is not unique enough and therefore interesting traces, like -traces that experienced a `500`, might not be sampled. Field names may -come from any span in the trace; if they occur on multiple spans, all -unique values will be included in the key. +### `FieldList` + +A list of all the field names to use to form the key that will be handed to the dynamic sampler. +The combination of values from all of these fields should reflect how interesting the trace is compared to another. +A good field selection has consistent values for high-frequency, boring traffic, and unique values for outliers and interesting traffic. +Including an error field (or something like HTTP status code) is an excellent choice. +Using fields with very high cardinality (like `k8s.pod.id`), is a bad choice. +If the combination of fields essentially makes them unique, the dynamic sampler will sample everything. +If the combination of fields is not unique enough, you will not be guaranteed samples of the most interesting traces. +As an example, consider a combination of HTTP endpoint (high-frequency and boring), HTTP method, and status code (normally boring but can become interesting when indicating an error) as a good set of fields since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint. +For example, in contrast, consider a combination of HTTP endpoint, status code, and pod id as a bad set of fields, since it would result in keys that are all unique, and therefore results in sampling 100% of traces. +Using only the HTTP endpoint field would be a **bad** choice, as it is not unique enough and therefore interesting traces, like traces that experienced a `500`, might not be sampled. +Field names may come from any span in the trace; if they occur on multiple spans, all unique values will be included in the key. Type: `stringarray` @@ -660,14 +528,12 @@ Type: `stringarray` -### MaxKeys +### `MaxKeys` -Limits the number of distinct keys tracked by the sampler. Once -MaxKeys is reached, new keys will not be included in the sample rate -map, but existing keys will continue to be be counted. You can use -this to keep the sample rate map size under control. Defaults to 500; -dynamic samplers will rarely achieve their goals with more keys than -this. +Limits the number of distinct keys tracked by the sampler. +Once MaxKeys is reached, new keys will not be included in the sample rate map, but existing keys will continue to be be counted. +You can use this to keep the sample rate map size under control. +Defaults to 500; dynamic samplers will rarely achieve their goals with more keys than this. Type: `int` @@ -675,14 +541,11 @@ Type: `int` -### UseTraceLength +### `UseTraceLength` -Indicates whether to include the trace length (number of spans in the -trace) as part of the key. The number of spans is exact, so if there -are normally small variations in trace length you may want to leave -this off. If traces are consistent lengths and changes in trace length -is a useful indicator of traces you'd like to see in Honeycomb, set -this to true. +Indicates whether to include the trace length (number of spans in the trace) as part of the key. +The number of spans is exact, so if there are normally small variations in trace length you may want to leave this off. +If traces are consistent lengths and changes in trace length is a useful indicator of traces you'd like to see in Honeycomb, set this to true. Type: `bool` @@ -695,18 +558,15 @@ Type: `bool` ### Name: `RulesBasedSampler` -The Rules-based sampler allows you to specify a set of rules that will -determine whether a trace should be sampled or not. Rules are -evaluated in order, and the first rule that matches will be used to -determine the sample rate. If no rules match, the SampleRate will be 1 -(i.e. all traces will be kept). -Rules-based samplers will usually be configured to have the last rule -be a default rule with no conditions that uses a downstream dynamic -sampler to keep overall sample rate under control. +The Rules-based sampler allows you to specify a set of rules that will determine whether a trace should be sampled or not. +Rules are evaluated in order, and the first rule that matches will be used to determine the sample rate. +If no rules match, the SampleRate will be 1 (i.e. +all traces will be kept). +Rules-based samplers will usually be configured to have the last rule be a default rule with no conditions that uses a downstream dynamic sampler to keep overall sample rate under control. -### Rules +### `Rules` Rules is a list of rules to use to determine the sample rate. @@ -716,17 +576,13 @@ Type: `objectarray` -### CheckNestedFields +### `CheckNestedFields` -Indicates whether to expand nested JSON when evaluating rules. If -false, nested JSON will be treated as a string. If true, nested JSON -will be expanded into a map[string]interface{} and the value of the -field will be the value of the nested field. For example, if you have -a field called `http.request.headers` and you want to check the value -of the `User-Agent` header, you would set this to true and use -`http.request.headers.User-Agent` as the field name in your rule. This -is a computationally expensive option and may cause performance -problems if you have a large number of spans with nested JSON. +Indicates whether to expand nested JSON when evaluating rules. +If false, nested JSON will be treated as a string. +If true, nested JSON will be expanded into a map[string]interface{} and the value of the field will be the value of the nested field. +For example, if you have a field called `http.request.headers` and you want to check the value of the `User-Agent` header, you would set this to true and use `http.request.headers.User-Agent` as the field name in your rule. +This is a computationally expensive option and may cause performance problems if you have a large number of spans with nested JSON. Type: `bool` @@ -739,21 +595,17 @@ Type: `bool` ### Name: `Rules` -Rules are evaluated in order, and the first rule that matches will be -used to determine the sample rate. If no rules match, the SampleRate -will be 1 (i.e. all traces will be kept). -If a rule matches, one of three things happens, and they are evaluated -in this order: a) if the rule specifies a downstream Sampler, that -sampler is used to determine the sample rate; b) if the rule has the -Drop flag set to true, the trace is dropped; c) the rule's sample rate -is used. +Rules are evaluated in order, and the first rule that matches will be used to determine the sample rate. +If no rules match, the SampleRate will be 1 (i.e. +all traces will be kept). +If a rule matches, one of three things happens, and they are evaluated in this order: a) if the rule specifies a downstream Sampler, that sampler is used to determine the sample rate; b) if the rule has the Drop flag set to true, the trace is dropped; c) the rule's sample rate is used. -### Name +### `Name` -The name of the rule. This is used for debugging and will appear in -the trace metadata if AddRuleReasonToTrace is set to true. +The name of the rule. +This is used for debugging and will appear in the trace metadata if AddRuleReasonToTrace is set to true. Type: `string` @@ -761,12 +613,11 @@ Type: `string` -### Sampler +### `Sampler` -The sampler to use if the rule matches. If this is set, the sample -rate will be determined by this downstream sampler. If this is not -set, the sample rate will be determined by the Drop flag or the -SampleRate field. +The sampler to use if the rule matches. +If this is set, the sample rate will be determined by this downstream sampler. +If this is not set, the sample rate will be determined by the Drop flag or the SampleRate field. Type: `object` @@ -774,10 +625,11 @@ Type: `object` -### Drop +### `Drop` -Indicates whether to drop the trace if it matches this rule. If true, -the trace will be dropped. If false, the trace will be kept. +Indicates whether to drop the trace if it matches this rule. +If true, the trace will be dropped. +If false, the trace will be kept. Type: `bool` @@ -785,10 +637,9 @@ Type: `bool` -### SampleRate +### `SampleRate` -If the rule is matched, there is no Sampler specified, and the Drop -flag is false, then this is the sample rate to use. +If the rule is matched, there is no Sampler specified, and the Drop flag is false, then this is the sample rate to use. Type: `int` @@ -796,12 +647,11 @@ Type: `int` -### Conditions +### `Conditions` -Conditions is a list of conditions to use to determine whether the -rule matches. All conditions must be met for the rule to match. If -there are no conditions, the rule will always match (this is typically -done for the last rule to provide a default behavior). +Conditions is a list of conditions to use to determine whether the rule matches. +All conditions must be met for the rule to match. +If there are no conditions, the rule will always match (this is typically done for the last rule to provide a default behavior). Type: `objectarray` @@ -809,13 +659,11 @@ Type: `objectarray` -### Scope +### `Scope` -Controls the scope of the rule evaluation. If set to "trace" (the -default), each condition can apply to any span in the trace -independently. If set to "span", all of the conditions in the rule -will be evaluated against each span in the trace and the rule only -succeeds if all of the conditions match on a single span together. +Controls the scope of the rule evaluation. +If set to "trace" (the default), each condition can apply to any span in the trace independently. +If set to "span", all of the conditions in the rule will be evaluated against each span in the trace and the rule only succeeds if all of the conditions match on a single span together. Type: `string` @@ -828,18 +676,18 @@ Type: `string` ### Name: `Conditions` -Conditions are evaluated in order, and the first condition that does -not match will cause the rule to not match. If all conditions match, -the rule will match. If there are no conditions, the rule will always -match. +Conditions are evaluated in order, and the first condition that does not match will cause the rule to not match. +If all conditions match, the rule will match. +If there are no conditions, the rule will always match. -### Field +### `Field` -The field to check. This can be any field in the trace. If the field -is not present, the condition will not match. The comparison is -case-sensitive. +The field to check. +This can be any field in the trace. +If the field is not present, the condition will not match. +The comparison is case-sensitive. Type: `string` @@ -847,9 +695,10 @@ Type: `string` -### Operator +### `Operator` -The comparison operator to use. String comparisons are case-sensitive. +The comparison operator to use. +String comparisons are case-sensitive. Type: `string` @@ -857,10 +706,10 @@ Type: `string` -### Value +### `Value` -The value to compare against. If Datatype is not specified, then the -value and the field will be compared based on the type of the field. +The value to compare against. +If Datatype is not specified, then the value and the field will be compared based on the type of the field. Type: `anyscalar` @@ -868,14 +717,12 @@ Type: `anyscalar` -### Datatype +### `Datatype` -The datatype to use when comparing the value and the field. If -Datatype is specified, then both values will be converted -(best-effort) to that type and compared. Errors in conversion will -result in the comparison evaluating to false. This is especially -useful when a field like http status code may be rendered as strings -by some environments and as numbers or booleans by others. +The datatype to use when comparing the value and the field. +If Datatype is specified, then both values will be converted (best-effort) to that type and compared. +Errors in conversion will result in the comparison evaluating to false. +This is especially useful when a field like http status code may be rendered as strings by some environments and as numbers or booleans by others. Type: `string` @@ -888,29 +735,22 @@ Type: `string` ### Name: `TotalThroughputSampler` -TotalThroughput attempts to meet a goal of a fixed number of events -per second sent to Honeycomb. This sampler is deprecated and present -mainly for compatibility. Most installations will want to use either -EMAThroughput or WindowedThroughput instead. -If your key space is sharded across different servers, this is a good -method for making sure each server sends roughly the same volume of -content to Honeycomb. It performs poorly when the active keyspace is -very large. -GoalThroughputPerSec * ClearFrequency defines the upper limit of the -number of keys that can be reported and stay under the goal, but with -that many keys, you'll only get one event per key per -ClearFrequencySec, which is very coarse. You should aim for at least 1 -event per key per sec to 1 event per key per 10sec to get reasonable -data. In other words, the number of active keys should be less than -10*GoalThroughputPerSec. +TotalThroughput attempts to meet a goal of a fixed number of events per second sent to Honeycomb. +This sampler is deprecated and present mainly for compatibility. +Most installations will want to use either EMAThroughput or WindowedThroughput instead. +If your key space is sharded across different servers, this is a good method for making sure each server sends roughly the same volume of content to Honeycomb. +It performs poorly when the active keyspace is very large. +GoalThroughputPerSec * ClearFrequency defines the upper limit of the number of keys that can be reported and stay under the goal, but with that many keys, you'll only get one event per key per ClearFrequencySec, which is very coarse. +You should aim for at least 1 event per key per sec to 1 event per key per 10sec to get reasonable data. +In other words, the number of active keys should be less than 10*GoalThroughputPerSec. -### GoalThroughputPerSec +### `GoalThroughputPerSec` -The desired throughput per second of events sent to Honeycomb. This is -the number of events per second you want to send. This is not the same -as the sample rate. +The desired throughput per second of events sent to Honeycomb. +This is the number of events per second you want to send. +This is not the same as the sample rate. Type: `int` @@ -918,11 +758,11 @@ Type: `int` -### ClearFrequency +### `ClearFrequency` -The duration after which the dynamic sampler should reset its internal -counters. It should be specified as a duration string, e.g. "30s" or -"1m". +The duration after which the dynamic sampler should reset its internal counters. +It should be specified as a duration string, e.g. +"30s" or "1m". Type: `duration` @@ -930,32 +770,19 @@ Type: `duration` -### FieldList - -A list of all the field names to use to form the key that will be -handed to the dynamic sampler. The combination of values from all of -these fields should reflect how interesting the trace is compared to -another. A good field selection has consistent values for -high-frequency, boring traffic, and unique values for outliers and -interesting traffic. Including an error field (or something like HTTP -status code) is an excellent choice. Using fields with very high -cardinality (like `k8s.pod.id`), is a bad choice. If the combination -of fields essentially makes them unique, the dynamic sampler will -sample everything. If the combination of fields is not unique enough, -you will not be guaranteed samples of the most interesting traces. As -an example, consider a combination of HTTP endpoint (high-frequency -and boring), HTTP method, and status code (normally boring but can -become interesting when indicating an error) as a good set of fields -since it will allowing proper sampling of all endpoints under normal -traffic and call out when there is failing traffic to any endpoint. -For example, in contrast, consider a combination of HTTP endpoint, -status code, and pod id as a bad set of fields, since it would result -in keys that are all unique, and therefore results in sampling 100% of -traces. Using only the HTTP endpoint field would be a **bad** choice, -as it is not unique enough and therefore interesting traces, like -traces that experienced a `500`, might not be sampled. Field names may -come from any span in the trace; if they occur on multiple spans, all -unique values will be included in the key. +### `FieldList` + +A list of all the field names to use to form the key that will be handed to the dynamic sampler. +The combination of values from all of these fields should reflect how interesting the trace is compared to another. +A good field selection has consistent values for high-frequency, boring traffic, and unique values for outliers and interesting traffic. +Including an error field (or something like HTTP status code) is an excellent choice. +Using fields with very high cardinality (like `k8s.pod.id`), is a bad choice. +If the combination of fields essentially makes them unique, the dynamic sampler will sample everything. +If the combination of fields is not unique enough, you will not be guaranteed samples of the most interesting traces. +As an example, consider a combination of HTTP endpoint (high-frequency and boring), HTTP method, and status code (normally boring but can become interesting when indicating an error) as a good set of fields since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint. +For example, in contrast, consider a combination of HTTP endpoint, status code, and pod id as a bad set of fields, since it would result in keys that are all unique, and therefore results in sampling 100% of traces. +Using only the HTTP endpoint field would be a **bad** choice, as it is not unique enough and therefore interesting traces, like traces that experienced a `500`, might not be sampled. +Field names may come from any span in the trace; if they occur on multiple spans, all unique values will be included in the key. Type: `stringarray` @@ -963,14 +790,12 @@ Type: `stringarray` -### MaxKeys +### `MaxKeys` -Limits the number of distinct keys tracked by the sampler. Once -MaxKeys is reached, new keys will not be included in the sample rate -map, but existing keys will continue to be be counted. You can use -this to keep the sample rate map size under control. Defaults to 500; -dynamic samplers will rarely achieve their goals with more keys than -this. +Limits the number of distinct keys tracked by the sampler. +Once MaxKeys is reached, new keys will not be included in the sample rate map, but existing keys will continue to be be counted. +You can use this to keep the sample rate map size under control. +Defaults to 500; dynamic samplers will rarely achieve their goals with more keys than this. Type: `int` @@ -978,14 +803,11 @@ Type: `int` -### UseTraceLength +### `UseTraceLength` -Indicates whether to include the trace length (number of spans in the -trace) as part of the key. The number of spans is exact, so if there -are normally small variations in trace length you may want to leave -this off. If traces are consistent lengths and changes in trace length -is a useful indicator of traces you'd like to see in Honeycomb, set -this to true. +Indicates whether to include the trace length (number of spans in the trace) as part of the key. +The number of spans is exact, so if there are normally small variations in trace length you may want to leave this off. +If traces are consistent lengths and changes in trace length is a useful indicator of traces you'd like to see in Honeycomb, set this to true. Type: `bool` diff --git a/tools/convert/helpers.go b/tools/convert/helpers.go index 146501284e..7507ce4313 100644 --- a/tools/convert/helpers.go +++ b/tools/convert/helpers.go @@ -43,6 +43,7 @@ func helpers() template.FuncMap { "split": split, "wci": wci, "wordwrap": wordwrap, + "wrapForDocs": wrapForDocs, "yamlf": yamlf, } } @@ -375,6 +376,14 @@ func wordwrap(s string) string { return strings.Join(output, "\n") } +func wrapForDocs(s string) string { + paragraphBreak := regexp.MustCompile(`\n\s*\n`) + s = paragraphBreak.ReplaceAllString(s, "__PARAGRAPH_BREAK__") + sentenceEnd := regexp.MustCompile(`([.?!])\s+|__PARAGRAPH_BREAK__`) + s = sentenceEnd.ReplaceAllString(s, "$1\n") + return s +} + // simplistic YAML formatting of a value func yamlf(a any) string { switch v := a.(type) { diff --git a/tools/convert/templates/cfg_docfield.tmpl b/tools/convert/templates/cfg_docfield.tmpl index 62e0093ec6..168b1a2f6e 100644 --- a/tools/convert/templates/cfg_docfield.tmpl +++ b/tools/convert/templates/cfg_docfield.tmpl @@ -1,11 +1,12 @@ {{- $field := . -}} +{{- println -}} ### `{{ $field.Name }}` -{{ printf "%s %s" $field.Name $field.Summary | wordwrap }} +{{ printf "%s %s" $field.Name $field.Summary }} {{ if $field.Description -}} -{{ $field.Description | wordwrap }} +{{ $field.Description | wrapForDocs }} {{- end -}} {{- println -}} diff --git a/tools/convert/templates/cfg_docfile.tmpl b/tools/convert/templates/cfg_docfile.tmpl index 589fe1f91a..b2d4f227b7 100644 --- a/tools/convert/templates/cfg_docfile.tmpl +++ b/tools/convert/templates/cfg_docfile.tmpl @@ -24,8 +24,7 @@ OTelMetrics: APIKey: SetThisToAHoneycombKey ``` -The remainder of this document describes the sections within the file and the -fields in each. +The remainder of this document describes the sections within the file and the fields in each. ## Table of Contents {{ range $file.Groups -}} diff --git a/tools/convert/templates/cfg_docgroup.tmpl b/tools/convert/templates/cfg_docgroup.tmpl index a6ea3b9836..262452444e 100644 --- a/tools/convert/templates/cfg_docgroup.tmpl +++ b/tools/convert/templates/cfg_docgroup.tmpl @@ -4,7 +4,7 @@ ### Section Name: `{{ $group.Name }}` -{{ $group.Description | wordwrap }} +{{ $group.Description | wrapForDocs }} {{- println -}} {{- println -}} diff --git a/tools/convert/templates/rules_docfield.tmpl b/tools/convert/templates/rules_docfield.tmpl index 3a9db3d648..b3239bb43d 100644 --- a/tools/convert/templates/rules_docfield.tmpl +++ b/tools/convert/templates/rules_docfield.tmpl @@ -1,12 +1,12 @@ {{- $field := . -}} -### {{ $field.Name }} +### `{{ $field.Name }}` {{ if $field.Description -}} -{{ $field.Description | wordwrap }} +{{ $field.Description | wrapForDocs }} {{- else -}} -{{ printf "%s %s" $field.Name $field.Summary | wordwrap }} +{{ printf "%s %s" $field.Name $field.Summary }} {{- end }} diff --git a/tools/convert/templates/rules_docfile.tmpl b/tools/convert/templates/rules_docfile.tmpl index 5bde4bef13..eea67412cf 100644 --- a/tools/convert/templates/rules_docfile.tmpl +++ b/tools/convert/templates/rules_docfile.tmpl @@ -27,24 +27,21 @@ Samplers: Name: `RulesVersion` -This is a required parameter used to verify the version of -the rules file. It must be set to 2. +This is a required parameter used to verify the version of the rules file. +It must be set to 2. Name: `Samplers` -Samplers is a mapping of targets to samplers. Each target is a -Honeycomb environment (or, for classic keys, a dataset). The value is -the sampler to use for that target. The target called `__default__` -will be used for any target that is not explicitly listed. A -`__default__` target is required. -The targets are determined by examining the API key used to send the -trace. If the API key is a 'classic' key (which is a 32-character -hexadecimal value), the specified dataset name is used as the target. -If the API key is a new-style key (20-23 alphanumeric characters), the -key's environment name is used as the target. +Samplers is a mapping of targets to samplers. +Each target is a Honeycomb environment (or, for classic keys, a dataset). +The value is the sampler to use for that target. +The target called `__default__` will be used for any target that is not explicitly listed. +A `__default__` target is required. +The targets are determined by examining the API key used to send the trace. +If the API key is a 'classic' key (which is a 32-character hexadecimal value), the specified dataset name is used as the target. +If the API key is a new-style key (20-23 alphanumeric characters), the key's environment name is used as the target. -The remainder of this document describes the samplers that can be used within -the `Samplers` section and the fields that control their behavior. +The remainder of this document describes the samplers that can be used within the `Samplers` section and the fields that control their behavior. ## Table of Contents {{ range $file.Groups -}} diff --git a/tools/convert/templates/rules_docgroup.tmpl b/tools/convert/templates/rules_docgroup.tmpl index 12749e8055..174a616a2f 100644 --- a/tools/convert/templates/rules_docgroup.tmpl +++ b/tools/convert/templates/rules_docgroup.tmpl @@ -4,7 +4,7 @@ ### Name: `{{ $group.Name }}` -{{ $group.Description | wordwrap }} +{{ $group.Description | wrapForDocs }} {{- println -}} {{- println -}}