-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AvgSampleRateWithMin #123
Labels
type: enhancement
New feature or request
Comments
TylerHelmuth
added a commit
that referenced
this issue
Mar 22, 2023
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? - Adds labels to the built images ## Short description of the changes - adds 3 new labels to the built images - tested these additions in honeycombio/honeycomb-kubernetes-agent#354. I don't really want to run the same steps for Refinery since they are risky and Refinery's build is more complicated so there is more risk of publishing an incorrect image.
TylerHelmuth
added a commit
that referenced
this issue
Apr 6, 2023
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? - telemetry team is getting tagged on refinery dependabot PRs still Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
TylerHelmuth
added a commit
that referenced
this issue
Jul 11, 2023
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? - Related to #776 ## Short description of the changes - Updates release notes to explicitly callout breaking metric change.
TylerHelmuth
added a commit
that referenced
this issue
Jul 14, 2023
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? - the build-docker.sh script has started including the branch name as a tag, but that breaks CI if the branch name includes a slash (`/`), which are not allowed in tags - Unable to run CI against external PRs ## Short description of the changes - Updated the script to replace `/` with `-` in branch names.
TylerHelmuth
added a commit
that referenced
this issue
Jul 14, 2023
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? - external PRs try to publish to ecr but fail because they dont have the secret ## Short description of the changes - updated ecr publishing to exclude external PRs
TylerHelmuth
added a commit
that referenced
this issue
Jul 14, 2023
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? This week, we've bumped up our number of refinery pods from 250 to 300. After that change, we've noticed a lot more timeouts from redis. ``` time="2023-07-13T17:49:28Z" level=error msg="redis scan encountered an error" err="redis scan timeout" keys_returned=92 timeoutSec=5s time="2023-07-13T18:39:41Z" level=error msg="get members failed during watch" error="context deadline exceeded" name="http://host:8081/" oldPeers="[list_of_the_300_hosts:8081]" timeout=5s ``` After digging into some of the code, we've noticed that peers are stored in redis and are retrieved in batches of 10. Because there are 300 nodes, each node is making 30 requests to redis to get all the peers, and doing it twice, so 60 total. This is done every 5 seconds. So our redis instance is handling 18000 requests every 5 seconds. ## Short description of the changes After a little bit of napkin math, with 300 nodes and an average payload size of 26 bytes (that's what we see reported by `oldPeers` in our logs, anyway), redis has within it about 7.8kb of data. That should be very easily retrieved in a single request and doesn't need to be broken down in batches of 10. This PR proposes bumping that up to 1000. That would give us a total payload of about 26kb. For 1000 nodes, that would also only yield 2000 requests to redis every 5 seconds (total 72mb bandwidth for those 5 seconds, too). Leaving it as is would require 200,000 requests to redis over 5 seconds for 1000 nodes. Getting a little extreme, for 1M nodes, that would give total payload of about 26MB and yield 2000 requests per node, so 2B requests to redis every 5 seconds for a total of 72GB data transferred (1/1000 of the total payload 26MB, 2000 times). This might kill the redis, but the data is still "pretty small". Fixes #793 --------- Co-authored-by: Kent Quirk <kentquirk@honeycomb.io> Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
kentquirk
pushed a commit
that referenced
this issue
Jul 31, 2023
…` validation (#806) <!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Fixes #803 by ensuring that only both `AvailableMemory` and `MaxAlloc` cannot be set at the same. `MaxMemoryPercentage` may still be set regardless as it has a default value defined. While making this changes I noticed that when only `AvailableMemory` was defined I ran into a validation `requireWith` error. The validation only fails on the first pass before defaults are loaded in from the config struct. If the failure is intentional then we probably should remove the default -- only breaking change I could see is if someone was running with validations turned off explicitly. ```bash $ docker-compose up [+] Running 1/0 ✔ Container refinery-refinery-1 Recreated 0.0s Attaching to refinery-refinery-1 refinery-refinery-1 | 2023/07/24 07:57:40 maxprocs: Leaving GOMAXPROCS=6: CPU quota undefined refinery-refinery-1 | Validation failed for config file /etc/refinery/config.yaml: refinery-refinery-1 | the group Collection includes AvailableMemory, which also requires MaxMemoryPercentage refinery-refinery-1 | refinery-refinery-1 | refinery-refinery-1 exited with code 1 $ cat config.yaml General: ConfigurationVersion: 2 MinRefineryVersion: v2.0 Collection: AvailableMemory: 2Gb ``` ## Short description of the changes - Adds `conflictsWith` validation type acting in an opposite way to `requireWith` - Modified `requireWith` to check whether or not the required field has a default value Fixes #803.
kentquirk
added a commit
that referenced
this issue
Dec 13, 2023
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? ## Short description of the changes Starting from Go 1.17, we can use `t.Setenv` to set environment variable in test. The environment variable is automatically restored to its original value when the test and all its subtests complete. This ensures that each test does not start with leftover environment variables from previous completed tests. Reference: https://pkg.go.dev/testing#T.Setenv ```go func TestFoo(t *testing.T) { // before os.Setenv(key, "new value") defer os.Unsetenv(key) // after t.Setenv(key, "new value") } ``` Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> Co-authored-by: Kent Quirk <kentquirk@honeycomb.io>
VinozzZ
added a commit
that referenced
this issue
Feb 28, 2024
…1014) <!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? - Keep state - this PR makes sure that the keep decision is stored as part of the state transition for a given trace - It also adds logic to remove spans information for a given trace when a decision is made - Timestamp - we nee record the current time whenever a state change happens. - the precision used before is not sufficient to determine the order of state change events - this PR changes the timestamp precision from second to nanosecond ## Short description of the changes - Modified `KeepTrace` - return traceIDs that has successfully changed their states from `toNextState` function - added more tests
VinozzZ
added a commit
that referenced
this issue
Feb 29, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? This PR enables the simulator to be able to run with a redis backend ## Short description of the changes - added opts.HnyEndpoint to enable sending traces to other honeycomb environments - redis returns the status field as `float64`. Handle that scenario in the decider logic.
VinozzZ
added a commit
that referenced
this issue
Mar 1, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Before this PR, decisions made for traces were not stored in redis. It also wasn't part of the state machine when transition a trace from awaiting for decision to either decision keep or drop. ## Short description of the changes - Introduced DecisionDrop and DecisionKeep into the state machine - check against in-memory cache before fetching from Redis for trace status - run smartwrapper tests against redis in the ci - removed some unnecessary code
VinozzZ
added a commit
that referenced
this issue
Mar 4, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? - allow running decider in parallel - tracing support for redis_store - one round trip per trace state transition ## Short description of the changes - parallelize `GetTrace` call - pass tracer into redis store - running one lua script for a trace state transition [Result honeycomb query ](https://ui.honeycomb.io/island-of-misfit-toys/environments/dynamic-scaling-refinery/datasets/refinery-store-test/result/AGisWg8rn49)
VinozzZ
added a commit
that referenced
this issue
Mar 11, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? we are using `url.Parse` for peer address list. `host:port` is not a valid url. ## Short description of the changes - Change validation type for `PeerManagement.Peers` to `url` instead of `hostport` - After this change, the convert tool will always convert `host:port` to `http://host:port` fix: #1044
kentquirk
pushed a commit
that referenced
this issue
Mar 15, 2024
…1014) <!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? - Keep state - this PR makes sure that the keep decision is stored as part of the state transition for a given trace - It also adds logic to remove spans information for a given trace when a decision is made - Timestamp - we nee record the current time whenever a state change happens. - the precision used before is not sufficient to determine the order of state change events - this PR changes the timestamp precision from second to nanosecond ## Short description of the changes - Modified `KeepTrace` - return traceIDs that has successfully changed their states from `toNextState` function - added more tests
kentquirk
pushed a commit
that referenced
this issue
Mar 15, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? This PR enables the simulator to be able to run with a redis backend ## Short description of the changes - added opts.HnyEndpoint to enable sending traces to other honeycomb environments - redis returns the status field as `float64`. Handle that scenario in the decider logic.
kentquirk
pushed a commit
that referenced
this issue
Mar 15, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Before this PR, decisions made for traces were not stored in redis. It also wasn't part of the state machine when transition a trace from awaiting for decision to either decision keep or drop. ## Short description of the changes - Introduced DecisionDrop and DecisionKeep into the state machine - check against in-memory cache before fetching from Redis for trace status - run smartwrapper tests against redis in the ci - removed some unnecessary code
kentquirk
pushed a commit
that referenced
this issue
Mar 15, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? - allow running decider in parallel - tracing support for redis_store - one round trip per trace state transition ## Short description of the changes - parallelize `GetTrace` call - pass tracer into redis store - running one lua script for a trace state transition [Result honeycomb query ](https://ui.honeycomb.io/island-of-misfit-toys/environments/dynamic-scaling-refinery/datasets/refinery-store-test/result/AGisWg8rn49)
tdarwin
pushed a commit
to tdarwin/refinery
that referenced
this issue
Jun 26, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes honeycombio#123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? GetTrace was costing us a roundtrip to redis for each traces. It's not as performant as we want it to be. ## Short description of the changes - Use pipelining to send all GetTrace request in a single batch - change encoding method from json to snappy for trace metadata and span data
tdarwin
pushed a commit
to tdarwin/refinery
that referenced
this issue
Jun 26, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes honeycombio#123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> Optimize redis query performance by using pipelining as suggested by redis doc https://redis.io/docs/latest/develop/use/pipelining/ - Implement pipelining for `HGETALL` command - Implement `FanoutChunksToMap` to allow using fanout pattern for batch processing --------- Co-authored-by: Kent Quirk <kentquirk@honeycomb.io>
tdarwin
pushed a commit
to tdarwin/refinery
that referenced
this issue
Jun 26, 2024
…oneycombio#1188) <!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes honeycombio#123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? We found the keep trace lua script in redis' slow query log. Turns out, we actually don't need to store those fields separately since they don't change once they are set. By storing them together, we can eliminate the need for using lua script. ## Short description of the changes - Define a `KeepRecord` field in the data structure we use to store trace status in redis - encode `centralTraceStatusReason` into one json blob - change redis to allow easier pipelining commands TODO: Since we have changed most of our redis_store code to use pipelining when communicating with redis, I will make another PR to clean up the redis.go code that enables a easier interface for using pipelining
tdarwin
pushed a commit
to tdarwin/refinery
that referenced
this issue
Jun 26, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes honeycombio#123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? We have converted a lot of code to use pipelining pattern for communication with Redis. This PR is to allow the caller of the redis package to have an easier time to use the pattern ## Short description of the changes - return a command object from each redis call so that the caller has control whether to buffer or send a query to redis - remove unused code
VinozzZ
added a commit
that referenced
this issue
Jul 2, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Enable otel tracing for debugging Refinery internal operations. ## Short description of the changes - Add otel tracing configurations - inject a tracer on Refinery startup - Update `github.com/honeycombio/opentelemetry-proto-go/otlp` to v1.3.1 --------- Co-authored-by: Kent Quirk <kentquirk@honeycomb.io>
VinozzZ
added a commit
that referenced
this issue
Jul 3, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? `OtelTracing.APIKey` is specifically for setting a honeycomb api key and header when setting up a tracer. When it's not set and `OtelTracing.Enabled` is set true, Refinery should still set up a tracer that can be used to send Refinery traces to non-honeycomb backend. ## Short description of the changes - Only set APIKey headers when APIKey is not empty - add some basic instrumentation
VinozzZ
added a commit
that referenced
this issue
Jul 17, 2024
…1228) <!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? This dockerfile is used only for local development. Now most of us who are actively working on this project are on ARM machines. Let's make it easier to test refinery locally ## Short description of the changes - Change targeted architecture to ARM
VinozzZ
added a commit
that referenced
this issue
Jul 18, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Add instrumentation for GoRedisPubSub to gain more insight on how the redis pubsub system works ## Short description of the changes - add tracer to `GoRedisPubSub` - fix tests - define a type for the pubsub subscription callback function
VinozzZ
added a commit
that referenced
this issue
Jul 18, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Refinery panics on start ## Short description of the changes - inject `Health` as a pointer
VinozzZ
added a commit
that referenced
this issue
Jul 22, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? The maintenance effort required to keep semantic convention in sync between `otel-config-go` and `otel` has led us to remove `otel-config-go` as a dependency ## Short description of the changes - configure otel using vanilla otel sdk
VinozzZ
added a commit
that referenced
this issue
Jul 23, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Enable stress relief management for Refinery cluster as a whole rather than only for individual instances. ## Short description of the changes - broadcast individual stress level through pub/sub to other peers - calculate cluster stress level when individual cluster level changes or on a timer - during stress relief, deterministic sampling for a portion of traces based on current cluster stress level - add tests
VinozzZ
added a commit
that referenced
this issue
Jul 23, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Make sure subscriber and publisher uses the same channel for stress relief ## Short description of the changes - define a constant for stress relief pub/sub topic - use the constant in test and stress relief code
kentquirk
pushed a commit
that referenced
this issue
Jul 24, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Config and PubSub needs injection tags for the injection framework ## Short description of the changes - add injection tag
VinozzZ
added a commit
that referenced
this issue
Jul 24, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? We don't need to publish stress level if it hasn't changed. This will help reduce the amount of pub/sub messages being transmitted in the cluster ## Short description of the changes - made `stressReliefMessage` struct more memory efficient - add a counter for tracking how long it has been since last stress level publish
VinozzZ
added a commit
that referenced
this issue
Jul 24, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? update doc with the latest config changes ## Short description of the changes - --------- Co-authored-by: Kent Quirk <kentquirk@honeycomb.io>
VinozzZ
added a commit
that referenced
this issue
Jul 25, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? `DeterministicSharder` wasn't updated to use the new peer identification system to determine which peer a given trace belongs to. This resulted Refinery couldn't identify itself in the peer list within the sharding system. ## Short description of the changes - Consolidate and unify code for retrieving peer identity - replace `:` with `|` as the delimiter in stress relief message - make sure `filepeer` returns `PeerListenAddr` as its public address
VinozzZ
added a commit
that referenced
this issue
Jul 25, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Accidentally left a print line from the previous PR ## Short description of the changes remove `fmt.Println`
VinozzZ
added a commit
that referenced
this issue
Jul 26, 2024
…icant load increase (#1256) <!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Without resolving trace locality issue, a single peer can receive a large trace that significantly raise its stress level than the rest of the cluster. To address this issue, we can allow individual refineries to go into stress relief mode if their own stress is too high, even if the cluster's isn't. ## Short description of the changes - use the max of the individual stress level and the cluster stress level as the overall stress level when calculating stress relief activation and deactivation - record the stress level that determined stress relief activation as `stress_level` metric - add tests
VinozzZ
added a commit
that referenced
this issue
Aug 1, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? In order to recalculate new destination for traces during shutdown as soon as possible, we need to announce peer unregistration first thing. ## Short description of the changes - use the `done` channel from catching the termination signal in main.go in peer management --------- Co-authored-by: Yingrong Zhao <yingrongzhao@yingrongzhao.attlocal.net>
VinozzZ
added a commit
that referenced
this issue
Aug 9, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? When a node leaves the cluster, gossip the news and then calculate the new destination and forward traces that node was holding on to. This means that removing a node from the refinery cluster will be much less disruptive than it is today. ## Short description of the changes - During shutdown, the collector first to stop accepting new traces by existing from the `collect()` function. - Traces are distributed to three go channels based on their state - late spans - traces that have already received root span or it has expired from its TraceTimeout - traces that can't have their decisions made yet - A goroutine is created to grab items off of the three go channels. For the late spans, expired traces, or traces with root span, collector will send them to honeycomb. For traces that don't have decisions yet, collector will forward them to their new home(another refinery peer) I also added unit tests for the distribution logic as well as a counter to monitor the amount of traces that goes through each state. address #1204 --------- Co-authored-by: Yingrong Zhao <yingrongzhao@yingrongzhao.attlocal.net>
VinozzZ
added a commit
that referenced
this issue
Aug 14, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Draining incoming and peer queue on peer membership changes #1258 ## Short description of the changes A `RedistributeNotifier` is introduced in the collector in this PR. - It's initialized on refinery start up - When a peer membership change is broadcasted, the notifier immediately signals the `collect()` goroutine to start the redistribution process - When multiple peer membership changes happen in a row, it ignores the signal if there's already a redistribution process running
VinozzZ
added a commit
that referenced
this issue
Aug 14, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Sample rate is not recorded in trace decision cache during stress relief. This could cause weird sample rate data for spans arrived after refinery has exited stress relief mode fix: #1271 ## Short description of the changes - record sample rate in decision cache - add tests - make sure collector uses clockwork.Clock in all places
VinozzZ
added a commit
that referenced
this issue
Aug 15, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? To reduce the cardinality in sampler keys for dynamic sampler in Refinery, it's helpful to be able to only select fields on root spans. fixes #1130 ## Short description of the changes - parse field list on dynamic sampler startup to see if there's `root.` in field list - store fields for root span only separately from fields applies to all spans in `traceKey` - loop through both field list to make sure keys are correctly populated
VinozzZ
added a commit
that referenced
this issue
Aug 15, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? When in peer mode, there could be a situation that the load is low enough for just one refinery instance. We should allow a scale down event to happen even if there's only one refinery left ## Short description of the changes - change the condition to run redistribution logic to as long as there's 1 peer in the peer list - add `trace_redistribution_count` metric to track how often a redistribution event has been triggered.
VinozzZ
added a commit
that referenced
this issue
Aug 16, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Currently, we are not setting the tls config for go-redis even through we do have `UseTLS` configuration option. ## Short description of the changes - log the pubsub publish error - configure redis client tls
VinozzZ
added a commit
that referenced
this issue
Aug 19, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? We should not set send reason to span limit if users didn't specify the span limit config ## Short description of the changes - only set send reason to span limit if span limit config is non-zero
VinozzZ
added a commit
that referenced
this issue
Aug 19, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> Currently, we are not setting the tls config for go-redis even through we do have `UseTLS` configuration option. - log the pubsub publish error - configure redis client tls
VinozzZ
added a commit
that referenced
this issue
Aug 19, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> Currently, we are not setting the tls config for go-redis even through we do have `UseTLS` configuration option. - log the pubsub publish error - configure redis client tls
VinozzZ
added a commit
that referenced
this issue
Aug 22, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? To support Redis instances in cluster mode, we need to provide a way for users to configure Refinery accordingly. fixes #1286 ## Short description of the changes - Add a new config option, `ClusterHosts`, under `RedisPeerManagement` - Consolidate `RedisPeerManagementConfig` getters into one - Add support for `env-delim:","` in `CmdEnv` - Add tests
VinozzZ
added a commit
that referenced
this issue
Aug 22, 2024
<!-- Thank you for contributing to the project! 💜 Please make sure to: - Chat with us first if this is a big change - Open a new issue (or comment on an existing one) - We want to make sure you don't spend time implementing something we might have to say No to - Add unit tests - Mention any relevant issues in the PR description (e.g. "Fixes #123") Please see our [OSS process document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#) to get an idea of how we operate. --> ## Which problem is this PR solving? Refinery converts SpanEvent and SpanLink to individual events just like regular spans. We should use DescendantCount to check against the SpanLimit instead of only spans. ## Short description of the changes - use DescendantCount to check against SpanLimit - modify test to explicitly check for span limit trace send reason
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I was looking to use this sampler in https://github.com/honeycombio/dynsampler-go -- however it doesn't seem like this is currently recognized by samproxy.
The text was updated successfully, but these errors were encountered: