Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AvgSampleRateWithMin #123

Closed
lewisf opened this issue Sep 1, 2020 · 0 comments
Closed

Add AvgSampleRateWithMin #123

lewisf opened this issue Sep 1, 2020 · 0 comments
Labels
type: enhancement New feature or request

Comments

@lewisf
Copy link

lewisf commented Sep 1, 2020

I was looking to use this sampler in https://github.com/honeycombio/dynsampler-go -- however it doesn't seem like this is currently recognized by samproxy.

@MikeGoldsmith MikeGoldsmith added the type: enhancement New feature or request label Sep 23, 2021
TylerHelmuth added a commit that referenced this issue Mar 22, 2023
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

- Adds labels to the built images

## Short description of the changes

- adds 3 new labels to the built images
- tested these additions in
honeycombio/honeycomb-kubernetes-agent#354. I
don't really want to run the same steps for Refinery since they are
risky and Refinery's build is more complicated so there is more risk of
publishing an incorrect image.
TylerHelmuth added a commit that referenced this issue Apr 6, 2023
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

- telemetry team is getting tagged on refinery dependabot PRs still

Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
TylerHelmuth added a commit that referenced this issue Jul 11, 2023
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

- Related to #776

## Short description of the changes

- Updates release notes to explicitly callout breaking metric change.
TylerHelmuth added a commit that referenced this issue Jul 14, 2023
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

- the build-docker.sh script has started including the branch name as a
tag, but that breaks CI if the branch name includes a slash (`/`), which
are not allowed in tags
- Unable to run CI against external PRs

## Short description of the changes

- Updated the script to replace `/` with `-` in branch names.
TylerHelmuth added a commit that referenced this issue Jul 14, 2023
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

- external PRs try to publish to ecr but fail because they dont have the
secret

## Short description of the changes

- updated ecr publishing to exclude external PRs
TylerHelmuth added a commit that referenced this issue Jul 14, 2023
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

This week, we've bumped up our number of refinery pods from 250 to 300.
After that change, we've noticed a lot more timeouts from redis.

```
time="2023-07-13T17:49:28Z" level=error msg="redis scan encountered an error" err="redis scan timeout" keys_returned=92 timeoutSec=5s
time="2023-07-13T18:39:41Z" level=error msg="get members failed during watch" error="context deadline exceeded" name="http://host:8081/" oldPeers="[list_of_the_300_hosts:8081]" timeout=5s
```

After digging into some of the code, we've noticed that peers are stored
in redis and are retrieved in batches of 10. Because there are 300
nodes, each node is making 30 requests to redis to get all the peers,
and doing it twice, so 60 total. This is done every 5 seconds. So our
redis instance is handling 18000 requests every 5 seconds.

## Short description of the changes

After a little bit of napkin math, with 300 nodes and an average payload
size of 26 bytes (that's what we see reported by `oldPeers` in our logs,
anyway), redis has within it about 7.8kb of data. That should be very
easily retrieved in a single request and doesn't need to be broken down
in batches of 10.

This PR proposes bumping that up to 1000. That would give us a total
payload of about 26kb. For 1000 nodes, that would also only yield 2000
requests to redis every 5 seconds (total 72mb bandwidth for those 5
seconds, too). Leaving it as is would require 200,000 requests to redis
over 5 seconds for 1000 nodes.

Getting a little extreme, for 1M nodes, that would give total payload of
about 26MB and yield 2000 requests per node, so 2B requests to redis
every 5 seconds for a total of 72GB data transferred (1/1000 of the
total payload 26MB, 2000 times). This might kill the redis, but the data
is still "pretty small".

Fixes #793

---------

Co-authored-by: Kent Quirk <kentquirk@honeycomb.io>
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
kentquirk pushed a commit that referenced this issue Jul 31, 2023
…` validation (#806)

<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Fixes #803 by ensuring
that only both `AvailableMemory` and `MaxAlloc` cannot be set at the
same. `MaxMemoryPercentage` may still be set regardless as it has a
default value defined.

While making this changes I noticed that when only `AvailableMemory` was
defined I ran into a validation `requireWith` error. The validation only
fails on the first pass before defaults are loaded in from the config
struct. If the failure is intentional then we probably should remove the
default -- only breaking change I could see is if someone was running
with validations turned off explicitly.

```bash
 $ docker-compose up
[+] Running 1/0
 ✔ Container refinery-refinery-1  Recreated                                                                   0.0s
Attaching to refinery-refinery-1
refinery-refinery-1  | 2023/07/24 07:57:40 maxprocs: Leaving GOMAXPROCS=6: CPU quota undefined
refinery-refinery-1  | Validation failed for config file /etc/refinery/config.yaml:
refinery-refinery-1  |   the group Collection includes AvailableMemory, which also requires MaxMemoryPercentage
refinery-refinery-1  |
refinery-refinery-1  |
refinery-refinery-1 exited with code 1
$ cat config.yaml 
General:
  ConfigurationVersion: 2
  MinRefineryVersion: v2.0


Collection:
  AvailableMemory: 2Gb

```



## Short description of the changes

- Adds `conflictsWith` validation type acting in an opposite way to
`requireWith`
- Modified `requireWith` to check whether or not the required field has
a default value

Fixes #803.
kentquirk added a commit that referenced this issue Dec 13, 2023
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

## Short description of the changes

Starting from Go 1.17, we can use `t.Setenv` to set environment variable
in test. The environment variable is automatically restored to its
original value when the test and all its subtests complete. This ensures
that each test does not start with leftover environment variables from
previous completed tests.

Reference: https://pkg.go.dev/testing#T.Setenv

```go
func TestFoo(t *testing.T) {
	// before
	os.Setenv(key, "new value")
	defer os.Unsetenv(key)
	
	// after
	t.Setenv(key, "new value")
}
```

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Co-authored-by: Kent Quirk <kentquirk@honeycomb.io>
VinozzZ added a commit that referenced this issue Feb 28, 2024
…1014)

<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?
- Keep state
- this PR makes sure that the keep decision is stored as part of the
state transition for a given trace
- It also adds logic to remove spans information for a given trace when
a decision is made
- Timestamp
    - we nee record the current time whenever a state change happens.
- the precision used before is not sufficient to determine the order of
state change events
    - this PR changes the timestamp precision from second to nanosecond

## Short description of the changes

- Modified `KeepTrace`
- return traceIDs that has successfully changed their states from
`toNextState` function
- added more tests
VinozzZ added a commit that referenced this issue Feb 29, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

This PR enables the simulator to be able to run with a redis backend 

## Short description of the changes

- added opts.HnyEndpoint to enable sending traces to other honeycomb
environments
- redis returns the status field as `float64`. Handle that scenario in
the decider logic.
VinozzZ added a commit that referenced this issue Mar 1, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Before this PR, decisions made for traces were not stored in redis. It
also wasn't part of the state machine when transition a trace from
awaiting for decision to either decision keep or drop.

## Short description of the changes

- Introduced DecisionDrop and DecisionKeep into the state machine
- check against in-memory cache before fetching from Redis for trace
status
- run smartwrapper tests against redis in the ci
- removed some unnecessary code
VinozzZ added a commit that referenced this issue Mar 4, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

- allow running decider in parallel
- tracing support for redis_store
- one round trip per trace state transition

## Short description of the changes

- parallelize `GetTrace` call
- pass tracer into redis store
- running one lua script for a trace state transition

[Result honeycomb query

](https://ui.honeycomb.io/island-of-misfit-toys/environments/dynamic-scaling-refinery/datasets/refinery-store-test/result/AGisWg8rn49)
VinozzZ added a commit that referenced this issue Mar 11, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

we are using `url.Parse` for peer address list. `host:port` is not a
valid url.

## Short description of the changes

- Change validation type for `PeerManagement.Peers` to `url` instead of
`hostport`
- After this change, the convert tool will always convert `host:port` to
`http://host:port`

fix: #1044
kentquirk pushed a commit that referenced this issue Mar 15, 2024
…1014)

<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?
- Keep state
- this PR makes sure that the keep decision is stored as part of the
state transition for a given trace
- It also adds logic to remove spans information for a given trace when
a decision is made
- Timestamp
    - we nee record the current time whenever a state change happens.
- the precision used before is not sufficient to determine the order of
state change events
    - this PR changes the timestamp precision from second to nanosecond

## Short description of the changes

- Modified `KeepTrace`
- return traceIDs that has successfully changed their states from
`toNextState` function
- added more tests
kentquirk pushed a commit that referenced this issue Mar 15, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

This PR enables the simulator to be able to run with a redis backend 

## Short description of the changes

- added opts.HnyEndpoint to enable sending traces to other honeycomb
environments
- redis returns the status field as `float64`. Handle that scenario in
the decider logic.
kentquirk pushed a commit that referenced this issue Mar 15, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Before this PR, decisions made for traces were not stored in redis. It
also wasn't part of the state machine when transition a trace from
awaiting for decision to either decision keep or drop.

## Short description of the changes

- Introduced DecisionDrop and DecisionKeep into the state machine
- check against in-memory cache before fetching from Redis for trace
status
- run smartwrapper tests against redis in the ci
- removed some unnecessary code
kentquirk pushed a commit that referenced this issue Mar 15, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

- allow running decider in parallel
- tracing support for redis_store
- one round trip per trace state transition

## Short description of the changes

- parallelize `GetTrace` call
- pass tracer into redis store
- running one lua script for a trace state transition

[Result honeycomb query

](https://ui.honeycomb.io/island-of-misfit-toys/environments/dynamic-scaling-refinery/datasets/refinery-store-test/result/AGisWg8rn49)
tdarwin pushed a commit to tdarwin/refinery that referenced this issue Jun 26, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes honeycombio#123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

GetTrace was costing us a roundtrip to redis for each traces. It's not
as performant as we want it to be.

## Short description of the changes

- Use pipelining to send all GetTrace request in a single batch
- change encoding method from json to snappy for trace metadata and span
data
tdarwin pushed a commit to tdarwin/refinery that referenced this issue Jun 26, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes honeycombio#123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

Optimize redis query performance by using pipelining as suggested by
redis doc https://redis.io/docs/latest/develop/use/pipelining/

- Implement pipelining for `HGETALL` command
- Implement `FanoutChunksToMap` to allow using fanout pattern for batch
processing

---------

Co-authored-by: Kent Quirk <kentquirk@honeycomb.io>
tdarwin pushed a commit to tdarwin/refinery that referenced this issue Jun 26, 2024
…oneycombio#1188)

<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes honeycombio#123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

We found the keep trace lua script in redis' slow query log. Turns out,
we actually don't need to store those fields separately since they don't
change once they are set. By storing them together, we can eliminate the
need for using lua script.

## Short description of the changes

- Define a `KeepRecord` field in the data structure we use to store
trace status in redis
- encode `centralTraceStatusReason` into one json blob
- change redis to allow easier pipelining commands

TODO:
Since we have changed most of our redis_store code to use pipelining
when communicating with redis, I will make another PR to clean up the
redis.go code that enables a easier interface for using pipelining
tdarwin pushed a commit to tdarwin/refinery that referenced this issue Jun 26, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes honeycombio#123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

We have converted a lot of code to use pipelining pattern for
communication with Redis. This PR is to allow the caller of the redis
package to have an easier time to use the pattern

## Short description of the changes

- return a command object from each redis call so that the caller has
control whether to buffer or send a query to redis
- remove unused code
VinozzZ added a commit that referenced this issue Jul 2, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Enable otel tracing for debugging Refinery internal operations.

## Short description of the changes

- Add otel tracing configurations
- inject a tracer on Refinery startup
- Update `github.com/honeycombio/opentelemetry-proto-go/otlp` to v1.3.1

---------

Co-authored-by: Kent Quirk <kentquirk@honeycomb.io>
VinozzZ added a commit that referenced this issue Jul 3, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?
`OtelTracing.APIKey` is specifically for setting a honeycomb api key and
header when setting up a tracer. When it's not set and
`OtelTracing.Enabled` is set true, Refinery should still set up a tracer
that can be used to send Refinery traces to non-honeycomb backend.

## Short description of the changes

- Only set APIKey headers when APIKey is not empty
- add some basic instrumentation
VinozzZ added a commit that referenced this issue Jul 17, 2024
…1228)

<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

This dockerfile is used only for local development. Now most of us who
are actively working on this project are on ARM machines. Let's make it
easier to test refinery locally

## Short description of the changes

- Change targeted architecture to ARM
VinozzZ added a commit that referenced this issue Jul 18, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Add instrumentation for GoRedisPubSub to gain more insight on how the
redis pubsub system works

## Short description of the changes

- add tracer to `GoRedisPubSub`
- fix tests
- define a type for the pubsub subscription callback function
VinozzZ added a commit that referenced this issue Jul 18, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Refinery panics on start
 
## Short description of the changes

- inject `Health` as a pointer
VinozzZ added a commit that referenced this issue Jul 22, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

The maintenance effort required to keep semantic convention in sync
between `otel-config-go` and `otel` has led us to remove
`otel-config-go` as a dependency

## Short description of the changes

- configure otel using vanilla otel sdk
VinozzZ added a commit that referenced this issue Jul 23, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Enable stress relief management for Refinery cluster as a whole rather
than only for individual instances.

## Short description of the changes

- broadcast individual stress level through pub/sub to other peers
- calculate cluster stress level when individual cluster level changes
or on a timer
- during stress relief, deterministic sampling for a portion of traces
based on current cluster stress level
- add tests
VinozzZ added a commit that referenced this issue Jul 23, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Make sure subscriber and publisher uses the same channel for stress
relief

## Short description of the changes

- define a constant for stress relief pub/sub topic
- use the constant in test and stress relief code
kentquirk pushed a commit that referenced this issue Jul 24, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Config and PubSub needs injection tags for the injection framework

## Short description of the changes

- add injection tag
VinozzZ added a commit that referenced this issue Jul 24, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

We don't need to publish stress level if it hasn't changed. This will
help reduce the amount of pub/sub messages being transmitted in the
cluster

## Short description of the changes

- made `stressReliefMessage` struct more memory efficient
- add a counter for tracking how long it has been since last stress
level publish
VinozzZ added a commit that referenced this issue Jul 24, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

update doc with the latest config changes

## Short description of the changes

-

---------

Co-authored-by: Kent Quirk <kentquirk@honeycomb.io>
VinozzZ added a commit that referenced this issue Jul 25, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

`DeterministicSharder` wasn't updated to use the new peer identification
system to determine which peer a given trace belongs to. This resulted
Refinery couldn't identify itself in the peer list within the sharding
system.

## Short description of the changes

- Consolidate and unify code for retrieving peer identity
- replace `:` with `|` as the delimiter in stress relief message 
- make sure `filepeer` returns `PeerListenAddr` as its public address
VinozzZ added a commit that referenced this issue Jul 25, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Accidentally left a print line from the previous PR

## Short description of the changes

remove `fmt.Println`
VinozzZ added a commit that referenced this issue Jul 26, 2024
…icant load increase (#1256)

<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Without resolving trace locality issue, a single peer can receive a
large trace that significantly raise its stress level than the rest of
the cluster. To address this issue, we can allow individual refineries
to go into stress relief mode if their own stress is too high, even if
the cluster's isn't.

## Short description of the changes

- use the max of the individual stress level and the cluster stress
level as the overall stress level when calculating stress relief
activation and deactivation
- record the stress level that determined stress relief activation as
`stress_level` metric
- add tests
VinozzZ added a commit that referenced this issue Aug 1, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

In order to recalculate new destination for traces during shutdown as
soon as possible, we need to announce peer unregistration first thing.

## Short description of the changes

- use the `done` channel from catching the termination signal in main.go
in peer management

---------

Co-authored-by: Yingrong Zhao <yingrongzhao@yingrongzhao.attlocal.net>
VinozzZ added a commit that referenced this issue Aug 9, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

When a node leaves the cluster, gossip the news and then calculate the
new destination and forward traces that node was holding on to. This
means that removing a node from the refinery cluster will be much less
disruptive than it is today.

## Short description of the changes

- During shutdown, the collector first to stop accepting new traces by
existing from the `collect()` function.
- Traces are distributed to three go channels based on their state
      - late spans
- traces that have already received root span or it has expired from its
TraceTimeout
      - traces that can't have their decisions made yet
- A goroutine is created to grab items off of the three go channels. For
the late spans, expired traces, or traces with root span, collector will
send them to honeycomb. For traces that don't have decisions yet,
collector will forward them to their new home(another refinery peer)

I also added unit tests for the distribution logic as well as a counter
to monitor the amount of traces that goes through each state.

address #1204

---------

Co-authored-by: Yingrong Zhao <yingrongzhao@yingrongzhao.attlocal.net>
VinozzZ added a commit that referenced this issue Aug 14, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Draining incoming and peer queue on peer membership changes
#1258 

## Short description of the changes

A `RedistributeNotifier` is introduced in the collector in this PR. 
- It's initialized on refinery start up
- When a peer membership change is broadcasted, the notifier immediately
signals the `collect()` goroutine to start the redistribution process
- When multiple peer membership changes happen in a row, it ignores the
signal if there's already a redistribution process running
VinozzZ added a commit that referenced this issue Aug 14, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Sample rate is not recorded in trace decision cache during stress
relief. This could cause weird sample rate data for spans arrived after
refinery has exited stress relief mode
fix: #1271 

## Short description of the changes

- record sample rate in decision cache
- add tests
- make sure collector uses clockwork.Clock in all places
VinozzZ added a commit that referenced this issue Aug 15, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

To reduce the cardinality in sampler keys for dynamic sampler in
Refinery, it's helpful to be able to only select fields on root spans.

fixes #1130 

## Short description of the changes

- parse field list on dynamic sampler startup to see if there's `root.`
in field list
- store fields for root span only separately from fields applies to all
spans in `traceKey`
- loop through both field list to make sure keys are correctly populated
VinozzZ added a commit that referenced this issue Aug 15, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

When in peer mode, there could be a situation that the load is low
enough for just one refinery instance. We should allow a scale down
event to happen even if there's only one refinery left

## Short description of the changes

- change the condition to run redistribution logic to as long as there's
1 peer in the peer list
- add `trace_redistribution_count` metric to track how often a
redistribution event has been triggered.
VinozzZ added a commit that referenced this issue Aug 16, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Currently, we are not setting the tls config for go-redis even through
we do have `UseTLS` configuration option.


## Short description of the changes

- log the pubsub publish error
- configure redis client tls
VinozzZ added a commit that referenced this issue Aug 19, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

We should not set send reason to span limit if users didn't specify the
span limit config

## Short description of the changes

- only set send reason to span limit if span limit config is non-zero
VinozzZ added a commit that referenced this issue Aug 19, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

Currently, we are not setting the tls config for go-redis even through
we do have `UseTLS` configuration option.

- log the pubsub publish error
- configure redis client tls
VinozzZ added a commit that referenced this issue Aug 19, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

Currently, we are not setting the tls config for go-redis even through
we do have `UseTLS` configuration option.

- log the pubsub publish error
- configure redis client tls
VinozzZ added a commit that referenced this issue Aug 22, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

To support Redis instances in cluster mode, we need to provide a way for
users to configure Refinery accordingly.
fixes #1286 

## Short description of the changes

- Add a new config option, `ClusterHosts`, under `RedisPeerManagement` 
- Consolidate `RedisPeerManagementConfig` getters into one
- Add support for `env-delim:","` in `CmdEnv` 
- Add tests
VinozzZ added a commit that referenced this issue Aug 22, 2024
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
  - Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes #123")

Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->

## Which problem is this PR solving?

Refinery converts SpanEvent and SpanLink to individual events just like
regular spans. We should use DescendantCount to check against the
SpanLimit instead of only spans.

## Short description of the changes

- use DescendantCount to check against SpanLimit
- modify test to explicitly check for span limit trace send reason
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants