Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rel: prep 2.7 release #1255

Merged
merged 12 commits into from
Jul 29, 2024
58 changes: 57 additions & 1 deletion RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,65 @@

While [CHANGELOG.md](./CHANGELOG.md) contains detailed documentation and links to all the source code changes in a given release, this document is intended to be aimed at a more comprehensible version of the contents of the release from the point of view of users of Refinery.

## Version 2.7.0

This release is a transitional release, laying the groundwork for substantial future changes to Refinery.
kentquirk marked this conversation as resolved.
Show resolved Hide resolved

### Publish/Subscribe on Redis
In this release, Redis is no longer a database for storing a list of peers.
Instead, it is used as a more general publish/subscribe framework for rapidly sharing information between nodes in the cluster.
Things that are shared with this connection are:

- Peer membership
- Stress levels
- News of configuration changes

Because of this mechanism, Refinery will now react more quickly to changes in any of these factors.
When one node detects a configuration change, all of its peers will be told about it immediately.

In addition, Refinery now publishes individual stress levels between peers.
Nodes calculate a cluster stress level as a weighted average (with nodes that are more stressed getting more weight).
If an individual node is stressed, it can enter stress relief individually.
This may happen, for example, when a single giant trace is concentrated on one node.
If the cluster as a whole is being stressed by a general burst in traffic, the entire cluster should now enter or leave stress relief at approximately the same time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that you could expect to see a rise in cross-node network traffic, or do we expect it to be marginal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might see the impact of pubsub traffic to and from redis, but it's pretty small; we mostly only publish new values when they're different.


### Health checks now include both liveness and readiness

Refinery has always had only a liveness check on `/alive`, which always simply returned ok.

Starting with this release, Refinery now supports both `/alive` and `/ready`, which are based on internal status reporting.

The liveness check is alive whenever Refinery is awake and internal systems are functional.
It will return a failure if any of the monitored systems fail to report in time.

The readiness check returns ready whenever the monitored systems indicate readiness.
It will return a failure if any internal system returns not ready.
This is usually used to indicate to a load balancer that no new traffic should go to this node.
In this release, this will only happen when a Refinery node is shutting down.

### Metrics changes
There have also been some minor changes to metrics in this release:

We hae two new metrics called `individual_stress_level` (the stress level as seen by a single node) and `cluster_stress_level` (the aggregated cluster level).
kentquirk marked this conversation as resolved.
Show resolved Hide resolved
The `stress_level` metric indicates the maximum of the two values; it is this value which is used to determine whether an individual node activates stress relief.

There is also a new pair of metrics, `config_hash` and `rule_config_hash`.
These are numeric Gauge metrics that are set to the numeric value of the last 4 hex digits of the hash of the current config files.
These can be used to track that all refineries are using the same configuration file.

### Disabling Redis and using a static list of peers
Specifying `PeerManagement.Type=file` will cause Refinery to use the fixed list of peers found in the configuration.
This means that Refinery will operate without sharing changes to peers, stress, or configuration, as it has in previous releases.

kentquirk marked this conversation as resolved.
Show resolved Hide resolved
### Config Change notifications
When deploying a cluster in Kubernetes, it is often the case that configurations are managed as a ConfigMap.
In the default setup, ConfigMaps are eventually consistent.
This may mean that one Refinery node will detect a configuration change and broadcast news of it, but a different node that receives the news will attempt to read the data and get the previous configuration.
In this situation, the change will still be detected by all Refineries within the `ConfigReloadInterval`.

## Version 2.6.1

This is a bug fix release.
This is a bug fix release.
In the log handling logic newly introduced in v2.6.0, Refinery would incorrectly consider log events to be root spans in a trace.
After this fix, log events can never be root spans.
This is recommended for everyone who wants to use the new log handling capabilities.
Expand Down
Loading