Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud support #521

Merged
merged 8 commits into from
Oct 6, 2022
Merged

Cloud support #521

merged 8 commits into from
Oct 6, 2022

Conversation

Baliedge
Copy link
Contributor

@Baliedge Baliedge commented Sep 19, 2022

Which problem is this PR solving?

When hosting Refinery in a cloud environment, certain network assumptions are no longer true.
Issues arise from these assumptions while hosting in our cloud environment using Nomad as container orchestration.

Terms, as used in Nomad:

  • Task: Single deployed container.
  • Group: Group of tasks running as a scalable service (e.g. Refinery).

This assumption is not necessarily true under Nomad:

  • The IP assigned to a task's eth0 can be accessed by other tasks in the group.

Redis peering expects peers to talk to each other using their eth0 IP. This behavior can be overridden by specifying RedisIdentifier in config. This value will be advertised as the peer's IP in the peer list.

However, on startup, peering uses its eth0 IP to cross-reference in the peers list found in Redis. Since the internal eth0 IP is not the true public IP (set by RedisIdentifier), this check will always fail and the container exits in error.

Short description of the changes

A workaround is to treat RedisIdentifier as the true public IP address of the task. Thus, it should be one of the "self" IPs when checking the peers list.

This behavior is reinforced in existing code in internal/peer/redis.go where RedisIdentifier is used to create a URL with the value being the host.

Copy link
Contributor

@kentquirk kentquirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a draft, I know, but I thought I'd weigh in since you posted on Pollinators.

sharder/deterministic.go Outdated Show resolved Hide resolved
sharder/deterministic.go Outdated Show resolved Hide resolved
sharder/deterministic.go Outdated Show resolved Hide resolved
@Baliedge Baliedge marked this pull request as ready for review September 21, 2022 14:39
@Baliedge Baliedge requested review from a team and robbkidd September 21, 2022 14:39
@kentquirk kentquirk self-assigned this Sep 21, 2022
Copy link
Contributor

@kentquirk kentquirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there.

sharder/deterministic.go Show resolved Hide resolved
sharder/deterministic.go Show resolved Hide resolved
@MikeGoldsmith MikeGoldsmith added type: enhancement New feature or request version: bump minor A PR that adds behavior, but is backwards-compatible. labels Sep 23, 2022
Copy link
Contributor

@kentquirk kentquirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for the delay. I was out sick for several days.

This looks good! Thank you.

@MikeGoldsmith
Copy link
Contributor

@Baliedge does this supersede the the etcd PR?

@Baliedge
Copy link
Contributor Author

@Baliedge does this supersede the the etcd PR?

No. I would say that's a separate effort. We decided to stick to Redis mode for now. Moving to etcd was determined to be a nice to have. We may decide to develop that later if necessary.

Copy link
Contributor

@TylerHelmuth TylerHelmuth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kentquirk kentquirk merged commit ed646e5 into honeycombio:main Oct 6, 2022
LokeshOpsramp pushed a commit to opsramp/tracing-proxy that referenced this pull request Jul 5, 2024
## Which problem is this PR solving?
When hosting Refinery in a cloud environment, certain network
assumptions are no longer true.
Issues arise from these assumptions while hosting in our cloud
environment using [Nomad](https://www.nomadproject.io/) as container
orchestration.

Terms, as used in Nomad:
- Task: Single deployed container.
- Group: Group of tasks running as a scalable service (e.g. Refinery).

This assumption is not necessarily true under Nomad:
- The IP assigned to a task's `eth0` can be accessed by other tasks in
the group.

Redis peering expects peers to talk to each other using their `eth0` IP.
This behavior can be overridden by specifying `RedisIdentifier` in
config. This value will be advertised as the peer's IP in the peer list.

However, on startup, peering uses its `eth0` IP to cross-reference in
the peers list found in Redis. Since the internal `eth0` IP is not the
true public IP (set by `RedisIdentifier`), this check will always fail
and the container exits in error.

## Short description of the changes
A workaround is to treat `RedisIdentifier` as the true public IP address
of the task. Thus, it should be one of the "self" IPs when checking the
peers list.

This behavior is reinforced in existing code in
[`internal/peer/redis.go`](https://github.com/honeycombio/refinery/blob/main/internal/peer/redis.go#L291-L298)
where `RedisIdentifier` is used to create a URL with the value being the
host.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request version: bump minor A PR that adds behavior, but is backwards-compatible.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants