Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add High Availability research #685

Merged
merged 5 commits into from
Oct 24, 2023
Merged

add High Availability research #685

merged 5 commits into from
Oct 24, 2023

Conversation

giubacc
Copy link

@giubacc giubacc commented Aug 30, 2023

High Availability research

This is a first attempt to define the direction we want to take for the HA topic with s3gw.
Feedbacks, comments, requests, considerations etc; all is good at this time.

Related to: https://github.com/aquarist-labs/s3gw/issues/361
Signed-off-by: Giuseppe Baccini giuseppe.baccini@suse.com

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • CHANGELOG.md has been updated should there be relevant changes in this PR.

@giubacc giubacc added kind/research Issues that need to be researched area/kubernetes k8s and related area/rgw-sfs RGW & SFS related labels Aug 30, 2023
@jhmarina jhmarina mentioned this pull request Aug 30, 2023
4 tasks
@giubacc giubacc self-assigned this Aug 30, 2023
Copy link
Contributor

@m-ildefons m-ildefons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice research. I very much like that you put some thought into what kind of failure scenarios are even within our scope and that there is a comprehensive overview of the various possible configurations with the components at hand.
Here are some comments I thought of while reading, hope you find some useful information in there

docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
docs/research/ha/RATIONALE.md Outdated Show resolved Hide resolved
@l-mb
Copy link

l-mb commented Aug 31, 2023

A high-level comment would be to touch base with the LH team on the work on the NFS share manager and their HA plans/ideas. While they can't take advantage of an ingress, they share a few of the similar concerns - node/pod failure detection, recovery, etc.

Perhaps there's overlap and tech we can leverage jointly.

@jhmarina jhmarina added this to the v.0.21.0 milestone Sep 4, 2023
@james-munson
Copy link

One item we (Longhorn) would like to know about the S3GW HA is whether it will assume the volume beneath the object store must be RWX, or whether RWO would suffice. If the gateway is active/active enough that both sides need simultaneous write access in order to transfer the work fast enough, that will make a difference.

From what I gather in the discussion here, it is unacceptable to have to start a pod on the new owner as part of the failover, but should be acceptable to defer attaching the backing volume until then. If so, then a simple RWO volume would suffice.

If not, the RWX volume would itself be layered on NFS, and any HA transfer would be gated by the NFS HA transfer, which currently requires significant time to clear locks, wait for grace periods, and all that. (I have the ticket to try to improve its performance, if possible.)

@giubacc
Copy link
Author

giubacc commented Sep 7, 2023

One item we (Longhorn) would like to know about the S3GW HA is whether it will assume the volume beneath the object store must be RWX, or whether RWO would suffice. If the gateway is active/active enough that both sides need simultaneous write access in order to transfer the work fast enough, that will make a difference.

From what I gather in the discussion here, it is unacceptable to have to start a pod on the new owner as part of the failover, but should be acceptable to defer attaching the backing volume until then. If so, then a simple RWO volume would suffice.

If not, the RWX volume would itself be layered on NFS, and any HA transfer would be gated by the NFS HA transfer, which currently requires significant time to clear locks, wait for grace periods, and all that. (I have the ticket to try to improve its performance, if possible.)

Our current idea is to propose the HA model: "active/standby".
This can be translated into Kubernetes concepts as: a deployment with an "immutable" replicas: 1 spec.
So in case of failure, Kubernetes would restart a new s3gw's pod that will gain the "ownership" over the LH volume without too much complications; we suppose it would be a not-failable operation for the new s3gw's pod being able to attach to the existing LH PVC (previously mounted on the failed pod).
In this way ("active/standby"), we would avoid to have others "cold" instances of the s3gw; an "active/passive" approach would require a potentially not trivial complexity to be added over the Kubernetes primitives we should implement.

@l-mb
Copy link

l-mb commented Sep 7, 2023

One item we (Longhorn) would like to know about the S3GW HA is whether it will assume the volume beneath the object store must be RWX, or whether RWO would suffice. If the gateway is active/active enough that both sides need simultaneous write access in order to transfer the work fast enough, that will make a difference.

RWO is, in fact, the only supported mode.

We need features from XFS that NFS would no longer expose; and we will not support multiple s3gw instances on the same store (which conceptually also doesn't really make sense from a performance PoV, and not really from an availability point of view either, since it still all depends on a single node).

We've got no plans to support RWX.

(At that point, s3gw would be slowly implementing a distributed K/V object store as a backend, and ... that'd be called RADOS/Ceph :-D )

@giubacc giubacc force-pushed the HA-research branch 3 times, most recently from ad8d872 to 9fbea68 Compare September 13, 2023 08:07
@giubacc giubacc force-pushed the HA-research branch 5 times, most recently from c3e5fdb to 4af657c Compare September 28, 2023 08:55
@giubacc giubacc marked this pull request as ready for review September 28, 2023 09:00
@giubacc giubacc modified the milestones: v.0.21.0, v0.22.0 Sep 28, 2023
@jecluis
Copy link
Contributor

jecluis commented Oct 2, 2023

@l-mb are we good to merge this?

@jecluis
Copy link
Contributor

jecluis commented Oct 13, 2023

@giubacc there are conflicts with this PR, mind addressing them?

Giuseppe Baccini added 5 commits October 16, 2023 14:24
- add research/ha/RATIONALE.md

Related to: https://github.com/aquarist-labs/s3gw/issues/361
Signed-off-by: Giuseppe Baccini <giuseppe.baccini@suse.com>
regular-localhost-incremental-fill-5k
regular_localhost_load_fio_64_write
regular_localhost_zeroload_400_800Kdb
regular_localhost_zeroload_emptydb
segfault_localhost_zeroload_emptydb

Related to: https://github.com/aquarist-labs/s3gw/issues/361
Signed-off-by: Giuseppe Baccini <giuseppe.baccini@suse.com>
- scale_deployment_0_1-k3s3nodes-zeroload-emptydb
- s3wl-putobj-100ms-clusterip
- s3wl-putobj-100ms-ingress

Related to: https://github.com/aquarist-labs/s3gw/issues/361
Signed-off-by: Giuseppe Baccini <giuseppe.baccini@suse.com>
@giubacc
Copy link
Author

giubacc commented Oct 16, 2023

rebased on latest main

@giubacc
Copy link
Author

giubacc commented Oct 24, 2023

@l-mb @jecluis @vmoutoussamy
Can we merge this first HA research?
I'd rather handle the current activity over medik8s with its dedicated LH issue.

@jecluis jecluis merged commit ee686da into s3gw-tech:main Oct 24, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubernetes k8s and related area/rgw-sfs RGW & SFS related kind/research Issues that need to be researched
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants