Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[openshift-4.3]: bump etcd to v3.3.17 #20

Merged
merged 125 commits into from
Oct 24, 2019
Merged

Conversation

hexfusion
Copy link

@hexfusion hexfusion commented Oct 23, 2019

Kubernetes in 1.16 bumped [1] the etcd client to v3.3.15. This update brought in a few major changes outlined below. This PR essentially bumps etcd from v3.3.10 to v3.3.17 for Openshift 4.3 to pair etcd server with the apiservers client.

  • rewrite of client balancer [2]
  • bump gRPC 1.23.0 [3]
    ** The bump in gRPC resolved a few outstanding bugs including "gRPC panic "send on closed channel "[4] which could result in catastrophic member failure.
  • Bump bbolt to v1.3.3

After v3.3.15 was released a few major bugs were exposed and were patched through v3.3.17

  • "etcd client does not parse IPv6 addresses correctly when members are joining" [5] fixed via [6]
  • "kube-apiserver: failover on multi-member etcd cluster fails certificate check on DNS mismatch" [7] fixed via [8]

NOTE: we won't gain the benifits of [8] in client until [10] merges in k8s 1.16.3

etcd CHANGELOG-3-3 [9]:

[1] kubernetes/kubernetes#82199
[2] etcd-io#9860
[3] etcd-io#10911
[4] etcd-io#9956
[5] kubernetes/kubernetes#83550
[6] etcd-io#11211
[7] kubernetes/kubernetes#83028
[8] etcd-io#11184
[9] https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.3.md
[10] kubernetes/kubernetes#83968

Closes: https://jira.coreos.com/browse/ETCD-50

gyuho and others added 30 commits October 10, 2018 13:30
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
use set instead of slice as interval value

fixes etcd-io#10326
[Cherry pick 3.3] grpcproxy: fix memory leak
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Signed-off-by: Iskander Sharipov <quasilyte@gmail.com>
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
When using the embed package to embed etcd, sometimes the storage prefix
being used might be full. In this case, this code path triggers, causing
an: `etcdserver: create wal error: no space left on device` error, which
causes a fatal. A fatal differs from a panic in that it also calls
os.Exit(1). In this situation, the calling program that embeds the etcd
server will be abruptly killed, which prevents it from cleaning up
safely, and giving a proper error message. Depending on what the calling
program is, this can cause corruption and data loss.

This patch switches the fatal to a panic. Ideally this would be a
regular error which would get propagated upwards to the StartEtcd
command, but in the meantime at least this can be caught with recover().

This fixes the most common fatal that I've experienced, but there are
surely more that need looking into. If possible, the errors should be
threaded down into the code path so that embedding etcd can be more
robust.

Fixes: etcd-io#10588

This is a cherry-picked version of upstream: 368f70a
etcdserver: Use panic instead of fatal on no space left error
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
This PR resolves an issue where the `/metrics` endpoints exposed by the proxy were not returning metrics of the etcd members servers but of the proxy itself.

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Signed-off-by: Shreyas Rao <shreyas.sriganesh.rao@sap.com>
Signed-off-by: Shreyas Rao <shreyas.sriganesh.rao@sap.com>
We need to use the stdlib-compatible one that is case-sensitive, etc

Change-Id: Id0df573a70e09967ac7d8c0a63d99d6a49ce82f1
Change-Id: Iac4601443bcad71920fd96b97bfe21c16116577a
Change-Id: I1f3fc00f95efadd6da9b4c248156f8460ae0ff97
Change-Id: Ibfa24e28cacd58388f7606a945c8ac35e1c34580
…ugorji

Using lessons learned from k8s changes:
kubernetes/kubernetes#65034

Change-Id: Ia17a8f94ae6ed00c5af2595c2b48d3c9a0344427
Change-Id: I53b30e9317de6cd058833d743bc88c46686cea20
* Update Documentation folder

Signed-off-by: lucperkins <lucperkins@gmail.com>

* Re-add README file

Signed-off-by: lucperkins <lucperkins@gmail.com>
Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
From etcd-io#10595.

Signed-off-by: Gyuho Lee <leegyuho@amazon.com>
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
make build performs a sanity test on the binary image which causes problems for unsupport arch. Because we run full CI tests against the image this check is not nessisary and will allow images to be build regardless of arch.

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 23, 2019
@hexfusion
Copy link
Author

/retest

@hexfusion hexfusion changed the title WIP: *: bump etcd to v3.3.17 [openshift-4.3]: bump etcd to v3.3.17 Oct 23, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 23, 2019
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
@hexfusion
Copy link
Author

/retest

1 similar comment
@hexfusion
Copy link
Author

/retest

@hexfusion
Copy link
Author

@smarterclayton
Copy link

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 24, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit e30a18f into openshift-4.3 Oct 24, 2019
@hexfusion hexfusion deleted the bump-3.3.17 branch April 7, 2021 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.