Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

operator: Add option to act as cluster wide #1777

Merged
merged 14 commits into from
Feb 13, 2018

Conversation

guilhem
Copy link
Contributor

@guilhem guilhem commented Dec 14, 2017

for a cluster to be managed widely, user have to add annotation:
etcd.database.coreos.com/scope: clusterwide
And admin have to run an operator with option -cluster-wide.

Current implementation have a lack of locking if many operators with cluster-wide run on different namespaces.

@etcd-bot
Copy link
Collaborator

Can one of the admins verify this patch?

1 similar comment
@etcd-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@guilhem
Copy link
Contributor Author

guilhem commented Dec 15, 2017

JFYI: a docker image exist with this patch guilhem/etcd-operator

@xiang90
Copy link
Collaborator

xiang90 commented Dec 15, 2017

The idea seems fine.

But we need to add tests and related documentation to move this forward.

@guilhem
Copy link
Contributor Author

guilhem commented Dec 15, 2017

Good to know! :)
That's the news I waited before putting more effort on tests and documentation.
I will do this this weekend or next week :)

@xiang90
Copy link
Collaborator

xiang90 commented Dec 15, 2017

@guilhem

Out of curiosity, what is your use case of having a cluster wide etcd operator? Why a per namespace one will not work for you?

@guilhem
Copy link
Contributor Author

guilhem commented Dec 17, 2017

@xiang90

I want to save resources (pod/memory/watchers in API) and operations when having a lot of namespaces.

@guilhem
Copy link
Contributor Author

guilhem commented Dec 17, 2017

My main concern is were to put configuration:

  1. as I do in annotation
  2. in Spec part
  3. nowhere, can't mix etcd-operators

@raoofm
Copy link

raoofm commented Dec 18, 2017

@xiang90 @guilhem Can you please expand on the usecase? I didn't get it.

@guilhem
Copy link
Contributor Author

guilhem commented Dec 18, 2017

@raoofm just want to avoid having to manage many etcdoperators
My operator need etcdoperator and I don't want to manage another operator in it.
I just want to install it as external dependency :)

@xiang90
Copy link
Collaborator

xiang90 commented Jan 17, 2018

@raoofm

more details here: #859

@guilhem guilhem force-pushed the clusterwide branch 2 times, most recently from ad657e7 to dde9475 Compare January 18, 2018 14:01
@guilhem
Copy link
Contributor Author

guilhem commented Jan 18, 2018

@xiang90 I added some tests and documentation.
Don't hesitate to comment, or better, correct any mistake I made ;)

@raoofm
Copy link

raoofm commented Jan 18, 2018

@xiang90 awesome, makes sense. Thanks @guilhem

@hongchaodeng
Copy link
Member

@etcd-bot ok to test

@guilhem
Copy link
Contributor Author

guilhem commented Jan 31, 2018

I know news are quite huge...
But it's there anything I can do to make this PR merged?

@guilhem guilhem changed the title [WIP] operator: Add option to act as cluster wide operator: Add option to act as cluster wide Jan 31, 2018
@xiang90
Copy link
Collaborator

xiang90 commented Feb 1, 2018

The idea looks good for me. Defer to @hongchaodeng or @fanminshi or @hasbro17 for a actual code review.

image: quay.io/coreos/etcd-operator:v0.8.1
command:
- etcd-operator
- -all-namespaces
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster-wide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

EnvOperatorPodNamespace = "MY_POD_NAMESPACE"
EnvOperatorPodName = "MY_POD_NAME"
EnvOperatorPodNamespace = "MY_POD_NAMESPACE"
EnvRestoreOperatorServiceName = "SERVICE_ADDR"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this EnvRestoreOperatorServiceName

Copy link
Contributor Author

@guilhem guilhem Feb 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, bad merge

@guilhem guilhem force-pushed the clusterwide branch 2 times, most recently from 6065fd2 to d06117a Compare February 3, 2018 16:19
Copy link
Member

@hongchaodeng hongchaodeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the review delay.
When you have change, please ping me on github.


## Install etcd operator

etcd operator have to run with `-all-namespaces` arg option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag should be "cluster-wide"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to be like kubectl option --all-namespaces=false.
But I wasn't really happy with this solution and will change it to cluster-wide :)


etcd operator have to run with `-all-namespaces` arg option.

For example you can run:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to mention the command and install guide here.
Just give a reference to the file.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: etcd-operator-cluster-wide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is too much duplicate and it is just hard to maintain for every upgrade.
I would recommend to add a comment to existing deployment file: https://github.com/coreos/etcd-operator/blob/master/example/deployment.yaml

@@ -64,6 +65,10 @@ func New(cfg Config) *Controller {
func (c *Controller) handleClusterEvent(event *Event) error {
clus := event.Object

if !c.managed(clus) {
return fmt.Errorf("cluster (%s) isn't managed", clus.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will introduce unnecessary errors/warnings in the log for unmanaged EtcdCluster CRs. Should just ignore it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this func return (ignored bool, err error) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongchaodeng tell me if my last commit is ok for you.
I will rebase if ok.

@hongchaodeng
Copy link
Member

Test failed:

--- FAIL: TestHandleClusterEventClusterwideIgnored (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x16cc8bc]

goroutine 56 [running]:
testing.tRunner.func1(0xc4201605a0)
	/usr/local/go/src/testing/testing.go:711 +0x5d9
panic(0x17fab80, 0x23564a0)
	/usr/local/go/src/runtime/panic.go:491 +0x2a2
github.com/coreos/etcd-operator/pkg/controller.TestHandleClusterEventClusterwideIgnored(0xc4201605a0)
	/go/src/github.com/coreos/etcd-operator/pkg/controller/controller_test.go:112 +0x1ac
testing.tRunner(0xc4201605a0, 0x19ccc58)
	/usr/local/go/src/testing/testing.go:746 +0x16d
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:789 +0x569
FAIL	github.com/coreos/etcd-operator/pkg/controller	0.553s

kind: "EtcdCluster"
metadata:
name: "example-etcd-cluster"
annotations:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the comment in example file for this annotation instead of here?
We have plans to show the code files in the doc later: #1904

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -61,28 +62,32 @@ func New(cfg Config) *Controller {
}
}

func (c *Controller) handleClusterEvent(event *Event) error {
func (c *Controller) handleClusterEvent(event *Event) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment for the returned value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use named values instead (as you previously suggest :) )

@hongchaodeng
Copy link
Member

Can you also rebase against master? Your commit is too old.

@NatsuSir
Copy link

NatsuSir commented Feb 5, 2018

I don't know what you say😂😂

@hongchaodeng
Copy link
Member

Which part you don't understand?

Copy link
Member

@hongchaodeng hongchaodeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failed due to unknown issue:

denied: Access denied.

name: "example-etcd-cluster"
# Adding this annotation make this cluster managed by clusterwide operators
# namespaced operators ignore it
annotations:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine this to example-etcd-cluster.yaml and add a comment for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops my bad, was what I wanted to do

@@ -61,28 +62,37 @@ func New(cfg Config) *Controller {
}
}

func (c *Controller) handleClusterEvent(event *Event) error {
func (c *Controller) handleClusterEvent(event *Event) (ignored bool, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment for this func what returned values are

clus := event.Object

if !c.managed(clus) {
ignored = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer explicit return value:

return true, nil

@@ -30,4 +30,7 @@ const (

EnvOperatorPodName = "MY_POD_NAME"
EnvOperatorPodNamespace = "MY_POD_NAMESPACE"

AnnotationScope = "etcd.database.coreos.com/scope"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also move these const into k8sutil:
https://github.com/coreos/etcd-operator/blob/master/pkg/util/k8sutil/k8sutil.go#L46

We want to get rid of this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove AnnotationScope?

@guilhem
Copy link
Contributor Author

guilhem commented Feb 13, 2018

@hongchaodeng any news about why tests are failing?
I don't see anything on my side to do to fix it :/

@fanminshi
Copy link
Contributor

via https://jenkins-etcd-public.prod.coreos.systems/job/etcd-operator/3228/

17:48:25 Short-lived access for ['gcr.io', 'us.gcr.io', 'eu.gcr.io', 'asia.gcr.io', 'b.gcr.io', 'bucket.gcr.io', 'l.gcr.io', 'launcher.gcr.io', 'appengine.gcr.io', 'us-mirror.gcr.io', 'eu-mirror.gcr.io', 'asia-mirror.gcr.io', 'mirror.gcr.io'] configured.
17:48:25 building test-pod container...
17:48:39 unable to find image "sha256:c8c15988d8f837cf1318568c8eaf4a20915b1ae7db32158505f48a7adb670396"
17:48:40 namespace "e2e-etcd-operator-3228" deleted
17:48:40 Build step 'Execute shell' marked build as failure

@fanminshi
Copy link
Contributor

@etcd-bot re-test this please reason: build image not found might due to jenkin flake.

@fanminshi
Copy link
Contributor

@guilhem have you rebase this pr on top of the latest master?

@guilhem
Copy link
Contributor Author

guilhem commented Feb 13, 2018

@fanminshi yes :/

@fanminshi
Copy link
Contributor

@guilhem test passed. :)

README.md Outdated

- The etcd operator only manages the etcd cluster created in the same namespace. Users need to create multiple operators in different namespaces to manage etcd clusters in different namespaces.
### Limitations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably no need to have Limitations section if it is blank.

@@ -66,6 +68,7 @@ func init() {
flag.BoolVar(&printVersion, "version", false, "Show version and quit")
flag.BoolVar(&createCRD, "create-crd", true, "The operator will not create the EtcdCluster CRD when this flag is set to false.")
flag.DurationVar(&gcInterval, "gc-interval", 10*time.Minute, "GC interval")
flag.BoolVar(&clusterWide, "cluster-wide", false, "The operator will watch clusters in all namespaces")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operator will watch clusters in all namespaces => Enable operator to watch clusters in all namespaces?

Avoid the word Will ref: https://developers.google.com/style/tense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this link :)

@@ -30,4 +30,7 @@ const (

EnvOperatorPodName = "MY_POD_NAME"
EnvOperatorPodNamespace = "MY_POD_NAMESPACE"

AnnotationScope = "etcd.database.coreos.com/scope"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove AnnotationScope?

@@ -30,4 +30,7 @@ const (

EnvOperatorPodName = "MY_POD_NAME"
EnvOperatorPodNamespace = "MY_POD_NAMESPACE"

AnnotationScope = "etcd.database.coreos.com/scope"
AnnotationClusterWide = "clusterwide"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove AnnotationClusterWide?


More information in [install guide](doc/user/install_guide.md).

## [Cluster resource example](example/example-etcd-cluster-clusterwide.yaml)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not use a header?
Add a sentence:

See the example in ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guilhem fix this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fanminshi hehe was in another batch ;)

@@ -61,28 +62,37 @@ func New(cfg Config) *Controller {
}
}

func (c *Controller) handleClusterEvent(event *Event) error {
// handleClusterEvent returns ignored true if cluster isn't managed by this instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns ignored true if cluster is ignored (not managed) by this instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guilhem fix this?

@@ -61,28 +62,37 @@ func New(cfg Config) *Controller {
}
}

func (c *Controller) handleClusterEvent(event *Event) error {
// handleClusterEvent returns ignored true if cluster isn't managed by this instance.
func (c *Controller) handleClusterEvent(event *Event) (ignored bool, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't clear before.
Prefer ALL returns to be explicit about return values.

handleClusterEvent(event *Event) (bool, error) {
  ...
  return false, nil
  ...
  return false, fmt.Errorf(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe no problem :)

@hongchaodeng
Copy link
Member

@guilhem Thanks for your patience and persistence. Only a few minor comments. Overall it looks good.

@guilhem
Copy link
Contributor Author

guilhem commented Feb 13, 2018

I "think" I fixed your comments :)
But no problem for all the time this PR take me, I know how it's hard to accept PR with new feature from community.
And it is my first commit in your project... I also have a lot to learn.
Thank you for your patience.

If all is ok, I will squash all my PR to have something clean ;)

@hongchaodeng
Copy link
Member

I know how it's hard to accept PR with new feature from community.

Thanks again for your patience.
We try to embrace the community so there are a lot of things for us to learn. We will be more open and let us know what we did not do well.

If all is ok, I will squash all my PR to have something clean ;)

No worries. We will do squash when we merge.

@hongchaodeng
Copy link
Member

LGTM


More information in [install guide](doc/user/install_guide.md).

See the example in [example/example-etcd-cluster.yaml](example/example-etcd-cluster.yaml)
Copy link
Contributor

@fanminshi fanminshi Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the link in [example/example-etcd-cluster.yaml](example/example-etcd-cluster.yaml) doesn't work. you might want to try ~~~[example/example-etcd-cluster.yaml](../../blob/master/example/example-etcd-cluster.yaml)~~~

edit: use [example/example-etcd-cluster.yaml](../../example/example-etcd-cluster.yaml)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Should be

[example/example-etcd-cluster.yaml](../../example/example-etcd-cluster.yaml)

Copy link
Contributor

@fanminshi fanminshi Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongchaodeng ~~~this [example/example-etcd-cluster.yaml](../../example/example-etcd-cluster.yaml) didn't work for me.~~~

edit: use [example/example-etcd-cluster.yaml](../../example/example-etcd-cluster.yaml)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying with ../../

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, ../../ is the right approach.

@fanminshi
Copy link
Contributor

lgtm

@guilhem
Copy link
Contributor Author

guilhem commented Feb 13, 2018

Last thing you may add is a "BETA" mention in cmd help.

But you can do it directly on my own branch, feel free ;)

@hongchaodeng hongchaodeng merged commit 0a4c2c1 into coreos:master Feb 13, 2018
@guilhem
Copy link
Contributor Author

guilhem commented Feb 13, 2018

🎉 👍

Thank you all :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants