Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oc cluster join -- how to use? #13336

Closed
debianmaster opened this issue Mar 10, 2017 · 27 comments
Closed

oc cluster join -- how to use? #13336

debianmaster opened this issue Mar 10, 2017 · 27 comments

Comments

@debianmaster
Copy link

It would be great if a guide / readme provided on how to use oc cluster join command

Version

oc v1.5.0-rc.0+49a4a7a
kubernetes v1.5.2+43a9be4
features: Basic-Auth

Server https://127.0.0.1:8443
openshift v1.5.0-alpha.3+cf7e336
kubernetes v1.5.2+43a9be4

Steps To Reproduce

oc cluster up
oc cluster join

Current Result

Prompts for
Please paste the contents of your secret here and hit ENTER:
and does not take any values as input, i have to ctrl + c to exit

Expected Result

join another cluster

@debianmaster
Copy link
Author

let me know if you need help this task, i will try my best

@csrwng
Copy link
Contributor

csrwng commented Mar 16, 2017

@debianmaster sorry I haven't had time to look into this. But, if you want to try it out, the secret that it's expecting is the contents of admin.kubeconfig from the master's config. You need to make sure that the address specified inside that config is accessible to the node you're trying to add.

@debianmaster
Copy link
Author

i tried pasting the content of admin.kubeconfig and HIT enter.
prompt never leaves to next step its struck on asking secret content.

@debianmaster
Copy link
Author

@csrwng can you help me with this? i'm trying to automate platform scaling and i do not want to do it in a non- standard way

@csrwng
Copy link
Contributor

csrwng commented Mar 20, 2017

@debianmaster cluster join is at a very early, experimental stage. Using it to scale the platform is definitely not standard.

@LorbusChris
Copy link
Member

I am interested in this as well, although I haven't had time to test it, yet.

@debianmaster I think the intended usecase is a bit different from yours, as stated in #9547

Add a new command oc cluster join which launches a container that acts as a node.

As I understand it you'd be bootstrapping that container on your machine as a node host to the cluster that is existing somewhere else. Question is are you supposed to run cluster up on that machine prior to cluster join? Have you tried it without?

I am imagining a use case where on one reasonably available bare-metal host I want to run an all-in-one setup (like cluster up) that provides the ability to scale out into a federated HA setup. Hence, my next Question:

Can anybody give an indication whether joining clusters to federations is something that will likely be supported within oc cluster in the future? (>v1.6)

@debianmaster
Copy link
Author

😢 waiting.....

@ghost
Copy link

ghost commented May 22, 2017

just a heads u to thoose following this thread, the paste is waiting for a EOF on the file, so paste it in, and then hit CTRL-D

@debianmaster
Copy link
Author

did that work for you? what config file did you use? kubeconfig? @joshprismon

@pilhuhn
Copy link

pilhuhn commented May 30, 2017

@smarterclayton Do you have a 2 liner on how to use this?

@xiaoping378
Copy link
Contributor

oc cluster join Don't work.

[root@c7 ~]# oc cluster join
Please paste the contents of your secret here and hit ENTER:
123456-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... 
   Deleted existing OpenShift container
-- Checking for openshift/origin:v3.6.0-alpha.2 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ... 
   Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ... 
   Using 127.0.0.1 as the server IP
-- Joining OpenShift cluster ... 
   Starting OpenShift Node using container 'origin'
FAIL
   Error: could not start OpenShift container "origin"
   Details:
     Last 10 lines of "origin" container log:
     error: --kubeconfig must be set to provide API server connection information

@csrwng csrwng assigned smarterclayton and unassigned csrwng Jun 27, 2017
@debianmaster
Copy link
Author

:) waiting......

@wydwww
Copy link

wydwww commented Oct 22, 2017

@debianmaster Hi, can you please tell me what is the "secret"? I checked admin.kubeconfig but there were no "secret" attribute.
Thanks!

@debianmaster
Copy link
Author

@wydwww still no luck. there is no good doc.

@adelton
Copy link
Contributor

adelton commented Oct 31, 2017

@xiaoping378: after it fails, try to run docker logs origin to get the full logs of the failed origin container. For me it says

# docker logs origin
Error: unknown flag: --bootstrap
Usage:
  openshift start node [options]
Options:
      --bootstrap-config-name='': On startup, the node will request a client cert from the master and get its config from this config map in the openshift-node namespace (experimental).
      --config='': Location of the node configuration file to run from. When running from a configuration file, all other command-line arguments are ignored.
      --disable='': The set of node components to disable
      --enable='dns,kubelet,plugins,proxy': The set of node components to enable
      --expire-days=730: Validity of the certificates in days (defaults to 2 years). WARNING: extending this above default value is highly discouraged.
      --hostname='node.example.com': The hostname to identify this node with the master.
      --images='registry.access.redhat.com/openshift3/ose-${component}:${version}': When fetching images used by the cluster for important components, use this format on both master and nodes. The latest release will be used by default.
      --kubeconfig='': Path to the kubeconfig file to use for requests to the Kubernetes API.
      --kubernetes='https://localhost:8443': removed in favor of --kubeconfig
      --latest-images=false: If true, attempt to use the latest images for the cluster instead of the latest release.
      --listen='https://0.0.0.0:8443': The address to listen for connections on (scheme://host:port).
      --network-plugin='': The network plugin to be called for configuring networking for pods.
      --recursive-resolv-conf='': An optional upstream resolv.conf that will override the DNS config.
      --volume-dir='openshift.local.volumes': The volume storage directory.
Use "openshift options" for a list of global command-line options (applies to all commands).

so it looks like there is some issue with passing the correct arguments around. But I'm trying it with OSE 3.7-to-be.

@adelton
Copy link
Contributor

adelton commented Oct 31, 2017

Of course, that Using 127.0.0.1 as the server IP also looks suspicious -- I would hope to see the IP address of the master there.

@adelton
Copy link
Contributor

adelton commented Oct 31, 2017

I've now retried with OpenShift Origin v3.6.1, on RHEL.

On the master:

# curl -LO https://github.com/openshift/origin/releases/download/v3.6.1/openshift-origin-client-tools-v3.6.1-008f2d5-linux-64bit.tar.gz
# tar xvzf openshift-origin-client-tools-v3.6.1-008f2d5-linux-64bit.tar.gz
# cp openshift-origin-client-tools-v3.6.1-008f2d5-linux-64bit/oc /usr/bin/oc
# yum install -y docker
# cat <<EOF >> /etc/containers/registries.conf

insecure_registries:
 - 172.30.0.0/16

EOF
# systemctl restart docker
# oc cluster up --public-hostname=$(hostname) --use-existing-config=true --host-data-dir=/var/lib/origin/openshift.local.etcd
# oc serviceaccounts create-kubeconfig -n openshift-infra node-bootstrapper > node-bootstrapper

The node-boostrapper file contains

apiVersion: v1
clusters:
- cluster:
    api-version: v1
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM2akNDQWRLZ0F3SUJBZ0lCQVRBTkJna3Foa2lHOXcwQkFRc0ZBREFtTVNRd0lnWURWUVFEREJ0dmNHVnUKYzJocFpuUXRjMmxuYm1WeVFERTFNRGswTmpBek9EY3dIaGNOTVRjeE1ETXhNVFF6TXpBM1doY05Nakl4TURNdwpNVFF6TXpBNFdqQW1NU1F3SWdZRFZRUUREQnR2Y0dWdWMyaHBablF0YzJsbmJtVnlRREUxTURrME5qQXpPRGN3CmdnRWlNQTBHQ1NxR1NJYjNEUUVCQVFVQUE0SUJEd0F3Z2dFS0FvSUJBUUROY29LYU1yN1owbTB4OThTSGxuTHIKdUtHWXExMWhtTEN6NVBHR0JCcXFwTGxmQzk3T1ZyVnJRMElDTXl2NTFmRTZRd01tdko1b3FMMW5FT3YzdFEyYgpCM0xVQUl3ZFBwL1ZTSy93QVhlUUxtblVLVFJOQU9DZTArcmhiSGpsTXoxKzdtbGkrZnlHVXpUQk5mdlNhMmhDCnhlcEt4b3RvcDUrbVIvVSt4N3JUTFV5eVZFRkVQckJqd1VjR2dtNmIrQmdJaXRGd3A5cU1Xb1JkTXkvZEZIc0wKai91TVVHVGtEOFJDeTZIMzhFQWFrblF4bG9BWEljOEFUR0N2bXo3U1lzSnVqazFQcmhnU3lnQnc3Uk05cE1vNQozeFdWSytXeTFQc1pmYXBQZVRad2RPdTZDTGlWZUp0Ym5lY1J4TWhibmdBZUwwYk5uU2lxbzJrVTV1NE9wV0loCkFnTUJBQUdqSXpBaE1BNEdBMVVkRHdFQi93UUVBd0lDcERBUEJnTlZIUk1CQWY4RUJUQURBUUgvTUEwR0NTcUcKU0liM0RRRUJDd1VBQTRJQkFRRERPQnBoR2R4ODlnYnA3K1pGUUpvbmhuM1gzUmhYMDN3UG8ySGlRL01iVTlQWgpWMFpCVE0xeDJoU3AvTlpRRDBQWm1Bdk94ZVZlTEdSN3FHWjk0Unp3Z093ekhpS3VGdG1DSVRCOW8wL2sxbHA1CnpScjFLbFZ1elhxSzRCUGJRa1grSEozSzRiVnN5SExrS2lTZHFjTEYwbk5iU0ptTlM1NW1mMStqVFpEanR2NUQKQWU1ekk5OTZwck9XRjBTSFRGMEJBQ0dnVU1KdHJTR2ZxNklEQU84L2lSdkthRGhQeUZzUnpTbVpwMjRCYUo3bQpYaWdzNG5paVJLNTNVZzJFYzV2U2JScldzWkZoT2VSTUNvVCtBVGJIWW5kSndSWHdBQ0NlNVNiY1Q5cVorOEo4CkZCdlRCWDFYMHBjVDBrd0ZNWGk5bjFGNy9lUTVrREZBU2pOdWQvanUKLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
    server: https://127.0.0.1:8443
  name: 127-0-0-1:8443
contexts:
- context:
    cluster: 127-0-0-1:8443
    namespace: openshift-infra
    user: node-bootstrapper
  name: node-bootstrapper
current-context: node-bootstrapper
kind: Config
preferences: {}
users:
- name: node-bootstrapper
  user:
    token: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtaW5mcmEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoibm9kZS1ib290c3RyYXBwZXItdG9rZW4tMnd4NDYiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibm9kZS1ib290c3RyYXBwZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI3ZmEwZTRlZC1iZTQ4LTExZTctOWJhNC0wMDEzMjBmOWJjMzkiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6b3BlbnNoaWZ0LWluZnJhOm5vZGUtYm9vdHN0cmFwcGVyIn0.NAhoU4Sk7psrwWtjM5-wPNkN6CX2iKbsdPMMotE2UJoJn8xhVG3PdJi1OoJzAcEl7mT5OrUzNJ_NviTjkeHPC8MzjMjdcSBEdxav9AVdbnnrFWsxyTyM6TnEMvehmOVbwSN9RgWMT2QeB_gbTuetsz3G14tgEsDtJ8QiNRT-toLtLtDwiqnRiRMoy1o9PC6mkM6NGZRkH7tRSrrbfZwL5KHRfZNH4Icuy2yATcOyxdHl_kYdP8nBNjZkgWB1-m9P2SX0Fmy2WS4S9WP2Ljehld6ROe1rtibnkqt4jeADC9eFwZFMi1GRj1h2LrW74rd7n2tKAS9hjClE1S5hR2kU4g

I've copied that file to second RHEL machine on which I plan to run oc cluster join.

On the second machine:

# curl -LO https://github.com/openshift/origin/releases/download/v3.6.1/openshift-origin-client-tools-v3.6.1-008f2d5-linux-64bit.tar.gz
# tar xvzf openshift-origin-client-tools-v3.6.1-008f2d5-linux-64bit.tar.gz
# cp openshift-origin-client-tools-v3.6.1-008f2d5-linux-64bit/oc /usr/bin/oc
# yum install -y docker
# cat <<EOF >> /etc/containers/registries.conf

insecure_registries:
 - 172.30.0.0/16

EOF
# systemctl restart docker
# oc cluster join --secret="$(cat node-bootstrapper)"
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v3.6.1 image ... 
   Pulling image openshift/origin:v3.6.1
   Pulled 1/4 layers, 26% complete
   Pulled 2/4 layers, 78% complete
   Pulled 3/4 layers, 97% complete
   Pulled 4/4 layers, 100% complete
   Extracting
   Image pull complete
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ... 
   Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ... 
   Using 127.0.0.1 as the server IP
-- Joining OpenShift cluster ... 
   Starting OpenShift Node using container 'origin'
FAIL
   Error: could not start OpenShift container "origin"
   Details:
     No log available from "origin" container

# docker logs origin
I1031 14:45:44.455576   30088 bootstrap_node.go:266] Bootstrapping from API server https://127.0.0.1:8443 (experimental)
F1031 14:45:44.846586   30088 start_node.go:140] Post https://127.0.0.1:8443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: dial tcp 127.0.0.1:8443: getsockopt: connection refused

So I believe the problem is that oc cluster join is using the 127.0.0.1 IP address, instead of connecting to the master.

I have even tried to use

oc cluster join --secret="$(cat node-bootstrapper)" --public-hostname=$MASTER

or replace the IP address in node-bootstrapper's line

    server: https://127.0.0.1:8443

with the master, to no avail.

@smarterclayton, how is the oc cluster join supposed to figure out the hostname / IP address of the master to which it should be joining?

@ikus060
Copy link

ikus060 commented Nov 17, 2017

I don't want to create a new ticket for this issue, so I will comments on this one.
The oc cluster join command is broken in v3.6.1, here how:

--host-data-dir doesn't work
When executing oc cluster join --use-existing-config --host-data-dir=/var/lib/origin/openshift.local.volumes --secret="$(cat kubeconfig)", the generated docker container doesn't have the proper volume created.
--volume="/var/lib/origin/openshift.local.config:/var/lib/origin/openshift.local.config:z" is missing.

Volume /var/lib/origin/openshift.local.config not define
Even without --host-data-dir, the docker container doesn't have any volume for /var/lib/origin/openshift.local.config. Basically, the certs will be lost the next time we run oc cluster join again. You will get something like this:

bootstrap client certificate does not match private key, you may need to delete the client CSR: tls: private key does not match public key

wrong cgroup driver used
Making reference to my own ticket #17190
The node configuration file /var/lib/origin/openshift.local.config/node/node-config.yaml is missing the following lines:

kubeletArguments:
  cgroup-driver:
  - cgroupfs

Otherwise kublet refuse to start with the following error message:

failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"

For adventurous people: I manage to get the node started by running something similar:

docker run --name=origin --hostname=sylve --env="HOME=/root" --env="OPENSHIFT_CONTAINERIZED=true" --env="KUBECONFIG=/var/lib/origin/openshift.local.config/master/admin.kubeconfig" --env="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" --volume="/var/log:/var/log:rw" --volume="/var/run:/var/run:rw" --volume="/sys:/sys:rw" --volume="/sys/fs/cgroup:/sys/fs/cgroup:rw" --volume="/dev:/dev" --volume="/:/rootfs:ro" --volume="/var/lib/origin/openshift.local.volumes:/var/lib/origin/openshift.local.volumes:rslave" --volume="/var/lib/origin/openshift.local.config:/var/lib/origin/openshift.local.config:z" --network=host --privileged --restart= --label io.openshift.tags="openshift,core" --label license="GPLv2" --label io.k8s.description="OpenShift Origin is a platform for developing, building, and deploying containerized applications." --label build-date="20170911" --label name="CentOS Base Image" --label io.k8s.display-name="OpenShift Origin Application Platform" --label vendor="CentOS" --detach=true openshift/origin:v3.6.1 start node --bootstrap --kubeconfig=/var/lib/origin/openshift.local.config/node/node-bootstrap.kubeconfig

@smarterclayton
Copy link
Contributor

smarterclayton commented Nov 17, 2017 via email

@ikus060
Copy link

ikus060 commented Nov 17, 2017

@ smarterclayton yep, I've read your PR #16571.
While openshift start node might work, it doesn't get called properly when trying to use oc cluster join ...
But did you have a look at #17331. I've raise this problem. It seams related to wrong arguments getting passed to the container.

@smarterclayton
Copy link
Contributor

smarterclayton commented Nov 17, 2017 via email

@ikus060
Copy link

ikus060 commented Nov 17, 2017

@smarterclayton I'm not sure it's the right place to continue discussion. If you have a better place, please tell me.
With 3.7.0-rc0, I give a try to your command:

origin start node 
    --bootstrap-config-name=node-config
    --kubeconfig=/var/lib/origin/openshift.local.config/node/node-bootstrap.kubeconfig 
    --config=/var/lib/origin/openshift.local.config/node/node-config.yml
    --enable=kubelet
    --loglevel=4

In the logs, I see:

I1117 20:29:51.302904       1 start_node.go:274] Bootstrapping from master configuration
I1117 20:29:51.303009       1 bootstrap.go:58] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
I1117 20:29:51.550320       1 csr.go:104] csr for this node already exists, reusing
I1117 20:29:51.552579       1 csr.go:112] csr for this node is still valid

Then nothing else happen. From the master server, I only received one cert. Probably the client server. I didn't get any serving cert to be approved.
Alos, my node it not showing in oc get nodes.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 25, 2018
@hoogenm
Copy link

hoogenm commented Feb 25, 2018

/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 25, 2018
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 26, 2018
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 26, 2018
@debianmaster
Copy link
Author

/remove-lifecycle rotten

@openshift-ci-robot openshift-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests