Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Deploy kubernetes cluster with OpenPAI hangs forever #2497

Closed
edenbuaa opened this issue Apr 3, 2019 · 4 comments
Closed

Deploy kubernetes cluster with OpenPAI hangs forever #2497

edenbuaa opened this issue Apr 3, 2019 · 4 comments
Assignees

Comments

@edenbuaa
Copy link
Contributor

edenbuaa commented Apr 3, 2019

How to reproduce it:

  • pull dev-box docker image
  • generate default configuration
    -- python paictl.py cluster k8s-bootup -p ~/pai-config

OpenPAI Environment:

  • OpenPAI version: v 0.10.1
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): ubuntu 16.04
  • Kernel (e.g. uname -a):
  • Hardware (e.g. core number, memory size, storage size, GPU type etc.): 2x v100
  • Others:

Anything else we need to know:
there are 3 machines, and can ssh each other. and they share the same username /password

where's the problem? thanks a lot

@edenbuaa
Copy link
Contributor Author

edenbuaa commented Apr 3, 2019

the log info before hangs as follows:

2019-04-03 03:24:35,864 [INFO] - deployment.k8sPaiLibrary.maintainlib.deploy : Begin to deploy k8s on host 10.10.47.5, the node role is [ master ]
2019-04-03 03:24:35,922 [INFO] - paramiko.transport : Connected (version 2.0, client OpenSSH_7.2p2)
/usr/lib/python2.7/dist-packages/Crypto/Cipher/blockalgo.py:141: FutureWarning: CTR mode needs counter parameter, not IV
self._cipher = factory.new(key, *args, **kwargs)
2019-04-03 03:24:36,061 [INFO] - paramiko.transport : Authentication (password) successful!
2019-04-03 03:24:36,248 [INFO] - deployment.k8sPaiLibrary.maintainlib.common : Executing the command on host [10.10.47.5]: getent passwd openpai | cut -d: -f6
/home/openpai
2019-04-03 03:24:36,262 [INFO] - paramiko.transport : Connected (version 2.0, client OpenSSH_7.2p2)
2019-04-03 03:24:36,403 [INFO] - paramiko.transport : Authentication (password) successful!

@scarlett2018
Copy link
Member

scarlett2018 commented Apr 3, 2019

@mzmssg - is this the known hotfix you had made?

Known Issue: There is a known issue #2433 in v0.10.1 upgrade, we had provided an hotfix #2441 for it. But if your organization does not have any urgency to upgrade to v0.10.1 by end of March 2019, you can postpone the upgrade plan for a week, by when we will release v0.11.0 #2307 in which the known issue has been officially fixed.

@mzmssg
Copy link
Member

mzmssg commented Apr 3, 2019

@scarlett2018
Yes

@edenbuaa
Welcome to use v0.11.0 instead

@edenbuaa
Copy link
Contributor Author

edenbuaa commented Apr 3, 2019

cool!

@mzmssg mzmssg closed this as completed Apr 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants