Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.8 upgrade to 1.0 failed - no node process (1 out of 20 nodes) #2688

Closed
tamireran opened this issue Feb 9, 2017 · 10 comments · Fixed by #2698
Closed

0.8 upgrade to 1.0 failed - no node process (1 out of 20 nodes) #2688

tamireran opened this issue Feb 9, 2017 · 10 comments · Fixed by #2698
Labels
Severity 1 Unable to use product / product is unable to operate / product caused other critical sw to fail

Comments

@tamireran
Copy link
Contributor

Environment info

  • Version: 0.8.0-8ac4edc to 1.0.0-92796da
  • Deployment: AZURE
  • Customer: noobaa online sandbox

Actual behavior

  1. upgraded the setup from 0.8.0-8ac4edc to 1.0.0-92796da. One node didn't make it.
    No node process.

Expected behavior

  1. smooth upgrade

Steps to reproduce

  1. Can't tell. You can ssh and grab information

ssh notadmin@13.92.235.117
You know the password

http://noobaaonline.eastus.cloudapp.azure.com:8080/
Username: cleverball@noobaa.com
Password: XlkJvFXz

Screenshots or Logs or other output that would be helpful

(If large, please upload as attachment)

/var/log/setup.out

Verifying archive integrity... 100% All good.
Uncompressing 0.8.0-8ac4edc 100%
OpenSSL 1.0.1f 6 Jan 2014
Creating directory /usr/local/noobaa
Verifying archive integrity... 100% All good.
Uncompressing 0.8.0-8ac4edc 100%
installing NooBaa
Tue Jan 24 21:34:48 UTC 2017
mkdir: cannot create directory '/usr/local/noobaa/logs': File exists
Terminated
Signal caught, cleaning up
Verifying archive integrity... 100% All good.
Uncompressing 1.0.0-92796da 100%
OpenSSL 1.0.1f 6 Jan 2014
Creating directory /usr/local/noobaa
Verifying archive integrity... 100% All good.
Uncompressing 1.0.0-92796da 100%
installing NooBaa
Thu Feb 9 20:52:16 UTC 2017
mkdir: cannot create directory '/usr/local/noobaa/logs': File exists
Terminated
Signal caught, cleaning up

@tamireran tamireran added Priority 1 Critical Severity 1 Unable to use product / product is unable to operate / product caused other critical sw to fail labels Feb 9, 2017
@guymguym
Copy link
Member

This mkdir seems to be the last failure:
https://github.com/noobaa/noobaa-core/blob/master/src/deploy/Linux/noobaa_service_installer.sh#L23

Perhaps mkdir -p

@nimrod-becker
Copy link
Contributor

no problem with adding -p

@tamireran
Copy link
Contributor Author

According to the log, it happened in the previous upgrade as well.

Why didn't we have this problem on all other nodes?

I think it requires a bit deeper digging.

@nimrod-becker
Copy link
Contributor

I can take a second look at the logs

@tamireran
Copy link
Contributor Author

please do so. thanks

@nimrod-becker
Copy link
Contributor

nimrod-becker commented Feb 15, 2017

@tamireran
I see a gap in the agent logs (usr/local/noobaa/log) between:

noobaa2.log.gz: Nov-20 19:54:12.382 [Agent/2019] [L0]  core.agent.agent_cli:: e ...
noobaa1.log.gz: Feb-7 17:19:18.622 [Agent/1323] [L0]  core.rpc.rpc_n2n_agent:: ...

and then

noobaa1.log.gz: Feb-8 18:10:12.268 [Agent/1323] [L0]  core.agent.agent::  ...
noobaa.log.gz: Feb-9 19:00:18.456 [Agent/1323] [L0]  core.rpc.ice::  ...

Was the agent down for these periods of time ?

Also we see for some reason that the forever service still prints during the installation... continuing to check

@NimrodGeva
Copy link

It looks like forever was still running the noobaa local service.
The problem was actually with the upgrade TO 0.8 (from 0.5.2) and not FROM 0.8.
We need to determine if this situation is relevant and I need to understand if it is possible in the field

@tamireran
Copy link
Contributor Author

tamireran commented Feb 15, 2017 via email

@nimrod-becker
Copy link
Contributor

@NimrodGeva we need to make sure that in the upgrade we clean this issue if it occurred during previous upgrades.

@NimrodGeva
Copy link

We tried upgrading from 0.5.3, 0.5.0 and 0.8, all with agents, straight to 1.1, and the problem did not reproduce.
I believe it's related to a bug we fixed where forever-service was not removed correctly. It won't happen to customers upgrading to 1.1.

Situations where an agent is unresponsive to upgrades should not happen in the field. Those are always possible and unfortunately sometimes will only be solved using some hotfix/workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Severity 1 Unable to use product / product is unable to operate / product caused other critical sw to fail
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants