Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"allocation lifecycle permissions" when trying to restart with a stopped allocation #7875

Closed
neclimdul opened this issue May 6, 2020 · 15 comments · Fixed by #9909
Closed

Comments

@neclimdul
Copy link

Nomad version

Nomad v0.11.1 (b434570)

Operating system and Environment details

Linux X 5.4.0-28-generic #32-Ubuntu SMP Wed Apr 22 17:40:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Docker Engine - Community
Version: 19.03.8
Ubuntu 20.04

Issue

When trying to restart my allocation or my job I get the following error.
Your ACL token does not grant allocation lifecycle permissions.
even when using a management token.

When I click restart I see this in my logs:

 2020-05-05T23:12:20.126-0500 [ERROR] http: request failed: method=PUT path=/v1/client/allocation/31b5957a-7893-5766-83e8-f697e41b6d7d/restart error="Task not running" code=500

So this seem to be caused when an allocation fails to start or is otherwise not running and you try to restart.

A quick search for the error in the code suggests it gets shown for any exception which tracks with the token having permissions and the error in the log so I'm under the assumption that's the cause.

Reproduction steps

I think you can reproduce this by starting a job, killing the allocation till its stops respawning and trying to restart. I haven't nailed down the steps yet.

Job file (if appropriate)

Nomad Client logs (if appropriate)

Above

@neclimdul
Copy link
Author

now I'm really having trouble recreating this despite being dead in the water the other night and having to stop and start the job to get things working again which means I don't have a working example anymore. I'll keep trying to figure out what sort of weird edge case I was in.

@tgross
Copy link
Member

tgross commented Jun 22, 2020

Hi @neclimdul! Thanks for opening this issue!

When trying to restart my allocation or my job I get the following error.

Was this with the CLI or the web UI? You mentioned "click", so I'm wondering if you ran into some kind of problem with the storage of the token in the web UI.

@neclimdul
Copy link
Author

It was in the Web UI. The rest of the UI was working just something about restarting wasn't working. Restarting a running service seemed to work as well

I'm not sure if this was clear in my report, I was trying to start a dead task using the restart button in the UI. That's why I think you might be able to reproduce this by maybe killing the task in Docker directly until it stalls and then trying to restart it. I haven't been able to sit down and figure that out though.

@tgross
Copy link
Member

tgross commented Jun 22, 2020

Thanks @neclimdul, I think that detail will help narrow things down.

@cgbaker
Copy link
Contributor

cgbaker commented Nov 29, 2020

I just ran into this issue; there are a few things going on. One is a UI issue; as noted above, the UI assumes that any failure to restart is a permissions issue:
https://github.com/hashicorp/nomad/blob/v1.0.0-beta3/ui/app/controllers/allocations/allocation/index.js#L86

In fact, there are a number of error that can happen on restart. For example, my use case involved a post-start task that isn't running anymore.

  1. The UI calls /restart without a task name in the payload
  2. The Nomad API will not allow the alloc to be restarted without a task name and returns a 500: Task not running. This is potentially a bug, which I have filed separately (Alloc restart does not restart one-shot lifecycle tasks #9464).
  3. The UI reports the (incorrect) cause:

Screen Shot 2020-11-29 at 10 37 22 AM

Job spec:

job "repro7875" {
  type = "service"
  datacenters = ["dc1"]
  group "repro" {
    task "main" {
      driver = "exec"
      config {
        command = "sleep"
        args = ["3600"]
      }  
    }
    task "poststart" {
      driver = "exec"
      config {
        command = "env"
      }
      lifecycle {
        hook = "poststart"
      }
    }
  }
}

@pySilver
Copy link

Is this feature is still non functional?

@tgross
Copy link
Member

tgross commented Jan 25, 2021

It's not in the changelog so I'm not sure when, but it looks like the button was removed from the UI in a later version of Nomad. Which is why we have this feature request open: #9881

@DingoEatingFuzz maybe we should close this one, as the bug is fixed by removing in the button, in lieu of #9881?

@pySilver
Copy link

@tgross I'm on the latest Nomad and this button is here :)

@scyd-cb
Copy link

scyd-cb commented Jan 25, 2021

@pySilver did you verify with Nomad latest version 1.0.2 ?
I have updated to 1.0.2 and I still don't see the button when the alloc is in failed state :)

@pySilver
Copy link

@SCYD here it is https://take.ms/YzCOT

@scyd-cb
Copy link

scyd-cb commented Jan 25, 2021

@pySilver yes it is present only when the allocation is running, however it is missing when it is in failed state
nomad failed

@pySilver
Copy link

I am on the latest version, yes

@pySilver
Copy link

I'm sorry. You are probably right. My issue is that I'm getting this "allocation lifecycle permissions" error when I'm trying to restart running service which is also odd.

@DingoEatingFuzz
Copy link
Contributor

If I'm reading all of this correctly, there are two issues being discussed.

  1. Tasks that are dead, or allocs that are terminal (and therefore have dead tasks) cannot be restarted. This issue is also opened as UI does not support ability to Start/Restart failed Allocation and tasks #9881 (thank you @SCYD)
  2. When a task or alloc fails to restart, the UI always reports the error as a permissions issue even though this isn't true.

I want to leave this issue open to track the bad error message and keep the restarting dead tasks discussion in #9881.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants