Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

erroneous "cpu exhausted" message using qemu #302

Closed
ghost opened this issue Oct 19, 2015 · 7 comments
Closed

erroneous "cpu exhausted" message using qemu #302

ghost opened this issue Oct 19, 2015 · 7 comments

Comments

@ghost
Copy link

ghost commented Oct 19, 2015

Using current master, any qemu or docker instance I attempt to schedule is coming up unable to find a node for placement. This may or may not be because I'm attempting to run something paravirtualized (or I could just not have the nomad file built properly), if so it would be helpful if the allocator told me this. Tested with a 4 core node with 4 gigs of memory. I've run docker images on this setup with no issue before however they are also now failing. Command output:

AC02MK0LSFD58:~ rvm2015$ nomad run example.nomad
==> Monitoring evaluation "0ec1d56f-b31d-c1c7-aeb5-2ab272516b32"
    Evaluation triggered by job "qemu_centos7"
    Scheduling error for group "qemu_test" (failed to find a node for placement)
    Allocation "b4a83386-3d8c-a145-b079-8690a23b8612" status "failed" (0/1 nodes filtered)
      * Resources exhausted on 1 nodes
      * Dimension "cpu exhausted" exhausted on 1 nodes
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "0ec1d56f-b31d-c1c7-aeb5-2ab272516b32" finished with status "complete"

example.nomad

# There can only be a single job definition per file.
# Create a job with ID and Name 'example'
job "qemu_centos7" {
# Run the job in the global region, which is the default.
# region = "global"

# Specify the datacenters within the region this job can run in.
datacenters = ["dc1"]

# Service type jobs optimize for long-lived services. This is
# the default but we can change to batch for short-lived tasks.
# type = "service"

# Priority controls our access to resources and scheduling priority.
# This can be 1 to 100, inclusively, and defaults to 50.
# priority = 50

# Restrict our job to only linux. We can specify multiple
# constraints as needed.
constraint {
    attribute = "$attr.kernel.name"
    value = "linux"
}

# Configure the job to do rolling updates
update {
    # Stagger updates every 10 seconds
    stagger = "10s"

    # Update a single task at a time
    max_parallel = 1
}

# Create a 'cache' group. Each task in the group will be
# scheduled onto the same machine.
group "qemu_test" {
    # Control the number of instances of this groups.
    # Defaults to 1
    # count = 1

    # Define a task to run
    task "qemu_task" {
        # Use Docker to run the task.
        driver = "qemu"

        image_source = "http://core.example.org/centos7.qcow2"
        checksum = "443ca3ac203fa0f90bbd739119b57384" 

        # We must specify the resources required for
        # this task to ensure it runs on a machine with
        # enough capacity.
        resources {
            cpu = 500 # 500 Mhz
            memory = 256 # 256MB
            network {
                mbits = 10
            }
        }
    }
}
}
@ghost
Copy link
Author

ghost commented Oct 19, 2015

Rolling this to nomad 0.1.2 fixes docker, partially fixes qemu. Qemu is now stating "failed to start: Missing source image Qemu driver" however this appears to be a check on the source address of the qemu image in qemu.go (https://github.com/hashicorp/nomad/blob/7ab84c2862d8f8de75e9ac64ee71b8a0cd05c798/client/driver/qemu.go) which is correct for my environment.

AC02MK0LSFD58:~ rvm2015$ nomad alloc-status 9439010a-b52c-ad2a-c35d-ecf9560d10b0
ID                = 9439010a-b52c-ad2a-c35d-ecf9560d10b0
EvalID            = 580d761f-8ba8-1b73-e265-b7ea6987e539
Name              = qemu_centos7.qemu_test[0]
NodeID            = baaf41ac-074f-ff4e-eb0a-eed11d8bc246
JobID             = qemu_centos7
ClientStatus      = failed
ClientDescription = {"qemu_task":{"Status":"failed","Description":"failed to start: Missing source image Qemu driver"}}
NodesEvaluated    = 1
NodesFiltered     = 0
NodesExhausted    = 0
AllocationTime    = 27.002µs
CoalescedFailures = 0

==> Status
Allocation "9439010a-b52c-ad2a-c35d-ecf9560d10b0" status "failed" (0/1 nodes filtered)
   * Score "baaf41ac-074f-ff4e-eb0a-eed11d8bc246.binpack" = 4.229620

@ghost
Copy link
Author

ghost commented Oct 19, 2015

I suspect the original issue is possibly something in a dependent library changed since 0.1.2 was released (most likely shirou/gopsutil). If I check out v0.1.2 and rebuild locally it I get the same error about exhausted cpus.

@ghost
Copy link
Author

ghost commented Oct 19, 2015

Yeah it looks like the current master tag of gopsutil is broken, get a nil object when testing the library, opened a bug report with the maintainer. Would not be a terrible idea to implement godep in the future to prevent these kinds of issues.

@achanda
Copy link
Contributor

achanda commented Oct 19, 2015

The fix to gopsutil has been merged. You should be able to manually pull those changes and rebuild nomad. But yes, a way to reproduce build deterministically is absolutely essential.

@cbednarski
Copy link
Contributor

@rvm2015 Thanks for the detailed report. I think you're correct we will need godep to prevent this type of issue. We're discussing this internally.

@dadgar
Copy link
Contributor

dadgar commented Jan 6, 2016

Closing as this was from an upstream bug.

@dadgar dadgar closed this as completed Jan 6, 2016
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants