Invalid config format is silently ignored #735

valentinbud · 2016-02-02T07:07:11Z

There is a bug in the config syntax, which looks like a mix between HCL and JSON. Rather than emitting an error, Nomad appears to silently drop some of the config information.

Nomad should complain that it cannot parse the config.

-- cbednarski

Like most of the people around here once I've found the tools you guys make I completely fall in love with them :).

I would like in the future to deploy a cluster built with Nomad, Vault, Consul and Consul Template on top of Docker.

So I started playing around with them. And I thought I would take one of our project and construct a development environment on local machines with the help of boot2docker, Nomad and Consul, slowly integrating the other tools.

First approach was to create custom docker images for consul and nomad and run them with --net=host. Said and done. But then I have found about #150, precisely that I can't mount volumes in containers. I would require them to mount code from the local machine inside the container and to mount a data directory for the database.

Reading around I came to the conclusion that I can use raw_exec driver with command=docker and args=["-v ...",].

So I have created a custom boot2docker.iso with nomad inside.

I have the following agent configuration:

cat /usr/local/etc/nomad/agent.json
{
        "name": "dev",
        "log_level": "DEBUG",
        "enable_debug": true,
        "region": "localhost",
        "datacenter": "boot2docker",
        "data_dir": "/var/lib/boot2docker/nomad",
        "server": {
                "enabled": true,
                "bootstrap_expect": 1
        },
        "client": {
                "enabled": true,
                "network_interface": "eth1"
                "options": {
                        "driver.raw_exec.enable" = "1"
                }
        }
}

Nomad logs in /var/lib/boot2docker/nomad.log. I can see the following logs when I start nomad:

------------------------
/usr/local/bin/nomad agent -config /usr/local/etc/nomad/agent.json --bind=192.168.200.102 --servers=192.168.200.102:4647 >> "/var/lib/boot2docker/nomad.log"
==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
    Loaded configuration from /usr/local/etc/nomad/agent.json
==> Starting Nomad agent...
==> Nomad agent configuration:

                 Atlas: <disabled>
                Client: true
             Log Level: DEBUG
                Region: localhost (DC: boot2docker)
                Server: true

==> Nomad agent started! Log data will stream in below:

    2016/02/02 06:55:36 [INFO] raft: Node at 192.168.200.102:4647 [Follower] entering Follower state
    2016/02/02 06:55:36 [INFO] serf: EventMemberJoin: dev.localhost 192.168.200.102
    2016/02/02 06:55:36 [INFO] nomad: starting 4 scheduling worker(s) for [system service batch _core]
    2016/02/02 06:55:36 [INFO] client: using state directory /var/lib/boot2docker/nomad/client
    2016/02/02 06:55:36 [INFO] client: using alloc directory /var/lib/boot2docker/nomad/alloc
    2016/02/02 06:55:36 [INFO] nomad: adding server dev.localhost (Addr: 192.168.200.102:4647) (DC: boot2docker)
    2016/02/02 06:55:36 [DEBUG] client: periodically fingerprinting consul at duration 15s
    2016/02/02 06:55:37 [WARN] raft: Heartbeat timeout reached, starting election
    2016/02/02 06:55:37 [INFO] raft: Node at 192.168.200.102:4647 [Candidate] entering Candidate state
    2016/02/02 06:55:37 [DEBUG] raft: Votes needed: 1
    2016/02/02 06:55:37 [DEBUG] raft: Vote granted. Tally: 1
    2016/02/02 06:55:37 [INFO] raft: Election won. Tally: 1
    2016/02/02 06:55:37 [INFO] raft: Node at 192.168.200.102:4647 [Leader] entering Leader state
    2016/02/02 06:55:37 [INFO] nomad: cluster leadership acquired
    2016/02/02 06:55:37 [INFO] raft: Disabling EnableSingleNode (bootstrap)
    2016/02/02 06:55:37 [DEBUG] raft: Node 192.168.200.102:4647 updated peer set (2): [192.168.200.102:4647]
    2016/02/02 06:55:38 [DEBUG] fingerprint.env_aws: Error querying AWS Metadata URL, skipping
    2016/02/02 06:55:38 [DEBUG] fingerprint.env_gce: Error querying GCE Metadata URL, skipping
    2016/02/02 06:55:38 [DEBUG] fingerprint.network: Detected interface eth1  with IP 192.168.200.102 during fingerprinting
    2016/02/02 06:55:38 [DEBUG] client: applied fingerprints [arch cpu host memory network storage]
    2016/02/02 06:55:38 [DEBUG] driver.docker: using client connection initialized from environment
    2016/02/02 06:55:38 [DEBUG] driver.docker: privileged containers are disabled
    2016/02/02 06:55:38 [DEBUG] client: available drivers [docker exec]
    2016/02/02 06:55:38 [DEBUG] client: node registration complete
    2016/02/02 06:55:38 [DEBUG] client: updated allocations at index 1 (0 allocs)
    2016/02/02 06:55:38 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 0)
    2016/02/02 06:55:41 [DEBUG] client: state updated to ready

As per the log above the raw_exec driver is not loaded only docker and exec.

I have a dev.nomad job file that looks like:

# There can only be a single job definition per file.
# Create a job with ID and Name 'example'
job "mobum" {
    # Run the job in the global region, which is the default.
    region = "localhost"

    # Specify the datacenters within the region this job can run in.
    datacenters = ["boot2docker"]

    # Service type jobs optimize for long-lived services. This is
    # the default but we can change to batch for short-lived tasks.
    type = "service"

    # Priority controls our access to resources and scheduling priority.
    # This can be 1 to 100, inclusively, and defaults to 50.
    # priority = 50

    # Restrict our job to only linux. We can specify multiple
    # constraints as needed.
    constraint {
        attribute = "$attr.kernel.name"
        value = "linux"
    }

    # Configure the job to do rolling updates
    update {
        # Stagger updates every 10 seconds
        stagger = "10s"

        # Update a single task at a time
        max_parallel = 1
    }

    # Create a 'cache' group. Each task in the group will be
    # scheduled onto the same machine.
    task "mobum-postgres" {
        # Use Docker to run the task.
        driver = "docker"
        # Configure Docker driver with the image
        config {
            image = "postgres"
            port_map {
                postgres = 5432
            }
        }
        service {
            name = "mobum-postgres"
            tags = ["mobum", "postgres"]
            port = "postgres"
            check {
                name = "alive"
                type = "tcp"
                interval = "10s"
                timeout = "2s"
            }
        }
        # We must specify the resources required for
        # this task to ensure it runs on a machine with
        # enough capacity.
        resources {
            cpu = 500 # 500 Mhz
            memory = 256 # 256MB
            network {
                mbits = 10
                port "postgres" {
                }
            }
        }
    }
    task "mobum-django-app" {
        # Use Docker to run the task.
        driver = "raw_exec"
        # Configure Docker driver with the image
        config {
            command = "/usr/local/bin/docker"
            args = [ "run", "-d", "--hostname=mobum-django-app", "-p 8080:8080", "-v /Users:/Users", "--name=${BASE}", "mobum:latest" ]
        }
        service {
            name = "mobum-django-app"
            tags = ["mobum", "django-app"]
            port = "8080"
            check {
                name = "alive"
                type = "tcp"
                interval = "10s"
                timeout = "2s"
            }
        }
        # We must specify the resources required for
        # this task to ensure it runs on a machine with
        # enough capacity.
        resources {
            cpu = 500 # 500 Mhz
            memory = 256 # 256MB
            network {
                mbits = 10
                port "8080" {
            static = 8080
                }
            }
        }
    }
}

Running it with nomad run dev.nomad gives the following output on the client:

$ nomad run dev.nomad
==> Monitoring evaluation "9ea8a959-2dff-18e5-33eb-a7474c60eddd"
    Evaluation triggered by job "mobum"
    Scheduling error for group "mobum-django-app" (failed to find a node for placement)
    Allocation "9f961362-cda5-1f1c-f226-85fd8b6e0d85" status "failed" (1/1 nodes filtered)
      * Constraint "missing drivers" filtered 1 nodes
    Allocation "ba22a437-c3f3-f314-19cd-287913da2f1a" created: node "627af627-4b96-8ac2-31d1-0707ab6bfe2d", group "mobum-postgres"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "9ea8a959-2dff-18e5-33eb-a7474c60eddd" finished with status "complete"

mobum-django-app failed because it couldn't find a node which has driver raw_exec. As expected I would say :).

I didn't stop here so I switched raw_exec with exec for task mobum-django-app and now I have the following output on the client:

$ nomad run dev.nomad
==> Monitoring evaluation "e0f7386b-6b4f-250b-2867-626b391ebd77"
    Evaluation triggered by job "mobum"
    Allocation "f2341149-a8c7-db3f-3483-21fa1bbe12b4" created: node "627af627-4b96-8ac2-31d1-0707ab6bfe2d", group "mobum-django-app"
    Allocation "301fa2a4-2ef5-8876-ea9e-e5cbc209f0cf" created: node "627af627-4b96-8ac2-31d1-0707ab6bfe2d", group "mobum-postgres"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "e0f7386b-6b4f-250b-2867-626b391ebd77" finished with status "complete"

But the mobum-django-app task fails to configure task directory as per the nomad logs.

    2016/02/02 07:02:59 [DEBUG] client: starting runner for alloc '301fa2a4-2ef5-8876-ea9e-e5cbc209f0cf'
    2016/02/02 07:02:59 [DEBUG] worker: updated evaluation <Eval 'e0f7386b-6b4f-250b-2867-626b391ebd77' JobID: 'mobum'>
    2016/02/02 07:02:59 [DEBUG] worker: ack for evaluation e0f7386b-6b4f-250b-2867-626b391ebd77
    2016/02/02 07:02:59 [DEBUG] client: starting task context for 'mobum-postgres' (alloc '301fa2a4-2ef5-8876-ea9e-e5cbc209f0cf')
    2016/02/02 07:02:59 [DEBUG] client: starting runner for alloc 'f2341149-a8c7-db3f-3483-21fa1bbe12b4'
    2016/02/02 07:02:59 [DEBUG] client: starting task context for 'mobum-django-app' (alloc 'f2341149-a8c7-db3f-3483-21fa1bbe12b4')
    2016/02/02 07:02:59 [ERR] client: failed to start task 'mobum-django-app' for alloc 'f2341149-a8c7-db3f-3483-21fa1bbe12b4': failed to configure task directory: Couldn't mount /dev to /var/lib/boot2docker/nomad/alloc/f2341149-a8c7-db3f-3483-21fa1bbe12b4/mobum-django-app/dev: no such device
    2016/02/02 07:03:00 [DEBUG] client: updated allocations at index 40 (5 allocs)
    2016/02/02 07:03:00 [DEBUG] client: allocs: (added 0) (removed 0) (updated 4) (ignore 1)
    2016/02/02 07:03:00 [ERR] client: dropping update to alloc '9750e8b0-2bee-4c4e-3357-4ab842e05e13'
    2016/02/02 07:03:00 [ERR] client: dropping update to alloc '58578772-a831-0771-3d07-8828c43dcbcb'
    2016/02/02 07:03:00 [DEBUG] http: Request /v1/evaluation/e0f7386b-6b4f-250b-2867-626b391ebd77 (208.922µs)
    2016/02/02 07:03:00 [DEBUG] http: Request /v1/evaluation/e0f7386b-6b4f-250b-2867-626b391ebd77/allocations (83.324µs)
    2016/02/02 07:03:30 [DEBUG] driver.docker: docker pull postgres:latest succeeded
    2016/02/02 07:03:30 [DEBUG] driver.docker: identified image postgres as 54fa18d9f3b6c5b350ec4588bdb4f4e90df21e3fd6c767094a1f77c13cd5b453
    2016/02/02 07:03:30 [DEBUG] driver.docker: using 268435456 bytes memory for postgres
    2016/02/02 07:03:30 [DEBUG] driver.docker: using 500 cpu shares for postgres
    2016/02/02 07:03:30 [DEBUG] driver.docker: binding directories []string{"/var/lib/boot2docker/nomad/alloc/301fa2a4-2ef5-8876-ea9e-e5cbc209f0cf/alloc:/alloc:rw,z", "/var/lib/boot2docker/nomad/alloc/301fa2a4-2ef5-8876-ea9e-e5cbc209f0cf/mobum-postgres:/local:rw,Z"} for postgres
    2016/02/02 07:03:30 [DEBUG] driver.docker: networking mode not specified; defaulting to bridge
    2016/02/02 07:03:30 [DEBUG] driver.docker: allocated port 192.168.200.102:35979 -> 5432 (mapped)
    2016/02/02 07:03:30 [DEBUG] driver.docker: exposed port 5432
    2016/02/02 07:03:30 [DEBUG] driver.docker: setting container name to: mobum-postgres-301fa2a4-2ef5-8876-ea9e-e5cbc209f0cf
    2016/02/02 07:03:30 [INFO] driver.docker: created container b0627794c6d181699cd87828f5b34b5c9e207a58268d704a65abfe5bf67b4279
    2016/02/02 07:03:30 [INFO] driver.docker: started container b0627794c6d181699cd87828f5b34b5c9e207a58268d704a65abfe5bf67b4279
    2016/02/02 07:03:30 [INFO] consul: registering service mobum-postgres with consul.
    2016/02/02 07:03:30 [DEBUG] client: updated allocations at index 41 (5 allocs)
    2016/02/02 07:03:30 [DEBUG] client: allocs: (added 0) (removed 0) (updated 5) (ignore 0)
    2016/02/02 07:03:30 [ERR] client: dropping update to alloc '58578772-a831-0771-3d07-8828c43dcbcb'
    2016/02/02 07:03:30 [ERR] client: dropping update to alloc '9750e8b0-2bee-4c4e-3357-4ab842e05e13'

Would the situation change if I load raw_exec? How can I load it for testing purposes? Thanks.

The text was updated successfully, but these errors were encountered:

cbednarski · 2016-02-02T07:52:40Z

I'm not sure if this is the problem but your config is not valid JSON. It looks like you have mixed HCL syntax with JSON syntax. I'm not sure how this will be parsed.

"driver.raw_exec.enable" = "1"
"network_interface": "eth1"

should be

"network_interface": "eth1",
"driver.raw_exec.enable": "1"

Note the , after "eth1" and : rather than =. I'm curious why you didn't get an error message from the config parser about this.

valentinbud · 2016-02-02T08:07:44Z

@cbednarski thank you for spotting the error.

driver.raw_exec.enable should be under client:{options: {}} or just under client:{}? For now it's the former and the raw_exec driver gets loaded.

Should I close this issue and open a new one for the config parser? Thanks!

cbednarski · 2016-02-02T09:37:46Z

All the info is here I will just change the title.

diptanu · 2016-03-16T20:59:56Z

Fixed via #910

github-actions · 2022-11-13T02:32:02Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

cbednarski added the type/question label Feb 2, 2016

cbednarski added type/bug theme/config and removed type/question labels Feb 2, 2016

cbednarski changed the title ~~Nomad raw_exec driver not enabled~~ Invalid config format is silently ignored Feb 2, 2016

diptanu closed this as completed Mar 16, 2016

github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid config format is silently ignored #735

Invalid config format is silently ignored #735

valentinbud commented Feb 2, 2016

cbednarski commented Feb 2, 2016

valentinbud commented Feb 2, 2016

cbednarski commented Feb 2, 2016

diptanu commented Mar 16, 2016

github-actions bot commented Nov 13, 2022

Invalid config format is silently ignored #735

Invalid config format is silently ignored #735

Comments

valentinbud commented Feb 2, 2016

cbednarski commented Feb 2, 2016

valentinbud commented Feb 2, 2016

cbednarski commented Feb 2, 2016

diptanu commented Mar 16, 2016

github-actions bot commented Nov 13, 2022