Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid config format is silently ignored #735

Closed
valentinbud opened this issue Feb 2, 2016 · 5 comments
Closed

Invalid config format is silently ignored #735

valentinbud opened this issue Feb 2, 2016 · 5 comments

Comments

@valentinbud
Copy link

There is a bug in the config syntax, which looks like a mix between HCL and JSON. Rather than emitting an error, Nomad appears to silently drop some of the config information.

Nomad should complain that it cannot parse the config.

-- cbednarski


Like most of the people around here once I've found the tools you guys make I completely fall in love with them :).

I would like in the future to deploy a cluster built with Nomad, Vault, Consul and Consul Template on top of Docker.

So I started playing around with them. And I thought I would take one of our project and construct a development environment on local machines with the help of boot2docker, Nomad and Consul, slowly integrating the other tools.

First approach was to create custom docker images for consul and nomad and run them with --net=host. Said and done. But then I have found about #150, precisely that I can't mount volumes in containers. I would require them to mount code from the local machine inside the container and to mount a data directory for the database.

Reading around I came to the conclusion that I can use raw_exec driver with command=docker and args=["-v ...",].

So I have created a custom boot2docker.iso with nomad inside.

I have the following agent configuration:

cat /usr/local/etc/nomad/agent.json
{
        "name": "dev",
        "log_level": "DEBUG",
        "enable_debug": true,
        "region": "localhost",
        "datacenter": "boot2docker",
        "data_dir": "/var/lib/boot2docker/nomad",
        "server": {
                "enabled": true,
                "bootstrap_expect": 1
        },
        "client": {
                "enabled": true,
                "network_interface": "eth1"
                "options": {
                        "driver.raw_exec.enable" = "1"
                }
        }
}

Nomad logs in /var/lib/boot2docker/nomad.log. I can see the following logs when I start nomad:

------------------------
/usr/local/bin/nomad agent -config /usr/local/etc/nomad/agent.json --bind=192.168.200.102 --servers=192.168.200.102:4647 >> "/var/lib/boot2docker/nomad.log"
==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
    Loaded configuration from /usr/local/etc/nomad/agent.json
==> Starting Nomad agent...
==> Nomad agent configuration:

                 Atlas: <disabled>
                Client: true
             Log Level: DEBUG
                Region: localhost (DC: boot2docker)
                Server: true

==> Nomad agent started! Log data will stream in below:

    2016/02/02 06:55:36 [INFO] raft: Node at 192.168.200.102:4647 [Follower] entering Follower state
    2016/02/02 06:55:36 [INFO] serf: EventMemberJoin: dev.localhost 192.168.200.102
    2016/02/02 06:55:36 [INFO] nomad: starting 4 scheduling worker(s) for [system service batch _core]
    2016/02/02 06:55:36 [INFO] client: using state directory /var/lib/boot2docker/nomad/client
    2016/02/02 06:55:36 [INFO] client: using alloc directory /var/lib/boot2docker/nomad/alloc
    2016/02/02 06:55:36 [INFO] nomad: adding server dev.localhost (Addr: 192.168.200.102:4647) (DC: boot2docker)
    2016/02/02 06:55:36 [DEBUG] client: periodically fingerprinting consul at duration 15s
    2016/02/02 06:55:37 [WARN] raft: Heartbeat timeout reached, starting election
    2016/02/02 06:55:37 [INFO] raft: Node at 192.168.200.102:4647 [Candidate] entering Candidate state
    2016/02/02 06:55:37 [DEBUG] raft: Votes needed: 1
    2016/02/02 06:55:37 [DEBUG] raft: Vote granted. Tally: 1
    2016/02/02 06:55:37 [INFO] raft: Election won. Tally: 1
    2016/02/02 06:55:37 [INFO] raft: Node at 192.168.200.102:4647 [Leader] entering Leader state
    2016/02/02 06:55:37 [INFO] nomad: cluster leadership acquired
    2016/02/02 06:55:37 [INFO] raft: Disabling EnableSingleNode (bootstrap)
    2016/02/02 06:55:37 [DEBUG] raft: Node 192.168.200.102:4647 updated peer set (2): [192.168.200.102:4647]
    2016/02/02 06:55:38 [DEBUG] fingerprint.env_aws: Error querying AWS Metadata URL, skipping
    2016/02/02 06:55:38 [DEBUG] fingerprint.env_gce: Error querying GCE Metadata URL, skipping
    2016/02/02 06:55:38 [DEBUG] fingerprint.network: Detected interface eth1  with IP 192.168.200.102 during fingerprinting
    2016/02/02 06:55:38 [DEBUG] client: applied fingerprints [arch cpu host memory network storage]
    2016/02/02 06:55:38 [DEBUG] driver.docker: using client connection initialized from environment
    2016/02/02 06:55:38 [DEBUG] driver.docker: privileged containers are disabled
    2016/02/02 06:55:38 [DEBUG] client: available drivers [docker exec]
    2016/02/02 06:55:38 [DEBUG] client: node registration complete
    2016/02/02 06:55:38 [DEBUG] client: updated allocations at index 1 (0 allocs)
    2016/02/02 06:55:38 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 0)
    2016/02/02 06:55:41 [DEBUG] client: state updated to ready

As per the log above the raw_exec driver is not loaded only docker and exec.

I have a dev.nomad job file that looks like:

# There can only be a single job definition per file.
# Create a job with ID and Name 'example'
job "mobum" {
    # Run the job in the global region, which is the default.
    region = "localhost"

    # Specify the datacenters within the region this job can run in.
    datacenters = ["boot2docker"]

    # Service type jobs optimize for long-lived services. This is
    # the default but we can change to batch for short-lived tasks.
    type = "service"

    # Priority controls our access to resources and scheduling priority.
    # This can be 1 to 100, inclusively, and defaults to 50.
    # priority = 50

    # Restrict our job to only linux. We can specify multiple
    # constraints as needed.
    constraint {
        attribute = "$attr.kernel.name"
        value = "linux"
    }

    # Configure the job to do rolling updates
    update {
        # Stagger updates every 10 seconds
        stagger = "10s"

        # Update a single task at a time
        max_parallel = 1
    }

    # Create a 'cache' group. Each task in the group will be
    # scheduled onto the same machine.
    task "mobum-postgres" {
        # Use Docker to run the task.
        driver = "docker"
        # Configure Docker driver with the image
        config {
            image = "postgres"
            port_map {
                postgres = 5432
            }
        }
        service {
            name = "mobum-postgres"
            tags = ["mobum", "postgres"]
            port = "postgres"
            check {
                name = "alive"
                type = "tcp"
                interval = "10s"
                timeout = "2s"
            }
        }
        # We must specify the resources required for
        # this task to ensure it runs on a machine with
        # enough capacity.
        resources {
            cpu = 500 # 500 Mhz
            memory = 256 # 256MB
            network {
                mbits = 10
                port "postgres" {
                }
            }
        }
    }
    task "mobum-django-app" {
        # Use Docker to run the task.
        driver = "raw_exec"
        # Configure Docker driver with the image
        config {
            command = "/usr/local/bin/docker"
            args = [ "run", "-d", "--hostname=mobum-django-app", "-p 8080:8080", "-v /Users:/Users", "--name=${BASE}", "mobum:latest" ]
        }
        service {
            name = "mobum-django-app"
            tags = ["mobum", "django-app"]
            port = "8080"
            check {
                name = "alive"
                type = "tcp"
                interval = "10s"
                timeout = "2s"
            }
        }
        # We must specify the resources required for
        # this task to ensure it runs on a machine with
        # enough capacity.
        resources {
            cpu = 500 # 500 Mhz
            memory = 256 # 256MB
            network {
                mbits = 10
                port "8080" {
            static = 8080
                }
            }
        }
    }
}

Running it with nomad run dev.nomad gives the following output on the client:

$ nomad run dev.nomad
==> Monitoring evaluation "9ea8a959-2dff-18e5-33eb-a7474c60eddd"
    Evaluation triggered by job "mobum"
    Scheduling error for group "mobum-django-app" (failed to find a node for placement)
    Allocation "9f961362-cda5-1f1c-f226-85fd8b6e0d85" status "failed" (1/1 nodes filtered)
      * Constraint "missing drivers" filtered 1 nodes
    Allocation "ba22a437-c3f3-f314-19cd-287913da2f1a" created: node "627af627-4b96-8ac2-31d1-0707ab6bfe2d", group "mobum-postgres"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "9ea8a959-2dff-18e5-33eb-a7474c60eddd" finished with status "complete"

mobum-django-app failed because it couldn't find a node which has driver raw_exec. As expected I would say :).

I didn't stop here so I switched raw_exec with exec for task mobum-django-app and now I have the following output on the client:

$ nomad run dev.nomad
==> Monitoring evaluation "e0f7386b-6b4f-250b-2867-626b391ebd77"
    Evaluation triggered by job "mobum"
    Allocation "f2341149-a8c7-db3f-3483-21fa1bbe12b4" created: node "627af627-4b96-8ac2-31d1-0707ab6bfe2d", group "mobum-django-app"
    Allocation "301fa2a4-2ef5-8876-ea9e-e5cbc209f0cf" created: node "627af627-4b96-8ac2-31d1-0707ab6bfe2d", group "mobum-postgres"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "e0f7386b-6b4f-250b-2867-626b391ebd77" finished with status "complete"

But the mobum-django-app task fails to configure task directory as per the nomad logs.

    2016/02/02 07:02:59 [DEBUG] client: starting runner for alloc '301fa2a4-2ef5-8876-ea9e-e5cbc209f0cf'
    2016/02/02 07:02:59 [DEBUG] worker: updated evaluation <Eval 'e0f7386b-6b4f-250b-2867-626b391ebd77' JobID: 'mobum'>
    2016/02/02 07:02:59 [DEBUG] worker: ack for evaluation e0f7386b-6b4f-250b-2867-626b391ebd77
    2016/02/02 07:02:59 [DEBUG] client: starting task context for 'mobum-postgres' (alloc '301fa2a4-2ef5-8876-ea9e-e5cbc209f0cf')
    2016/02/02 07:02:59 [DEBUG] client: starting runner for alloc 'f2341149-a8c7-db3f-3483-21fa1bbe12b4'
    2016/02/02 07:02:59 [DEBUG] client: starting task context for 'mobum-django-app' (alloc 'f2341149-a8c7-db3f-3483-21fa1bbe12b4')
    2016/02/02 07:02:59 [ERR] client: failed to start task 'mobum-django-app' for alloc 'f2341149-a8c7-db3f-3483-21fa1bbe12b4': failed to configure task directory: Couldn't mount /dev to /var/lib/boot2docker/nomad/alloc/f2341149-a8c7-db3f-3483-21fa1bbe12b4/mobum-django-app/dev: no such device
    2016/02/02 07:03:00 [DEBUG] client: updated allocations at index 40 (5 allocs)
    2016/02/02 07:03:00 [DEBUG] client: allocs: (added 0) (removed 0) (updated 4) (ignore 1)
    2016/02/02 07:03:00 [ERR] client: dropping update to alloc '9750e8b0-2bee-4c4e-3357-4ab842e05e13'
    2016/02/02 07:03:00 [ERR] client: dropping update to alloc '58578772-a831-0771-3d07-8828c43dcbcb'
    2016/02/02 07:03:00 [DEBUG] http: Request /v1/evaluation/e0f7386b-6b4f-250b-2867-626b391ebd77 (208.922µs)
    2016/02/02 07:03:00 [DEBUG] http: Request /v1/evaluation/e0f7386b-6b4f-250b-2867-626b391ebd77/allocations (83.324µs)
    2016/02/02 07:03:30 [DEBUG] driver.docker: docker pull postgres:latest succeeded
    2016/02/02 07:03:30 [DEBUG] driver.docker: identified image postgres as 54fa18d9f3b6c5b350ec4588bdb4f4e90df21e3fd6c767094a1f77c13cd5b453
    2016/02/02 07:03:30 [DEBUG] driver.docker: using 268435456 bytes memory for postgres
    2016/02/02 07:03:30 [DEBUG] driver.docker: using 500 cpu shares for postgres
    2016/02/02 07:03:30 [DEBUG] driver.docker: binding directories []string{"/var/lib/boot2docker/nomad/alloc/301fa2a4-2ef5-8876-ea9e-e5cbc209f0cf/alloc:/alloc:rw,z", "/var/lib/boot2docker/nomad/alloc/301fa2a4-2ef5-8876-ea9e-e5cbc209f0cf/mobum-postgres:/local:rw,Z"} for postgres
    2016/02/02 07:03:30 [DEBUG] driver.docker: networking mode not specified; defaulting to bridge
    2016/02/02 07:03:30 [DEBUG] driver.docker: allocated port 192.168.200.102:35979 -> 5432 (mapped)
    2016/02/02 07:03:30 [DEBUG] driver.docker: exposed port 5432
    2016/02/02 07:03:30 [DEBUG] driver.docker: setting container name to: mobum-postgres-301fa2a4-2ef5-8876-ea9e-e5cbc209f0cf
    2016/02/02 07:03:30 [INFO] driver.docker: created container b0627794c6d181699cd87828f5b34b5c9e207a58268d704a65abfe5bf67b4279
    2016/02/02 07:03:30 [INFO] driver.docker: started container b0627794c6d181699cd87828f5b34b5c9e207a58268d704a65abfe5bf67b4279
    2016/02/02 07:03:30 [INFO] consul: registering service mobum-postgres with consul.
    2016/02/02 07:03:30 [DEBUG] client: updated allocations at index 41 (5 allocs)
    2016/02/02 07:03:30 [DEBUG] client: allocs: (added 0) (removed 0) (updated 5) (ignore 0)
    2016/02/02 07:03:30 [ERR] client: dropping update to alloc '58578772-a831-0771-3d07-8828c43dcbcb'
    2016/02/02 07:03:30 [ERR] client: dropping update to alloc '9750e8b0-2bee-4c4e-3357-4ab842e05e13'

Would the situation change if I load raw_exec? How can I load it for testing purposes? Thanks.

@cbednarski
Copy link
Contributor

I'm not sure if this is the problem but your config is not valid JSON. It looks like you have mixed HCL syntax with JSON syntax. I'm not sure how this will be parsed.

"driver.raw_exec.enable" = "1"
"network_interface": "eth1"

should be

"network_interface": "eth1",
"driver.raw_exec.enable": "1"

Note the , after "eth1" and : rather than =. I'm curious why you didn't get an error message from the config parser about this.

@valentinbud
Copy link
Author

@cbednarski thank you for spotting the error.

driver.raw_exec.enable should be under client:{options: {}} or just under client:{}? For now it's the former and the raw_exec driver gets loaded.

Should I close this issue and open a new one for the config parser? Thanks!

@cbednarski
Copy link
Contributor

All the info is here I will just change the title.

@cbednarski cbednarski changed the title Nomad raw_exec driver not enabled Invalid config format is silently ignored Feb 2, 2016
@diptanu
Copy link
Contributor

diptanu commented Mar 16, 2016

Fixed via #910

@diptanu diptanu closed this as completed Mar 16, 2016
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants