Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't resolve ID of some jobs placed in another region #3679

Closed
tantra35 opened this issue Dec 20, 2017 · 14 comments
Closed

Can't resolve ID of some jobs placed in another region #3679

tantra35 opened this issue Dec 20, 2017 · 14 comments
Labels
theme/api HTTP API and SDK issues type/bug

Comments

@tantra35
Copy link
Contributor

tantra35 commented Dec 20, 2017

Nomad version

Nomad v0.7.1 (0b295d3)

If we made this cli call we can see all jobs placed in some region:

nomad status -region=atf01
ID                    Type     Priority  Status   Submit Date
appsexternalproxy     service  50        running  12/20/17 13:27:36 MSK
appsinternalproxy     service  70        running  12/20/17 15:38:13 MSK
fbassassin            service  50        running  12/20/17 15:42:00 MSK
fbassassin-nsqlookup  service  50        running  12/13/17 00:09:39 MSK
ldap                  service  80        running  12/07/17 17:42:31 MSK
redis                 service  50        running  12/20/17 15:14:21 MSK
smtprelay             service  70        running  11/28/17 18:56:05 MSK
tdagent-local         service  70        running  12/08/17 18:06:21 MSK
zabbixfrontend        service  70        running  12/20/17 16:13:06 MSK
zabbixproxy           service  50        running  12/04/17 14:10:08 MSK
zabbixserver          service  70        running  12/04/17 14:12:11 MSK

but if we want to see status of partucular job for some of them we got follow:

nomad status -region=atf01 zabbixserver
Unable to resolve ID: "zabbixserver"

but for example nomad status of zabbixproxy work perfectly:

nomad status -region=atf01 zabbixproxy
ID            = zabbixproxy
Name          = zabbixproxy
Submit Date   = 12/04/17 14:10:08 MSK
Type          = service
Priority      = 50
Datacenters   = test
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group   Queued  Starting  Running  Failed  Complete  Lost
zabbixproxy  0       0         1        0       0         0

Latest Deployment
ID          = a3025182
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group   Desired  Placed  Healthy  Unhealthy
zabbixproxy  1        1       1        0

Allocations
ID        Node ID   Task Group   Version  Desired  Status   Created    Modified
f088dae5  bcd00076  zabbixproxy  2        run      running  16d7h ago  <none>

We see this in 0.6.3 too but thinks that this is GH-3203. But upgrade from 0.6.3 to 0.7.1 doens't solve problem.

@dadgar
Copy link
Contributor

dadgar commented Dec 20, 2017

@tantra35 Are both regions running 0.7.1?

@dadgar dadgar added theme/api HTTP API and SDK issues type/bug labels Dec 20, 2017
@tantra35
Copy link
Contributor Author

@dadgar Yes of course we upgrade all regions

@tantra35
Copy link
Contributor Author

And this is independent of job name length, as we thinked before, because fbassassin got 'Unable to resolve ID', but zabbixproxy not and fbassassin is shorter than zabbixproxy

@dadgar
Copy link
Contributor

dadgar commented Dec 20, 2017

@tantra35 Can you do curl http://127.0.0.1:4646/v1/job/zabbixserver?region=atf01. I want to see if it is a CLI problem or even via API it fails

@tantra35
Copy link
Contributor Author

tantra35 commented Dec 20, 2017

curl $NOMAD_ADDR/v1/job/zabbixserver?region=atf01

{
   "Stop":false,
   "Region":"atf01",
   "Namespace":"default",
   "ID":"zabbixserver",
   "ParentID":"",
   "Name":"zabbixserver",
   "Type":"service",
   "Priority":70,
   "AllAtOnce":false,
   "Datacenters":[
      "test"
   ],
   "Constraints":[
      {
         "LTarget":"${attr.kernel.name}",
         "RTarget":"linux",
         "Operand":"="
      }
   ],
   "TaskGroups":[
      {
         "Name":"zabbixserver",
         "Count":1,
         "Update":{
            "Stagger":10000000000,
            "MaxParallel":1,
            "HealthCheck":"checks",
            "MinHealthyTime":10000000000,
            "HealthyDeadline":300000000000,
            "AutoRevert":false,
            "Canary":0
         },
         "Constraints":null,
         "RestartPolicy":{
            "Attempts":2,
            "Interval":60000000000,
            "Delay":15000000000,
            "Mode":"delay"
         },
         "Tasks":[
            {
               "Name":"zabbixserver",
               "Driver":"docker",
               "User":"",
               "Config":{
                  "image":"https://675869518239.dkr.ecr.eu-central-1.amazonaws.com/zabbix-server-mysql:3.4.4",
                  "network_mode":"host",
                  "args":[
                     "-c",
                     "/usr/bin/runsvdir -P /etc/service/"
                  ],
                  "command":"/sbin/init_plrx",
                  "dns_servers":[
                     "172.17.0.1"
                  ]
               },
               "Env":{
                  "ZBXMYSQLSERVER":"master.mysql.service.consul"
               },
               "Services":[
                  {
                     "Name":"rootzabbixserver",
                     "PortLabel":"",
                     "AddressMode":"driver",
                     "Tags":null,
                     "Checks":null
                  }
               ],
               "Vault":null,
               "Templates":null,
               "Constraints":null,
               "Resources":{
                  "CPU":1500,
                  "MemoryMB":700,
                  "DiskMB":0,
                  "IOPS":0,
                  "Networks":[
                     {
                        "Device":"",
                        "CIDR":"",
                        "IP":"",
                        "MBits":10,
                        "ReservedPorts":[
                           {
                              "Label":"appport",
                              "Value":10051
                           }
                        ],
                        "DynamicPorts":null
                     }
                  ]
               },
               "DispatchPayload":null,
               "Meta":null,
               "KillTimeout":5000000000,
               "LogConfig":{
                  "MaxFiles":3,
                  "MaxFileSizeMB":10
               },
               "Artifacts":null,
               "Leader":false,
               "ShutdownDelay":0,
               "KillSignal":""
            },
            {
               "Name":"zabbixserverattacheni",
               "Driver":"docker",
               "User":"",
               "Config":{
                  "image":"https://675869518239.dkr.ecr.eu-central-1.amazonaws.com/attacheni:po01",
                  "network_mode":"host",
                  "command":"/opt/eni.py"
               },
               "Env":{
                  "AWSREGION":"eu-central-1",
                  "FILTERS":"tag:Name=zabbixserver"
               },
               "Services":null,
               "Vault":null,
               "Templates":null,
               "Constraints":null,
               "Resources":{
                  "CPU":50,
                  "MemoryMB":60,
                  "DiskMB":0,
                  "IOPS":0,
                  "Networks":null
               },
               "DispatchPayload":null,
               "Meta":null,
               "KillTimeout":5000000000,
               "LogConfig":{
                  "MaxFiles":3,
                  "MaxFileSizeMB":10
               },
               "Artifacts":null,
               "Leader":false,
               "ShutdownDelay":0,
               "KillSignal":""
            }
         ],
         "EphemeralDisk":{
            "Sticky":false,
            "SizeMB":300,
            "Migrate":false
         },
         "Meta":null
      }
   ],
   "Update":{
      "Stagger":10000000000,
      "MaxParallel":1,
      "HealthCheck":"",
      "MinHealthyTime":0,
      "HealthyDeadline":0,
      "AutoRevert":false,
      "Canary":0
   },
   "Periodic":null,
   "ParameterizedJob":null,
   "Payload":null,
   "Meta":null,
   "VaultToken":"",
   "Status":"running",
   "StatusDescription":"",
   "Stable":true,
   "Version":4,
   "SubmitTime":1512385931203546180,
   "CreateIndex":1185,
   "ModifyIndex":10417,
   "JobModifyIndex":10405
}

@dadgar
Copy link
Contributor

dadgar commented Dec 20, 2017

@tantra35 Can you also try this:

  1. Put in file:
{
  "Prefix": "zabbixserver",
  "Context": ""
}
  1. curl --request POST --data @payload.json NOMAD_ADDR/v1/search?region=atf01

And to be clear the results from the other API are via querying through another regions servers?

@tantra35
Copy link
Contributor Author

tantra35 commented Dec 20, 2017

@dadgar I think you made a mistake in Contexts. i got follow error:

* RPC failed to server 172.16.9.35:4647: rpc error: context must be one of [allocs jobs nodes evals deployment] or 'all' for all contexts; got ""
* RPC failed to server 172.16.9.89:4647: rpc error: context must be one of [allocs jobs nodes evals deployment] or 'all' for all contexts; got ""
* RPC failed to server 172.16.9.87:4647: rpc error: context must be one of [allocs jobs nodes evals deployment] or 'all' for all contexts; got ""

So I allowed myself to modify payload:

root@social:/home/ruslan# cat ./payload.zabbixserver.json
{
  "Prefix": "zabbixserver",
  "Context": "all"
}

For zabbixserver(not working)

root@social:/home/ruslan# curl -XPOST --data @payload.zabbixserver.json $NOMAD_ADDR/v1/search?region=atf01
{
  "Matches":{
    "jobs":null
  },
  "Truncations":{
    "jobs":false
  },
  "Index":1658878,
  "LastContact":55450241,
  "KnownLeader":true
}

For ldap(it's working)

root@social:/home/ruslan# cat ./payload.ldap.json
{
  "Prefix": "ldap",
  "Context": "all"
}

root@social:/home/ruslan# curl -XPOST --data @payload.ldap.json $NOMAD_ADDR/v1/search?region=atf01
{
  "Matches":{
    "jobs":[
      "ldap"
    ]
  },
  "Truncations":{
    "jobs":false
  },
  "Index":1658886,
  "LastContact":0,
  "KnownLeader":true
}

@dadgar
Copy link
Contributor

dadgar commented Dec 20, 2017

@tantra35 Thanks for all the info, will really help debugging! In the mean time, I believe if you use nomad job status it should work

@tantra35
Copy link
Contributor Author

tantra35 commented Dec 20, 2017

@dadgar Sorry but no :-(

root@social:/home/ruslan# nomad job status zabbixserver
No job(s) with prefix or id "zabbixserver" found

And as i can make conclusion when we try to make search by zabbixserver prefix api doesn't return vaild results(from prev post):

  "Matches":{
    "jobs":null
  },

@dadgar
Copy link
Contributor

dadgar commented Dec 20, 2017

@tantra35 What is the results of the search request if you do it directly to that regions servers

@tantra35
Copy link
Contributor Author

tantra35 commented Dec 20, 2017

@dadgar it that case all works as expected without any problems

For examle from one of the servers in region atf01

root@consulnomad-01:/home/ruslan# nomad status
ID                    Type     Priority  Status   Submit Date
appsexternalproxy     service  50        running  12/20/17 13:27:36 MSK
appsinternalproxy     service  70        running  12/20/17 15:38:13 MSK
fbassassin            service  50        running  12/20/17 15:42:00 MSK
fbassassin-nsqlookup  service  50        running  12/13/17 00:09:39 MSK
ldap                  service  80        running  12/07/17 17:42:31 MSK
redis                 service  50        running  12/20/17 15:14:21 MSK
smtprelay             service  70        running  11/28/17 18:56:05 MSK
tdagent-local         service  70        running  12/08/17 18:06:21 MSK
zabbixfrontend        service  70        running  12/20/17 16:13:06 MSK
zabbixproxy           service  50        running  12/04/17 14:10:08 MSK
zabbixserver          service  70        running  12/04/17 14:12:11 MSK

then on the same server

root@consulnomad-01:/home/ruslan# nomad status zabbixserver
ID            = zabbixserver
Name          = zabbixserver
Submit Date   = 12/04/17 14:12:11 MSK
Type          = service
Priority      = 70
Datacenters   = test
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group    Queued  Starting  Running  Failed  Complete  Lost
zabbixserver  0       0         1        0       0         0

Latest Deployment
ID          = 847c90ff
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group    Desired  Placed  Healthy  Unhealthy
zabbixserver  1        1       1        0

Allocations
ID        Node ID   Task Group    Version  Desired  Status   Created    Modified
a9060860  839ac64a  zabbixserver  4        run      running  16d8h ago  <none>

@tantra35
Copy link
Contributor Author

tantra35 commented Dec 20, 2017

@dadgar I'm so sorry for what has misled you, in nomad job status zabbixserver i doesn't provide region. so at now with nomad job status all working as expected

@tantra35
Copy link
Contributor Author

tantra35 commented Dec 20, 2017

So valid way to got status for job now is nomad job status?

chelseakomlo added a commit that referenced this issue Dec 20, 2017
code review fixups; add changelog
@github-actions
Copy link

github-actions bot commented Dec 5, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
theme/api HTTP API and SDK issues type/bug
Projects
None yet
Development

No branches or pull requests

2 participants