-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error querying node allocations: invalid character 'I' in numeric literal #11100
Comments
Thanks for the report @watsonian. Does the error still happens if the CLI is connected to the server in the datacenter where the allocation is running? For example, if you set the $ NOMAD_ADDR=http://dc1.my.nomad.cluster.com:4646 nomad alloc status 59eeb575 |
@lgfa29 In my tests all DCs were in the same region and I was running the CLI request against the server the job was allocated in. |
This behavior is seen in 1.4.3 as well. Repro Steps
Setup
server1
server2
worker1
worker2
server1
server2
worker1
worker2
server1
server2
worker1
worker2
server1
server2
worker1
worker2
worker1
worker2
Re-run of the same steps, with spread stanza removed (error_no_spread.nomad)
server1
server2
worker1
worker2
server1
server2
worker1
worker2
server1
server2
worker1
worker2
server1
server2
worker1
worker2
worker1
worker2
Jobspecserror.nomad
error_no_spread.nomad
|
@ron-savoia do you have the specific API request from the server debug logs that's returning the error? That'd help narrow things down to something more easily reproducible. Your example adds a bunch of new quirks to the problem that @watsonian specifically said weren't in play, like multi-region federation. |
@tgross the odd thing is that I am not seeing an error for a particular API call in the logs. I re-ran the job and the repro steps while running a debug and have added it to the GH. Hopefully you will see what I am not. |
I forgot to add the last re-run info...
|
You need to get the logs from the agent that your Nomad CLI has as the address. Just look at the journal for that host, don't bother with all the mess of the debug bundle -- that's all noise here. |
This has been repro'd in a single region, two worker node cluster (ver. 1.4.3):
Trace level logging as well as the journalctl logs have been collected and attached. No errors were seen in the logs when The same jobspec that watsonian initially provided was used for the repro. |
Thanks @ron-savoia! I'll take a look |
Ok, I was able to reproduce this on v1.4.3. I patched in a
(See the second block under If we look at the Go object before it's serialized, we see
So what's happening here is that when the command line is deserializing from the JSON to a native object and that's throwing the error. Which makes this issue a duplicate of #8863 and #11130 A known workaround is to not have a I'm going to close this issue as a duplicate. The underlying fix isn't currently on the roadmap but if you follow those issues and/or upvote them with a 👍 reaction it helps surface the problem when we're triaging the next batch of work. |
Nomad version
Operating system and Environment details
Issue
With a two datacenter deployment when using the
spread
stanza for a job, the allocation returns an error when requesting the status:If you perform the request via curl, it comes back fine:
Reproduction steps
dc1
,dc2
). Each datacenter should have a single Nomad server and a single Nomad client (this isn't necessarily important to the reproduction, but is how I reproduced the issue). Configs supplied below.nomad alloc status $ALLOC_ID
for one of the allocs of the job. The error should appear. Commenting out thespread
stanza and running the job again causesnomad alloc status
to return the expected result.Expected Result
The allocation status should be printed.
Actual Result
Job file (if appropriate)
Nomad Server Config
Nomad Client Config
The text was updated successfully, but these errors were encountered: