You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
When testing the master cluster work from #64936, realised that the jobs runner doesn't work as expected when using the default local master_job_cache. This may be expected and it may be that using a shared external master_job_cache is required - if so that should be documented.
If you have master1, master2 and master3. You initiate a job that targets minions across all three masters:
When querying for jobs that have run with jobs.list_jobs:
on master1 jobs.list_jobs will list the job.
on master2 and master3 jobs.list_jobs will not list the job (even through minions attached to those masters were targetted).
When querying a specific JID using jobs.list_job <JID>:
on master1 jobs.list_job <JID> will return details about the job and returns, but only from directly connected minions
on master2 and master3 jobs.list_job <JID> will throw an error in the job section and show reurns, but only from directly connected minions.
This is to be expected if no changes have been made to how the master_job_cache works - data will only be stored for jobs initiated on a master and returns from locally connected minions, however it's not behaviour that would be expected by and end user.
We see the job and we can see it was targetted at all three minions, but we only see the return for minion1 (the directly connected minion)
On the other masters we get an error and the return for the directly connected minion:
~/git/salt/local_cluster_test/master2
barney@test:$ salt-run jobs.list_job 20230912153555367783
Error:
Cannot contact returner or no job with this jid
Result:
----------
minion2:
----------
retcode:
0
return:
True
success:
True
StartTime:
2023, Sep 12 15:35:55.367783
jid:
20230912153555367783
Expected behavior
Running jobs.list_jobs should return all jobs across the cluster.
Running jobs_list_job <JID> should work from all masters in the cluster and return the correct job data and all returns.
Versions Report
Salt: 3006.1+1136.gcaa5e39303 - current git master
Salt Version:
Salt: 3006.1+1136.gcaa5e39303Python Version:
Python: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]Dependency Versions:
cffi: 1.15.1cherrypy: Not Installeddateutil: 2.8.2docker-py: Not Installedgitdb: Not Installedgitpython: Not InstalledJinja2: 3.1.2libgit2: Not Installedlooseversion: 1.1.2M2Crypto: Not InstalledMako: Not Installedmsgpack: 1.0.5msgpack-pure: Not Installedmysql-python: Not Installedpackaging: 23.0pycparser: 2.21pycrypto: Not Installedpycryptodome: 3.17pygit2: Not Installedpython-gnupg: Not InstalledPyYAML: 6.0PyZMQ: 25.0.2relenv: 0.13.0smmap: Not Installedtimelib: Not InstalledTornado: 6.3.3ZMQ: 4.3.4Salt Package Information:
Package Type: pipSystem Versions:
dist: ubuntu 22.04.3 jammylocale: utf-8machine: x86_64release: 6.2.0-32-genericsystem: Linuxversion: Ubuntu 22.04.3 jammy
@barneysowood is the local master job cache using the cachedir config to decide where to store the cache? I do have a note in the WIP docs about needing to have the cachedir shared.
Description
When testing the master cluster work from #64936, realised that the
jobs
runner doesn't work as expected when using the default local master_job_cache. This may be expected and it may be that using a shared external master_job_cache is required - if so that should be documented.If you have master1, master2 and master3. You initiate a job that targets minions across all three masters:
jobs.list_jobs
:jobs.list_jobs
will list the job.jobs.list_jobs
will not list the job (even through minions attached to those masters were targetted).jobs.list_job <JID>
:jobs.list_job <JID>
will return details about the job and returns, but only from directly connected minionsjobs.list_job <JID>
will throw an error in the job section and show reurns, but only from directly connected minions.This is to be expected if no changes have been made to how the master_job_cache works - data will only be stored for jobs initiated on a master and returns from locally connected minions, however it's not behaviour that would be expected by and end user.
Setup
3 masters (salt-cluster-master[1-3])
3 minions (minion[1-3])
minion1 -> salt-cluster-master1
minion2 -> salt-cluster-master2
minion3 -> salt-cluster-master3
Steps to Reproduce the behavior
Run job against minion1 from salt-cluster-master1
Job runs as expected and we can see it in the list of jobs from the
jobs
runner and query the job returnIf we try and query that job on the other masters:
The other masters know about the JID but don't have the return in their job caches.
Run job against minion[1-3] from salt-cluster-master1
We can
jobs.list_jobs
on salt-cluster-master1:But doing that on the other masters returns nothing:
If we list the job using
jobs.list_job
on salt-master-cluster1:We see the job and we can see it was targetted at all three minions, but we only see the return for minion1 (the directly connected minion)
On the other masters we get an error and the return for the directly connected minion:
Expected behavior
jobs.list_jobs
should return all jobs across the cluster.jobs_list_job <JID>
should work from all masters in the cluster and return the correct job data and all returns.Versions Report
Salt: 3006.1+1136.gcaa5e39303 - current git master
Additional context
Master cluster SEP PR - saltstack/salt-enhancement-proposals#72
The text was updated successfully, but these errors were encountered: