Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: check daily job limit on each job, not just when choosing AV #3089

Merged
merged 2 commits into from
Apr 18, 2019

Conversation

davidpanderson
Copy link
Contributor

Fixes #3073

@lfield
Copy link
Contributor

lfield commented Apr 9, 2019

I have spent some time testing this and there are a few issues. The first two seem to be issues with the previous PR #3024. I don't know why I didn't find them when I was testing previously.

  1. There is a missing "where" in the clause given to the lookup function.
  2. The function hav.update_validator(hav_orig) should be used instread of hav.update()

In this PR the call to daily_quota_exceeded will always pass as the the number of tasks provided by the for loop are not taken into account. Either the DB needs to be updated on each iteration or a variable is required to store this value and taken into consideration by the function.

I was doing the check on each call to get_app_version().
But it turns out this gets called during the job scan,
before calling add_result_to_reply(),
which is where n_jobs_today gets incremented.

Solution:move the check to the loop (in sched_score.cpp)
where add_result_reply() is called.

Also: fix bad logic in work_needed() where a global check
(on job limit in user project prefs)
was being done inside a loop over resources.
@davidpanderson
Copy link
Contributor Author

Fixed now (I think) in this branch and the new dpa_punitive2

@lfield
Copy link
Contributor

lfield commented Apr 10, 2019

Still doesn't work. In the DB, max_jobs_per_day=1 and n_jobs_today=0 so I would expect 1 job but get 4. Here are relevant lines from the scheduler log.

2019-04-10 10:08:09.2451 [PID=3297865] Request: [USER#2] [HOST#3641] [IP 128.141.150.54] client 7.12.0
2019-04-10 10:08:09.2464 [PID=3297865] [quota] effective ncpus 1 ngpus 0
2019-04-10 10:08:09.2464 [PID=3297865] [quota] max jobs per RPC: 50
2019-04-10 10:08:09.2464 [PID=3297865] [quota] Overall limits on jobs in progress:
2019-04-10 10:08:09.2464 [PID=3297865] [quota] total: base 50 scaled 50 njobs 0
2019-04-10 10:08:09.2464 [PID=3297865] [quota] CPU: base 50 scaled 50 njobs 0
2019-04-10 10:08:09.3737 [PID=3297865] [quota] reached limit on CPU jobs in progress
2019-04-10 10:08:09.3737 [PID=3297865] [quota] Overall limits on jobs in progress:
2019-04-10 10:08:09.3737 [PID=3297865] [quota] total: base 50 scaled 50 njobs 4
2019-04-10 10:08:09.3738 [PID=3297865] [quota] CPU: base 50 scaled 50 njobs 4
2019-04-10 10:08:09.3738 [PID=3297865] [quota] reached limit on AMD/ATI GPU jobs in progress
2019-04-10 10:08:09.3738 [PID=3297865] [quota] Overall limits on jobs in progress:
2019-04-10 10:08:09.3738 [PID=3297865] [quota] total: base 50 scaled 50 njobs 4
2019-04-10 10:08:09.3738 [PID=3297865] [quota] CPU: base 50 scaled 50 njobs 4
2019-04-10 10:08:09.3882 [PID=3297865] Sending reply to [HOST#3641]: 4 results, delay req 61.00

Please could you do a pull request for dpa_punitive2, just click here. Please delete this comment first and I can merge immediately.

@davidpanderson
Copy link
Contributor Author

I did a PR for dpa_punitive2, leaving the valid comment.

@davidpanderson
Copy link
Contributor Author

In my testing this seemed to work. Remember that the daily limits are per (host, app version). Maybe it's sending jobs for other app versions. Set debug_version_select and debug_send.

@lfield
Copy link
Contributor

lfield commented Apr 17, 2019

I rebuilt the scheduler from a fresh codebase and it now seems to work as advertised. I just want to leave it running on the dev project overnight before merging.

@lfield
Copy link
Contributor

lfield commented Apr 18, 2019

Looks good.

@lfield lfield merged commit b4c0616 into master Apr 18, 2019
@AenBleidd AenBleidd deleted the dpa_job_limit branch May 21, 2019 20:58
@AenBleidd AenBleidd added this to the Server Release 1.2.0 milestone Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The daily quota is not fullly respected if close to limit.
3 participants