cache R in the job manager #5472

garlick · 2023-09-26T15:34:07Z

Problem: jobtap plugins might need convenient access to R per #5471

Per discussion, this prototypes keeping a copy of R in the job manager so that it could potentially be offered to jobtap plugins. Some notes/caveats:

R is not deleted after the job becomes inactive, which could be a potential optimization to conserve memory on JGF systems
jobtap interfaces are unchanged (I thought @grondo might already have a good idea of how he wants to do that)
if this is acceptable, RFC 27 will need updates to sched.alloc and job-manager.sched-hello responses.
this is set up so that job manager restart could replay resource-update events on top of the original R loaded from the KVS
R could be passed to job-exec.start to save a lookup (would require an update to RFC 32). (Edit: oops I included a couple of commits that started this - maybe I'll just finish it and tack it on)

garlick · 2023-09-26T16:03:18Z

Oops one place where job->R won't be populated is with alloc-bypass.

I made some sloppy errors caught by CI and will force push with fixes in a sec as soon as tests pass locally.

grondo · 2023-09-26T16:07:24Z

Great!

R is not deleted after the job becomes inactive, which could be a potential optimization to conserve memory on JGF systems

A quick thought here - should the scheduling key of R be redacted in a similar vein to the environment attribute of jobspec? I don't see a use to keeping scheduler opaque data in memory of the job manager...

jobtap interfaces are unchanged (I thought @grondo might already have a good idea of how he wants to do that)

All I was thinking was to add "R" to the callback input "args" when it is available (i.e. callbacks after the alloc event)

src/modules/job-manager/start.c

@@ -177,8 +182,10 @@
    }
 }

-static void start_response_cb (flux_t *h, flux_msg_handler_t *mh,
-                               const flux_msg_t *msg, void *arg)
+static void start_response_cb (flux_t *h,


grondo · 2023-09-26T16:20:37Z

Oops one place where job->R won't be populated is with alloc-bypass.

I didn't check the code here yet, does the scheduler still commit R to the KVS and send it back in the alloc response in parallel, or does the job-manager now commit R? The alloc-bypass plugin has to commit the provided R to the KVS itself, but if the job-manager already does this on behalf of the scheduler, perhaps we just need a way to allow a plugin to assign R and then we can drop this step and the job manager will have the copy of R in this case. (Oh, wait, won't this be necessary anyway if R is sent to the job-exec module with the start request?)

garlick · 2023-09-26T16:29:15Z

A quick thought here - should the scheduling key of R be redacted in a similar vein to the environment attribute of jobspec? I don't see a use to keeping scheduler opaque data in memory of the job manager...

That was true before, but after this PR, the job-manager.sched-hello streaming responses include a copy of R from memory. My thought was if R has been modified, then we had better give the scheduler the updated copy. For example, if the expiration has changed, the scheduler needs to re-allocate the resources for the updated time. Plus it saves a KVS lookup.

I'm not sure if fluxion relies on this. Maybe (R is passed to queue->reconstruct() )
https://github.com/flux-framework/flux-sched/blob/master/qmanager/modules/qmanager_callbacks.cpp#L159

garlick · 2023-09-26T16:33:01Z

does the scheduler still commit R to the KVS and send it back in the alloc response in parallel

Yes. That seemed the safest since I think we've advertised the invariant that once the alloc event is posted to the eventlog, R can be read from the KVS.

The alloc-bypass plugin has to commit the provided R to the KVS itself, but if the job-manager already does this on behalf of the scheduler, perhaps we just need a way to allow a plugin to assign R and then we can drop this step and the job manager will have the copy of R in this case. (Oh, wait, won't this be necessary anyway if R is sent to the job-exec module with the start request?)

It does not already do it so alloc-bypass would need a way to assign R, and would need to retain the KVS commit also.

Edit: well actually that invariant could be maintained if we committed R from the job-manager, so that's not necessarily a reason to do it this way if we need to reconsider.

grondo · 2023-09-26T16:55:19Z

That was true before, but after this PR, the job-manager.sched-hello streaming responses include a copy of R from memory. My thought was if R has been modified, then we had better give the scheduler the updated copy. For example, if the expiration has changed, the scheduler needs to re-allocate the resources for the updated time. Plus it saves a KVS lookup.

All very true! However, it saves a KVS lookup at restart only, with a tradeoff of keeping JGF for every job (including inactive jobs) and we currently don't even support restart with running jobs (though I guess we do support reloading the scheduler). If the scheduler continued to lookup R in the KVS, but instead used the new job-info lookup which incorporates resource-update events, then this would just work and R could be redacted. I'm not saying that's the right solution, but it should be considered?

Yes. That seemed the safest since I think we've advertised the invariant that once the alloc event is posted to the eventlog, R can be read from the KVS.

True. Though it would presumably be easy to delay the alloc event in the job manager until the commit of R is complete. It actually seems fine to me do it this way for now though (probably faster too)

It does not already do it so alloc-bypass would need a way to assign R, and would need to retain the KVS commit also.

Yes, that is what I was proposing, a way for a plugin to assign R. Not sure what that would look like. Sounds like if we send R in the start request this will be necessary to keep alloc-bypass functionality though?

garlick · 2023-09-26T17:31:35Z

it saves a KVS lookup at restart only, with a tradeoff of keeping JGF for every job

True!

If the scheduler continued to lookup R in the KVS, but instead used the new job-info lookup which incorporates resource-update events, then this would just work and R could be redacted.

Good point, let's do it that way!

Regarding the interface to R for jobtap plugins, I have a commit ready to push that adds an "R" key to the jobtap input args. I was going to try also adding "R" to the output args just for alloc-bypass, then if it's found there and job->R is unset, letting it be set. Hopefully that's not too big of a foot-gun, but I figured I'd try the simple thing first.

garlick · 2023-09-26T22:43:46Z

Re-pushed with the following changes

added support for R in jobtap IN and OUT args
updated alloc-bypass to set R in OUT args
Include R in the job-exec.start request
Dropped the schedutil hello changes (R is fetched from the KVS, which we can easily change to job-info when it provides the modified R.
rebased on current master

garlick · 2023-09-26T22:44:41Z

instead used the new job-info lookup which incorporates resource-update events

That's not available yet, correct?

grondo · 2023-09-26T23:14:25Z

That's not available yet, correct?

PR posted in #5467 - unfortunately I just realized that job-info lookup RPC still returns the original R because the application of the resource-update events is done on the client side in flux-job. I wonder now if that was the right solution (though it does move processing out of the job-info module itself, it means users of the API have to apply the updates themselves? Maybe @chu11 can comment here.

Edit: In a way it would be nice if job-info.lookup just magically returned the updated R and jobspec.. however I did not look into this as deeply as @chu11 so not sure if there are roadblocks to doing it that way.

chu11 · 2023-09-27T00:26:53Z

PR posted in #5467 - unfortunately I just realized that job-info lookup RPC still returns the original R because the application of the resource-update events is done on the client side in flux-job. I wonder now if that was the right solution (though it does move processing out of the job-info module itself, it means users of the API have to apply the updates themselves? Maybe @chu11 can comment here.

apologies, there's actually two PRs so it may be confusing

#5467 - an update-watch service that will get the currently up to date "R" (w/ resource-update events applied) and then stream any new R changes that may occur in the future.

#5464 - flux job info <jobid> R gets the updated R, but does this on the client side. This is a mirror of flux job info <jobid> jobspec, to keep the burden off of the job-info module. This PR could be updated to use the service in #5467, but I elected to keep it out of the broker for the time being because these are likely "one off" lookups, vs the streaming updates are needed for various broker services. (and admittedly this PR was in development before issue #5451 was written )

Edit: In a way it would be nice if job-info.lookup just magically returned the updated R and jobspec.. however I did not look into this as deeply as @chu11 so not sure if there are roadblocks to doing it that way.

There is no roadblock that I can see. The entirety of why it was done client side in flux job info (and python equivalent code) is to keep the burden out of the job-info module. Some of my pro-con brainstorming here: #5411 (comment).

grondo · 2023-10-24T15:09:15Z

This PR would be useful in development of duration update for running jobs. The job-manager would not have to fetch R from the job-info service to apply a duration update. A cursory examination of this PR indicates it is in pretty good shape. @garlick, what remains to do here to remove the WIP? (Did we ever decide if the cached R should be freed for inactive jobs? Given that we have users wanting run 1M jobs, perhaps that would be a good idea...)

garlick · 2023-10-24T18:27:22Z

This has been sitting for a while. A brief review didn't turn up any big gaps, that I caught anyway. Just force pushed with the following changes:

drop the scheduling key from R when the job transitions to INACTIVE state to save memory
drop the scheduling key from R when including it in the exec.start request
rebase on current master

garlick · 2023-10-24T21:46:25Z

Per offline discussion with @grondo, repushed with the R scheduling key redacted upon receipt. It's still in the KVS of course, and schedutil still looks it up there when the scheduler is reloaded.

grondo

LGTM!

Problem: libschedutil code sometimes breaks long function parameter lists in blocks instead of one per line, which is inconsistent with the modern flux code base. Break function parameters one per line when the list cannot fit on one line (except json pack/unpack key-value pairs).

Problem: alloc responses use a common function that is not suited to adding one more optional parameter. Refactor alloc.c to use a more general "pack" style utility function.

Problem: schedutil_alloc_respond_success_pack() writes R to the KVS then responds to the job-manager without R as required by RFC 27, but in emerging cases, it may be handy to give jobtap plugins access to R. Include R in the alloc response.

Problem: resource_set_create uses a calloc() buffer without checking for NULL. Add check.

Problem: when R is passed to job-exec as a json object, it will be convenient to have some rset interfaces to handle R as a json object. Add resource_set_create_fromjson (). Add resource_set_get_json ().

Problem: there is no place to store R in the job manager job object. Add R to struct and call json_decref() from job destructor.

Problem: job_create_from_eventlog() does not accept R, but this will be required once R is tracked by the job-manager. Add R argument to job_create_from_eventlog(). This initializes job->R before the eventlog is replayed, for the eventual support of resource-update events. Update unit test.

Problem: the scheduler now returns R in the sched.alloc response, but the job manager ignores it. Accept R and make it part of the job object.

Problem: lookup_job() repeats a block of code for each KVS key it looks up, and adding one more raises the annoyance level. Create local job data lookup helpers to reduce code duplication.

Problem: R is expected to be part of the job-manager job object after allocation, but this is not the case for jobs loaded from the KVS after a job manager restart. Load R from the KVS, if available, and make it part of the job object.

Problem: there is not a simple way to access the job manager's in-memory copy of R for testing. Allow R to be fetched with the job-manager.getattr RPC.

Problem: there are no tests that show the job manager holds a copy of R. Add a sharness test. In the future it can cover resource-update changes to R.

Problem: job-manager/start.c sometimes breaks long function parameter lists in blocks instead of one per line, which is inconsistent with the modern flux code base. Break function parameters one per line when the list cannot fit on one line (except json pack/unpack key-value pairs).

Problem: jobtap_call() defines 'rc' both at function scope and within a local block. Rename the variable in the local block so they don't confuse anyone.

Problem: jobtap plugins may need access to R. Add an "R" key to the input args if R has been allocated.

Problem: alloc-bypass needs to need set job->R. Add an "R" key to the output args. If present in output args and preconditions are met, namely: - callback is job.state.sched - not already set in the job struct then set job->R_redacted to the argument value.

Problem: alloc-bypass represents R internally as a string for storage to the KVS, but soon we will need it in object form for adding to the plugin output args. Represent it internally as an object and use KVS interfaces that accept an object.

Problem: alloc-bypass now needs to set R in the output args so that the job manager's in-memory copy of R is set. Set "R" in the output args of the job.state.sched callback.

Problem: job-exec fetches R from the KVS but we have it now in the job manager so it could be included in the exec.start request. Add R to the exec.start request.

Problem: R is now present in the exec.start so job-exec no longer needs to look it up. Initialize job->R and critical ranks in the start request handler rather than later in a continuation.

Problem: a job exec test that fakes an invalid R by placing it in the KVS no longer works because R comes directly from the job manager. Remove the test for now.

codecov · 2023-10-25T02:10:54Z

Codecov Report

Merging #5472 (4d7ee26) into master (fce7220) will decrease coverage by 0.06%.
The diff coverage is 73.29%.

@@            Coverage Diff             @@
##           master    #5472      +/-   ##
==========================================
- Coverage   83.47%   83.42%   -0.06%     
==========================================
  Files         487      487              
  Lines       81918    81990      +72     
==========================================
+ Hits        68381    68398      +17     
- Misses      13537    13592      +55

Files	Coverage Δ
src/common/libschedutil/ops.c	`73.17% <100.00%> (ø)`
src/common/libschedutil/ready.c	`63.33% <100.00%> (ø)`
src/modules/job-manager/job.c	`89.16% <100.00%> (+0.20%)`	⬆️
src/common/libschedutil/hello.c	`70.00% <66.66%> (ø)`
src/modules/job-manager/plugins/alloc-bypass.c	`68.18% <83.33%> (ø)`
src/modules/job-manager/restart.c	`83.02% <96.15%> (+0.67%)`	⬆️
src/modules/job-manager/start.c	`72.78% <80.00%> (ø)`
src/modules/job-exec/rset.c	`88.34% <76.92%> (-1.87%)`	⬇️
src/modules/job-manager/alloc.c	`77.59% <62.50%> (-0.36%)`	⬇️
src/modules/job-manager/getattr.c	`65.15% <42.85%> (-0.96%)`	⬇️
... and 3 more

... and 9 files with indirect coverage changes

garlick force-pushed the issue#5471 branch from 8957423 to 27bc94c Compare September 26, 2023 16:05

github-advanced-security bot found potential problems Sep 26, 2023

View reviewed changes

garlick force-pushed the issue#5471 branch from 27bc94c to c4def6a Compare September 26, 2023 22:37

garlick mentioned this pull request Sep 27, 2023

job-info: support new update-lookup and update-watch service #5467

Merged

garlick force-pushed the issue#5471 branch from c4def6a to 74bee95 Compare October 24, 2023 18:25

garlick changed the title ~~WIP: cache R in the job manager~~ cache R in the job manager Oct 24, 2023

garlick force-pushed the issue#5471 branch from 74bee95 to 84897c3 Compare October 24, 2023 21:45

grondo approved these changes Oct 24, 2023

View reviewed changes

garlick added the merge-when-passing label Oct 24, 2023

garlick added 7 commits October 25, 2023 01:32

libschedutil: refactor alloc

f702424

Problem: alloc responses use a common function that is not suited to adding one more optional parameter. Refactor alloc.c to use a more general "pack" style utility function.

libschedutil: add R to alloc response

dd07975

Problem: schedutil_alloc_respond_success_pack() writes R to the KVS then responds to the job-manager without R as required by RFC 27, but in emerging cases, it may be handy to give jobtap plugins access to R. Include R in the alloc response.

job-exec: fix unchecked calloc() return value

2543a2a

Problem: resource_set_create uses a calloc() buffer without checking for NULL. Add check.

job-exec: add json functions to rset

f5da1b9

Problem: when R is passed to job-exec as a json object, it will be convenient to have some rset interfaces to handle R as a json object. Add resource_set_create_fromjson (). Add resource_set_get_json ().

job-manager: add R to job object

dd07ce6

Problem: there is no place to store R in the job manager job object. Add R to struct and call json_decref() from job destructor.

garlick added 14 commits October 25, 2023 01:32

job-manager: accept R in alloc response

a6728ae

Problem: the scheduler now returns R in the sched.alloc response, but the job manager ignores it. Accept R and make it part of the job object.

job-manager: refactor lookup_job() in restart

906f361

Problem: lookup_job() repeats a block of code for each KVS key it looks up, and adding one more raises the annoyance level. Create local job data lookup helpers to reduce code duplication.

job-manager: lookup R during replay

229333e

Problem: R is expected to be part of the job-manager job object after allocation, but this is not the case for jobs loaded from the KVS after a job manager restart. Load R from the KVS, if available, and make it part of the job object.

job-manager: allow R to be fetched via getattr

0e4d4eb

Problem: there is not a simple way to access the job manager's in-memory copy of R for testing. Allow R to be fetched with the job-manager.getattr RPC.

testsuite: add test for job manager copy of R

09d3977

Problem: there are no tests that show the job manager holds a copy of R. Add a sharness test. In the future it can cover resource-update changes to R.

jobtap: fix shadowed local variable

496fabc

Problem: jobtap_call() defines 'rc' both at function scope and within a local block. Rename the variable in the local block so they don't confuse anyone.

jobtap: add R to plugin input args, if available

4158ae2

Problem: jobtap plugins may need access to R. Add an "R" key to the input args if R has been allocated.

alloc-bypass: set R in output arguments

23ac7f3

Problem: alloc-bypass now needs to set R in the output args so that the job manager's in-memory copy of R is set. Set "R" in the output args of the job.state.sched callback.

job-manager: add R to exec.start request

993b890

Problem: job-exec fetches R from the KVS but we have it now in the job manager so it could be included in the exec.start request. Add R to the exec.start request.

job-exec: get R from exec.start request

2bb2330

Problem: R is now present in the exec.start so job-exec no longer needs to look it up. Initialize job->R and critical ranks in the start request handler rather than later in a continuation.

testsuite: drop test of invalid R

4d7ee26

Problem: a job exec test that fakes an invalid R by placing it in the KVS no longer works because R comes directly from the job manager. Remove the test for now.

grondo force-pushed the issue#5471 branch from 84897c3 to 4d7ee26 Compare October 25, 2023 01:32

mergify bot merged commit 9528de4 into flux-framework:master Oct 25, 2023
31 of 32 checks passed

grondo mentioned this pull request Nov 7, 2023

Keep a copy of R in job-manager for use in jobtap plugin callbacks #5471

Closed

garlick deleted the issue#5471 branch March 1, 2024 14:32

cmoussa1 mentioned this pull request Mar 28, 2024

limits: add max-nodes limit support flux-framework/flux-accounting#349

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache R in the job manager #5472

cache R in the job manager #5472

garlick commented Sep 26, 2023 •

edited

Loading

garlick commented Sep 26, 2023

grondo commented Sep 26, 2023

grondo commented Sep 26, 2023

garlick commented Sep 26, 2023

garlick commented Sep 26, 2023 •

edited

Loading

grondo commented Sep 26, 2023

garlick commented Sep 26, 2023

garlick commented Sep 26, 2023

garlick commented Sep 26, 2023

grondo commented Sep 26, 2023 •

edited

Loading

chu11 commented Sep 27, 2023

grondo commented Oct 24, 2023

garlick commented Oct 24, 2023

garlick commented Oct 24, 2023

grondo left a comment

codecov bot commented Oct 25, 2023

cache R in the job manager #5472

cache R in the job manager #5472

Conversation

garlick commented Sep 26, 2023 • edited Loading

garlick commented Sep 26, 2023

grondo commented Sep 26, 2023

grondo commented Sep 26, 2023

garlick commented Sep 26, 2023

garlick commented Sep 26, 2023 • edited Loading

grondo commented Sep 26, 2023

garlick commented Sep 26, 2023

garlick commented Sep 26, 2023

garlick commented Sep 26, 2023

grondo commented Sep 26, 2023 • edited Loading

chu11 commented Sep 27, 2023

grondo commented Oct 24, 2023

garlick commented Oct 24, 2023

garlick commented Oct 24, 2023

grondo left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 25, 2023

Codecov Report

garlick commented Sep 26, 2023 •

edited

Loading

garlick commented Sep 26, 2023 •

edited

Loading

grondo commented Sep 26, 2023 •

edited

Loading