-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flux-job: get updated version of R #5464
flux-job: get updated version of R #5464
Conversation
f59f269
to
9664379
Compare
Would it work to write a test jobtap plugin that simply emits a |
We could definitely do that in the short term. The main reason I didn't do that initially is because of the python tests. But looking at the implementation of |
9664379
to
5b62e89
Compare
re-pushed adding a I have comments that say "you may want to update this to something smarter down the road", but the tests do the job for the time being. So I removed WIP. |
5b62e89
to
d31d435
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Seems like there was a miscommunication about the resource-update
event -- it is not similar to jobspec-update
in that only an expiration
key is currently allowed, which sets a new execution.expiration
. (Commented inline as I noted spots that need to be updated.)
src/cmd/flux-job.c
Outdated
json_object_foreach (context, path, value) { | ||
if (jpath_set (R, path, value) < 0) | ||
log_err_exit ("Failed to update R"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 21 specifies that a resource-update event may only have a key of expiration
, which updates the resource set expiration. I don't think this code will work here since the expiration is located in execution.expiration
.
In the future other keys may indicate grow or shrink (or those may be different events) and those would likely not apply with jpath_set()
either, so probably better to specifically handle expiration
here and generate an error for any other key in the resource-update
event context.
t/t2230-job-info-lookup.t
Outdated
echo $jobid > updated_R.id && | ||
flux job info $jobid R | jq -e ".execution.expiration == 0.0" && | ||
kvspath=`flux job id --to=kvs ${jobid}` && | ||
flux kvs eventlog append ${kvspath}.eventlog resource-update "{\"execution.expiration\": 1000.0}" && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment. Only allowed key in resource-update
event context is expiration
.
if event.name == "resource-update": | ||
for key, value in event.context.items(): | ||
set_treedict(R, key, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, need to handle expiration
correctly here.
id, | ||
"resource-update", | ||
"{s:f}", | ||
"execution.expiration", newexp) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted elsewhere, RFC 21 specifies expiration
as the only allowed resource-update
context key.
d31d435
to
b8340b7
Compare
Doh! I had just completely misread the RFC! re-pushed with changes, mostly
|
Given the comments in #5467 I wonder if we want to consider having the |
I don't think it should be too much work, I think the big questions are:
|
I'm not sure. |
Problem: In a helper function, an extra parameter was mistakenly passed to AssertNotIn. Remove that extra parameter.
Problem: In several check helper functions in python/t0014-job-kvslookup.py, we pass in a jobid but forget to use it. Check that the jobid exists within the looked up data within those helper functions.
Problem: A few tests in python/t0014-job-kvslookup.py were named / numbered inconsistently. Although this did not affect these specific tests, it could in the future as some tests do have dependencies on other tests being run first. Solution: Fix the names / numbering.
Problem: In python/t0014-job-kvslookup.py there is a single function to check both R and J lookups. In future tests, it would be convenient to only check R or J but not both. Split up the function into a R check and J check and update callers accordingly.
Problem: Internally within flux, R can be updated via resource-update events. But when the user runs "flux job info" to get R, it is getting the version stored in the KVS and may not be representative of the "viewed" R. The result may not be consistent other tools such as "flux jobs". Solution: When retrieving R, also retrieve the job eventlog. Update R through "resource-update" events and output that as the final result. Get the stored version of R only if the --base option is specified. This change breaks several tests in t2232-job-info-security.t that created a dummy value of R. Instead of reading a dummy value of R, create a "dummy" key and read that. Fixes flux-framework#5425
Problem: There are no tests that exercise handling of `flux job info` and resource-update events in the job eventlog and operation of the `flux job info --base` option. Add coverage in t2230-job-info-lookup.t.
Problem: The "flux job info" command will present an updated view of "R" when it is looked up, or a "base" view when --base is specified. This is supported in the python job/kvslookup.py module with jobspec, but not with R. Support updated and base lookups of R as well as jobspec.
Problem: There is no coverage for the 'base' option and the "R" key in job kvs lookup via Python. Add some coverage in python/t0014-job-kvslookup.py. Use a jobtap plugin to generate resource-update event for tests.
b8340b7
to
ef87fab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Rebased and fixed up the test jobtap plugin to use R from job-manager.
@@ -285,20 +285,21 @@ test_expect_success 'flux job wait-event guest.exec.eventlog fails via -p (live | |||
|
|||
# for these tests we need to create a fake job eventlog in the KVS | |||
|
|||
# value of R irrelevant here, just need to lookup something | |||
# value of "dummy" irrelevant here, just need to lookup something |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, these test changes should have come in a separate commit before the breaking changes with an explanation. However, it isn't critical and is just my preference, so this works for me.
/* tests expecting an expiration greater than original. Instead | ||
* of getting R and calculating future time, just add 72 hours to | ||
* the current time. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R is now available in jobtap plugin args so this can be simplified. Will fix.
Codecov Report
@@ Coverage Diff @@
## master #5464 +/- ##
==========================================
+ Coverage 83.43% 83.47% +0.04%
==========================================
Files 487 487
Lines 81990 82061 +71
==========================================
+ Hits 68406 68504 +98
+ Misses 13584 13557 -27
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it gets the job done and I'm fine with merging.
We should come back around and replace the one-off jobspec/R update code with calls to the new job-info.update-lookup
RPC at some point though.
similar to #5428 this supports
resource-update
events influx job info
and the equivalent python bindings to look up the "viewed" R and not the "base" one stored in the KVS.Just like #5463, the commits with new tests do not work and have XXX tagged to be filled in later when an actual way to update the R of a job exists.
I should also note that this does not use any solution from #5451 as A) I implemented this first :P, B) copied the implementation from the jobspec side, and C) I wanted to keep this processing out of broker. We could of course change the approach.