-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: job-info: add streaming RPC to "watch" R #5451
Comments
I think this is a great idea.
My first thought was we can just outright watch Edit: As I re-read the above, what I was really thinking was a watch service on any key or keys of the job, jobspec, R, etc. It'd be dumb for just R. I was thinking generally speaking watching the key directly vs watching the eventlog. Dunno if you were thinking of watching the eventlog b/c you want extra smarts in there, i.e. if job not running wait, end stream when job ends, etc. |
No, I had thought watching the eventlog is required to process |
Not to say we couldn't update R in the kvs when changes occur... Note: I don't think the job manager currently even fetches a copy of R. I'll have to go review what our current plan was here. |
Agh, I'm dumb. yeah, we're not updating R in the KVS ;-) One thought, when I implemented: I specifically avoided putting it into But perhaps this discussion pushes the update back into Do you imagine it will need smarts to wait for a job to be assigned resources? end the stream when the job is finished? |
I imagine it should end the stream when the job is finished, but I think it could return an error if
Yeah, that is what I was thinking. Though we should carefully consider before moving forward so as not to unnecessarily undo any of your previous work. I was focused on R here because we may have many components that need to watch for updates, vs just getting the updated R at any given time which is more the use case for jobspec. (However, it makes sense that is a use case for R as well -- if a user fetches it should reflect updates) We should also consider if updating R in the kvs is indeed the right approach, since that could simplify some of this... |
Just spit balling here, but is there a reason to not update jobspec/R in the KVS? I assume it's a raciness issue / potential for different folks to have different views? Or perhaps who "owns" jobspec/R? Or a desire to keep the "original"? Or now that I think about it while typing this, perhaps all of the above. And probably some other reasons too I'm not thinking of yet. |
Yeah, I'd second a vote for keeping the original R/jobspec in the KVS and building the "current" R/jobspec by applying job events and with a replay on a restart. The "job schema" just makes more sense to me this way. If R were updated in place, then the update events in the eventlog would not have a context to be meaningful. |
question, should the very first reply from the service send the original R? Or should it reply with an updated R based on all |
Ooh, good question. I think your intuition is correct. If we want to show "history" at some point maybe we could add a flag. |
note for future possible flag, UNIQ option. I think it's rare enough to not waste time doing big json-diffs in the normal case. |
Problem: In the future, several services will need to know a job's resources and know the updates that would apply to them. This would currently require users to read R, watch the eventlog, and then apply `resource-update` events as they happen. It would be nice if a service did this as there will be multiple users. Solution: Support a new job-info.update-watch streaming service. It currently supports only the key "R", but can be extended to other keys in the future. The service will read R and the eventlog for a job and apply all resource-update changes as needed. This initial "R" will sent back to the caller. If the job has completed, the RPC streaming service ends. If not, the eventlog will be watched for future resource-update events. On each new resource-update event, a new R will be streamed back to the caller. This continues until the job ends or the caller cancels the stream. Fixes flux-framework#5451
Problem: In the future, several services will need to know a job's resources and know the updates that would apply to them. This would currently require users to read R, watch the eventlog, and then apply `resource-update` events as they happen. It would be nice if a service did this as there will be multiple users. Solution: Support a new job-info.update-watch streaming service. It currently supports only the key "R", but can be extended to other keys in the future. The service will read R and the eventlog for a job and apply all resource-update changes as needed. This initial "R" will sent back to the caller. If the job has completed, the RPC streaming service ends. If not, the eventlog will be watched for future resource-update events. On each new resource-update event, a new R will be streamed back to the caller. This continues until the job ends or the caller cancels the stream. Fixes flux-framework#5451
Problem: In the future, several services will need to know a job's resources and know the updates that would apply to them. This would currently require users to read R, watch the eventlog, and then apply `resource-update` events as they happen. It would be nice if a service did this as there will be multiple users. Solution: Support a new job-info.update-watch streaming service. It currently supports only the key "R", but can be extended to other keys in the future. The service will read R and the eventlog for a job and apply all resource-update changes as needed. This initial "R" will sent back to the caller. If the job has completed, the RPC streaming service ends. If not, the eventlog will be watched for future resource-update events. On each new resource-update event, a new R will be streamed back to the caller. This continues until the job ends or the caller cancels the stream. Fixes flux-framework#5451
Problem: In the future, several services will need to know a job's resources and know the updates that would apply to them. This would currently require users to read R, watch the eventlog, and then apply `resource-update` events as they happen. It would be nice if a service did this as there will be multiple users. Solution: Support a new job-info.update-watch streaming service. It currently supports only the key "R", but can be extended to other keys in the future. The service will read R and the eventlog for a job and apply all resource-update changes as needed. This initial "R" will sent back to the caller. If the job has completed, the RPC streaming service ends. If not, the eventlog will be watched for future resource-update events. On each new resource-update event, a new R will be streamed back to the caller. This continues until the job ends or the caller cancels the stream. Fixes flux-framework#5451
Problem: In the future, several services will need to know a job's resources and know the updates that would apply to them. This would currently require users to read R, watch the eventlog, and then apply `resource-update` events as they happen. It would be nice if a service did this as there will be multiple users. Solution: Support a new job-info.update-watch streaming service. It currently supports only the key "R", but can be extended to other keys in the future. The service will read R and the eventlog for a job and apply all resource-update changes as needed. This initial "R" will sent back to the caller. If the job has completed, the RPC streaming service ends. If not, the eventlog will be watched for future resource-update events. On each new resource-update event, a new R will be streamed back to the caller. This continues until the job ends or the caller cancels the stream. Fixes flux-framework#5451
Problem: In the future, several services will need to know a job's resources and know the updates that would apply to them. This would currently require users to read R, watch the eventlog, and then apply `resource-update` events as they happen. It would be nice if a service did this as there will be multiple users. Solution: Support a new job-info.update-watch streaming service. It currently supports only the key "R", but can be extended to other keys in the future. The service will read R and the eventlog for a job and apply all resource-update changes as needed. This initial "R" will sent back to the caller. If the job has completed, the RPC streaming service ends. If not, the eventlog will be watched for future resource-update events. On each new resource-update event, a new R will be streamed back to the caller. This continues until the job ends or the caller cancels the stream. Fixes flux-framework#5451
Problem: In the future, several services will need to know a job's resources and know the updates that would apply to them. This would currently require users to read R, watch the eventlog, and then apply `resource-update` events as they happen. It would be nice if a service did this as there will be multiple users. Solution: Support a new job-info.update-watch streaming service. It currently supports only the key "R", but can be extended to other keys in the future. The service will read R and the eventlog for a job and apply all resource-update changes as needed. This initial "R" will sent back to the caller. If the job has completed, the RPC streaming service ends. If not, the eventlog will be watched for future resource-update events. On each new resource-update event, a new R will be streamed back to the caller. This continues until the job ends or the caller cancels the stream. Fixes flux-framework#5451
Problem: In the future, several services will need to know a job's resources and know the updates that would apply to them. This would currently require users to read R, watch the eventlog, and then apply `resource-update` events as they happen. It would be nice if a service did this as there will be multiple users. Solution: Support a new job-info.update-watch streaming service. It currently supports only the key "R", but can be extended to other keys in the future. The service will read R and the eventlog for a job and apply all resource-update changes as needed. This initial "R" will sent back to the caller. If the job has completed, the RPC streaming service ends. If not, the eventlog will be watched for future resource-update events. On each new resource-update event, a new R will be streamed back to the caller. This continues until the job ends or the caller cancels the stream. Fixes flux-framework#5451
Problem: In the future, several services will need to know a job's resources and know the updates that would apply to them. This would currently require users to read R, read the eventlog, and then apply `resource-update` events to R. Some other users would also need to know when there are changes to R, necessitating watching the eventlog for future resource-update changes. It would be nice if a service did this as there will be multiple users. Solution: Support a new job-info.update-lookup service and job-info.update-watch streaming service. It currently supports only the key "R", but can be extended to other keys in the future. The job-info.update-lookup service will read R and the eventlog for a job. it then apples all resource-update changes to R and returns the result. job-info.update-watch service will do the same as the above, but if the job is not completed, it will continue to watch the eventlog for future resource-update events. On each new resource-update event, a new R will be streamed back to the caller. This continues until the job ends or the caller cancels the stream. Fixes flux-framework#5451
Problem: In the future, several services will need to know a job's resources and know the updates that would apply to them. This would currently require users to read R, read the eventlog, and then apply `resource-update` events to R. Some other users would also need to know when there are changes to R, necessitating watching the eventlog for future resource-update changes. It would be nice if a service did this as there will be multiple users. Solution: Support a new job-info.update-lookup service and job-info.update-watch streaming service. It currently supports only the key "R", but can be extended to other keys in the future. The job-info.update-lookup service will read R and the eventlog for a job. it then apples all resource-update changes to R and returns the result. job-info.update-watch service will do the same as the above, but if the job is not completed, it will continue to watch the eventlog for future resource-update events. On each new resource-update event, a new R will be streamed back to the caller. This continues until the job ends or the caller cancels the stream. Fixes flux-framework#5451
Problem: In the future, several services will need to know a job's resources and know the updates that would apply to them. This would currently require users to read R, read the eventlog, and then apply `resource-update` events to R. Some other users would also need to know when there are changes to R, necessitating watching the eventlog for future resource-update changes. It would be nice if a service did this as there will be multiple users. Solution: Support a new job-info.update-lookup service and job-info.update-watch streaming service. It currently supports only the key "R", but can be extended to other keys in the future. The job-info.update-lookup service will read R and the eventlog for a job. it then apples all resource-update changes to R and returns the result. job-info.update-watch service will do the same as the above, but if the job is not completed, it will continue to watch the eventlog for future resource-update events. On each new resource-update event, a new R will be streamed back to the caller. This continues until the job ends or the caller cancels the stream. Fixes flux-framework#5451
Problem: In the future, several services will need to know a job's resources and know the updates that would apply to them. This would currently require users to read R, read the eventlog, and then apply `resource-update` events to R. Some other users would also need to know when there are changes to R, necessitating watching the eventlog for future resource-update changes. It would be nice if a service did this as there will be multiple users. Solution: Support a new job-info.update-lookup service and job-info.update-watch streaming service. It currently supports only the key "R", but can be extended to other keys in the future. The job-info.update-lookup service will read R and the eventlog for a job. it then apples all resource-update changes to R and returns the result. job-info.update-watch service will do the same as the above, but if the job is not completed, it will continue to watch the eventlog for future resource-update events. On each new resource-update event, a new R will be streamed back to the caller. This continues until the job ends or the caller cancels the stream. Fixes flux-framework#5451
Problem: In the future, several services will need to know a job's resources and know the updates that would apply to them. This would currently require users to read R, read the eventlog, and then apply `resource-update` events to R. Some other users would also need to know when there are changes to R, necessitating watching the eventlog for future resource-update changes. It would be nice if a service did this as there will be multiple users. Solution: Support a new job-info.update-lookup service and job-info.update-watch streaming service. It currently supports only the key "R", but can be extended to other keys in the future. The job-info.update-lookup service will read R and the eventlog for a job. it then apples all resource-update changes to R and returns the result. job-info.update-watch service will do the same as the above, but if the job is not completed, it will continue to watch the eventlog for future resource-update events. On each new resource-update event, a new R will be streamed back to the caller. This continues until the job ends or the caller cancels the stream. Fixes flux-framework#5451
As part of the proposed solution for #4175,
resource-update
events will be posted to the job eventlog to indicate a change in a job's expiration. Interested parties can then monitor the job eventlog to be notified of these updates and adjust accordingly. However, there are multiple components that will require a new eventlog watch RPC, including the job execution service, the job shell, and any scheduler of a Flux subinstance. This is 2-3 new watches on the job eventlog per job. Not to mention the amount of code that will need to be copied to each of these use cases, processing of every eventlog event to check for the rareresource-update
event, etc.Each of these entities already fetches R from the job-info service. It might be convenient if the job-info service offered an RPC to start watching R instead of just fetching it. This could replace the existing lookup RPC and in the common case would have just one response. The job-info module would then watch the eventlog of the job and notify the client of updates by sending a new R. Code would be simplified for each use case, and there would only be one new eventlog watch instead of (up to) 3. Plus, I imagine the job-info module already has code for monitoring and processing the job eventlog.
Thoughts?
BTW, as an experiment I added an eventlog watch to the job-exec module for every job. It seemed to impact throughput by 5-8%.
The text was updated successfully, but these errors were encountered: