Skip to content
This repository has been archived by the owner on Apr 24, 2023. It is now read-only.

Optimize the /list fetching code to not round-trip to UUID. #848

Closed
wants to merge 3 commits into from

Conversation

scrosby
Copy link
Member

@scrosby scrosby commented May 14, 2018

Changes proposed in this PR

  • Don't round-trip jobs to UUID's then back to job entities in the /list path. Get the entities and populate them.

Why are we making these changes?

  • We think it'll be a modest performance increase; should avoid an extra random IO per job in the result set.

@scrosby scrosby added the wip label May 14, 2018
@scrosby scrosby added wip and removed wip labels May 21, 2018
@scrosby scrosby removed the wip label May 29, 2018
@scrosby scrosby requested a review from dposada May 29, 2018 15:58
(histograms/update! list-response-job-count (count job-ents))
job-ents)))

(defn list-jobs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change, does anything call list-jobs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; Line 1300, jobs-list-exists?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that path also switch to using list-jobents?

Copy link
Member Author

@scrosby scrosby May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a lot messier/riskier because the result of list-jobs is just stuffed into ::jobs in the context, and being a global, we have to trace the context through all of the code paths; pretty much, refactor the entire /jobs endpoint dataflow (individual UUID's, /job search parameters, etc). Now, the dataflow is entirely wired to generating, and consuming uuid's. If we push entities further down, we'd have to rewire the whole /jobs set of endpoints thing to entities, or clone chunks of it.

Copy link
Contributor

@dposada dposada May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but /list is the deprecated endpoint, and /jobs is the "new" endpoint. It seems like if we're choosing to improve one but not the other, we should choose /jobs

@DaoWen
Copy link
Contributor

DaoWen commented May 29, 2018

This PR partially addresses issue #617.

@scrosby scrosby self-assigned this May 30, 2018
@scrosby
Copy link
Member Author

scrosby commented May 31, 2018

This PR addresses #790. The additional work on the /jobs involves the liberator context and addresses part of #617.

@scrosby
Copy link
Member Author

scrosby commented May 31, 2018

@dposada Is there a unit test /jobs/#UUID that exeercises this codepath for https://github.com/twosigma/Cook/pull/848/files#diff-6611b39b65ff49438a4d940b51c85cc0R2549 ?
I didn't find one in a search, nor on your commit d236ec3 #756
I'd like to validate that this works correctly.

@dposada
Copy link
Contributor

dposada commented May 31, 2018

@scrosby The util.load_job function:

def load_job(cook_url, job_uuid, assert_response=True):
"""Loads a job by UUID using GET /jobs/UUID"""
return load_resource(cook_url, 'jobs', job_uuid, assert_response)
which is used by several integration tests, exercises /jobs/UUID; let me know if that's what you're looking for.

Scott Crosby added 2 commits June 4, 2018 14:10
…ext for /jobs endpoint.

Avoids the same from-uuid to-uuid round-trip in the /jobs endpoint.
Stored in ::jobs-entities.
@scrosby scrosby changed the title WIP: Optimize the /list fetching code to not round-trip to UUID. Optimize the /list fetching code to not round-trip to UUID. Jun 5, 2018
@@ -891,6 +890,10 @@
progress-regex-string (assoc :progress-regex-string progress-regex-string)
pool (assoc :pool (:pool/name pool))))))

(defn fetch-job-map
[db framework-id job-uuid]
(fetch-job-map-from-entity db framework-id (d/entity db [:job/uuid job-uuid])))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anything still use fetch-job-map after this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over a dozen uses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are they? Should they switch over to using the new function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

render-jobs-for-response-deprecated -- /rawscheduler handler
render-instances-for-response
11 times as a convenience function in unit tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have render-jobs-for-response-deprecated and render-instances-for-response use the new way?

Copy link
Member Author

@scrosby scrosby Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a scope question..

If we do that, then effectively we're effectively rescoping this PR to do all of #617. Do we want to do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote to either do that in this PR or do it immediately after this in a follow-up PR. Basically, I wouldn't want the two functions to stay around for very long.

@@ -1116,22 +1119,31 @@
(mapv (partial fetch-job-map (db conn) framework-id) (::jobs ctx)))

(defn render-jobs-for-response
"This rendes for response. Fills in in :group UUID's and names as well as map-ifies jobs
It will examine the ctx for ::jobs (containing UUID's) or ::job-entities (containing
datomic job entities) and return the merged set."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both ::jobs and ::job-entities?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

::jobs has UUID's (coming from the /job/ and from /job&uuid=XXXX&uuid=XXXX) and ::job-entities has entities. Both have different logic as they filter through read-jobs-handler, so are separated until the very end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have them work the same way so that we don't need both entries in the context map?

Copy link
Member Author

@scrosby scrosby Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we refactor the /jobs API endpoint implementation. It'll take time for me to get up to speed on liberator's semantics, because I'm going to have to face the ::params ::query-params, etc. All of the filtering and other functions in the &uuid=XXXX are all wired to UUID's. And if we're going to do that, we could greatly simplify it the implementation & semantics with a small API change. IMO, the current code is fighting with liberator, which is a sign that the API we're implementing isn't an appropriate match for it. Thus the proposal for a change.

Do we want to rescope this PR that much?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus the proposal for a change.

It's not clear to me what the change is that you're proposing.

@dposada
Copy link
Contributor

dposada commented Jun 20, 2018

@scrosby Do we want to keep this PR open?

@dposada
Copy link
Contributor

dposada commented Jun 26, 2018

@scrosby Can we close this PR?

@pschorf pschorf closed this Jun 27, 2018
@scrosby scrosby deleted the outgoing/faster_list branch November 12, 2018 21:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants