-
Notifications
You must be signed in to change notification settings - Fork 63
Conversation
97c29ed
to
773c9ea
Compare
773c9ea
to
432ffa0
Compare
432ffa0
to
b2e2b2e
Compare
2625cdc
to
c643468
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't made it all the way through yet, but here are my comments so far.
scheduler/src/cook/mesos.clj
Outdated
@@ -178,6 +178,7 @@ | |||
datomic-report-chan (async/chan (async/sliding-buffer 4096)) | |||
mesos-heartbeat-chan (async/chan (async/buffer 4096)) | |||
current-driver (atom nil) | |||
pool-name->optimizer-schedule-job-ids-atom (atom {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is very confusing. We should try to come up with something simpler.
It looks like it's a mapping from pool names onto atoms.
Are the mapped-to elements schedules, or just a collection of job ids, or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed pool-name->optimizer-schedule-job-ids-atom
to pool-name->optimizer-suggested-job-ids-atom
(get @pool-name->pending-jobs-atom pool-name)) | ||
(fn pool-name->running [pool-name] | ||
(->> (util/get-running-task-ents (d/db mesos-datomic-conn)) | ||
(filter #(= pool-name (-> % :job/_instance util/job->pool-name))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to move this filtering logic into the query? E.g., we could add an optional :pool-name
argument to the get-running-task-ents
function, when then gets optionally added to the query.
If we decide to update get-running-task-ents
, that should probably go into its own PR, and we could rebase this on top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had this discussion before about using pool name in the query. However, the decision then was to filter outside the query. Here is another snippet that does this: cook.mesos.scheduler/generate-user-usage-map
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we favor filtering outside the query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me
[(-> (t/now) | ||
(t/plus (t/millis (task->runtime-ms task)))) | ||
[(t/plus (t/now) | ||
(-> task task->runtime-ms (* runtime-multiplier) t/millis)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move (t/plus (t/now))
to the end of the threading macro instead of having it outside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
[clojure.tools.logging :as log] | ||
[cook.util :refer [lazy-load-var PosNum PosInt NonNegInt]] | ||
[cook.util :refer [lazy-load-var NonNegInt PosNum PosInt]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙄
You didn't move lazy-load-var
to the end of the list???
(sort ["lazy" "Non"])
→ ("Non" "lazy")
And you didn't move cook.util
below cook.mesos.util
??????
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ;)
{\"count\" 4 | ||
\"cpus\" 8 | ||
\"instance-type\" \"basic\" | ||
\"mem\" 240000}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example use string keys, but the actual definition uses keyword keys.
Using keyword keys in the example would let us get rid of all those nasty backslashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
(merge-with + user->mem-usage)))) | ||
;; Running must be before waiting here because optimizer determines batch order from job order | ||
opt-jobs (concat running-opt-jobs waiting-opt-jobs) | ||
;; TODO mem-share should be computed per user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note as other TODOs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, created #485
(->> (pc/map-vals (partial * alpha) ema-usage) | ||
(merge-with + user->mem-usage)))) | ||
;; Running must be before waiting here because optimizer determines batch order from job order | ||
opt-jobs (concat running-opt-jobs waiting-opt-jobs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also imply that the intra-sequence order in running-opt-jobs
and waiting-opt-jobs
affects the results? Are those sorted in some meaningful way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, their orders affect the results. Waiting jobs are sorted by the ranker. Running jobs are not sorted as we expect all running batches to fit in resource constraints and be looked into by the optimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
(let [host-group 0 | ||
host-name->host-group (constantly host-group) | ||
host-infos [host-info] | ||
user->ema-mem-usage (atom {})] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a comment somewhere saying what "ema" stands for. Maybe here, maybe elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
"now" (.getTime now) | ||
"opt_jobs" opt-jobs | ||
"opt_params" opt-params | ||
"seed" (.getTime now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why seed with the current time?
Why do we even need a seed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It allows us to mimic results if the optimizer decides to use a random number generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't using a constant value as the seed make it easier to mimic the results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to being able to control the randomness in the optimizer from the scheduler. The optimizer prints the seed in its logs and if we decide to run the optimizer independently we can by using the seed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, recovering the seed from the logs works too.
(fn [pool-name->optimizer-schedule-job-ids] | ||
(->> (pool-name->optimizer-schedule-job-ids pool-name) | ||
(deliver optimizer-schedule-job-ids-promise)) | ||
;; TODO should we be clearing out optimizer data in one use? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note as other TODOs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stays, we can discuss during the review if we agree with the current strategy or we should experiment and evaluate other strategies.
ea5a5d1
to
76f7df2
Compare
(filter-based-on-quota user->quota user->usage) | ||
(filter (fn [job] (util/job-allowed-to-start? db job))) | ||
(take num-considerable))) | ||
(let [optimizer-schedule-job-ids-promise (promise)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just talked and agreed that optimizer-schedule-job-ids-promise
should be an atom rather than a promise. The atom won't cause an error if it gets set twice in the case that the following swap!
does a retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should this logic be dependent on whether or not the optimizer is enabled? Do we really want to create the atom, do the swap, etc if there's no optimizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the promise to an atom.
I also wrapped the logic to reset the atom to include a when
clause which is triggered only when the content of pool-name->optimizer-suggested-job-ids-atom
is non-empty.
(swap! pool-name->optimizer-suggested-job-ids-atom | ||
(fn [pool-name->optimizer-schedule-job-ids] | ||
(->> (pool-name->optimizer-schedule-job-ids pool-name) | ||
(deliver optimizer-schedule-job-ids-promise)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will now be (reset! optimizer-schedule-job-ids-promise)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
;; TODO determine whether we should be clearing out optimizer data after one use? | ||
(assoc pool-name->optimizer-schedule-job-ids pool-name []))) | ||
(->> (if-let [optimizer-schedule-job-ids (seq @optimizer-schedule-job-ids-promise)] | ||
(let [optimizer-schedule-job-ids-set (into #{} optimizer-schedule-job-ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: (set xs)
vs (into #{} xs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
(->> (pool-name->optimizer-schedule-job-ids pool-name) | ||
(deliver optimizer-schedule-job-ids-promise)) | ||
;; TODO determine whether we should be clearing out optimizer data after one use? | ||
(assoc pool-name->optimizer-schedule-job-ids pool-name []))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably keep the data around for more than one cycle, but not indefinitely.
Since Fenzo isn't guaranteed to see enough resources to match all of the optimizer-recommended jobs, it seems like a terrible waste to pass them to Fenzo just once and then throw them away.
But conversely, the optimizer runs on a different order of magnitude of frequency from the fenzo-matching loop, so we really don't want a bunch of "old news" from the optimizer to stick around until it spits out a new recommended schedule.
We'll need to find a way to keep them around for a while, and then "expire" them or something. Maybe we can put a timestamp in the scheduler output, and ignore it if it's too old?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also the issue that we need to remove entries from the optimizer schedule once they get matched by Fenzo, otherwise they continue to take valuable slots from the num-considerable
limit even after they're running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise they continue to take valuable slots from the num-considerable limit even after they're running
The util/job-allowed-to-start?
takes care of that scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like expiring the contents as they age, We should discuss further the strategy we like before implementing one.
76f7df2
to
a853201
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you've added the comment about the expected run-time model with Math/abs
, then this looks fine to me. But I think you mentioned wanting to get at least one other review, and I think that's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add integration tests for this?
:state "waiting" | ||
:submit_time (-> job :job/submit-time .getTime) | ||
:user (:job/user job) | ||
:uuid (:job/uuid job)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these keys named with underscores instead of dashes?
@@ -560,15 +592,16 @@ | |||
keywordize-keys | |||
;; This is needed because we want the roles to be strings | |||
(transform [ALL :resources MAP-VALS MAP-KEYS] name)) | |||
_ (println "config file:" config-file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use log
instead of println
?
oversubscribe-factor 1.75 | ||
duration-ms (* 1000 60 60 5) | ||
resources (* oversubscribe-factor num-hosts num-cpus-per-host num-mem-per-host duration-ms) | ||
_ (println "resources:" resources) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use log
instead of println
?
group-size-weights (map-from-keys | ||
(fn [k] (-> max-group-size (Math/pow 2) (/ k) (int))) | ||
(map inc (range max-group-size))) | ||
_ (println "group-size-weights:" group-size-weights) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use log
instead of println
?
|
||
(comment create-jobs-and-hosts | ||
(create-hosts) | ||
(create-jobs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we delete the comment
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the comment in there as I do not want the functions to be marked as unused by the IDE (Curisve).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should definitely add a meta-comment to explain this comment.
…mizer-suggested-job-ids-atom
- uses threading macro for entire expression in new-time-task-id-pairs - orders requires - uses defschema to clarify intent - improves documentation
trigger the optimizer-schedule-job-ids-atom atom population only when we have contents from the optimizer.
a853201
to
9391cb0
Compare
Changes proposed in this PR
Why are we making these changes?
Adds support for the group (batch) scheduling optimizer server. The server can act as a 'slow' brain offering hints to the ranker on how to rank pending jobs.