Skip to content
This repository has been archived by the owner on Apr 24, 2023. It is now read-only.

[WIP] Optimizer server integration #992

Closed
wants to merge 14 commits into from
Closed

Conversation

shamsimam
Copy link
Contributor

@shamsimam shamsimam commented Oct 12, 2018

Changes proposed in this PR

  • integrate optimizer output into the ranker
  • update the simulator to retrieve results from the optimizer server
  • include example code for the optimizer server and configuration

Why are we making these changes?

Adds support for the group (batch) scheduling optimizer server. The server can act as a 'slow' brain offering hints to the ranker on how to rank pending jobs.

@shamsimam shamsimam added the wip label Oct 12, 2018
@shamsimam shamsimam self-assigned this Oct 12, 2018
@shamsimam shamsimam force-pushed the optimizer-integration branch 2 times, most recently from 97c29ed to 773c9ea Compare October 18, 2018 18:05
@shamsimam shamsimam requested review from DaoWen and removed request for dposada October 24, 2018 20:05
Copy link
Contributor

@DaoWen DaoWen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't made it all the way through yet, but here are my comments so far.

@@ -178,6 +178,7 @@
datomic-report-chan (async/chan (async/sliding-buffer 4096))
mesos-heartbeat-chan (async/chan (async/buffer 4096))
current-driver (atom nil)
pool-name->optimizer-schedule-job-ids-atom (atom {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is very confusing. We should try to come up with something simpler.
It looks like it's a mapping from pool names onto atoms.
Are the mapped-to elements schedules, or just a collection of job ids, or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed pool-name->optimizer-schedule-job-ids-atom to pool-name->optimizer-suggested-job-ids-atom

(get @pool-name->pending-jobs-atom pool-name))
(fn pool-name->running [pool-name]
(->> (util/get-running-task-ents (d/db mesos-datomic-conn))
(filter #(= pool-name (-> % :job/_instance util/job->pool-name)))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move this filtering logic into the query? E.g., we could add an optional :pool-name argument to the get-running-task-ents function, when then gets optionally added to the query.

If we decide to update get-running-task-ents, that should probably go into its own PR, and we could rebase this on top.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had this discussion before about using pool name in the query. However, the decision then was to filter outside the query. Here is another snippet that does this: cook.mesos.scheduler/generate-user-usage-map.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we favor filtering outside the query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke with @pschorf and @dposada and they are okay with moving the pool inside the query. I created #1002

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me

[(-> (t/now)
(t/plus (t/millis (task->runtime-ms task))))
[(t/plus (t/now)
(-> task task->runtime-ms (* runtime-multiplier) t/millis))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move (t/plus (t/now)) to the end of the threading macro instead of having it outside?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

[clojure.tools.logging :as log]
[cook.util :refer [lazy-load-var PosNum PosInt NonNegInt]]
[cook.util :refer [lazy-load-var NonNegInt PosNum PosInt]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙄

You didn't move lazy-load-var to the end of the list???
(sort ["lazy" "Non"])("Non" "lazy")

And you didn't move cook.util below cook.mesos.util??????

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ;)

{\"count\" 4
\"cpus\" 8
\"instance-type\" \"basic\"
\"mem\" 240000}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example use string keys, but the actual definition uses keyword keys.
Using keyword keys in the example would let us get rid of all those nasty backslashes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

(merge-with + user->mem-usage))))
;; Running must be before waiting here because optimizer determines batch order from job order
opt-jobs (concat running-opt-jobs waiting-opt-jobs)
;; TODO mem-share should be computed per user.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as other TODOs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, created #485

(->> (pc/map-vals (partial * alpha) ema-usage)
(merge-with + user->mem-usage))))
;; Running must be before waiting here because optimizer determines batch order from job order
opt-jobs (concat running-opt-jobs waiting-opt-jobs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also imply that the intra-sequence order in running-opt-jobs and waiting-opt-jobs affects the results? Are those sorted in some meaningful way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, their orders affect the results. Waiting jobs are sorted by the ranker. Running jobs are not sorted as we expect all running batches to fit in resource constraints and be looked into by the optimizer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

(let [host-group 0
host-name->host-group (constantly host-group)
host-infos [host-info]
user->ema-mem-usage (atom {})]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a comment somewhere saying what "ema" stands for. Maybe here, maybe elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"now" (.getTime now)
"opt_jobs" opt-jobs
"opt_params" opt-params
"seed" (.getTime now)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why seed with the current time?
Why do we even need a seed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It allows us to mimic results if the optimizer decides to use a random number generator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't using a constant value as the seed make it easier to mimic the results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to being able to control the randomness in the optimizer from the scheduler. The optimizer prints the seed in its logs and if we decide to run the optimizer independently we can by using the seed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, recovering the seed from the logs works too.

(fn [pool-name->optimizer-schedule-job-ids]
(->> (pool-name->optimizer-schedule-job-ids pool-name)
(deliver optimizer-schedule-job-ids-promise))
;; TODO should we be clearing out optimizer data in one use?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as other TODOs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stays, we can discuss during the review if we agree with the current strategy or we should experiment and evaluate other strategies.

@shamsimam shamsimam force-pushed the optimizer-integration branch 6 times, most recently from ea5a5d1 to 76f7df2 Compare October 26, 2018 18:25
(filter-based-on-quota user->quota user->usage)
(filter (fn [job] (util/job-allowed-to-start? db job)))
(take num-considerable)))
(let [optimizer-schedule-job-ids-promise (promise)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just talked and agreed that optimizer-schedule-job-ids-promise should be an atom rather than a promise. The atom won't cause an error if it gets set twice in the case that the following swap! does a retry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should this logic be dependent on whether or not the optimizer is enabled? Do we really want to create the atom, do the swap, etc if there's no optimizer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the promise to an atom.

I also wrapped the logic to reset the atom to include a when clause which is triggered only when the content of pool-name->optimizer-suggested-job-ids-atom is non-empty.

(swap! pool-name->optimizer-suggested-job-ids-atom
(fn [pool-name->optimizer-schedule-job-ids]
(->> (pool-name->optimizer-schedule-job-ids pool-name)
(deliver optimizer-schedule-job-ids-promise))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will now be (reset! optimizer-schedule-job-ids-promise).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

;; TODO determine whether we should be clearing out optimizer data after one use?
(assoc pool-name->optimizer-schedule-job-ids pool-name [])))
(->> (if-let [optimizer-schedule-job-ids (seq @optimizer-schedule-job-ids-promise)]
(let [optimizer-schedule-job-ids-set (into #{} optimizer-schedule-job-ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: (set xs) vs (into #{} xs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

(->> (pool-name->optimizer-schedule-job-ids pool-name)
(deliver optimizer-schedule-job-ids-promise))
;; TODO determine whether we should be clearing out optimizer data after one use?
(assoc pool-name->optimizer-schedule-job-ids pool-name [])))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably keep the data around for more than one cycle, but not indefinitely.

Since Fenzo isn't guaranteed to see enough resources to match all of the optimizer-recommended jobs, it seems like a terrible waste to pass them to Fenzo just once and then throw them away.

But conversely, the optimizer runs on a different order of magnitude of frequency from the fenzo-matching loop, so we really don't want a bunch of "old news" from the optimizer to stick around until it spits out a new recommended schedule.

We'll need to find a way to keep them around for a while, and then "expire" them or something. Maybe we can put a timestamp in the scheduler output, and ignore it if it's too old?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also the issue that we need to remove entries from the optimizer schedule once they get matched by Fenzo, otherwise they continue to take valuable slots from the num-considerable limit even after they're running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise they continue to take valuable slots from the num-considerable limit even after they're running

The util/job-allowed-to-start? takes care of that scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like expiring the contents as they age, We should discuss further the strategy we like before implementing one.

Copy link
Contributor

@DaoWen DaoWen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you've added the comment about the expected run-time model with Math/abs, then this looks fine to me. But I think you mentioned wanting to get at least one other review, and I think that's a good idea.

Copy link
Contributor

@dposada dposada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add integration tests for this?

scheduler/src/cook/mesos/optimizer.clj Outdated Show resolved Hide resolved
scheduler/src/cook/mesos/optimizer.clj Outdated Show resolved Hide resolved
scheduler/src/cook/mesos/optimizer.clj Outdated Show resolved Hide resolved
scheduler/src/cook/mesos/optimizer.clj Outdated Show resolved Hide resolved
:state "waiting"
:submit_time (-> job :job/submit-time .getTime)
:user (:job/user job)
:uuid (:job/uuid job)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these keys named with underscores instead of dashes?

scheduler/test/cook/test/mesos/scheduler.clj Show resolved Hide resolved
@@ -560,15 +592,16 @@
keywordize-keys
;; This is needed because we want the roles to be strings
(transform [ALL :resources MAP-VALS MAP-KEYS] name))
_ (println "config file:" config-file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use log instead of println?

oversubscribe-factor 1.75
duration-ms (* 1000 60 60 5)
resources (* oversubscribe-factor num-hosts num-cpus-per-host num-mem-per-host duration-ms)
_ (println "resources:" resources)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use log instead of println?

group-size-weights (map-from-keys
(fn [k] (-> max-group-size (Math/pow 2) (/ k) (int)))
(map inc (range max-group-size)))
_ (println "group-size-weights:" group-size-weights)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use log instead of println?


(comment create-jobs-and-hosts
(create-hosts)
(create-jobs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the comment in there as I do not want the functions to be marked as unused by the IDE (Curisve).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should definitely add a meta-comment to explain this comment.

- uses threading macro for entire expression in new-time-task-id-pairs
- orders requires
- uses defschema to clarify intent
- improves documentation
trigger the optimizer-schedule-job-ids-atom atom population only when we have contents from the optimizer.
@pschorf pschorf closed this Mar 7, 2019
@shamsimam shamsimam deleted the optimizer-integration branch March 7, 2019 18:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants