[WIP] Optimizer server integration #992

shamsimam · 2018-10-12T16:56:46Z

Changes proposed in this PR

integrate optimizer output into the ranker
update the simulator to retrieve results from the optimizer server
include example code for the optimizer server and configuration

Why are we making these changes?

Adds support for the group (batch) scheduling optimizer server. The server can act as a 'slow' brain offering hints to the ranker on how to rank pending jobs.

DaoWen

I haven't made it all the way through yet, but here are my comments so far.

DaoWen · 2018-10-25T04:01:08Z

scheduler/src/cook/mesos.clj

@@ -178,6 +178,7 @@
        datomic-report-chan (async/chan (async/sliding-buffer 4096))
        mesos-heartbeat-chan (async/chan (async/buffer 4096))
        current-driver (atom nil)
+        pool-name->optimizer-schedule-job-ids-atom (atom {})


This name is very confusing. We should try to come up with something simpler.
It looks like it's a mapping from pool names onto atoms.
Are the mapped-to elements schedules, or just a collection of job ids, or something else?

Renamed pool-name->optimizer-schedule-job-ids-atom to pool-name->optimizer-suggested-job-ids-atom

DaoWen · 2018-10-25T04:05:12Z

scheduler/src/cook/mesos.clj

+                                          (get @pool-name->pending-jobs-atom pool-name))
+                                        (fn pool-name->running [pool-name]
+                                          (->> (util/get-running-task-ents (d/db mesos-datomic-conn))
+                                               (filter #(= pool-name (-> % :job/_instance util/job->pool-name)))))


Would it make sense to move this filtering logic into the query? E.g., we could add an optional :pool-name argument to the get-running-task-ents function, when then gets optionally added to the query.

If we decide to update get-running-task-ents, that should probably go into its own PR, and we could rebase this on top.

We had this discussion before about using pool name in the query. However, the decision then was to filter outside the query. Here is another snippet that does this: cook.mesos.scheduler/generate-user-usage-map.

Why did we favor filtering outside the query?

I spoke with @pschorf and @dposada and they are okay with moving the pool inside the query. I created #1002

Works for me

DaoWen · 2018-10-25T04:10:00Z

scheduler/src/cook/mesos/mesos_mock.clj

-                                        [(-> (t/now) 
-                                             (t/plus (t/millis (task->runtime-ms task)))) 
+                                        [(t/plus (t/now)
+                                                 (-> task task->runtime-ms (* runtime-multiplier) t/millis))


Can we move (t/plus (t/now)) to the end of the threading macro instead of having it outside?

DaoWen · 2018-10-25T04:10:38Z

scheduler/src/cook/mesos/optimizer.clj

            [clojure.tools.logging :as log]
-            [cook.util :refer [lazy-load-var PosNum PosInt NonNegInt]]
+            [cook.util :refer [lazy-load-var NonNegInt PosNum PosInt]]


🙄

You didn't move lazy-load-var to the end of the list???
(sort ["lazy" "Non"]) → ("Non" "lazy")

And you didn't move cook.util below cook.mesos.util??????

DaoWen · 2018-10-25T04:18:10Z

scheduler/src/cook/mesos/optimizer.clj

+   {\"count\" 4
+    \"cpus\" 8
+    \"instance-type\" \"basic\"
+    \"mem\" 240000}"


This example use string keys, but the actual definition uses keyword keys.
Using keyword keys in the example would let us get rid of all those nasty backslashes.

DaoWen · 2018-10-25T04:42:28Z

scheduler/src/cook/mesos/optimizer.clj

+                                  (merge-with + user->mem-usage))))
+                  ;; Running must be before waiting here because optimizer determines batch order from job order
+                  opt-jobs (concat running-opt-jobs waiting-opt-jobs)
+                  ;; TODO mem-share should be computed per user.


Same note as other TODOs.

Done, created #485

DaoWen · 2018-10-25T04:44:02Z

scheduler/src/cook/mesos/optimizer.clj

+                             (->> (pc/map-vals (partial * alpha) ema-usage)
+                                  (merge-with + user->mem-usage))))
+                  ;; Running must be before waiting here because optimizer determines batch order from job order
+                  opt-jobs (concat running-opt-jobs waiting-opt-jobs)


Does this also imply that the intra-sequence order in running-opt-jobs and waiting-opt-jobs affects the results? Are those sorted in some meaningful way?

Yes, their orders affect the results. Waiting jobs are sorted by the ranker. Running jobs are not sorted as we expect all running batches to fit in resource constraints and be looked into by the optimizer.

DaoWen · 2018-10-25T04:45:24Z

scheduler/src/cook/mesos/optimizer.clj

+  (let [host-group 0
+        host-name->host-group (constantly host-group)
+        host-infos [host-info]
+        user->ema-mem-usage (atom {})]


I think we need a comment somewhere saying what "ema" stands for. Maybe here, maybe elsewhere.

DaoWen · 2018-10-25T04:48:57Z

scheduler/src/cook/mesos/optimizer.clj

+                                    "now" (.getTime now)
+                                    "opt_jobs" opt-jobs
+                                    "opt_params" opt-params
+                                    "seed" (.getTime now)


Why seed with the current time?
Why do we even need a seed?

It allows us to mimic results if the optimizer decides to use a random number generator.

Wouldn't using a constant value as the seed make it easier to mimic the results?

I was referring to being able to control the randomness in the optimizer from the scheduler. The optimizer prints the seed in its logs and if we decide to run the optimizer independently we can by using the seed.

OK, recovering the seed from the logs works too.

DaoWen · 2018-10-25T04:52:21Z

scheduler/src/cook/mesos/scheduler.clj

+           (fn [pool-name->optimizer-schedule-job-ids]
+             (->> (pool-name->optimizer-schedule-job-ids pool-name)
+                  (deliver optimizer-schedule-job-ids-promise))
+             ;; TODO should we be clearing out optimizer data in one use?


Same note as other TODOs.

This stays, we can discuss during the review if we agree with the current strategy or we should experiment and evaluate other strategies.

adds support for optimizer server

DaoWen · 2018-10-29T20:21:29Z

scheduler/src/cook/mesos/scheduler.clj

-       (filter-based-on-quota user->quota user->usage)
-       (filter (fn [job] (util/job-allowed-to-start? db job)))
-       (take num-considerable)))
+  (let [optimizer-schedule-job-ids-promise (promise)]


We just talked and agreed that optimizer-schedule-job-ids-promise should be an atom rather than a promise. The atom won't cause an error if it gets set twice in the case that the following swap! does a retry.

Also, should this logic be dependent on whether or not the optimizer is enabled? Do we really want to create the atom, do the swap, etc if there's no optimizer?

I changed the promise to an atom.

I also wrapped the logic to reset the atom to include a when clause which is triggered only when the content of pool-name->optimizer-suggested-job-ids-atom is non-empty.

DaoWen · 2018-10-29T20:22:12Z

scheduler/src/cook/mesos/scheduler.clj

+    (swap! pool-name->optimizer-suggested-job-ids-atom
+           (fn [pool-name->optimizer-schedule-job-ids]
+             (->> (pool-name->optimizer-schedule-job-ids pool-name)
+                  (deliver optimizer-schedule-job-ids-promise))


This will now be (reset! optimizer-schedule-job-ids-promise).

DaoWen · 2018-10-29T21:27:10Z

scheduler/src/cook/mesos/scheduler.clj

+             ;; TODO determine whether we should be clearing out optimizer data after one use?
+             (assoc pool-name->optimizer-schedule-job-ids pool-name [])))
+    (->> (if-let [optimizer-schedule-job-ids (seq @optimizer-schedule-job-ids-promise)]
+           (let [optimizer-schedule-job-ids-set (into #{} optimizer-schedule-job-ids)


Nitpick: (set xs) vs (into #{} xs)

DaoWen · 2018-10-29T21:37:12Z

scheduler/src/cook/mesos/scheduler.clj

+             (->> (pool-name->optimizer-schedule-job-ids pool-name)
+                  (deliver optimizer-schedule-job-ids-promise))
+             ;; TODO determine whether we should be clearing out optimizer data after one use?
+             (assoc pool-name->optimizer-schedule-job-ids pool-name [])))


We should probably keep the data around for more than one cycle, but not indefinitely.

Since Fenzo isn't guaranteed to see enough resources to match all of the optimizer-recommended jobs, it seems like a terrible waste to pass them to Fenzo just once and then throw them away.

But conversely, the optimizer runs on a different order of magnitude of frequency from the fenzo-matching loop, so we really don't want a bunch of "old news" from the optimizer to stick around until it spits out a new recommended schedule.

We'll need to find a way to keep them around for a while, and then "expire" them or something. Maybe we can put a timestamp in the scheduler output, and ignore it if it's too old?

There's also the issue that we need to remove entries from the optimizer schedule once they get matched by Fenzo, otherwise they continue to take valuable slots from the num-considerable limit even after they're running.

otherwise they continue to take valuable slots from the num-considerable limit even after they're running

The util/job-allowed-to-start? takes care of that scenario.

I like expiring the contents as they age, We should discuss further the strategy we like before implementing one.

DaoWen

Once you've added the comment about the expected run-time model with Math/abs, then this looks fine to me. But I think you mentioned wanting to get at least one other review, and I think that's a good idea.

dposada

Can we add integration tests for this?

scheduler/src/cook/mesos/optimizer.clj

dposada · 2018-10-30T21:00:28Z

scheduler/src/cook/mesos/optimizer.clj

+            :state "waiting"
+            :submit_time (-> job :job/submit-time .getTime)
+            :user (:job/user job)
+            :uuid (:job/uuid job)}


Why are these keys named with underscores instead of dashes?

scheduler/test/cook/test/mesos/scheduler.clj

dposada · 2018-10-30T21:29:30Z

scheduler/test/cook/test/simulator.clj

@@ -560,15 +592,16 @@
                     keywordize-keys
                     ;; This is needed because we want the roles to be strings
                     (transform [ALL :resources MAP-VALS MAP-KEYS] name))
+          _ (println "config file:" config-file)


Should this use log instead of println?

dposada · 2018-10-30T21:30:35Z

scheduler/test/cook/test/simulator.clj

+        oversubscribe-factor 1.75
+        duration-ms (* 1000 60 60 5)
+        resources (* oversubscribe-factor num-hosts num-cpus-per-host num-mem-per-host duration-ms)
+        _ (println "resources:" resources)


Should this use log instead of println?

dposada · 2018-10-30T21:30:46Z

scheduler/test/cook/test/simulator.clj

+        group-size-weights (map-from-keys
+                             (fn [k] (-> max-group-size (Math/pow 2) (/ k) (int)))
+                             (map inc (range max-group-size)))
+        _ (println "group-size-weights:" group-size-weights)


Should this use log instead of println?

dposada · 2018-10-30T21:31:25Z

scheduler/test/cook/test/simulator.clj

+
+(comment create-jobs-and-hosts
+  (create-hosts)
+  (create-jobs))


Can we delete the comment?

I have the comment in there as I do not want the functions to be marked as unused by the IDE (Curisve).

You should definitely add a meta-comment to explain this comment.

…mizer-suggested-job-ids-atom

- uses threading macro for entire expression in new-time-task-id-pairs - orders requires - uses defschema to clarify intent - improves documentation

trigger the optimizer-schedule-job-ids-atom atom population only when we have contents from the optimizer.

shamsimam added the wip label Oct 12, 2018

shamsimam self-assigned this Oct 12, 2018

shamsimam force-pushed the optimizer-integration branch 2 times, most recently from 97c29ed to 773c9ea Compare October 18, 2018 18:05

shamsimam force-pushed the optimizer-integration branch from 773c9ea to 432ffa0 Compare October 24, 2018 05:27

shamsimam requested a review from dposada October 24, 2018 15:53

shamsimam force-pushed the optimizer-integration branch from 432ffa0 to b2e2b2e Compare October 24, 2018 16:26

shamsimam requested review from DaoWen and removed request for dposada October 24, 2018 20:05

shamsimam force-pushed the optimizer-integration branch from 2625cdc to c643468 Compare October 25, 2018 01:06

DaoWen reviewed Oct 25, 2018

View reviewed changes

shamsimam force-pushed the optimizer-integration branch 6 times, most recently from ea5a5d1 to 76f7df2 Compare October 26, 2018 18:25

shamsimam added 3 commits October 29, 2018 16:39

integrates the optimizer with the mesos scheduler

9b22d08

adds support for optimizer server

integrates optimizer output to produce considerable jobs for matching

050d48a

uses order of jobs returned by the optimizer

6a475af

DaoWen suggested changes Oct 29, 2018

View reviewed changes

shamsimam added 3 commits October 29, 2018 16:57

adds support for optimizer config in simulator

a9d116e

adds configuration files and server for optimizer simulation

f6fcb7f

adds support for generating hosts and jobs for the simulator

4150c7b

shamsimam force-pushed the optimizer-integration branch from 76f7df2 to a853201 Compare October 29, 2018 22:23

DaoWen approved these changes Oct 30, 2018

View reviewed changes

shamsimam requested a review from dposada October 30, 2018 20:39

dposada reviewed Oct 30, 2018

View reviewed changes

shamsimam added 2 commits October 30, 2018 19:15

fixes logic in determining running tasks by pool

8130587

renames pool-name->optimizer-schedule-job-ids-atom to pool-name->opti…

9d0100e

…mizer-suggested-job-ids-atom

shamsimam added 6 commits October 30, 2018 19:15

addresses feedback from DaoWen

701db1c

- uses threading macro for entire expression in new-time-task-id-pairs - orders requires - uses defschema to clarify intent - improves documentation

uses promise instead of atom to avoid issues with retries in swap!

aa0f0dd

trigger the optimizer-schedule-job-ids-atom atom population only when we have contents from the optimizer.

renames HostInfo to HostGroupInfo

9fba52a

updates docstring

839a262

adds destructuring to jobs and instances

d970cfd

adds new lines between testing clauses

9391cb0

shamsimam force-pushed the optimizer-integration branch from a853201 to 9391cb0 Compare November 1, 2018 18:19

pschorf closed this Mar 7, 2019

shamsimam deleted the optimizer-integration branch March 7, 2019 18:48

[WIP] Optimizer server integration #992

[WIP] Optimizer server integration #992

Conversation

shamsimam commented Oct 12, 2018 • edited Loading

Changes proposed in this PR

Why are we making these changes?

DaoWen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaoWen left a comment

Choose a reason for hiding this comment

dposada left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shamsimam commented Oct 12, 2018 •

edited

Loading