Skip to content

(Deprecated) Development Notes

Robert Bartel edited this page Oct 11, 2023 · 1 revision

June 9, 2020

A discussion with the development team revolved around scheduling. Currently scheduling is a complex service that manages resources, resource allocation, job state transitions, and job launching.

Resources are managed via the ResourceManager abstract interface.

Jobs are managed via the JobManager interface.

The SchedulerService is responsible for listening for request messages as well as job state update messages. It also has an added async_task for manage_job_processing attached from the services constructed JobManager.

A valid SchedulerRequestMessage creates a RequestedJob object. At some point the mange_job_processing async task will execute, and the new job will be recognized as ready to allocate. manage_job_processing implements semantics for a simple priority queue with starvation prevention using dynamic priority migration. Once an active job is eligible for allocation, the ResourceManager is asked to allocate the virtual resources. Once successfully allocated, the job manager request_scheduling is responsible for launching the job, and if successful, marks the job state as SCHEDULED and saves it to the backing store.

A question of scalability arises with this service definition, as well as questions around seperation of concerns. Advanced job scheduling semantics were also discussed, in particular if we want to include the Async IO implementation of the advanced python scheduler, what would that look like and how would we maintain consistent job states in the backing store as well potential communication between the job manager and the scheduling.