Student: Jianbo Ma(majb2114@zju.edu.cn)
Mentor: Harry Zhang (@resouer) ,Kai Zhang(@wsxiaozhang) ,Jian He (@jian-he)
Being able to participate in GSoC is a lucky thing for me. In the past three months, I have improved my engineering ability with the help of my mentors. I am very grateful for this. Now GSoC 2019 is nearly over, this is my summary of this stage of work.
GPUSharing is an open source project which could share GPU by leveraging Kubernetes scheduling and Device Plugin extensibility.
Arena is a command-line interface for the data scientists to run and monitor the machine learning training jobs and check their results in an easy way.In the backend, it is based on Kubernetes, helm and Kubeflow. But the data scientists can have very little knowledge about kubernetes. It's goal is to make the data scientists feel like to work on a single machine but with the Power of GPU clusters indeed.
- Integrate arena with GPUSharing in tensorflow-serving situation.
- Integrate Nvidia MPS as the option for isolation
- Finish an end to end tf-serving task using GPUShare.
- Check the GPUMemory resource of kubernetes cluster.
- Finish a user_guide of tf-serving with GPUShare.
Per_process_gpu_memory_fraction is a fraction that each process occupies of the GPU memory space. The value is between 0.0 and 1.0 (with 0.0 as the default)
If 1.0, the server will allocate all the memory when the server starts,
If 0.0, Tensorflow will automatically select a valupe.
For example, If we want the serving job to occupy half of the GPU resources,we can set per_process_gpu_memory_fraction equals to 0.5.
Goals:After users submit the serving task,we need to calculate the correct per_process_gpu_memory_fraction and convert it as a parameter of serving-task.
per_process_gpu_memory_fraction=(required GPUMemory)/(total GPUMemory in allocated GPU card).
- The gpumemory serving task requires will be transformed into spec.container.resource.limits.aliyun.com/gpu-mem.
- After GPUShare scheduler-extender and device-plugin,environmental variable will be generated.
- Required GPUMemory equals to ALIYUN_COM_GPU_MEM_CONTAINER,total GPUMemory in GPU card equals to ALIYUN_COM_GPU_MEM_DEV.
- per_process_gpu_memory_fraction=$ALIYUN_COM_GPU_MEM_CONTAINER/$ALIYUN_COM_GPU_MEM_DEV
- If in GPUShare situation,convert per_process_gpu_memory_fraction in the task.
- Investigate how to use MPS.
- Test the capacity of MPS.
- Integrate MPS with GPUShare,simplify user operations.
Test if GPU thread is controled by MPS.
Reference: