-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate requester compute #513
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
- name: Install Nvidia Container Tookit | ||
become: yes | ||
ansible.builtin.apt: | ||
pkg: | ||
- nvidia-docker2 | ||
notify: | ||
- Restart docker | ||
when: gpu | ||
|
||
- name: Ensure Nvidia persitence daemon is started | ||
ansible.builtin.systemd: | ||
name: nvidia-persistenced | ||
when: gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not blocker: the 2 when's can be combined using ansible block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to combine them, but I had them separated to decouple the Restart docker
hook as much as possible. My thinking is since most of the jobs compute is running are docker based we want to avoid restarting docker unnecessarily if possible. Probably a minor optimization in this case though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case, we can use docker live-restore option. https://docs.docker.com/config/containers/live-restore/
# Nvidia | ||
- name: Get Nvidia drivers apt key | ||
ansible.builtin.get_url: | ||
url: https://developer.download.nvidia.com/compute/cuda/repos/{{ nvidia_distribution }}/x86_64/cuda-keyring_1.0-1_all.deb | ||
dest: /tmp/cuda-keyring.deb | ||
when: gpu | ||
|
||
- name: Add Nvidia Keyring | ||
become: yes | ||
ansible.builtin.apt: | ||
deb: /tmp/cuda-keyring.deb | ||
when: gpu | ||
|
||
- name: Get Nvidia Container Tookit GPG key | ||
become: yes | ||
ansible.builtin.shell: | ||
cmd: curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --yes --dearmor -o {{ nvidia_container_toolkit_key_path }} | ||
creates: "{{ nvidia_container_toolkit_key_path }}" | ||
when: gpu | ||
|
||
- name: Add Nvidia Container Tookit Repository | ||
become: yes | ||
ansible.builtin.apt_repository: | ||
repo: deb [signed-by={{ nvidia_container_toolkit_key_path }}] https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) / | ||
state: present | ||
when: gpu | ||
|
||
- name: Install required system packages for gpu build | ||
become: yes | ||
ansible.builtin.apt: | ||
pkg: | ||
- cuda-drivers | ||
state: latest | ||
update_cache: true | ||
when: gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not blocker: combining block and when.
just a general note, again not a blocker. Just good to delete anything that gets downloaded under |
since my comments aren't blocking, gonna approve this to move it along. |
👍 💯 agree. I'm gonna merge as is. We can make /tmp/ clean up (and probably disk space monitoring in general) a separate project. |
Separates requester and compute nodes to separate EC2 instances. Currently one requester and one compute instance.
Factors out IPFS install steps into taks file.
This PR creates a new Requester/Compute node, essentially a new Plex instance - running at ec2-18-208-163-46.compute-1.amazonaws.com. I think if were happy with how this is working we can bounce the private ip to this node, then decommission the old compute node in a separate PR.