-
Notifications
You must be signed in to change notification settings - Fork 14
Conversation
Suggest to have a single folder for all GPU instances. Not introduce a new folder for G4, but rather consolidate existing 2 folders into a single one. |
Sure makes sense! |
Why is it necessary at all? You can create an image on one machine type and just run it on all. |
g4 needs a more recent driver. Therefore AMI requires update. Yes, there should be only a single AMI. |
Sure, but why not just simply update the existing setup script instead of introducing a new one? |
Yes, that's what I suggested in the first comment above #20 (comment) |
@marcoabreu @leezu Updated with a single folder for all GPU instances. |
…r by leveraging docker-compose
tools/jenkins-slave-creation-unix/conf-ubuntu-gpu/infrastructure.tfvars
Outdated
Show resolved
Hide resolved
@ChaiBapchya After our testing on Friday, I think we should also disable automatic Ubuntu updates. We know there is some fragility around the nvidia driver (if gcc is updated, for example, the driver stops working on the DLAMI based on Ubuntu.) |
https://askubuntu.com/questions/1059971/disable-updates-from-command-line-in-ubuntu-16-04 Adding
|
This reverts commit 0e8fde6.
@leezu @josephevans Plz help review/merge. |
Config files for G4 instance on MXNet CI [unix-gpu slaves]
UNIX AMI Creation changes
G4 instances have Tesla T4 drivers
† G4 instances require driver version 418.87 or later.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html
Autoscaling Lambda function changes
This resolves the
cant connect to linux-cpu
errorby reducing number of parallel jobs per instance
to
pending` [as starting is incorrect state]