Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU detection: Not only check for runtime, but also number of GPUs #10307

Open
chrmarti opened this issue Sep 24, 2024 · 4 comments
Open

GPU detection: Not only check for runtime, but also number of GPUs #10307

chrmarti opened this issue Sep 24, 2024 · 4 comments
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug containers Issue in vscode-remote containers
Milestone

Comments

@chrmarti
Copy link
Contributor

@chrmarti

DevContainers v0.386.0 (pre-release)

Hello,

It seems that this feature is still broken (v0.386.0). If I create a remote machine (GCP) with GPU and fully installed nvidia-stack, I can build and run the devcontainer using

"hostRequirements": {

    "gpu": "optional"

},

But if I remove the GPU from my remote machine I can't start the docker container anymore as it claims having detected a GPU despite the fact that no GPU is attached:

Output of devcontainer console is:

[21551 ms] Start: Run: docker info -f {{.Runtimes.nvidia}}

[21755 ms] GPU support found, add GPU flags to docker call.

...

If I run the command you have used in your ts-scripts on the machine (no GPU anymore) I get:

{nvidia-container-runtime [] }

I think you are just checking whether the nvidia-container-runtime is available but not whether an actual gpu is attached.

const runtimeFound = result.stdout.includes('nvidia-container-runtime');

So,

`export async function extraRunArgs(common: ResolverParameters, params: DockerResolverParameters, config: DevContainerFromDockerfileConfig | DevContainerFromImageConfig) {

const extraArguments: string[] = [];

if (config.hostRequirements?.gpu) {

  if (await checkDockerSupportForGPU(params)) {

  	common.output.write(`GPU support found, add GPU flags to docker call.`);

  	extraArguments.push('--gpus', 'all');

  } else {

  	if (config.hostRequirements?.gpu !== 'optional') {

  		common.output.write('No GPU support found yet a GPU was required - consider marking it as "optional"', LogLevel.Warning);

  	}

  }

}

return extraArguments;

}`

Will add --gpus 'all' if the runtime is available even if no gpu is attached. Unfortunately the container won't start if --gpus all is given but no GPU is attached to the computer. Am I missing something here?

Originally posted by @maro-otto in #9385

@chrmarti chrmarti self-assigned this Sep 24, 2024
@chrmarti chrmarti added bug Issue identified by VS Code Team member as probable bug containers Issue in vscode-remote containers labels Sep 24, 2024
@chrmarti chrmarti added this to the September 2024 milestone Sep 24, 2024
@chrmarti
Copy link
Contributor Author

@maro-otto Could you share the output of docker info --format '{{json .}}' when you have a GPU installed? I think we might additionally have to check what the default runtime is.

@chrmarti chrmarti modified the milestones: September 2024, October 2024 Sep 26, 2024
@maro-otto
Copy link

@chrmarti
docker info --format '{{json .}}'
gives me (no GPU attached)
{nvidia-container-runtime [] }

@chrmarti
Copy link
Contributor Author

@maro-otto This looks like the output from docker info -f {{.Runtimes.nvidia}}, could you also run docker info --format '{{json .}}' with the GPU present?

@maro-otto
Copy link

maro-otto commented Oct 9, 2024

@chrmarti Sorry for the late reply
With attached GPU I get a similar result

docker info -f {{.Runtimes.nvidia}}
{nvidia-container-runtime [] <nil>}

Additionally nvidia smi gives me
nvidia-smi
Wed Oct 9 06:39:57 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 On | 00000000:00:04.0 Off | 0 |
| N/A 38C P8 9W / 70W | 1MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

@chrmarti chrmarti modified the milestones: October 2024, November 2024 Oct 24, 2024
@chrmarti chrmarti modified the milestones: November 2024, Backlog Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue identified by VS Code Team member as probable bug containers Issue in vscode-remote containers
Projects
None yet
Development

No branches or pull requests

2 participants