Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UX/Catalog] Add DEVICE_MEM info to GCP GPUs. #3375

Merged
merged 2 commits into from
Mar 29, 2024

Conversation

concretevitamin
Copy link
Member

Previously, DEVICE_MEM is missing in sky show-gpus GCP results. This was because GCP APIs didn't explicitly return such info.

Now:

» sky show-gpus L4:1 --cloud gcp
GPU  QTY  CLOUD  INSTANCE_TYPE  DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
L4   1    GCP    g2-standard-4  24GB        4      16GB      $ 0.705       $ 0.248            us-east4

» sky show-gpus A100:1 --cloud gcp
GPU   QTY  CLOUD  INSTANCE_TYPE  DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
A100  1    GCP    a2-highgpu-1g  40GB        12     85GB      $ 3.673       $ 1.469            us-central1

GPU        QTY  CLOUD  INSTANCE_TYPE   DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION   
A100-80GB  1    GCP    a2-ultragpu-1g  80GB        12     170GB     $ 5.028       $ 2.011            us-central1

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'H100': 80 * 1024,
'P4': 8 * 1024,
'T4': 16 * 1024,
'V100': 16 * 1024,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add P100 too?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, generating a catalog from this branch removed GCP P100 from my sky show-gpus P100:

# Master catalog
(base) ➜  ~ sky show-gpus P100
GPU   QTY  CLOUD  INSTANCE_TYPE       DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
P100  1    Azure  Standard_NC6s_v2    -           6      112GB     $ 2.070       $ 0.207            eastus
P100  2    Azure  Standard_NC12s_v2   -           12     224GB     $ 4.140       $ 0.414            eastus
P100  4    Azure  Standard_NC24rs_v2  -           24     448GB     $ 9.108       $ 0.911            eastus
P100  4    Azure  Standard_NC24s_v2   -           24     448GB     $ 8.280       $ 0.828            eastus
P100  1    GCP    n1-highmem-8        -           8      52GB      $ 1.933       $ 0.679            us-central1
P100  2    GCP    n1-highmem-16       -           16     104GB     $ 3.866       $ 1.357            us-central1
P100  4    GCP    n1-highmem-32       -           32     208GB     $ 7.733       $ 2.714            us-central1
P100  1    OCI    VM.GPU2.1           16GB        24     72GB      $ 1.275       -                  eu-amsterdam-1
P100  2    OCI    BM.GPU2.2           16GB        56     256GB     $ 2.550       -                  eu-amsterdam-1

# Catalog from this branch
(base) ➜  ~ sky show-gpus P100
GPU   QTY  CLOUD  INSTANCE_TYPE       DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
P100  1    Azure  Standard_NC6s_v2    -           6      112GB     $ 2.070       $ 0.207            eastus
P100  2    Azure  Standard_NC12s_v2   -           12     224GB     $ 4.140       $ 0.414            eastus
P100  4    Azure  Standard_NC24rs_v2  -           24     448GB     $ 9.108       $ 0.911            eastus
P100  4    Azure  Standard_NC24s_v2   -           24     448GB     $ 8.280       $ 0.828            eastus
P100  1    OCI    VM.GPU2.1           16GB        24     72GB      $ 1.275       -                  eu-amsterdam-1
P100  2    OCI    BM.GPU2.2           16GB        56     256GB     $ 2.550       -                  eu-amsterdam-1

More generally, does this change handle any new GPUs that may get added but are not updated in the name_to_gpu_memory_in_mib dict?

Here's the catalog that got generated for me for reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Fixed both P100 and missing-GPU-info-not-shown issue (tested with this repro).

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @concretevitamin! LGTM.

@concretevitamin concretevitamin merged commit a946ed7 into master Mar 29, 2024
20 checks passed
@concretevitamin concretevitamin deleted the gcp-fetch-memory-info branch March 29, 2024 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants