Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support claiming multiple type devices resources requests&limits #1121

Closed
lizhiboo opened this issue Aug 28, 2024 · 0 comments
Closed

support claiming multiple type devices resources requests&limits #1121

lizhiboo opened this issue Aug 28, 2024 · 0 comments

Comments

@lizhiboo
Copy link
Contributor

Motivation:
Arena uses nvidia gpu by default, haven't yet supported other chip vendors such as AMD, Ascend, Hygon etc.

Design:
add --device parameter to set gpu request in Pod's resources, as below:

      resources:
        limits:
          cpu: "10"
          memory: 32Gi
          hygon.com/dcu: 1
        requests:
          cpu: "10"
          memory: 32Gi
          hygon.com/dcu: 1

Usage:

arena submit tfjob \
    --name=tfjobtest\
    --working-dir=/root \
    --ps-gpus=1 \
    --ps=1 \
    --workers=1 \
    --device=hygon.com/dcu=1 \
    --data-dir=/usr/local/hg-lib:/usr/local/hg-lib \
    --image=xxx:ascend_tensorflow_test \
    'sh -c train.sh'


arena serve custom \
    --name=cstest\
    --replicas=1 \
    --port=80 \
    --device=huawei.com/Ascend910=1 \
    --data-dir=/usr/local/ascend910-driver:/usr/local/ascend910-driver \
    --image=xxx:ascend-test \
    --command="sh train.sh"
google-oss-prow bot pushed a commit that referenced this issue Sep 3, 2024
Signed-off-by: lizhiboo <lizhiboo@yeah.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant