Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker build failing for amd - LLAMA 2 70B #401

Open
anandhu-eng opened this issue Oct 21, 2024 · 3 comments
Open

Docker build failing for amd - LLAMA 2 70B #401

anandhu-eng opened this issue Oct 21, 2024 · 3 comments
Assignees

Comments

@anandhu-eng
Copy link
Contributor

INFO:root:* cm run script "run docker container"
Traceback (most recent call last):
  File "/home/anandhu/.local/bin/cm", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/cli.py", line 37, in run
    r = cm.access(argv, out='con')
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 212, in run
    r = self._run(i)
        ^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 1474, in _run
    r = customize_code.preprocess(ii)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/run-mlperf-inference-app/customize.py", line 243, in preprocess
    r = cm.access(ii)
        ^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 758, in access
    return cm.access(i)
           ^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 4093, in docker
    return utils.call_internal_module(self, __file__, 'module_misc', 'docker', i)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/utils.py", line 1631, in call_internal_module
    return getattr(tmp_module, module_func)(i)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module_misc.py", line 2095, in docker
    r = self_module.cmind.access(cm_docker_input)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 212, in run
    r = self._run(i)
        ^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 1474, in _run
    r = customize_code.preprocess(ii)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/run-docker-container/customize.py", line 43, in preprocess
    DOCKER_CONTAINER = docker_image_repo + "/" + docker_image_name + ":" + docker_image_tag
                       ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
TypeError: can only concatenate str (not "NoneType") to str
@arjunsuresh
Copy link
Contributor

@anandhu-eng this is fixed now right?

@anandhu-eng
Copy link
Contributor Author

Hi @arjunsuresh , i tried to run it now but i got the following error:

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1    --model=llama2-70b-99    --implementation=amd    --framework=pytorch    --category=datacenter    --scenario=Offline    --execution_mode=test    --device=cuda     --docker --quiet    --test_query_count=50 --env.CM_MLPERF_MODEL_LLAMA2_70B_DOWNLOAD_TO_HOST='yes'
INFO:root:* cm run script "run-mlperf inference _find-performance _full _r4.1"
INFO:root:  * cm run script "get mlcommons inference src"
INFO:root:       ! load /home/anandhu/CM/repos/local/cache/9efb0b6eb31d4e7e/cm-cached-state.json
INFO:root:  * cm run script "install pip-package for-cmind-python _package.tabulate"
INFO:root:       ! load /home/anandhu/CM/repos/local/cache/a2fa268cf0da4e5f/cm-cached-state.json
INFO:root:  * cm run script "get mlperf inference utils"
INFO:root:    * cm run script "get mlperf inference src"
INFO:root:         ! load /home/anandhu/CM/repos/local/cache/9efb0b6eb31d4e7e/cm-cached-state.json
INFO:root:         ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-utils/customize.py
Using MLCommons Inference source from /home/anandhu/CM/repos/local/cache/c451e090cdb24951/inference

Running loadgen scenario: Offline and mode: performance
INFO:root:* cm run script "build dockerfile"
cm pull repo && cm run script --tags=app,mlperf,inference,generic,_amd,_llama2-70b-99,_pytorch,_cuda,_test,_r4.1_default,_offline --quiet=true --env.CM_MLPERF_MODEL_LLAMA2_70B_DOWNLOAD_TO_HOST=yes --env.CM_QUIET=yes --env.CM_MLPERF_IMPLEMENTATION=amd --env.CM_MLPERF_MODEL=llama2-70b-99 --env.CM_MLPERF_RUN_STYLE=test --env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False --env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter --env.CM_MLPERF_DEVICE=cuda --env.CM_MLPERF_USE_DOCKER=True --env.CM_MLPERF_BACKEND=pytorch --env.CM_MLPERF_LOADGEN_SCENARIO=Offline --env.CM_TEST_QUERY_COUNT=50 --env.CM_MLPERF_FIND_PERFORMANCE_MODE=yes --env.CM_MLPERF_LOADGEN_ALL_MODES=no --env.CM_MLPERF_LOADGEN_MODE=performance --env.CM_MLPERF_RESULT_PUSH_TO_GITHUB=False --env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full --env.CM_MLPERF_INFERENCE_VERSION=4.1 --env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1_default --env.CM_MLPERF_LAST_RELEASE=v4.1 --env.CM_MODEL=llama2-70b-99 --env.CM_MLPERF_LOADGEN_COMPLIANCE=no --env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= --env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline --env.CM_MLPERF_LOADGEN_MODES,=performance --env.OUTPUT_BASE_DIR=/home/anandhu/CM/repos --env.CM_OUTPUT_FOLDER_NAME=test_results --add_deps_recursive.coco2014-original.tags=_full --add_deps_recursive.coco2014-preprocessed.tags=_full --add_deps_recursive.imagenet-original.tags=_full --add_deps_recursive.imagenet-preprocessed.tags=_full --add_deps_recursive.openimages-original.tags=_full --add_deps_recursive.openimages-preprocessed.tags=_full --add_deps_recursive.openorca-original.tags=_full --add_deps_recursive.openorca-preprocessed.tags=_full --add_deps_recursive.get-mlperf-inference-results-dir.tags=_version.r4_1 --add_deps_recursive.get-mlperf-inference-submission-dir.tags=_version.r4_1 --add_deps_recursive.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1 --v=False --print_env=False --print_deps=False --dump_version_info=True --quiet
Dockerfile written at /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/app-mlperf-inference/dockerfiles/pytorch:rocm6.1.2_ubuntu20.04_py3.9_pytorch_staging.Dockerfile

Dockerfile generated at /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/app-mlperf-inference/dockerfiles/pytorch:rocm6.1.2_ubuntu20.04_py3.9_pytorch_staging.Dockerfile
INFO:root:* cm run script "get docker"
INFO:root:     ! load /home/anandhu/CM/repos/local/cache/c28f6fb1b7884706/cm-cached-state.json
INFO:root:* cm run script "get mlperf inference submission dir local _version.r4_1"
INFO:root:     ! load /home/anandhu/CM/repos/local/cache/09af3aede75c4983/cm-cached-state.json
INFO:root:* cm run script "get nvidia-docker"
INFO:root:     ! load /home/anandhu/CM/repos/local/cache/3ba5606de4784db5/cm-cached-state.json

CM command line regenerated to be used inside Docker:

cm run script --tags=app,mlperf,inference,generic,_amd,_llama2-70b-99,_pytorch,_cuda,_test,_r4.1_default,_offline --quiet=true --env.CM_MLPERF_MODEL_LLAMA2_70B_DOWNLOAD_TO_HOST=yes --env.CM_QUIET=yes --env.CM_MLPERF_IMPLEMENTATION=amd --env.CM_MLPERF_MODEL=llama2-70b-99 --env.CM_MLPERF_RUN_STYLE=test --env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False --env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter --env.CM_MLPERF_DEVICE=cuda --env.CM_MLPERF_USE_DOCKER=True --env.CM_MLPERF_BACKEND=pytorch --env.CM_MLPERF_LOADGEN_SCENARIO=Offline --env.CM_TEST_QUERY_COUNT=50 --env.CM_MLPERF_FIND_PERFORMANCE_MODE=yes --env.CM_MLPERF_LOADGEN_ALL_MODES=no --env.CM_MLPERF_LOADGEN_MODE=performance --env.CM_MLPERF_RESULT_PUSH_TO_GITHUB=False --env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full --env.CM_MLPERF_INFERENCE_VERSION=4.1 --env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1_default --env.CM_MLPERF_LAST_RELEASE=v4.1 --env.CM_TMP_CURRENT_PATH=/home/anandhu/CM/repos --env.CM_TMP_PIP_VERSION_STRING= --env.CM_MODEL=llama2-70b-99 --env.CM_MLPERF_LOADGEN_COMPLIANCE=no --env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= --env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline --env.CM_MLPERF_LOADGEN_MODES,=performance --env.OUTPUT_BASE_DIR=/home/anandhu/CM/repos --env.CM_OUTPUT_FOLDER_NAME=test_results --add_deps_recursive.coco2014-original.tags=_full --add_deps_recursive.coco2014-preprocessed.tags=_full --add_deps_recursive.imagenet-original.tags=_full --add_deps_recursive.imagenet-preprocessed.tags=_full --add_deps_recursive.openimages-original.tags=_full --add_deps_recursive.openimages-preprocessed.tags=_full --add_deps_recursive.openorca-original.tags=_full --add_deps_recursive.openorca-preprocessed.tags=_full --add_deps_recursive.get-mlperf-inference-results-dir.tags=_version.r4_1 --add_deps_recursive.get-mlperf-inference-submission-dir.tags=_version.r4_1 --add_deps_recursive.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1 --v=False --print_env=False --print_deps=False --dump_version_info=True  --env.OUTPUT_BASE_DIR=/cm-mount/home/anandhu/CM/repos  --env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/home/cmuser/CM/repos/local/cache/09af3aede75c4983/mlperf-inference-submission  --docker_run_deps 


INFO:root:* cm run script "run docker container"

Checking existing Docker container:

  docker ps --filter "ancestor=local/cm-script-app-mlperf-inference-generic--amd--llama2-70b-99--pytorch--cuda--test--r4.1-default--offline:rocm/pytorch-rocm6.1.2ubuntu20.04py3.9pytorchstaging-latest"  2> /dev/null


Checking Docker images:

  docker images -q local/cm-script-app-mlperf-inference-generic--amd--llama2-70b-99--pytorch--cuda--test--r4.1-default--offline:rocm/pytorch-rocm6.1.2ubuntu20.04py3.9pytorchstaging-latest 2> /dev/null

INFO:root:  * cm run script "build docker image"
/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/build-docker-image/customize.py:57: SyntaxWarning: invalid escape sequence '\$'
  dockerfile_path = "\${CM_DOCKERFILE_WITH_PATH}"
================================================
CM generated the following Docker build command:

docker build  --build-arg GID=\" $(id -g $USER) \" --build-arg UID=\" $(id -u $USER) \" -f "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/app-mlperf-inference/dockerfiles/pytorch:rocm6.1.2_ubuntu20.04_py3.9_pytorch_staging.Dockerfile" -t "local/cm-script-app-mlperf-inference-generic--amd--llama2-70b-99--pytorch--cuda--test--r4.1-default--offline:rocm/pytorch-rocm6.1.2ubuntu20.04py3.9pytorchstaging-latest" .

INFO:root:         ! cd /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/app-mlperf-inference/dockerfiles
INFO:root:         ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/build-docker-image/run.sh from tmp-run.sh
[+] Building 0.0s (0/0)                                          docker:default
ERROR: invalid tag "local/cm-script-app-mlperf-inference-generic--amd--llama2-70b-99--pytorch--cuda--test--r4.1-default--offline:rocm/pytorch-rocm6.1.2ubuntu20.04py3.9pytorchstaging-latest": invalid reference format

CM error: Portable CM script failed (name = build-docker-image, return code = 256)


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note that it is often a portability issue of a third-party tool or a native script
wrapped and unified by this CM script (automation recipe). Please re-run
this script with --repro flag and report this issue with the original
command line, cm-repro directory and full log here:

https://github.com/mlcommons/cm4mlops/issues

The CM concept is to collaboratively fix such issues inside portable CM scripts
to make existing tools and native scripts more portable, interoperable
and deterministic. Thank you!

@arjunsuresh
Copy link
Contributor

@anandhu-eng Is it happening now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants