Docker build failing for amd - LLAMA 2 70B #401

anandhu-eng · 2024-10-21T03:24:00Z

INFO:root:* cm run script "run docker container"
Traceback (most recent call last):
  File "/home/anandhu/.local/bin/cm", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/cli.py", line 37, in run
    r = cm.access(argv, out='con')
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 212, in run
    r = self._run(i)
        ^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 1474, in _run
    r = customize_code.preprocess(ii)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/run-mlperf-inference-app/customize.py", line 243, in preprocess
    r = cm.access(ii)
        ^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 758, in access
    return cm.access(i)
           ^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 4093, in docker
    return utils.call_internal_module(self, __file__, 'module_misc', 'docker', i)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/utils.py", line 1631, in call_internal_module
    return getattr(tmp_module, module_func)(i)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module_misc.py", line 2095, in docker
    r = self_module.cmind.access(cm_docker_input)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 212, in run
    r = self._run(i)
        ^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 1474, in _run
    r = customize_code.preprocess(ii)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/run-docker-container/customize.py", line 43, in preprocess
    DOCKER_CONTAINER = docker_image_repo + "/" + docker_image_name + ":" + docker_image_tag
                       ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
TypeError: can only concatenate str (not "NoneType") to str

The text was updated successfully, but these errors were encountered:

arjunsuresh · 2024-10-30T17:19:09Z

@anandhu-eng this is fixed now right?

anandhu-eng · 2024-10-30T17:49:09Z

Hi @arjunsuresh , i tried to run it now but i got the following error:

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1    --model=llama2-70b-99    --implementation=amd    --framework=pytorch    --category=datacenter    --scenario=Offline    --execution_mode=test    --device=cuda     --docker --quiet    --test_query_count=50 --env.CM_MLPERF_MODEL_LLAMA2_70B_DOWNLOAD_TO_HOST='yes'
INFO:root:* cm run script "run-mlperf inference _find-performance _full _r4.1"
INFO:root:  * cm run script "get mlcommons inference src"
INFO:root:       ! load /home/anandhu/CM/repos/local/cache/9efb0b6eb31d4e7e/cm-cached-state.json
INFO:root:  * cm run script "install pip-package for-cmind-python _package.tabulate"
INFO:root:       ! load /home/anandhu/CM/repos/local/cache/a2fa268cf0da4e5f/cm-cached-state.json
INFO:root:  * cm run script "get mlperf inference utils"
INFO:root:    * cm run script "get mlperf inference src"
INFO:root:         ! load /home/anandhu/CM/repos/local/cache/9efb0b6eb31d4e7e/cm-cached-state.json
INFO:root:         ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-utils/customize.py
Using MLCommons Inference source from /home/anandhu/CM/repos/local/cache/c451e090cdb24951/inference

Running loadgen scenario: Offline and mode: performance
INFO:root:* cm run script "build dockerfile"
cm pull repo && cm run script --tags=app,mlperf,inference,generic,_amd,_llama2-70b-99,_pytorch,_cuda,_test,_r4.1_default,_offline --quiet=true --env.CM_MLPERF_MODEL_LLAMA2_70B_DOWNLOAD_TO_HOST=yes --env.CM_QUIET=yes --env.CM_MLPERF_IMPLEMENTATION=amd --env.CM_MLPERF_MODEL=llama2-70b-99 --env.CM_MLPERF_RUN_STYLE=test --env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False --env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter --env.CM_MLPERF_DEVICE=cuda --env.CM_MLPERF_USE_DOCKER=True --env.CM_MLPERF_BACKEND=pytorch --env.CM_MLPERF_LOADGEN_SCENARIO=Offline --env.CM_TEST_QUERY_COUNT=50 --env.CM_MLPERF_FIND_PERFORMANCE_MODE=yes --env.CM_MLPERF_LOADGEN_ALL_MODES=no --env.CM_MLPERF_LOADGEN_MODE=performance --env.CM_MLPERF_RESULT_PUSH_TO_GITHUB=False --env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full --env.CM_MLPERF_INFERENCE_VERSION=4.1 --env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1_default --env.CM_MLPERF_LAST_RELEASE=v4.1 --env.CM_MODEL=llama2-70b-99 --env.CM_MLPERF_LOADGEN_COMPLIANCE=no --env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= --env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline --env.CM_MLPERF_LOADGEN_MODES,=performance --env.OUTPUT_BASE_DIR=/home/anandhu/CM/repos --env.CM_OUTPUT_FOLDER_NAME=test_results --add_deps_recursive.coco2014-original.tags=_full --add_deps_recursive.coco2014-preprocessed.tags=_full --add_deps_recursive.imagenet-original.tags=_full --add_deps_recursive.imagenet-preprocessed.tags=_full --add_deps_recursive.openimages-original.tags=_full --add_deps_recursive.openimages-preprocessed.tags=_full --add_deps_recursive.openorca-original.tags=_full --add_deps_recursive.openorca-preprocessed.tags=_full --add_deps_recursive.get-mlperf-inference-results-dir.tags=_version.r4_1 --add_deps_recursive.get-mlperf-inference-submission-dir.tags=_version.r4_1 --add_deps_recursive.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1 --v=False --print_env=False --print_deps=False --dump_version_info=True --quiet
Dockerfile written at /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/app-mlperf-inference/dockerfiles/pytorch:rocm6.1.2_ubuntu20.04_py3.9_pytorch_staging.Dockerfile

Dockerfile generated at /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/app-mlperf-inference/dockerfiles/pytorch:rocm6.1.2_ubuntu20.04_py3.9_pytorch_staging.Dockerfile
INFO:root:* cm run script "get docker"
INFO:root:     ! load /home/anandhu/CM/repos/local/cache/c28f6fb1b7884706/cm-cached-state.json
INFO:root:* cm run script "get mlperf inference submission dir local _version.r4_1"
INFO:root:     ! load /home/anandhu/CM/repos/local/cache/09af3aede75c4983/cm-cached-state.json
INFO:root:* cm run script "get nvidia-docker"
INFO:root:     ! load /home/anandhu/CM/repos/local/cache/3ba5606de4784db5/cm-cached-state.json

CM command line regenerated to be used inside Docker:

cm run script --tags=app,mlperf,inference,generic,_amd,_llama2-70b-99,_pytorch,_cuda,_test,_r4.1_default,_offline --quiet=true --env.CM_MLPERF_MODEL_LLAMA2_70B_DOWNLOAD_TO_HOST=yes --env.CM_QUIET=yes --env.CM_MLPERF_IMPLEMENTATION=amd --env.CM_MLPERF_MODEL=llama2-70b-99 --env.CM_MLPERF_RUN_STYLE=test --env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False --env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter --env.CM_MLPERF_DEVICE=cuda --env.CM_MLPERF_USE_DOCKER=True --env.CM_MLPERF_BACKEND=pytorch --env.CM_MLPERF_LOADGEN_SCENARIO=Offline --env.CM_TEST_QUERY_COUNT=50 --env.CM_MLPERF_FIND_PERFORMANCE_MODE=yes --env.CM_MLPERF_LOADGEN_ALL_MODES=no --env.CM_MLPERF_LOADGEN_MODE=performance --env.CM_MLPERF_RESULT_PUSH_TO_GITHUB=False --env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full --env.CM_MLPERF_INFERENCE_VERSION=4.1 --env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1_default --env.CM_MLPERF_LAST_RELEASE=v4.1 --env.CM_TMP_CURRENT_PATH=/home/anandhu/CM/repos --env.CM_TMP_PIP_VERSION_STRING= --env.CM_MODEL=llama2-70b-99 --env.CM_MLPERF_LOADGEN_COMPLIANCE=no --env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= --env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline --env.CM_MLPERF_LOADGEN_MODES,=performance --env.OUTPUT_BASE_DIR=/home/anandhu/CM/repos --env.CM_OUTPUT_FOLDER_NAME=test_results --add_deps_recursive.coco2014-original.tags=_full --add_deps_recursive.coco2014-preprocessed.tags=_full --add_deps_recursive.imagenet-original.tags=_full --add_deps_recursive.imagenet-preprocessed.tags=_full --add_deps_recursive.openimages-original.tags=_full --add_deps_recursive.openimages-preprocessed.tags=_full --add_deps_recursive.openorca-original.tags=_full --add_deps_recursive.openorca-preprocessed.tags=_full --add_deps_recursive.get-mlperf-inference-results-dir.tags=_version.r4_1 --add_deps_recursive.get-mlperf-inference-submission-dir.tags=_version.r4_1 --add_deps_recursive.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1 --v=False --print_env=False --print_deps=False --dump_version_info=True  --env.OUTPUT_BASE_DIR=/cm-mount/home/anandhu/CM/repos  --env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/home/cmuser/CM/repos/local/cache/09af3aede75c4983/mlperf-inference-submission  --docker_run_deps 


INFO:root:* cm run script "run docker container"

Checking existing Docker container:

  docker ps --filter "ancestor=local/cm-script-app-mlperf-inference-generic--amd--llama2-70b-99--pytorch--cuda--test--r4.1-default--offline:rocm/pytorch-rocm6.1.2ubuntu20.04py3.9pytorchstaging-latest"  2> /dev/null


Checking Docker images:

  docker images -q local/cm-script-app-mlperf-inference-generic--amd--llama2-70b-99--pytorch--cuda--test--r4.1-default--offline:rocm/pytorch-rocm6.1.2ubuntu20.04py3.9pytorchstaging-latest 2> /dev/null

INFO:root:  * cm run script "build docker image"
/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/build-docker-image/customize.py:57: SyntaxWarning: invalid escape sequence '\$'
  dockerfile_path = "\${CM_DOCKERFILE_WITH_PATH}"
================================================
CM generated the following Docker build command:

docker build  --build-arg GID=\" $(id -g $USER) \" --build-arg UID=\" $(id -u $USER) \" -f "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/app-mlperf-inference/dockerfiles/pytorch:rocm6.1.2_ubuntu20.04_py3.9_pytorch_staging.Dockerfile" -t "local/cm-script-app-mlperf-inference-generic--amd--llama2-70b-99--pytorch--cuda--test--r4.1-default--offline:rocm/pytorch-rocm6.1.2ubuntu20.04py3.9pytorchstaging-latest" .

INFO:root:         ! cd /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/app-mlperf-inference/dockerfiles
INFO:root:         ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/build-docker-image/run.sh from tmp-run.sh
[+] Building 0.0s (0/0)                                          docker:default
ERROR: invalid tag "local/cm-script-app-mlperf-inference-generic--amd--llama2-70b-99--pytorch--cuda--test--r4.1-default--offline:rocm/pytorch-rocm6.1.2ubuntu20.04py3.9pytorchstaging-latest": invalid reference format

CM error: Portable CM script failed (name = build-docker-image, return code = 256)


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note that it is often a portability issue of a third-party tool or a native script
wrapped and unified by this CM script (automation recipe). Please re-run
this script with --repro flag and report this issue with the original
command line, cm-repro directory and full log here:

https://github.com/mlcommons/cm4mlops/issues

The CM concept is to collaboratively fix such issues inside portable CM scripts
to make existing tools and native scripts more portable, interoperable
and deterministic. Thank you!

arjunsuresh · 2024-11-01T14:55:40Z

@anandhu-eng Is it happening now?

anandhu-eng self-assigned this Oct 21, 2024

This was referenced Oct 21, 2024

WIP - amd llama2 implementation GATEOverflow/inference#30

Merged

Added base image for amd r4.1 GATEOverflow/cm4mlops#154

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker build failing for amd - LLAMA 2 70B #401

Docker build failing for amd - LLAMA 2 70B #401

anandhu-eng commented Oct 21, 2024

arjunsuresh commented Oct 30, 2024

anandhu-eng commented Oct 30, 2024

arjunsuresh commented Nov 1, 2024

Docker build failing for amd - LLAMA 2 70B #401

Docker build failing for amd - LLAMA 2 70B #401

Comments

anandhu-eng commented Oct 21, 2024

arjunsuresh commented Oct 30, 2024

anandhu-eng commented Oct 30, 2024

arjunsuresh commented Nov 1, 2024