Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal: unable to access 'https://github.com/mlcommons/ck/': GnuTLS recv error (-9): Error decoding the received TLS packet. #1177

Closed
KingICCrab opened this issue Mar 17, 2024 · 8 comments

Comments

@KingICCrab
Copy link

I want to reproduce nvidia-bert https://github.com/mlcommons/ck/blob/master/docs/mlperf/inference/bert/README_nvidia.md#build-nvidia-docker-container-from-31-inference-round
when I run "cm docker script --tags=build,nvidia,inference,server", I encounter some problems.
=> ERROR [10/12] RUN cm pull repo mlcommons@ck 104.6s

[10/12] RUN cm pull repo mlcommons@ck:
0.255 Cloning into 'mlcommons@ck'...
104.5 error: RPC failed; curl 92 HTTP/2 stream 0 was not closed cleanly: CANCEL (err 8)
104.5 fatal: the remote end hung up unexpectedly
104.5 fatal: early EOF
104.5 fatal: index-pack failed
104.5 Warning: CM index is used for the first time. CM will reindex all artifacts now - it may take some time ...
104.5 =======================================================
104.5 Alias: mlcommons@ck
104.5 URL: https://github.com/mlcommons/ck
104.5
104.5 Local path: /home/cmuser/CM/repos/mlcommons@ck
104.5
104.5 git clone https://github.com/mlcommons/ck mlcommons@ck
104.5
104.5
104.5 CM error: repository was not cloned!


mlperf-inference:mlpinf-v3.1-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-l4-public.Dockerfile:32

30 |
31 | # Download CM repo for scripts
32 | >>> RUN cm pull repo mlcommons@ck
33 |
34 | # Install all system dependencies

ERROR: failed to solve: process "/bin/bash -c cm pull repo mlcommons@ck" did not complete successfully: exit code: 1

CM error: Portable CM script failed (name = build-docker-image, return code = 256)

@gfursin
Copy link
Contributor

gfursin commented Mar 17, 2024

I think the problem is that GitHub was down or you don't have an access to it.
Can you please try git clone https://github.com/mlcommons/ck mlcommons@ck in some temp directory to check if it works and then restart the cm command when it's working? Please tell us if it helps! Thanks!

@gfursin
Copy link
Contributor

gfursin commented Mar 19, 2024

@KingICCrab - did you try again to see if it works? I believe it's a network issue - it happens with GitHub from time to time ;) ...

@KingICCrab
Copy link
Author

Thank you for your consideration!
I‘m sorry. I temporarily give up reproducing it, because I know about docker little.

@gfursin
Copy link
Contributor

gfursin commented Mar 19, 2024

Thank you for your consideration! I‘m sorry. I temporarily give up reproducing it, because I know about docker little.

No problem. What I meant is that may I ask you to retry the same CM command and see if it works now:

cm docker script --tags=build,nvidia,inference,server

When there is a network issue, CM should restart building Docker container at the place it failed ...
Thanks!

@KingICCrab
Copy link
Author

KingICCrab commented Mar 19, 2024

After I run the command, the error is following.
(These words are red!)
Cloning into 'repo'...
error: RPC failed; curl 28 Failed to connect to github.com port 443: Connection timed out
fatal: the remote end hung up unexpectedly
Traceback (most recent call last):
File "/home/cmuser/.local/bin/cm", line 8, in
sys.exit(run())
File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/cli.py", line 35, in run
r = cm.access(argv, out='con')
File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 587, in access
r = action_addr(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 193, in run
r = self._run(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 1281, in _run
r = self._call_run_deps(deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 2699, in _call_run_deps
r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 2854, in _run_deps
r = self.cmind.access(ii)
File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 587, in access
r = action_addr(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 193, in run
r = self._run(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 1281, in _run
r = self._call_run_deps(deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 2699, in _call_run_deps
r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 2854, in _run_deps
r = self.cmind.access(ii)
File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 587, in access
r = action_addr(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 193, in run
r = self._run(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 1454, in _run
r = self._call_run_deps(prehook_deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 2699, in _call_run_deps
r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 2854, in _run_deps
r = self.cmind.access(ii)
File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 587, in access
r = action_addr(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 193, in run
r = self._run(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 1596, in _run
if dependent_cached_path != '' and not os.path.samefile(cached_path, dependent_cached_path):
File "/usr/lib/python3.8/genericpath.py", line 101, in samefile
s2 = os.stat(f2)
FileNotFoundError: [Errno 2] No such file or directory: '/home/cmuser/CM/repos/local/cache/9d809940ee024b38/repo'

@gfursin
Copy link
Contributor

gfursin commented Mar 19, 2024

Interesting. Thank you very much again for your feedback @KingICCrab - we didn't encounter such case before and will need to CM support to handle it in a better way! I will keep this ticket open to check it when we have time ... Thanks again!

@gfursin
Copy link
Contributor

gfursin commented Mar 20, 2024

I improved handling of broken CM repositories (when, for example, GitHub fails): c39caa3 . It should be available in the next CM release v2.0.3 ...

pgmpablo157321 pushed a commit that referenced this issue Mar 20, 2024
   - added support to handle broken CM repositories: #1177
   - added "cm checkout repo mlcommons@ck --branch=dev" to make it easier to switch branches
   - added "cm import repo" to import repository in the current directory
arjunsuresh pushed a commit to mlcommons/cm4mlops that referenced this issue May 1, 2024
   - added support to handle broken CM repositories: mlcommons/ck#1177
   - added "cm checkout repo mlcommons@ck --branch=dev" to make it easier to switch branches
   - added "cm import repo" to import repository in the current directory
arjunsuresh pushed a commit to mlcommons/cm4mlops that referenced this issue May 1, 2024
   - added support to handle broken CM repositories: mlcommons/ck#1177
   - added "cm checkout repo mlcommons@ck --branch=dev" to make it easier to switch branches
   - added "cm import repo" to import repository in the current directory
@gfursin
Copy link
Contributor

gfursin commented Nov 10, 2024

I believe it's fixed.

@gfursin gfursin closed this as completed Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants