Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix docker-wait-any script crash issue #19009

Merged
merged 2 commits into from
May 22, 2024

Conversation

liuh-80
Copy link
Contributor

@liuh-80 liuh-80 commented May 20, 2024

Fix docker-wait-any script crash issue.

Why I did it

docker-wait-any script will create a waiting thread for multiple containers.
When any container thread exit, g_thread_exit_event will set, and main thread will exit.
However when this happen, some thread may still waiting container with following code:
docker_client.wait(container_name)
Because docker_client will be destroyed when main thread exist, some time wait method will throw TypeError, and this will cause swss.sh crash then swss container can't start:

<30>May 13 07:11:22 DEVICE_NAME swss.sh[13603]: Traceback (most recent call last):
<30>May 13 07:11:22 DEVICE_NAME swss.sh[13603]: Exception in thread Thread-1:
...
<30>May 13 07:11:22 DEVICE_NAME swss.sh[13603]: while docker_client.inspect_container(container_name)['State']['Status'] != "running":
...
<30>May 13 07:11:23 DEVICE_NAME swss.sh[13603]: TypeError: 'NoneType' object is not callable
<13>May 13 07:11:23 DEVICE_NAME root: Stopping swss service...

This PR is based on analyze result of following PR: #10812

Work item tracking
  • Microsoft ADO: 28052815

How I did it

Ignore TypeError and exit current wait thread when g_thread_exit_event set.

How to verify it

Pass all UT.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Fix docker-wait-any script crash issue.

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@liuh-80 liuh-80 marked this pull request as ready for review May 20, 2024 08:28
@liuh-80 liuh-80 requested a review from lguohan as a code owner May 20, 2024 08:28
@liuh-80 liuh-80 requested review from qiluo-msft and prsunny May 20, 2024 08:29
@prsunny prsunny requested a review from prabhataravind May 20, 2024 16:04
@prsunny
Copy link
Contributor

prsunny commented May 20, 2024

lgtm, do we need #10812 ? @vaibhavhd with this PR?

@liuh-80
Copy link
Contributor Author

liuh-80 commented May 21, 2024

lgtm, do we need #10812 ? @vaibhavhd with this PR?

We don't need #10812, there already similar change in current code:

    # Signal the main thread to exit
    g_thread_exit_event.set()
    return <== similar with #10812, exit current thread after set exit event.

Copy link
Contributor

@prabhataravind prabhataravind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@lguohan lguohan merged commit 9c18b27 into sonic-net:master May 22, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants