Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman network crash on container restart #985

Open
gpb88 opened this issue May 8, 2024 · 4 comments
Open

podman network crash on container restart #985

gpb88 opened this issue May 8, 2024 · 4 comments

Comments

@gpb88
Copy link

gpb88 commented May 8, 2024

podman version 4.4.1, netavark 1.5.1-3 on RHEL 9.2

Hello, I'm trying to create a docker registry using podman generate systemd command from the container specified below:

podman run -d \
  --restart=always \
  --name undercloud_registry \
  -v /var/lib/registry:/var/lib/registry:z \
  -e REGISTRY_HTTP_ADDR=0.0.0.0:443 \
  -e REGISTRY_HTTP_TLS_CERTIFICATE=my.crt \
  -e REGISTRY_HTTP_TLS_KEY=my.key \
  -p 5001:443 \
  registry:2.8

after creating and restarting the service container becomes unreachable and the following stack trace is generated with added RUST_BACKTRACE=full environment variable

May 08 09:59:45 undercloud.localdomain podman[50105]: thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/network/core_util>
May 08 09:59:45 undercloud.localdomain podman[50105]: stack backtrace:
May 08 09:59:45 undercloud.localdomain podman[50105]:    0:     0x55e7c0bd5330 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>>
May 08 09:59:45 undercloud.localdomain podman[50105]:    1:     0x55e7c0bfb5de - core::fmt::write::hf73517e03618b68a
May 08 09:59:45 undercloud.localdomain podman[50105]:    2:     0x55e7c0bcede5 - std::io::Write::write_fmt::ha59b6aaf3044415f
May 08 09:59:45 undercloud.localdomain podman[50105]:    3:     0x55e7c0bd50f5 - std::sys_common::backtrace::print::h70f969e3dcb4e070
May 08 09:59:45 undercloud.localdomain podman[50105]:    4:     0x55e7c0bd6b4f - std::panicking::default_hook::{{closure}}::h9c26e5b40ab0b31e
May 08 09:59:45 undercloud.localdomain podman[50105]:    5:     0x55e7c0bd688a - std::panicking::default_hook::h138a2f5c3510240a
May 08 09:59:45 undercloud.localdomain podman[50105]:    6:     0x55e7c0bd7248 - std::panicking::rust_panic_with_hook::he5ad11b5d9aa0674
May 08 09:59:45 undercloud.localdomain podman[50105]:    7:     0x55e7c0bd6fe7 - std::panicking::begin_panic_handler::{{closure}}::h98480563b41f54f7
May 08 09:59:45 undercloud.localdomain podman[50105]:    8:     0x55e7c0bd57dc - std::sys_common::backtrace::__rust_end_short_backtrace::hb17f2b0009103e03
May 08 09:59:45 undercloud.localdomain podman[50105]:    9:     0x55e7c0bd6d02 - rust_begin_unwind
May 08 09:59:45 undercloud.localdomain podman[50105]:   10:     0x55e7c06c6cf3 - core::panicking::panic_fmt::h44aac608d1d1dad0
May 08 09:59:45 undercloud.localdomain podman[50105]:   11:     0x55e7c06c6e42 - core::panicking::panic_bounds_check::hcd2db91c8637a207
May 08 09:59:45 undercloud.localdomain podman[50105]:   12:     0x55e7c086029c - netavark::network::core_utils::get_ipam_addresses::h1bbd2be8882fa79c
May 08 09:59:45 undercloud.localdomain podman[50105]:   13:     0x55e7c07ec434 - <netavark::network::bridge::Bridge as netavark::network::driver::NetworkDriv>
May 08 09:59:45 undercloud.localdomain podman[50105]:   14:     0x55e7c073e4c7 - netavark::commands::teardown::Teardown::exec::h064104012377eee2
May 08 09:59:45 undercloud.localdomain podman[50105]:   15:     0x55e7c06cff32 - netavark::main::h44ca175015cd74e8
May 08 09:59:45 undercloud.localdomain podman[50105]:   16:     0x55e7c06d3523 - std::sys_common::backtrace::__rust_begin_short_backtrace::hf3589772bfe0db25
May 08 09:59:45 undercloud.localdomain podman[50105]:   17:     0x55e7c06d3539 - std::rt::lang_start::{{closure}}::h9decd9269fd8b204
May 08 09:59:45 undercloud.localdomain podman[50105]:   18:     0x55e7c0bc940b - std::rt::lang_start_internal::hec2052e3116d99c3
May 08 09:59:45 undercloud.localdomain podman[50105]:   19:     0x55e7c06d0865 - main
May 08 09:59:45 undercloud.localdomain podman[50105]:   20:     0x7f97b583feb0 - __libc_start_call_main
May 08 09:59:45 undercloud.localdomain podman[50105]:   21:     0x7f97b583ff60 - __libc_start_main_alias_1
May 08 09:59:45 undercloud.localdomain podman[50105]:   22:     0x55e7c06c7145 - _start
May 08 09:59:45 undercloud.localdomain podman[50105]:   23:                0x0 - <unknown>
May 08 09:59:45 undercloud.localdomain podman[50089]: time="2024-05-08T09:59:45Z" level=error msg="IPAM error: failed to get subnet bucket for network podman"
May 08 09:59:45 undercloud.localdomain podman[50089]: time="2024-05-08T09:59:45Z" level=error msg="Unable to clean up network for container a30cb98efc467bbb0>
May 08 09:59:45 undercloud.localdomain podman[50089]: 2024-05-08 09:59:45.150703382 +0000 UTC m=+0.183420734 container cleanup a30cb98efc467bbb022f37faf5bfa9>
May 08 09:59:45 undercloud.localdomain podman[50089]: a30cb98efc467bbb022f37faf5bfa94990f74aab583c87ce49dd62fedd7ffabf
May 08 09:59:45 undercloud.localdomain systemd[1]: undercloud_registry.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

subsequent service restarts seem to not generate same error, however the container is still unreachable

podman network details

[
     {
          "name": "podman",
          "id": "2f259bab93aaaaa2542ba43ef33eb990d0999ee1b9924b557b7be53c0b7a1bb9",
          "driver": "bridge",
          "network_interface": "podman0",
          "created": "2024-05-08T09:42:52.031751645Z",
          "subnets": [
               {
                    "subnet": "10.255.255.0/24",
                    "gateway": "10.255.255.1"
               },
               {
                    "subnet": "fc00:2222:3333::/64",
                    "gateway": "fc00:2222:3333::1"
               }
          ],
          "ipv6_enabled": true,
          "internal": false,
          "dns_enabled": false,
          "ipam_options": {
               "driver": "host-local"
          }
     }
]
@Luap99
Copy link
Member

Luap99 commented May 8, 2024

First keep in mind that we only support the latest version upstream so for issue sin RHEL you should contact the Red Hat support.

That said this is most likly caused because you use --restart always and within a systemd unit this is not recommend at all and will break many things. You should instead set the restart policy on the systemd unit instead (podman generate systemd --restart-policy)

@baude
Copy link
Member

baude commented Jun 24, 2024

can we consider this closed?

@Luap99
Copy link
Member

Luap99 commented Jun 24, 2024

The panic place should still be fixed. If the config is not valid it should return a proper error or on teardown try to cleanup as much as possible.

@gpb88
Copy link
Author

gpb88 commented Jul 11, 2024

I wanted to give an update on the issue. In the end the issue came from something unrelated to the --restart always option.

The container in question was started using default podman network. However later on during our deployment the default network subnet was changed while the container was still running. Since the container still had the old address, netavark did not know how to clean up the network after container was stopped.

While it is clearly an user error on out part I still believe it should be handled as right now it completely bricks up the port. An error message informing the user that the address is no longer within network's subnet would probably have made it it a lot easier to debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants