Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supervisor hangs when OpAMP server backend is restarted #33799

Closed
acrmp opened this issue Jun 28, 2024 · 1 comment · Fixed by #34159
Closed

Supervisor hangs when OpAMP server backend is restarted #33799

acrmp opened this issue Jun 28, 2024 · 1 comment · Fixed by #34159
Labels
bug Something isn't working cmd/opampsupervisor priority:p1 High

Comments

@acrmp
Copy link

acrmp commented Jun 28, 2024

Component(s)

cmd/opampsupervisor

What happened?

Description

The supervisor appears to hang reconnecting to the OpAMP server backend when the server is restarted.

Steps to Reproduce

  1. Start the example OpAMP server
  2. Start the supervisor and see that it successfully starts
  3. See that the agent is visible in the example server UI
  4. Stop the example server (CTRL-C)
  5. Supervisor reports that the connection is closed and that it will retry to connect
  6. Start the example server again

Expected Result

Supervisor logs that it has reconnected to the server.

Actual Result

  • Supervisor does not log reconnection.
  • The agent is not visible in the OpAMP server UI

Collector version

7c573a9

Environment information

Environment

OpenTelemetry Collector configuration

No response

Log output

2024-06-28T01:47:10.752Z        ERROR   supervisor/logger.go:26 Connection failed (dial tcp 127.0.0.1:4320: connect: connection refused), will retry.
github.com/open-telemetry/opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor.(*opAMPLogger).Errorf
        /home/pivotal/workspace/opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor/logger.go:26
github.com/open-telemetry/opamp-go/client.(*wsClient).ensureConnected
        /home/pivotal/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.15.0/client/wsclient.go:207
github.com/open-telemetry/opamp-go/client.(*wsClient).runOneCycle
        /home/pivotal/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.15.0/client/wsclient.go:245
github.com/open-telemetry/opamp-go/client.(*wsClient).runUntilStopped
        /home/pivotal/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.15.0/client/wsclient.go:330
github.com/open-telemetry/opamp-go/client/internal.(*ClientCommon).StartConnectAndRun.func1
        /home/pivotal/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.15.0/client/internal/clientcommon.go:197

Additional context

It looks like the supervisor is blocking in the OnConnectFunc callback sending to the unbuffered connectedToOpAMPServer channel.

$ killall -3 opampsupervisor
...
goroutine 28 gp=0xc000102fc0 m=nil [chan send]:
runtime.gopark(0xa82340?, 0xc000025380?, 0x0?, 0x70?, 0x55?)
        /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0002abad0 sp=0xc0002abab0 pc=0x4402ae
runtime.chansend(0xc00010e300, 0xc0002abba7, 0x1, 0xc0002abb90?)
        /usr/local/go/src/runtime/chan.go:259 +0x38d fp=0xc0002abb40 sp=0xc0002abad0 pc=0x40b3cd
runtime.chansend1(0x18?, 0xc000035f80?)
        /usr/local/go/src/runtime/chan.go:145 +0x17 fp=0xc0002abb70 sp=0xc0002abb40 pc=0x40b037
github.com/open-telemetry/opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor.(*Supervisor).startOpAMPClient.func1({0x98fda0?, 0x0?})
        /home/pivotal/workspace/opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor/supervisor.go:390 +0x28 fp=0xc0002abbc0 sp=0xc0002abb70 pc=0x8ade48
github.com/open-telemetry/opamp-go/client/types.CallbacksStruct.OnConnect(...)
        /home/pivotal/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.15.0/client/types/callbacks.go:140
github.com/open-telemetry/opamp-go/client/types.(*CallbacksStruct).OnConnect(0xc00025c748?, {0xa86098?, 0xc00028c2d0?})
        <autogenerated>:1 +0x5e fp=0xc0002abc20 sp=0xc0002abbc0 pc=0x7f487e
github.com/open-telemetry/opamp-go/client.(*wsClient).tryConnectOnce(0xc00025c600, {0xa86098, 0xc00028c2d0})
        /home/pivotal/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.15.0/client/wsclient.go:178 +0x53f fp=0xc0002abce0 sp=0xc0002abc20 pc=0x89edff
github.com/open-telemetry/opamp-go/client.(*wsClient).ensureConnected(0xc00025c600, {0xa86098, 0xc00028c2d0})
        /home/pivotal/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.15.0/client/wsclient.go:201 +0x10c fp=0xc0002abd90 sp=0xc0002abce0 pc=0x89ef4c
github.com/open-telemetry/opamp-go/client.(*wsClient).runOneCycle(0xc00025c600, {0xa86098, 0xc00028c2d0})
        /home/pivotal/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.15.0/client/wsclient.go:245 +0x51 fp=0xc0002abf50 sp=0xc0002abd90 pc=0x89f1f1
github.com/open-telemetry/opamp-go/client.(*wsClient).runUntilStopped(0xc00025c600, {0xa86098, 0xc00028c2d0})
        /home/pivotal/go/pkg/mod/github.com/open-telemetry/opamp-go@v0.15.0/client/wsclient.go:330 +0x33 fp=0xc0002abf78 sp=0xc0002abf50 pc=0x89fab3
github.com/open-telemetry/opamp-go/client.(*wsClient).runUntilStopped-fm({0xa86098?, 0xc00028c2d0?})
        <autogenerated>:1 +0x33 fp=0xc0002abfa0 sp=0xc0002abf78 pc=0x89fc73
github.com/open-telemetry/opamp-go/client/internal.(*ClientCommon).StartConnectAndRun.func1()
...
@acrmp acrmp added bug Something isn't working needs triage New item requiring triage labels Jun 28, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cmd/opampsupervisor priority:p1 High
Projects
None yet
2 participants