You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug @ericxtang and I were running a big load test. After about 400ish streams, something broke that we're still investigating. The working theory is that we ran out of bandwidth in the transcoding cluster.
Afterward, lots of the broadcasters in the cluster believed they still had streams. There was no data flowing into the B, but GET /status still showed a lot in there, and the logs for the B showed that they were consistently doing a watchdog reset:
Describe the bug
@ericxtang and I were running a big load test. After about 400ish streams, something broke that we're still investigating. The working theory is that we ran out of bandwidth in the transcoding cluster.
Afterward, lots of the broadcasters in the cluster believed they still had streams. There was no data flowing into the B, but
GET /status
still showed a lot in there, and the logs for the B showed that they were consistently doing a watchdog reset:Full unredacted logs and
/status
output in internal Livepeer Discord here.Seems like maybe there's a failure condition that we're not considering that's causing the session to remain open?
To Reproduce
I don't yet have a repro outside of stressing one of our clusters in a bandwidth-constrained setting
Expected behavior
The streams should timeout and return to 0
The text was updated successfully, but these errors were encountered: