Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sidecar sequencer last seen in liveness heist #1907

Open
jclulow opened this issue Oct 18, 2024 · 2 comments
Open

sidecar sequencer last seen in liveness heist #1907

jclulow opened this issue Oct 18, 2024 · 2 comments

Comments

@jclulow
Copy link
Contributor

jclulow commented Oct 18, 2024

Sidecar/C 0XV2:913-0000006:010:BRM23230002 was just installed in the switch 0 position of the new dublin environment in Emeryville. This is my best recollection of the sequence of steps that lead to the issue:

  • brought all four Gimlets and both Sidecars to A2 using pilot sp off
  • brought both Scrimlets to A0 to get new host OS bits
  • once rebooted into new host bits, I was watching some logs on the Scrimlets
  • took both Sidecars to A0 using pilot sp on
  • one Sidecar came up fine, but BRM23230002 did not!
  • the fans on BRM23230002 have been running full tilt since the problem started

The Scrimlet for the busted Sidecar is Gimlet/C BRM42220026.

The SP for the Sidecar appears to be broadcasting and visible, but will not accept control plane agent commands. I was able to take a dump. I took another dump 17 minutes later and it seems like everything has come to rest waiting on the sequencer task:

 $ diff -u <(humility -d hubris.core.8 tasks ) <(humility -d hubris.core.9 tasks)
humility: attached to dump
humility: attached to dump
--- /dev/fd/63  Fri Oct 18 23:26:35 2024
+++ /dev/fd/62  Fri Oct 18 23:26:35 2024
@@ -1,28 +1,28 @@
-system time = 82544321
+system time = 83571110
 ID TASK                       GEN PRI STATE
- 0 jefe                         0   0 recv, notif: fault timer(T+79)
+ 0 jefe                         0   0 recv, notif: fault timer(T+90)
  1 sys                          0   1 recv, notif: exti-wildcard-irq(irq6/irq7/irq8/irq9/irq10/irq23/irq40)
  2 rng_driver                   0   6 recv
  3 update_server                0   3 recv
  4 auxflash                     0   3 recv
- 5 net                          0   5 recv, notif: eth-irq(irq61) wake-timer(T+48)
+ 5 net                          0   5 recv, notif: eth-irq(irq61) wake-timer(T+259)
  6 control_plane_agent          0   7 wait: send to sequencer/gen0
  7 sprot                        0   4 notif: rot-irq timer(T+992)
  8 udpecho                      0   6 notif: socket
- 9 udpbroadcast                 0   6 notif: bit31(T+100)
+ 9 udpbroadcast                 0   6 notif: bit31(T+152)
 10 monorail                     0   6 wait: send to sequencer/gen0
 11 i2c_driver                   0   2 recv
-12 hiffy                        0   5 notif: bit31(T+124)
+12 hiffy                        0   5 notif: bit31(T+176)
 13 sensor                       0   4 recv
 14 ecp5_mainboard               0   3 recv
 15 ecp5_front_io                0   3 recv
-16 transceivers                 0   6 recv, notif: socket timer(T+93)
+16 transceivers                 0   6 recv, notif: socket timer(T+4)
 17 packrat                      0   3 recv
-18 sequencer                    0   4 notif: bit31(T+1)
+18 sequencer                    0   4 notif: bit31(T+2)
 19 thermal                      0   5 wait: send to sequencer/gen0
 20 power                        0   6 wait: send to sequencer/gen0
 21 validate                     0   5 recv
-22 ignition                     0   5 recv, notif: timer(T+275)
+22 ignition                     0   5 recv, notif: timer(T+486)
 23 vpd                          0   3 recv
 24 dump_agent                   0   6 wait: reply from sprot/gen0
 25 idle                         0   8 RUNNING
@jclulow
Copy link
Contributor Author

jclulow commented Oct 18, 2024

Dumps and SP archive in use at the time:

$ ls -lh /staff/core/dublin/hubris-1907
total 28067
-rw-rw-r--+  1 jclulow  staff      7.24M Oct 18 23:08 hubris.core.8
-rw-rw-r--+  1 jclulow  staff      7.24M Oct 18 23:25 hubris.core.9
-rw-rw-r--+  1 jclulow  staff      6.17M Oct 18 23:29 sp.zip

@jclulow
Copy link
Contributor Author

jclulow commented Oct 18, 2024

I removed power from the unit and reconnected it, and it has since come online as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant