-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dump-agent gets stuck in send to net #1797
Comments
humility dump -l
can leave hiffy in a bad state after timing out
not just eliza@niles ~ $ pfexec humility -t gimlet-c jefe -f hiffy
humility: attached to 0483:3754:000D00344741500820383733 via ST-Link V3
humility: successfully changed disposition for hiffy
eliza@niles ~ $ pfexec humility -t gimlet-c jefe -r hiffy
humility: attached to 0483:3754:000D00344741500820383733 via ST-Link V3
humility: successfully changed disposition for hiffy
eliza@niles ~ $ pfexec humility -t gimlet-c dump --all
humility dump failed: Probe could not be created
Caused by:
Probe was not found.
eliza@niles ~ $ pfexec humility -t gimlet-c dump --all
humility: attached to 0483:3754:000D00344741500820383733 via ST-Link V3
humility: using hiffy dump agent
humility dump failed: operation timed out
eliza@niles ~ $ |
Here's the core, for posterity. The power sequencer is hung attempting to program the FPGA, which isn't responding by pulling CDONE low. This is blocking the net task from making progress, because it can't interact with its devices until power sequencing is complete. The dump-agent in turn is patiently waiting for that, for better or worse. Here's the sequencer log:
And based on its interactions with the SPI driver, it's locking and unlocking without sending data:
...which means, on review of the code, that it's failing this check: // At this point, the iCE40 is _supposed_ to be chilling in programming mode
// listening for a bitstream. If this is the case it will be asserting
// (holding low) CDONE. Let's check!
if sys.gpio_read(config.cdone) != 0 {
// Welp, that sure didn't work.
return Err(Ice40Error::ChipNotListening);
} |
...so it turns out somebody flashed a Gimlet SP image onto a PSC SP board, and that's what we're observing here. The PSC doesn't have a sequencer FPGA, so this is getting stuck waiting for an acknowledge from a missing chip. That part is working roughly as expected. We don't currently have a consistent "board ID pins" scheme between boards that could catch this problem. |
On reflection, this machine will have been asserting the SP boot failure ignition fault pin, which is working basically as designed. FWIW. |
On a system with a whole lot of dumps (e.g. the gimlet from #1779, which has a
thermal
task that's crashed 7483 times), runninghumility dump -l
can time out. When this occurs, I've seen thehiffy
task left in a bad state where no other commands that usehiffy
can succeed.For instance:
manually restarting the
hiffy
task usinghumility jefe -fault hiffy
andhumility jefe --release hiffy
, seems to unstick it, until the next time that you runhumility dump -l
.The text was updated successfully, but these errors were encountered: