Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hesitation/Reboot with USB plugged in but no software connection open #48

Closed
mulcmu opened this issue Dec 30, 2020 · 10 comments
Closed

Comments

@mulcmu
Copy link

mulcmu commented Dec 30, 2020

I have encountered hesitation with the printer motion printing from SD card whenever the USB cable is physically attached to laptop and no software on laptop has the port open. Normal operation is resumed by unplugging the cable or opening the serial connection on the laptop. While recording video to post, a reboot was encountered and maybe similar to #47. This behavior was repeatable.

Troubleshooting consisted of trying different usb cables, different port on laptop, and different computers.

Easy work around for this issue but not the expected behavior.

mpmd_marlin_1.1.x.48.Encoded.mp4
@aegean-odyssey
Copy link
Owner

Oh my. Before disappointment sets in on me completely, let me thank you for your efforts and assesment. Alas, I think we've a serious bug here.

The timer, USB, and serial (LCD display), and endstop interrupts have priority over the stepper and temperature interrupts. When active, the usb transmits on a 1ms tick and flushes up to 64 bytes at a shot. On the filling the buffer side, if the machine fills the 128-byte transmit buffer, it backs off for 10ms when trying to add the 129th byte. So, it really shouldn't ever get "stuck".

The fault seems to be a watchdog timeout. With no red LED, it means that the timeout occurred while the CPU was happily executing code -- along the infinite loop variety.

The watchdog timeout is set at about 4 seconds which is plenty of time for the machine to do what it needs to do. So, I'm thinking that the problem is some kind of excessive interrupt activity.

I tracked down a similar crasher to an overrun error flag in the serial port for the LCD display. All indications pointed to a problem with the USB code, but it was the serial port interrupt that was in a tight loop.

This time I'm thinking that there is a problem with how the USB code connects and disconnects -- perhaps the code does not detect a dropped connection (it's suppose to). I'm not up on my USB protocols, but there seems to be a bit of negotiating and handshaking going on whenever USB establishes a connection -- I don't what's suppose to happen when a connection suddenly goes missing.

Anyway, I'll see what I can find.

@mulcmu
Copy link
Author

mulcmu commented Dec 31, 2020

A few other observations that might be relevant:

  • A .gcode file with 25 M503 commands and nothing else causes the reboot as well when printed from SD card.
  • The hesitation/reboot occurs when cable is plugged into a PC without power. So nothing on host side would have attempted to start establishing a connection.
  • Tested on 119r06, same reboot behavior observed.
  • I could not recreate M503 reboot on Marlin4MPMD - v1.3.3

@aegean-odyssey
Copy link
Owner

Thanks. Does seem like it's related to a full transmit (outgoing from printer) buffer. As I recall, Marlin4MPMD uses larger buffers, and a (what seemed at the time, heavy handed) throttling mechanism. And there were comments concerning USB frustration there, too. Long standing issue #5 also may be related. I'm pretty sure it's getting stuck or bogged down in a loop somehow. Now to find it.

Also, the USB interrupt stays in a tight loop until all pending (USB) interrupts are serviced -- very much not the way I would do things, but my "don't loop in the interrupt" rewrite could never establish a connection. So, perhaps the CPU can't respond fast enough otherwise.

Twenty-five (25) M503s -- generates a ton of output with very little on the receiver, and little in the way of a stepper interrupt. Good test. What a USB cable attached for this test?

@mulcmu
Copy link
Author

mulcmu commented Dec 31, 2020

Above testing was performed in 3 different states with the M503 output flood: USB unplugged from printer, USB connected to printer and linux laptop port closed, USB connected to printer and linux laptop port opened. Normal behavior with the USB unplugged or connected with port opened. Reboot was consistently encountered when cable connected with port closed.

I did some more testing with a Windows 10 laptop that seems to support your suggestion something is awry in the negotiating and handshaking going on whenever USB establishes a connection. This was the first time this laptop was connected to printer, so the STM32 Virtual COM port driver were not installed.
win 10, no drivers loaded
The reboot was not encountered in this condition with the M503 output flood. (USB cable connected to the windows 10 laptop and printer, port not opened, but the proper windows drivers not installed.) After installing the STM32 drivers from here the reboot behavior occurred on this laptop when cable was connected but the port was not opened in software.

@aegean-odyssey
Copy link
Owner

Thanks. Does seem like it's related to a full transmit (outgoing from printer) buffer, but I'm not sure. As I recall, Marlin4MPMD uses larger buffers, and a (what seemed at the time, heavy handed) throttling mechanism. I'll look at Marlin4MPMD for some insight. STM's USB driver and HAL in general are a bit of a mess (mostly because it's trying to be "all things to all people"), so it's where most of my suspicions lay.

In your Windows test, the string, "STM32 Virtual ComPort in FS Mode" (Other Devices), is sent by the firmware, I believe, so at least there's some initial handshaking on the USB port. I think things are pointing to the low-level USB driver is getting stuck, constantly interrupting until the watchdog timer reboots the machine. I suspect what sets up the situation is queuing up a USB packet to go out when the USB port is not really open, but I've not pinpointed the scenario in the code, yet.

@mulcmu
Copy link
Author

mulcmu commented Dec 31, 2020

Looks like this stackoverflow discussion might be a good lead.

@aegean-odyssey
Copy link
Owner

I think you're right.

The firmware all ready implements some of the strategies discussed and uses a later version of STM USB library with supposed fixes, but I'm a little leary about the "correctness" of STM's TxState flag. I added it some time ago and it seemed to work, but since it resides in the transmit (SysTick) interrupt, it can interrupt the USB interrupt, so TxState may not be valid when tested. I may need to follow the locking mechanism the HAL uses in its code. Or check the ep0_state instead.

I'm thinking, too, that I should add code at a higher level that closes and re-initializes the connection when it detects a problem. This would be when sending a byte, if the TX buffer is full, the firmware waits for 10ms then tries again. If the buffer is still full the byte is dropped. I could try to re-initialize the connection at this point. Only thing is, it may be too late. The machine is all ready "stuck".

aegean-odyssey added a commit that referenced this issue Jan 3, 2021
Specifically, if a full transmit buffer does not clear in 10ms, the
firmware clears the buffer and disables its USB output until the USB
host re-establishes a connection. This fix may resolve crashing
issues #47 and #48, and possibly issue #5.
@aegean-odyssey
Copy link
Owner

@mulcmu , I was able to reproduce the problem here using your "many M503s" test.

The sequence:

  • connect the printer with a terminal program (picoterm)
  • run print job of 50 M503 commands: success
  • disconnect (exit the terminal program)
  • run print job of 50 M503 commands: crash, reboot

Quite reproducible.

With the latest changes, I can repeatedly run "print job" connected or disconnected to the terminal program.

I wish I could identify a cause of the problem more specifically. It is particularly nagging, since the Marlin4MPMD firmware seems to avoid the issue. So as much as I'd like to blame STM's HAL/USB libraries, it's certainly not the entire story. I've been going over the Marlin4MPMD code and the USB implementations are very, very similar.

A few of the differences, Marlin4MPMD:

  • uses an earlier HAL/USB library (but there's seems to be no functional difference in USB code);
  • flushes its transmit buffer if the buffer does not empty completely in 1 second;
  • may be a bit slower, as it does quite a bit of looping and testing;
  • handles transmission and reception together, which may inadvertently synchronize the two;
  • does not implement a "port open" (e.g. DTR like signal) on the connection.

I think the main issue is that mpmd_marlin_1.1.x would pound away with output to a closed USB connection -- expecting USB 's error handling protocols to accommodate. And from a little reading over the past few days, that was not a good design choice on my part.

Hopefully, the problem's fixed.

@mulcmu
Copy link
Author

mulcmu commented Jan 15, 2021

The changes in the 119r15 release have resolved the observed hesitation and reboots. No other side usb related side effects observed after a week of usage.

@mulcmu mulcmu closed this as completed Jan 15, 2021
@aegean-odyssey
Copy link
Owner

Good to hear. Thanks for your help -- instrumental in resolving this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants