-
Notifications
You must be signed in to change notification settings - Fork 967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adafruit qtpy rp2040 target starts CDC/ACM USB only once - suspect flash timing #401
Conversation
@lurch the QT Py 2040 uses the winbond 25Q64JVXGIQ chip recommended by the pi foundation |
ItsyBitsy RP2040 enumerates on my Linux host PC with a clock divisor of 2. |
We have one other user who has trouble getting a QT Py RP2040 running CircuitPython to respond to the reset button, but the Itsy works. This sounds similar: https://forums.adafruit.com/viewtopic.php?f=60&t=178640. We will investigate that. Are you running a pico-sdk C program? |
"This ran fine until I hit the reset button. Then I had to hit it 6 times to get it running again." @dhalbert Yeah, it does sound like there might be some marginal timing somewhere that works with the flash chip on some QT Py RP2040 boards but not with the flash chip on other QT Py RP2040 boards? 🤷 |
Hi @dhalbert - Yes, in a pico-sdk C program. The issue manifests in pico-examples 'hello_world' but I usually test in CamelForth . Since it happens with pico-examples hello world (serial USB) I didn't investigate further wrt the code being run. Same code runs fine on Adafruit Feather RP2040, ItsyBitsy RP2040, Raspberry Pi Pico RP2040. Host PC Dell Optiplex about 10 yrs old, Debian Linux amd64. Suspect it may have slight trouble enumerating USB, as a second device doesn't get enumerated, if two CDC/ACM devices are inserted into the USB jack array on the PC chassis. Only /dev/ttyACM0 presents (not /dev/ttyACM1, for example). I don't remember this being a problem in the past, with (for example) the Arduino IDE. No issue, this morning, with enumerating QTPY RP2040 (/dev/ttyACM0) running pico-sdk C program, concurrent with enumeration (and conversation with) CP2104 Friend (/dev/ttyUSB0) (which is talking to STM32F40x Black Pill on the Black Pill's USART - concurrently. Both conversations CDC/ACM based (text interpreters running Forth). |
This prebuilt UF2 exhibits the issue (does not survive removal of power; runs fine upon UF2 upload, one time only): |
The same flash chip (maybe even from the same reel) is being used on the QT Py and the ItsyBitsy. I don't see any board configuration build differences, so this may be electrical, but it sounds like we might need to adjust some clock speed. |
See Errata RP2040-E5 in the RP2040 datasheet. Easiest fix is to plug the two different RP2040 devices into separate USB hubs. |
Thanks! The content below is optional reading; tangental to this PR. ;) I was able to and flash to both targets (Adafruit Feather RP2040, QTPy RP2040). There doesn't seem to be a preference for one over the other. Both will enumerate, and I'll have both /dev/ttyACM0 and /dev/ttyACM1 available for interactive sessions (minicom, seyon, hyperterm &c.) The one thing I can't do is 'claim' the interface (by invoking minicom, seyon, hyperterm &c.) and then try to enumerate the other target board (by plugging in its USB cable). Both cables must be plugged in, and enumeration verified, before proceeding further. Under those circumstances, I can have concurrent sessions on the two /dev/ttyACM devices, without a special USB hub (I don't own any external USB hubs). |
@hathach has been maintaining these board defs. I found 4 was a reliable divisor for CircuitPython. It needs to account for all of the different command speed limits (not just the read speed) and 62.5 mhz can end up over the common limit of 60mhz for commands. |
Yeah, I also noticed this issue as well when using qtpy as picoprobe. I have also tried to change the
Currently I have no ideas why it failed to run, further investigation is needed. |
Something to look into is the default output driver strength. The flash part on my QT Py RP2040 is a Winbond Q64JVXGIQ, which according to Winbond's Rev K datasheet defaults to 25% driver strength on read operations (see pg. 17, section 7.1.6). I'm going to give kicking it up to 50, 75, and 100% a try and will report back. |
Something else to note, I can get my QT Py RP2040 to rock solid operation by switching to boot2_generic. That's not ideal as there's a pretty big performance hit, but it does point to a problem in boot2_w25q080. Since I can't bond a probe to SWCLK/SWDIO (those are some small pins), I'm resorting to using the LED to find where its hanging up. |
Well that's weird. I was able to get it to work reliably with boot2_generic_03, but after attempting to set read driver strength I seem to have bricked the flash. Guess I'll try bonding to those tiny pins... |
There are a lot of moving parts between changing the SPI speed and USB operations failing. It could be some flash signal integrity issue, or it could be the different cache miss delay bumping against some hidden timing issue in the USB stack or elsewhere. Are you able to reproduce this with any simpler applications (like |
Like @tannewt said 62.5 MHz is quite high for some flash operations (particularly |
A closer reading of the Winbond datasheet reveals that certain SR bits are marked somewhat cryptically "Volatile / Non-Volatile Writable". What this means in practice is that on reset, these bits are copied from flash into the SR flip-flops. Using two different instructions, Write Enable (06h) or Write Enable for Volatile SR (50h), the programmer can permanently alter the SR bit in flash or temporarily (until the next reset) alter the bit in the flip-flop it's been copied into, respectively. |
I don't know who 'you' is. ;) hello_world in pico-examples exhibits the behavior - that's how I knew not to tear my own code apart looking for a flaw. i.e. pico-examples/hello_world/usb/hello_usb.c |
Sweet success! By setting flash read driver strength to 75% in non-volatile Status Register 3 bits |
Here is the utility I cobbled together to update flash read driver strength: |
We have been seeing some problems with long crystal oscillator startup time on a few samples of Qt Py RP2040. I'm not sure that's related to the problem you're seeing, but try changing this line:
to xosc_hw->startup = startup_delay * 32; // or even * 64 If you are willing to set at least one Qt Py RP2040 that was acting up back to the stock drive strength, and then trying the above, that would be an interesting test. But I don't know why the drive strength should have anything to do with flaky clocking. |
Will do.
Could be we've got more than one problem in play. What I'm seeing looks more like a signal integrity problem in XIP mode, so adding drive (25% -> 75%) makes sense as a remedy. |
What we actually saw on some Saleae traces was the the SCK signal to the flash was irregular and too fast after the xosc was started and used. So we surmised that the crystal oscillator was having trouble starting up and lengthened the startup delay experimentally. We haven't yet looked at analog traces. |
Stranger and stranger. The QT Py that was failing reliably for me now works with flash read drive at 25% and no additional delay in |
Hmm! I hadn't looked in the bootrom code and (EDIT) Maybe it's simply the extra delay that's being added in We are using a clock divider of 4, and we have our own stage2 boot, templatized and written in C: I'd be very interested in a simple program that just dumped all the NVM parameters in the flash chip. I have several boards that work fine with one date code on the Winbond chip, and one that does not with another date code. I wonder if they have different factory settings. (I think there is also a difference in the datecodes of the RP2040 chips, but they are both B1) Date code on QT Py and other boards without problems: Date code on QT Py with problems: |
Fascinating info on the date codes. I have QT Py's stuffed with 2048 and 2051 date coded flash on hand.
I'll get on it later today. |
So just the unique id is different - oh well! Thanks for checking, in any case! |
I've gone over bootrom and SDK initialization code and have found that initial Also significant is that by the time we're running |
Regarding lack of initialization of |
@Wren6991 you may have comments on initial clocking |
One more data point: The lot code on the Winbond memory is similar to the lot code specified above as 'good' (so these lot codes may be a red herring): When I adjusted the CLK_DIV to 4 as suggested by @wa1tnr , I was able to consistently reset the board once power was applied, but initial startup from USB power was inconsistent. Applying applying the startup_delay *32 as suggested by @dhalbert , the initial power-on seems to have been corrected as well. Let me know if there if there is any more needed information I can help to provide. |
Did you find this out by reading the register? (Since it's not in the datasheet.) |
This makes sense given the symptoms. Our Saleae traces on the SCK of the the flash chip show it clocked quite slowly for a while, I assume while stage2 is being read. Then the xosc is turned on in the pre- If instead the bootloader goes into USB mode, either because the boot button ws pressed or there was no valid stage, then xosc is turned on and the execution remains in the boot ROM. Any xosc startup issue would probably just delay USB operation briefly. |
See section datasheet 2.16.3, "The 1ms default is sufficient for the RP2040 reference design..." which I verified by setting up a GPIO as a trigger and measuring it using my trusty Rigol. An interesting characteristic of I've brought The way that clock initialization works between
Yes, I believe that helps drain more mystery out of this bug. The bottom line is that some QT Py's have a defect. It may be a wonky crystal or a layout problem. Your fix (increase |
One more thing, I'm convinced that the Winbond flash part is not part of the problem. What's coming out of the PLL is such a mess that it's unreasonable to expect anything using it as a clock to work. |
pico-sdk/src/rp2_common/boot_stage2/boot2_is25lp080.S Lines 100 to 102 in afc10f3
|
@eightycc Figure 117 |
@daveythacher Looked at your references, but I'm not grokking what your driving at? Figure 117 of the rp2040 datasheet shows a 2:1 ratio ( In the Infineon reference, the pertinent reference for the problem at hand is section 6.3.1 Parasitic RC or LC Oscillation which matches the problem I've observed with the failing QT Py board closely enough that Adafruit engineers should take note. |
|
When I sent in the PR, I just figured it was early in development - and I got very lucky on a wild guess.
@tannewt mentioned [the subject matter of] what you just wrote, it as well: Thanks! |
#457 modifies all of the Adafruit board configs - should this PR do the same? 🤷 |
|
@daveythacher Mea culpa. After carefully examining the SSI initialization code, I can see that the SSI baud rate will be properly handled through a transition from SDK code to bootrom code and back again. So, why was I seeing a failure with a divider of 2 vs. 4? Bad test hygiene on my part, i.e., building with a locally patched copy of pico-sdk that was picking up the generic stage 2 boot. Going back to a clean checkout of pico-sdk, I can confirm that the QT Py RP2040 works with (1) #457, (2) Humbly, I conclude that this pull is not necessary. Apologies to everyone. |
Changed:
Reference system Linux host PC will not enumerate /dev/ttyACM0
(or related device names) except upon UF2 firmware upload.
UF2 upload is fine and program will run once, until power is removed.
Then, never again.
This patch is meant to allow the (end-user authored) firmware/program
to run repeatedly, including cycling power to the target board.
First (SPI clock divisor) value tried was '4' (from '2').
No other experiments done, in search of the optimal clock divisor.
Not at all sure what the clock divisor 'does'.
Simple guess:
Going from 2, to 4 would 'halve' the clock frequency (divide it
by 2).
Halving the frequency would cause the subsystem to evolve
all events more slowly, which apparently helps in enumeration
of the CDC/ACM device to the host PC.