Raspberry Pi Pico hanging when writing microcontroller.nvm #4867

DigitalMakerCIC · 2021-06-07T14:11:00Z

Firmware

Adafruit CircuitPython 6.3.0 on 2021-06-01; Raspberry Pi Pico with rp2040

REPL

>>> import microcontroller
>>> microcontroller.nvm[0:3]
bytearray(b'\xff\xff\xff')
>>> microcontroller.nvm[0:3] = b'\x00\x01\x02'

Behavior

The system hangs, no error messages

Description

When it hangs, have to unplug the Pico and plug it back in.

DavePutz · 2021-06-07T20:11:08Z

I tried this test on CircuitPython 7.0.0-alpha.2 and it worked correctly.

Neradoc · 2021-06-07T21:04:13Z

I have done a few tests on this, I can definitely reproduce, but it's very inconsistent.
On a raspberry pico freshly erased it seems to happen after a soft reset or a reload or I modify the drive, maybe more than once, maybe while in the REPL, maybe when writing to the same byte. Something like this seems to cause the issue regularly right now on 7:

plug the board
connect to the REPL
press enter (get the >>> prompt)
open, modify, save code.py
type in the REPL:
>>> import microcontroller
>>> microcontroller.nvm[0] = 0
>>> microcontroller.nvm[0] = 0
if it works ok, repeat from step 4

On a (not-freshly-erased) QT PY 2040 I get this without a code.py, simply connecting to the REPL and typing.

> tio /dev/cu.usbmodem144443121
[22:57:28] tio v1.32
[22:57:28] Press ctrl-a q to quit
[22:57:28] Connected
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.

Code done running.

Press any key to enter the REPL. Use CTRL-D to reload.

Adafruit CircuitPython 7.0.0-alpha.3-18-gc4f685aab on 2021-06-07; Adafruit QT Py RP2040 with rp2040
>>> import microcontroller
>>> microcontroller.nvm[0] = 0

(hangs)

eightycc · 2021-06-08T01:07:02Z

This looks like this bug: raspberrypi/pico-sdk#457 that @dhalbert squashed.

This pull fixes a clock startup glitch specific to a subset of QT Py RP2040s. Flash access takes a trip back through SDK initialization code to get XIP going again (it needs to leave XIP to do the flash write) which also restarts xosc, so that's where it's probably happening.

dhalbert · 2021-06-08T13:30:10Z

The xosc fix I did is included in 7.0.0-alpha.3 and later, so I'm not sure if raspberrypi/pico-sdk#457 is still the reason. @Neradoc, was this a local build of yours? If you git submodule update --initd after pulling, then it would be included.

You could try changing the multiplier from 32 to 64 in ports/raspberrypi/sdk/src/rp2_common/hardware_xosc/xosc.c, in xosc_init(). 32 seemed to be more than adequate, but in the fix I sent to raspberrypi/pico-sdk, I made it 64 for safety.

eightycc · 2021-06-08T17:18:12Z

Here are my results testing with 7.0.0-alpha.3 (verified that raspberrypi/pico-sdk#457 was in place):

Board	Result
Feather RP2040	passed
QtPY RP2040 with clock bug	failed
QtPY RP2040 without clock bug	failed
Raspberry Pi Pico	passed

So, raspberrypi/pico-sdk#457 does not fix this bug.

rbrian · 2021-08-02T00:41:40Z

I can confirm that something similar happens on the macropad with alpha 5

FoamyGuy · 2021-11-27T01:05:26Z

I got a backtrace of this issue from a Raspberry Pi Pico using JLink and GDB. I'm not entirely sure what it means or what to do with the information I found but here is a paste of the backtrace in-case some else a little bit more knowledgeable knows what is going on "under the hood here".

Upon crashing the first backtrace I saw was this:

Program received signal SIGTRAP, Trace/breakpoint trap.
data_cpy () at sdk/src/rp2_common/pico_standard_link/crt0.S:275
275         blo data_cpy_loop
(gdb) bt
#0  data_cpy () at sdk/src/rp2_common/pico_standard_link/crt0.S:275
#1  <signal handler called>
#2  0x10029ae8 in supervisor_tick () at ../../supervisor/shared/tick.c:97
#3  0x1003b3c4 in hardware_alarm_irq_handler () at sdk/src/rp2_common/hardware_timer/timer.c:140
#4  <signal handler called>
#5  0x00001786 in ?? ()
#6  0x0000184e in ?? ()
--Type <RET> for more, q to quit, c to continue without paging--
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

From there I used GDB command to make it continue and received a second backtrace pointing to the next line down inside of crt0.S:

0x10000250 in data_cpy_table () at sdk/src/rp2_common/pico_standard_link/crt0.S:276
276         bx lr

(gdb) bt
#0  0x10000250 in data_cpy_table () at sdk/src/rp2_common/pico_standard_link/crt0.S:276
#1  <signal handler called>
#2  0x10029ae8 in supervisor_tick () at ../../supervisor/shared/tick.c:97
#3  0x1003b3c4 in hardware_alarm_irq_handler () at sdk/src/rp2_common/hardware_timer/timer.c:140
#4  <signal handler called>
#5  0x00001786 in ?? ()
#6  0x0000184e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

From there I again used GDB command to make it continue and ctrl-C again, and ended up a third different backtrace:

_entry_point () at sdk/src/rp2_common/pico_standard_link/crt0.S:210
210         bx r2
(gdb) bt
#0  _entry_point () at sdk/src/rp2_common/pico_standard_link/crt0.S:210

I repeated GDB continue and ctrl-c a few more times and got a few more stack traces until eventually finding one that doesn't point inside of crt0.S any more. (However I'm not entirely sure if these ones are as relevant to the problem. This is my first time using GDB debugging so I am wandering around a bit in the dark so to speak).

(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
hold_non_core0_in_bootrom () at sdk/src/rp2_common/pico_standard_link/crt0.S:317
317         bl rom_func_lookup
(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
_reset_handler () at sdk/src/rp2_common/pico_standard_link/crt0.S:243
243         movs r0, #0
(gdb) bt
#0  _reset_handler () at sdk/src/rp2_common/pico_standard_link/crt0.S:243
#1  <signal handler called>
#2  0x10029ae8 in supervisor_tick () at ../../supervisor/shared/tick.c:97
#3  0x1003b3c4 in hardware_alarm_irq_handler () at sdk/src/rp2_common/hardware_timer/timer.c:140
#4  <signal handler called>
#5  0x00001786 in ?? ()
#6  0x0000184e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
platform_entry () at sdk/src/rp2_common/pico_standard_link/crt0.S:259
259         blx r1
(gdb) bt
#0  platform_entry () at sdk/src/rp2_common/pico_standard_link/crt0.S:259
#1  <signal handler called>
#2  0x10029ae8 in supervisor_tick () at ../../supervisor/shared/tick.c:97
#3  0x1003b3c4 in hardware_alarm_irq_handler () at sdk/src/rp2_common/hardware_timer/timer.c:140
#4  <signal handler called>
#5  0x00001786 in ?? ()
#6  0x0000184e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
0x100001e0 in binary_info_header ()
(gdb) bt
#0  0x100001e0 in binary_info_header ()
#1  <signal handler called>
#2  0x10029ae8 in supervisor_tick () at ../../supervisor/shared/tick.c:97
#3  0x1003b3c4 in hardware_alarm_irq_handler () at sdk/src/rp2_common/hardware_timer/timer.c:140
#4  <signal handler called>
#5  0x00001786 in ?? ()
#6  0x0000184e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

I found the file that it's pointing to inside the circuitpython repo at:

<repo root>/ports/raspberrypi/sdk/src/rp2_common/pico_standard_link/crt0.S

tannewt · 2021-11-29T23:16:15Z

I think the issue may be that we shouldn't load any code when writing to flash. So, interrupts should be off during the write. These signal handlers show that this isn't the case.

Since the stack is corrupt (5 and 6 being ??) I'd suggest adding printfs around the hang to determine where the flash write code gets stuck.

FoamyGuy · 2021-11-30T02:45:59Z

I added some print statements inside shared-binding/nvm/ByteArray.c: https://github.com/FoamyGuy/circuitpython/blob/21eb5758e794a7833db37ca8c0819dd06a94c090/shared-bindings/nvm/ByteArray.c#L100-L144

It doesn't always crash on the same line. Here is the output from several test runs showing where the last print that occurred before the crash.

inside type if 119
after big if 139
after check binary_get_size 135
after set src_items 131
after check binary_get_size 135
after check binary_get_size 135
after set src_items 131
after set src_items 131
after check binary_get_size 135
after check binary_get_size 135

Each line represents a different test. I was using the sample code from the nvm docs for the tests: https://circuitpython.readthedocs.io/en/latest/shared-bindings/nvm/index.html?highlight=nvm#nvm.ByteArray

It also does not fail every time. Sometimes the write succeeds and all of my print statements came out. @Neradoc observation about editing code.py does seem consistent. A few times I had several successes in a row and every one it failed on the next attempt to write after saving a change to code.py. In my case code.py is empty except for a single comment I was modifying.

From these results and Scotts ideas I think maybe we need to wrap this section with disabling the interrupts and then re-enabling after:

circuitpython/shared-bindings/nvm/ByteArray.c

Lines 111 to 131 in f8493db

    
           if (mp_obj_is_type(value, &mp_type_array) || 
        
               mp_obj_is_type(value, &mp_type_bytearray) || 
        
               mp_obj_is_type(value, &mp_type_memoryview) || 
        
               mp_obj_is_type(value, &mp_type_bytes)) { 
        
               mp_buffer_info_t bufinfo; 
        
               mp_get_buffer_raise(value, &bufinfo, MP_BUFFER_READ); 
        
               if (bufinfo.len != src_len) { 
        
                   mp_raise_ValueError(translate("Slice and value different lengths.")); 
        
               } 
        
               src_len = bufinfo.len; 
        
               src_items = bufinfo.buf; 
        
               if (1 != mp_binary_get_size('@', bufinfo.typecode, NULL)) { 
        
                   mp_raise_ValueError(translate("Array values should be single bytes.")); 
        
               } 
        
           } else { 
        
               mp_raise_NotImplementedError(translate("array/bytes required on right side")); 
        
           } 
        
           if (!common_hal_nvm_bytearray_set_bytes(self, slice.start, src_items, src_len)) { 
        
               mp_raise_RuntimeError(translate("Unable to write to nvm.")); 
        
           }

Is there an example somewhere else in the core where the interrupts are disabled and re-enabled? I can try it out if I have some code that does this to reference.

tannewt · 2021-11-30T18:46:52Z

Check this out:

circuitpython/ports/raspberrypi/supervisor/internal_flash.c

Lines 134 to 138 in f8493db

    
           // Make sure we don't have an interrupt while we do flash operations. 
        
           common_hal_mcu_disable_interrupts(); 
        
           flash_range_erase(RESERVED_FLASH + sector_offset, SECTOR_SIZE); 
        
           flash_range_program(RESERVED_FLASH + sector_offset, _cache, SECTOR_SIZE); 
        
           common_hal_mcu_enable_interrupts();

You'll want to modify the port specific code in the common-hal folder, not the code in shared-bindings.

dhalbert · 2021-12-05T14:23:22Z

Fixed by #5663. Please try, and let us know. Note this fix is not in "Absolute Newest", because of git branches. Try the artifacts from https://github.com/adafruit/circuitpython/actions/runs/1540091742#artifacts.

DigitalMakerCIC added the bug label Jun 7, 2021

tannewt added the rp2040 Raspberry Pi RP2040 label Jun 7, 2021

tannewt modified the milestones: 6.x.x - Bug Fixes, 7.0.0 Jun 7, 2021

tannewt modified the milestones: 7.0.0, 7.x.x Jul 15, 2021

FoamyGuy mentioned this issue Dec 4, 2021

disable interrupts inside of ports raspberrypi common hal #5663

Merged

dhalbert closed this as completed Dec 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raspberry Pi Pico hanging when writing microcontroller.nvm #4867

Raspberry Pi Pico hanging when writing microcontroller.nvm #4867

DigitalMakerCIC commented Jun 7, 2021

DavePutz commented Jun 7, 2021

Neradoc commented Jun 7, 2021

eightycc commented Jun 8, 2021 •

edited

Loading

dhalbert commented Jun 8, 2021 •

edited

Loading

eightycc commented Jun 8, 2021

rbrian commented Aug 2, 2021

FoamyGuy commented Nov 27, 2021

tannewt commented Nov 29, 2021

FoamyGuy commented Nov 30, 2021

tannewt commented Nov 30, 2021

dhalbert commented Dec 5, 2021

Raspberry Pi Pico hanging when writing microcontroller.nvm #4867

Raspberry Pi Pico hanging when writing microcontroller.nvm #4867

Comments

DigitalMakerCIC commented Jun 7, 2021

DavePutz commented Jun 7, 2021

Neradoc commented Jun 7, 2021

eightycc commented Jun 8, 2021 • edited Loading

dhalbert commented Jun 8, 2021 • edited Loading

eightycc commented Jun 8, 2021

rbrian commented Aug 2, 2021

FoamyGuy commented Nov 27, 2021

tannewt commented Nov 29, 2021

FoamyGuy commented Nov 30, 2021

tannewt commented Nov 30, 2021

dhalbert commented Dec 5, 2021

eightycc commented Jun 8, 2021 •

edited

Loading

dhalbert commented Jun 8, 2021 •

edited

Loading