Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raspberry Pi Pico hanging when writing microcontroller.nvm #4867

Closed
DigitalMakerCIC opened this issue Jun 7, 2021 · 11 comments
Closed

Raspberry Pi Pico hanging when writing microcontroller.nvm #4867

DigitalMakerCIC opened this issue Jun 7, 2021 · 11 comments
Labels
bug rp2040 Raspberry Pi RP2040
Milestone

Comments

@DigitalMakerCIC
Copy link

Firmware

Adafruit CircuitPython 6.3.0 on 2021-06-01; Raspberry Pi Pico with rp2040

REPL

>>> import microcontroller
>>> microcontroller.nvm[0:3]
bytearray(b'\xff\xff\xff')
>>> microcontroller.nvm[0:3] = b'\x00\x01\x02'

Behavior

The system hangs, no error messages

Description

When it hangs, have to unplug the Pico and plug it back in.

@DavePutz
Copy link
Collaborator

DavePutz commented Jun 7, 2021

I tried this test on CircuitPython 7.0.0-alpha.2 and it worked correctly.

@Neradoc
Copy link

Neradoc commented Jun 7, 2021

I have done a few tests on this, I can definitely reproduce, but it's very inconsistent.
On a raspberry pico freshly erased it seems to happen after a soft reset or a reload or I modify the drive, maybe more than once, maybe while in the REPL, maybe when writing to the same byte. Something like this seems to cause the issue regularly right now on 7:

  1. plug the board
  2. connect to the REPL
  3. press enter (get the >>> prompt)
  4. open, modify, save code.py
  5. type in the REPL:
  6. >>> import microcontroller
  7. >>> microcontroller.nvm[0] = 0
  8. >>> microcontroller.nvm[0] = 0
  9. if it works ok, repeat from step 4

On a (not-freshly-erased) QT PY 2040 I get this without a code.py, simply connecting to the REPL and typing.

> tio /dev/cu.usbmodem144443121
[22:57:28] tio v1.32
[22:57:28] Press ctrl-a q to quit
[22:57:28] Connected
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.

Code done running.

Press any key to enter the REPL. Use CTRL-D to reload.

Adafruit CircuitPython 7.0.0-alpha.3-18-gc4f685aab on 2021-06-07; Adafruit QT Py RP2040 with rp2040
>>> import microcontroller
>>> microcontroller.nvm[0] = 0

(hangs)

@tannewt tannewt added the rp2040 Raspberry Pi RP2040 label Jun 7, 2021
@tannewt tannewt modified the milestones: 6.x.x - Bug Fixes, 7.0.0 Jun 7, 2021
@eightycc
Copy link

eightycc commented Jun 8, 2021

This looks like this bug: raspberrypi/pico-sdk#457 that @dhalbert squashed.

This pull fixes a clock startup glitch specific to a subset of QT Py RP2040s. Flash access takes a trip back through SDK initialization code to get XIP going again (it needs to leave XIP to do the flash write) which also restarts xosc, so that's where it's probably happening.

@dhalbert
Copy link
Collaborator

dhalbert commented Jun 8, 2021

The xosc fix I did is included in 7.0.0-alpha.3 and later, so I'm not sure if raspberrypi/pico-sdk#457 is still the reason. @Neradoc, was this a local build of yours? If you git submodule update --initd after pulling, then it would be included.

You could try changing the multiplier from 32 to 64 in ports/raspberrypi/sdk/src/rp2_common/hardware_xosc/xosc.c, in xosc_init(). 32 seemed to be more than adequate, but in the fix I sent to raspberrypi/pico-sdk, I made it 64 for safety.

@eightycc
Copy link

eightycc commented Jun 8, 2021

Here are my results testing with 7.0.0-alpha.3 (verified that raspberrypi/pico-sdk#457 was in place):

Board Result
Feather RP2040 passed
QtPY RP2040 with clock bug failed
QtPY RP2040 without clock bug failed
Raspberry Pi Pico passed

So, raspberrypi/pico-sdk#457 does not fix this bug.

@tannewt tannewt modified the milestones: 7.0.0, 7.x.x Jul 15, 2021
@rbrian
Copy link

rbrian commented Aug 2, 2021

I can confirm that something similar happens on the macropad with alpha 5

@FoamyGuy
Copy link
Collaborator

I got a backtrace of this issue from a Raspberry Pi Pico using JLink and GDB. I'm not entirely sure what it means or what to do with the information I found but here is a paste of the backtrace in-case some else a little bit more knowledgeable knows what is going on "under the hood here".

Upon crashing the first backtrace I saw was this:

Program received signal SIGTRAP, Trace/breakpoint trap.
data_cpy () at sdk/src/rp2_common/pico_standard_link/crt0.S:275
275         blo data_cpy_loop
(gdb) bt
#0  data_cpy () at sdk/src/rp2_common/pico_standard_link/crt0.S:275
#1  <signal handler called>
#2  0x10029ae8 in supervisor_tick () at ../../supervisor/shared/tick.c:97
#3  0x1003b3c4 in hardware_alarm_irq_handler () at sdk/src/rp2_common/hardware_timer/timer.c:140
#4  <signal handler called>
#5  0x00001786 in ?? ()
#6  0x0000184e in ?? ()
--Type <RET> for more, q to quit, c to continue without paging--
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

From there I used GDB command to make it continue and received a second backtrace pointing to the next line down inside of crt0.S:

0x10000250 in data_cpy_table () at sdk/src/rp2_common/pico_standard_link/crt0.S:276
276         bx lr

(gdb) bt
#0  0x10000250 in data_cpy_table () at sdk/src/rp2_common/pico_standard_link/crt0.S:276
#1  <signal handler called>
#2  0x10029ae8 in supervisor_tick () at ../../supervisor/shared/tick.c:97
#3  0x1003b3c4 in hardware_alarm_irq_handler () at sdk/src/rp2_common/hardware_timer/timer.c:140
#4  <signal handler called>
#5  0x00001786 in ?? ()
#6  0x0000184e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

From there I again used GDB command to make it continue and ctrl-C again, and ended up a third different backtrace:

_entry_point () at sdk/src/rp2_common/pico_standard_link/crt0.S:210
210         bx r2
(gdb) bt
#0  _entry_point () at sdk/src/rp2_common/pico_standard_link/crt0.S:210

I repeated GDB continue and ctrl-c a few more times and got a few more stack traces until eventually finding one that doesn't point inside of crt0.S any more. (However I'm not entirely sure if these ones are as relevant to the problem. This is my first time using GDB debugging so I am wandering around a bit in the dark so to speak).

(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
hold_non_core0_in_bootrom () at sdk/src/rp2_common/pico_standard_link/crt0.S:317
317         bl rom_func_lookup
(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
_reset_handler () at sdk/src/rp2_common/pico_standard_link/crt0.S:243
243         movs r0, #0
(gdb) bt
#0  _reset_handler () at sdk/src/rp2_common/pico_standard_link/crt0.S:243
#1  <signal handler called>
#2  0x10029ae8 in supervisor_tick () at ../../supervisor/shared/tick.c:97
#3  0x1003b3c4 in hardware_alarm_irq_handler () at sdk/src/rp2_common/hardware_timer/timer.c:140
#4  <signal handler called>
#5  0x00001786 in ?? ()
#6  0x0000184e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
platform_entry () at sdk/src/rp2_common/pico_standard_link/crt0.S:259
259         blx r1
(gdb) bt
#0  platform_entry () at sdk/src/rp2_common/pico_standard_link/crt0.S:259
#1  <signal handler called>
#2  0x10029ae8 in supervisor_tick () at ../../supervisor/shared/tick.c:97
#3  0x1003b3c4 in hardware_alarm_irq_handler () at sdk/src/rp2_common/hardware_timer/timer.c:140
#4  <signal handler called>
#5  0x00001786 in ?? ()
#6  0x0000184e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
0x100001e0 in binary_info_header ()
(gdb) bt
#0  0x100001e0 in binary_info_header ()
#1  <signal handler called>
#2  0x10029ae8 in supervisor_tick () at ../../supervisor/shared/tick.c:97
#3  0x1003b3c4 in hardware_alarm_irq_handler () at sdk/src/rp2_common/hardware_timer/timer.c:140
#4  <signal handler called>
#5  0x00001786 in ?? ()
#6  0x0000184e in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

I found the file that it's pointing to inside the circuitpython repo at:

<repo root>/ports/raspberrypi/sdk/src/rp2_common/pico_standard_link/crt0.S

@tannewt
Copy link
Member

tannewt commented Nov 29, 2021

I think the issue may be that we shouldn't load any code when writing to flash. So, interrupts should be off during the write. These signal handlers show that this isn't the case.

Since the stack is corrupt (5 and 6 being ??) I'd suggest adding printfs around the hang to determine where the flash write code gets stuck.

@FoamyGuy
Copy link
Collaborator

I added some print statements inside shared-binding/nvm/ByteArray.c: https://github.com/FoamyGuy/circuitpython/blob/21eb5758e794a7833db37ca8c0819dd06a94c090/shared-bindings/nvm/ByteArray.c#L100-L144

It doesn't always crash on the same line. Here is the output from several test runs showing where the last print that occurred before the crash.

inside type if 119
after big if 139
after check binary_get_size 135
after set src_items 131
after check binary_get_size 135
after check binary_get_size 135
after set src_items 131
after set src_items 131
after check binary_get_size 135
after check binary_get_size 135

Each line represents a different test. I was using the sample code from the nvm docs for the tests: https://circuitpython.readthedocs.io/en/latest/shared-bindings/nvm/index.html?highlight=nvm#nvm.ByteArray

It also does not fail every time. Sometimes the write succeeds and all of my print statements came out. @Neradoc observation about editing code.py does seem consistent. A few times I had several successes in a row and every one it failed on the next attempt to write after saving a change to code.py. In my case code.py is empty except for a single comment I was modifying.

From these results and Scotts ideas I think maybe we need to wrap this section with disabling the interrupts and then re-enabling after:

if (mp_obj_is_type(value, &mp_type_array) ||
mp_obj_is_type(value, &mp_type_bytearray) ||
mp_obj_is_type(value, &mp_type_memoryview) ||
mp_obj_is_type(value, &mp_type_bytes)) {
mp_buffer_info_t bufinfo;
mp_get_buffer_raise(value, &bufinfo, MP_BUFFER_READ);
if (bufinfo.len != src_len) {
mp_raise_ValueError(translate("Slice and value different lengths."));
}
src_len = bufinfo.len;
src_items = bufinfo.buf;
if (1 != mp_binary_get_size('@', bufinfo.typecode, NULL)) {
mp_raise_ValueError(translate("Array values should be single bytes."));
}
} else {
mp_raise_NotImplementedError(translate("array/bytes required on right side"));
}
if (!common_hal_nvm_bytearray_set_bytes(self, slice.start, src_items, src_len)) {
mp_raise_RuntimeError(translate("Unable to write to nvm."));
}

Is there an example somewhere else in the core where the interrupts are disabled and re-enabled? I can try it out if I have some code that does this to reference.

@tannewt
Copy link
Member

tannewt commented Nov 30, 2021

Check this out:

// Make sure we don't have an interrupt while we do flash operations.
common_hal_mcu_disable_interrupts();
flash_range_erase(RESERVED_FLASH + sector_offset, SECTOR_SIZE);
flash_range_program(RESERVED_FLASH + sector_offset, _cache, SECTOR_SIZE);
common_hal_mcu_enable_interrupts();

You'll want to modify the port specific code in the common-hal folder, not the code in shared-bindings.

@dhalbert
Copy link
Collaborator

dhalbert commented Dec 5, 2021

Fixed by #5663. Please try, and let us know. Note this fix is not in "Absolute Newest", because of git branches. Try the artifacts from https://github.com/adafruit/circuitpython/actions/runs/1540091742#artifacts.

@dhalbert dhalbert closed this as completed Dec 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug rp2040 Raspberry Pi RP2040
Projects
None yet
Development

No branches or pull requests

8 participants