-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
usb: Fix USB device library transfer thread safety #896
usb: Fix USB device library transfer thread safety #896
Conversation
f9a85d8
to
3e421b7
Compare
Previous version of this PR had a commit which added locking in the write path, but this doesn't actually solve the problem of writing to a Buffer from multiple threads concurrently. Added a note in the README about this, instead. (@mischif this part isn't related to the bug you submitted, as you're only writing from one producer. Just something I noticed in testing.) |
@mischif are you able to test this fix? If you get a recent MicroPython nightly that includes the other fix, then you should be able to install the usb-device package directly from this branch with:
|
(For the record, while the code in the issue replicates the bug it's not exactly my use case; I do want to write from both threads but I want to write to ttyACM1 on core 0 and ttyACM0 on core 1) I've updated my Pico to run 1.24.0 preview 103, the above usb-device package and the usb-device-cdc code from #886 and can confirm I no longer see either of the errors I included in #874, most importantly the AssertionError that was very common. I have seen this error during testing a couple times, but it doesn't seem to affect my code:
|
@projectgus OK, the initial bug which led me to create the issue seems pretty much solved now, but now I have what I think is a related issue. My code looks something like this: async def api_hdlr():
[...]
elif SCRATCH[2] == CFG_READ:
with open(CONFIG.path, "rb") as f:
SCRATCH[:256] = f.read(256)
print(SCRATCH_DATA[:256], file=SERIAL_TWO, end="\r\n")
[...] The function is run in a loop on a secondary thread; I want to read the whole file but the issue shows with the first 256 bytes. The print statement seems to trigger an exception at various lines but the result is the same:
Let me know if you want me to split this into a new issue, but it may not be as easy to recreate this bug standalone. |
This one is hopefully simple: the stack size of the secondary thread probably isn't big enough. The default stack size for the secondary thread on rp2 is 4096 bytes. You can increase it by calling The transfer callback pushes onto the stack, and async coroutines may push more data on the stack, so when a callback is triggered while a coro is running then the total stack usage is peaking. |
I haven't seen a RuntimeError since increasing the stack size so I think that was the issue; now I have another issue that may or may not be related: async def api_hdlr():
[...]
elif SCRATCH[2] == CFG_READ:
chunk_size = 256
with open(CONFIG.path, "rb") as f:
cfg_left = f.seek(0, 2)
f.seek(0)
while cfg_left:
SCRATCH[:chunk_size] = f.read(chunk_size)
SERIAL_TWO.write(SCRATCH[:chunk_size])
cfg_left -= chunk_size
if cfg_left < chunk_size:
chunk_size = cfg_left
await sleep_ms(20)
[...] This only writes ~70% of the file out over serial, I get 55034 bytes of a 77341 byte file; reducing the chunk size to as low as 64 or increasing the sleep time to 750ms so the buffer has more time to flush doesn't change the amount written, and calling Is there some other way to flush the serial buffer, or figure out what's going on? |
@mischif Not sure about this one, unfortunately. One thing I'd note is that if the timeout is 0 then you need to check the result of |
@projectgus Turns out the incomplete writes were because of an interaction with my testing code; after fixing that everything works great so I think this issue is good to close. |
if callable(cb): | ||
cb(ep_addr, result, xferred_bytes) | ||
try: | ||
if callable(cb): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check for callable
really necessary? Would a check against None
be sufficient? That would be more efficient.
Then maybe even:
if cb:
self._cb_thread = get_ident()
self._cb_ep = ep_addr # Track while callback is running
try:
cb(...)
finally:
self._cb_ep = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of cb
should be True
if a transfer is pending but no callback is set (so it can be tested for truthiness elsewhere), or will be the callback function otherwise.
Have put in a comment to make this clearer.
(As an aside, this is the kind of thing that typing annotations really help with!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't the cb
be None
in some cases, due to the get(ep_addr, None)
above?
I was also thinking that the try/finally (which is expensive) only needs to be done if the cb is going to be called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't the cb be None in some cases, due to the get(ep_addr, None) above?
If it's None here then it's some kind of semi-error case because TinyUSB is giving us a callback for a transfer we didn't initiate. It might happen if the device configuration is being changed, but it's not something which happens normally..
I was also thinking that the try/finally (which is expensive) only needs to be done if the cb is going to be called.
Oh,right. I guess there's a version of this which is:
if callable(cb):
try:
...
I think 90% of cases (and especially performance critical cases) there will be a callback here, though.
I'll update the comment again, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Updated, realised moving the if
outside the try...finally
block is fiddly because we always want the code in the finally to run.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you could also do if not callable(cb): return
near the start of this function, to avoid most of the work if there's no callback. But if the callback is there in a large proportion of cases, then the way it is now is fine.
The USB pending transfer flag was cleared before calling the completion callback, to allow the callback code to call submit_xfer() again. Unfortunately this isn't safe in a multi-threaded environment, as another thread may see the endpoint is available before the callback is done executing and submit a new transfer. Rather than adding extra locking, specifically treat the transfer as still pending if checked from another thread while the callback is executing. Closes micropython#874 Signed-off-by: Angus Gratton <angus@redyak.com.au>
This is to replace a commit which added locking here but caused some other problems. The idea behind the Buffer class is that a single producer can call pend_write() more than once and it's idempotent, however this is very complex to extend across multiple threads. Signed-off-by: Angus Gratton <angus@redyak.com.au>
f3f4349
to
73c7487
Compare
Signed-off-by: Angus Gratton <angus@redyak.com.au>
73c7487
to
c61ca51
Compare
The USB pending transfer flag was cleared before calling the completion callback, to allow the callback code to call submit_xfer() again.
Unfortunately this isn't safe in a multi-threaded environment, as another thread may see the endpoint is available before the callback is done executing and submit a new transfer.
Rather than adding extra locking, specifically treat the transfer as still pending if checked from another thread while the callback is executing.
Together with micropython/micropython#15264 this closes #874.
Tested using the sample code from the linked issue.
This work was funded through GitHub Sponsors.