Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel panic - Spinlock issue / "BUG: scheduling while atomic" #251

Open
HinTak opened this issue Oct 14, 2020 · 40 comments
Open

kernel panic - Spinlock issue / "BUG: scheduling while atomic" #251

HinTak opened this issue Oct 14, 2020 · 40 comments
Assignees

Comments

@HinTak
Copy link
Contributor

HinTak commented Oct 14, 2020

First reported in #246 (comment) , haven't been tackled in #249 .

@Pillar1989
Copy link
Contributor

@HinTak This does sacrifice some performance. But there is no better way. Do you have any Suggestions?

@HinTak
Copy link
Contributor Author

HinTak commented Oct 14, 2020

@Pillar1989 the injector octo driver has an option to run the clock continuously (unless at power-saving mode under the kernel's power management), rather than at stream start/stop. This might workaround the spinlock issue, plus have the advantage of fixing channel sync (where the 8 channels shifts by two). At a disadvantage of possibly higher power consumption.

Anyway, that code just looks wrong - read chapter 5 of the linux device driver book. It is freely available online.

cc @j1nx

@HinTak
Copy link
Contributor Author

HinTak commented Oct 14, 2020

@Pillar1989 it is also NOT merely a performance issue. Every time the kernel's exception mechanism is triggered, there is no guarantee that the internal state of the driver is consistent.

@turmary
Copy link
Contributor

turmary commented Oct 14, 2020

@HinTak the injector octo sound card have 4 gpios to control the sample rates, and also the clock start(gpios != 0b0000)/stop(gpios = 0b0000).
I think it will also miss channels order sometimes with option non_stop_clocks setting to 1.

We haven't the same hardware design,
so we have to start/stop clock in function XXX_trigger by i2c access to sync channels order.
I know I2C access is a too long path which cause the problem "BUG: scheduling while atomic".

Do you have any idea about dealing with these limitations?

@HinTak
Copy link
Contributor Author

HinTak commented Oct 14, 2020

@turmary - I think the respeaker driver may be starting / stop the clock too often. The "BUG: scheduling while atomic" is partly due to that, so changing to more like how the Octo card does it can help. There is also why the respeaker driver needs to hammer on the I2S so often, and then have usleep() in a few places to slow it down! (instead the driver can cache response from the hardware and shield the hardware from too frequent access from alsa, without using usleep()).

Yes, that octo card option is mainly for channel +2/-2 shifts, which the respeaker also suffers from, I think, in one of the closed-without-resolving bugs.

@HinTak
Copy link
Contributor Author

HinTak commented Oct 20, 2020

Just remember this, raspberrypi/linux#3580 , likely related as both concern scheduling.

@HinTak HinTak changed the title Spinlock issue Spinlock issue / "BUG: scheduling while atomic" Oct 31, 2020
@HinTak
Copy link
Contributor Author

HinTak commented Oct 31, 2020

This seems to make the mic-array quite unuseable - I can get my Pi (headless Ubuntu 20.04.1) to crash by ssh'ing into it. I guess the sshd demon is sufficiently high-priority that it definitely causes context switches, and so whenever I ssh'ed into the Pi, it immediately dumps the whole lot of critical kernel logs to my console window and crashes.

@Pillar1989
Copy link
Contributor

@HinTak We hope to solve this problem internally, but when we solve this problem, we will bring other problems. In addition, we have learned from the supplier that AC108 will also face the problem of EOL. We will focus more on the V2 version, and we will consider using TI's multi-channel Audio ADC. I hope to have a perfect, or at least satisfying, solution in V2.

@Pillar1989
Copy link
Contributor

In addition, I also hope you can make Suggestions on the chip selection of the new scheme. The V2 version will be completely open-source on both hardware and software. @HinTak I hope that the design for the V2 version was a community choice, and Seeed just helped make it happen.

@HinTak
Copy link
Contributor Author

HinTak commented Nov 17, 2020

@Pillar1989 FWIW, if you can't/won't fix problems with V1, from the customers' point of view, it is hard to justify buying v2. Put it bluntly: I may not even want to spend time on it, even if you send v2 to me free!

@Pillar1989
Copy link
Contributor

We will definitely fix this before V2 is released. Because we also have this problem with V2, our hardware will have this problem whenever we use 8-channel PCM signals. @HinTak

@HinTak
Copy link
Contributor Author

HinTak commented Dec 4, 2020

@Pillar1989 by the way, what's your estimate of the timescale for arrival of v2 (prototype or shipped product)? 3 months, 6 months, 1 year, 2 years?

@HinTak HinTak changed the title Spinlock issue / "BUG: scheduling while atomic" kernel panic - Spinlock issue / "BUG: scheduling while atomic" Feb 6, 2021
@HinTak
Copy link
Contributor Author

HinTak commented Feb 6, 2021

Thanks to one of you side-tracking me onto berryboot, I managed to track down the specific kernel config which interacts badly with the respeaker driver to cause kernel panic and crashes. The bad news is, unfortunately, that's the Ubuntu (both 32-bit and 64-bit) and the Raspbian 64-bit default. So the only way without using a non-distro kernel to avoid the crash, is to stay with raspbian 32-bit, out of the 4 combinations of ubuntu / raspbian x 32-bit / 64-bit .
cc @j1nx @j1nx @younes-professor @Daenara @tomh05 @joshuajaharwood @h4de5 @faaafo @Tom-Lu @lxne

On the other hand, I really want to have a non-crashing ubuntu 64-bit instance, so I made a largely-compatible set of ubuntu kernel packages to the most recent ubuntu focal (20.04), instructions and downloads at https://github.com/HinTak/RaspberryPi-Dev/releases/tag/Ubuntu-raspi-5.4.0-1028.31 . I am likely to upgrade to Ubuntu groovy (20.10) 64-bit soon, so a Ubuntu groovy 64-bit set of kernel packages will likely be made at some point, but I am only going to keep two SD cards ("current" ubuntu 64-bit and raspbian 32-bit). Those of you want to use ubuntu 32-bit or raspbian 64-bit without crashing by the seeed respeaker driver are a bit out-of-luck. I could be pursuaded do a set of ubuntu 32-bit kernel packages, especially if you click donate at https://hintak.github.io ... but even the raspbian people admitt that trying to build the raspbian kernel package their way (with small tweaks) is hard, so I'd need a lot of pursuasion to attempt to build "largely compatible" 64-bit raspbian kernel packages...

@turmary @Pillar1989 How's Seeed Studio staff getting on with fixing this bug? and the progress with the V2 hardware and driver? This basically confirms that the respeaker driver causes crashes on everything except raspbian 32-bit. That's quite limiting, especially given that people want to use the more recent /powerful pi's in 64-bit mode...

@HinTak HinTak mentioned this issue Feb 6, 2021
@HinTak
Copy link
Contributor Author

HinTak commented Feb 6, 2021

The earlier tagged release was just the previous attempt - https://github.com/HinTak/RaspberryPi-Dev/tree/Ubuntu-raspi-5.4.0-1026.29 - enough time had passed between me trying, that the Ubuntu people had released another kernel. This first one took over 5 hours of build time with two interruptions; the 1028.31 is continuous over 4 hours. So I'm not likely to do it too often (except a one-off after I upgrade to groovy).

@HinTak
Copy link
Contributor Author

HinTak commented Feb 9, 2021

@Daenara okay... I have had about 25 years with linux on the desktop/laptop, so mostly pi is just smaller and different hardware for me; if I make a mistake or otherwise want to try dangerous things, I just take the sd card out, insert into my laptop, look at the logs, make some change and put it back into the pi. I can imagine things a bit different if you don't/can't fix things by playing with the sd card'a content elsewhere. But if you do want to downgrade to have a more functional system, let me know. Hopefully the new card arrives soon and/or seeed staffers shape up...

@HinTak
Copy link
Contributor Author

HinTak commented Feb 10, 2021

@Daenara forgotten to ask: while you are recording (I mean comparing before and after) , is there anything unusual in dmesg? You likely have a few "i2s errors" too, I am also interested in how often they happen, so the timestamps before them is useful too, say, while you recording for perhaps 30s / a minute.

@Daenara
Copy link

Daenara commented Feb 12, 2021

@HinTak I just got around to testing this evening, with only one channel working I got exactly 1 sync error per recording. I did 3 recordings with roughly 15min (needed to record stuff for voice ai training anyways) and one with roughly a minute. When the error happens seems to be random, for one file it was at the start, another was shortly before I stopped the recording, the short one had it in the middle. Here is that part of the log:
image

I did not test with all channels working, if you think that can help debug then I can do that this weekend also.

@HinTak
Copy link
Contributor Author

HinTak commented Feb 13, 2021

@Daenara thanks. That's interesting - on the 6-mics, I get a lot more i2s errors, and they always comes in groups of 5-6, not just from recording not also from running aplay -l/arecord -l querying the devices. I don't think a new recording with all channels is needed - you only turned down the volume, right? It is curious as the data is just packed group of 4.(ie you don't have access to individual mics).

My order of a 2-mics and 4-mics was placed just over 2 months ago... Until/unless it arrives, I wonder if I can use just half of the 6-mics device with the 4-mics driver? Cc @turmary @Pillar1989 regarding the order and the possibility of driving half of the 6-mics device with the 4-mics driver.

@Daenara
Copy link

Daenara commented Feb 13, 2021

Yes, I just turned the volume down. But even one channel seems to still be hit and miss, I just trained a new model, wanted to test it and now I have noise on my one channel also, lets see how many reboots it needs before it works again.

@HinTak
Copy link
Contributor Author

HinTak commented Feb 18, 2021

I mentioned earlier that I'd try to make a work-around kernel for Ubuntu groovy (20.10), so here it is: https://github.com/HinTak/RaspberryPi-Dev/releases/tag/Ubuntu-raspi-5.8.0-1013.16 . Sorry, I know 1015.18 / 1016.19 are already out...

cc @j1nx @j1nx @younes-professor @Daenara @tomh05 @joshuajaharwood @h4de5 @faaafo @Tom-Lu @lxne

I'll write about how it was built at some point. Known differences (besides it does not get crashed /kernel panic by this bug in the respeaker) against genuine 5.8.0-1013.16 from Ubuntu are: CEC GPIO driver, ZFS and the experimental BPF based packet filtering framework . They are either incompatible with the work-around (CEC GPIO and BPF), or out-of-tree (ZFS). The CEC driver seems to be for doing GPIO on special HDMI hardware and most people don't need it; the BPF packet filter is supposed to be still experimental, so the real loss is losing ZFS... but I'm a lot happier to be operational on ubuntu and not be crashed by the respeaker driver. I can be persuaded to try to make ZFS work, if some of you need it...

@j1nx
Copy link

j1nx commented Feb 18, 2021

@HinTak Thx for the ping. Have been busy with other stuff but see if I can dig into this stuff soon. You are doing great work, so deserve the feedback and the (still by me forgotten) donation.

Without having a deep look as of yet, are the kernel patches in there or do I have to take a look at your repo?

@HinTak
Copy link
Contributor Author

HinTak commented Feb 18, 2021

@j1nx the full story is complicated. The seeed studio repo is up to date to v5.4 . However, raspbian has moved from v5.4 to v5.10 at the beginning of February (2,3 weeks ago). Ubuntu 20.04 LTS is v5.4 based, but ubuntu 20.10 is v5.8 based. So you still need my repo, the v5.8 branch for v5.8, and v5.9 branch for v5.9/v5.10 kernels for up-to-date distributions. Ubuntu 20.04 LTS is still "current" as it is a LTS (long term support) release, so it looks like that might be the best choice for not worrying about misc routine upgrades breaking things.

The ubuntu kernel builds I posted earlier above are for working around this rather serious bug; the actual fix would involve some serious rewrite of part of the driver's logic - I don't have the time for, and also lack of chip/ic documentation (I don't want to register at their supplier's to get those, as a matter of principle...). @Daenara on the 4-mics device seems to have a rather serious issue with usage against newer kernel too - I am supposed to have such a device real soon - put in the order just over 2 months ago, but until/unless it arrives, there is not much I can do.

@HinTak
Copy link
Contributor Author

HinTak commented Jul 4, 2021

@AIWintermuteAI btw, you probably should know that the 4-mics/6-mics respeaker devices crashes 64-bit Raspberrypi OS.

I have a work around for 64-bit ubuntu, but requires rebuilding the kernel:
https://github.com/HinTak/RaspberryPi-Dev/releases/tag/Ubuntu-raspi-5.8.0-1013.16 .

Rebuilding the 64-bit kernel package compatible to the Raspberrypi OS way is hard / undocumented. (you can build the kernel alright, but very difficult to build the Rasp OS kernel deb packages the official way)

@HinTak
Copy link
Contributor Author

HinTak commented Jul 4, 2021

@AIWintermuteAI I am commenting on this since you updated the Readme to say "64-bit Raspberry Pi OS" is officially supported :(

@ghost
Copy link

ghost commented Jul 23, 2021

@HinTak ,
This action was performed automatically.
Please describe the issue according to bug template - if the issue was resolved, ignore this message. The issue will be marked as closed in 7 days if inactive.

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Platform
What platform are you running the code on.

  • Device: [e.g. Raspberry Pi 4]
  • OS: [e.g. Raspbian OS 32bit kernel version ...]
  • Version/commit number [e.g. d1816f5]

Relevant log output
Please copy and paste any relevant log output.

@HinTak
Copy link
Contributor Author

HinTak commented Jul 27, 2021

@AIWintermuteAI I already pinged you on this- the driver code will cause 64-bit raspbian to crash. (and 32-bit / 64-bit ubuntu too). Workaround further up. Just try it on the 64-bit raspbian with anything except the 2-mics device and you'll see.

@AIWintermuteAI
Copy link
Contributor

Hello @HinTak !
Yes, thank you for the reminder. Currently we're working on the issues starting from the most fresh issues and then going to down to staler ones. We will get to 64-bit issue in time as well. Currently this is how resources are allocated to tech support on reSpeaker project.

@HinTak
Copy link
Contributor Author

HinTak commented Jul 28, 2021

@AIWintermuteAI btw, many of the issues (including quite specifically this one) are related to the X-power chips / code, so you should test with the 4-mics/6-mics devices.

As noted in a few other reports, the two mics device has a different vendor and is even supposed to work with upstream latest kernel out-of-the-box without needing to compile and install anything from here.

@hellow554
Copy link

hellow554 commented Feb 3, 2022

Sadly this is still an issue on the 64-bit version of raspberry pi OS along with the respeaker 4-mic array.

@AIWintermuteAI you removed your assignment on this, does it mean this isn't likely to be fixed? :/

@HinTak
Copy link
Contributor Author

HinTak commented Feb 3, 2022 via email

@thetravellor
Copy link

The 6.5 branch installs and works with the current 64 bit bullseye kernel for raspbian 11, using 2 mic pi hat.

Linux openvpn 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants