-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WARNING: CPU: 0 PID: 142 at kernel/dma/mapping.c:148 dma_map_page_attrs #198
Comments
https://github.com/shenki/linux/blob/dev-5.9/kernel/dma/mapping.c#L148
So the no dma mask it doesn't point to the correct dma device, or the dma mask now needs to be setup explictly in the platform for the device because its no longer default. |
@geissonator saw this with v5.10. When attempting to debug this the issue goes away, so there is some aspect of timing or build configuration that affects reproducibility. |
69c9e82 fixes a similar looking warning. I suspect we're setting this on the wrong device. Reliably reproducing is still difficult which makes it hard to test.
|
@shenki USB-DMA issue observed in 5.10 based kernel I could able to recreate this issue with AST2600 eval kit i.e.., whenever I connect the USB cable to my host setup (laptop/server) such dma triggers were observed through USB interrupt which in turn lead to a crash. The errors logs observed in my setup is shown below:
|
We have exactly the same issue on 6687842 on g220a. |
Also encountering this on my e3c246d4i port (100% reproducible so far). |
I'm facing the similar backtrace when moving to kernel 5.10, and I have applied below 1-line patch in my tree to fix it. So far it works for Facebook AST2400 and AST2500 platforms but I didn't get chance to try it on AST2600. facebook/openbmc-linux@83153bd#diff-d44e373511270207f6d3dc1f1081834d0b7cd91ffa086c86a999a3dd753cc744 |
Thanks @tao-ren. I think this indicates the kernel is using the wrong device pointer somewhere along the lines. I've been reading through the code trying to work out where this is. Can someone boot using CONFIG_DMA_API_DEBUG and reproduce the issue? |
Full dmesg with
|
My feeling was: the "dummy" device (one for each port, created in "ast_vhub_init_dev") was not fully initialized (no dma mask). The driver used be working, but it fails after commit f959dcd ("dma-direct: Fix potential NULL pointer dereference"). |
Inspecting the code, this is our call stack:
The dma buffer is mapped with
So we need to know, does Following Tao's thread, the fake dummy device has it's parent set to @tao-ren Can you send your patch to the upstream mailing list? cc the people in this thread, and those from the dma-direct patch you linked to. If it's not the right thing to do I'm sure we will be told! |
I can't speak to whether or not it's entirely the right fix, but experimentally I've just tested @tao-ren's patch, and with it applied I no longer see the aspeed-vhub WARN() and the openbmc iKVM's K and M components are functional. (I still get the ftgmac100 |
I was about to learn more before sending out the patch but didn't get chance to do so (struggling with ast2600 these days). Sure I will send out the patch to upstream list soon. |
Tried the patch on g220a and it works great! |
FYI I just sent the patch to upstream list. I couldn't cc all the people in the thread because I don't have all the email addresses, but I added openbmc email alias, and you could search "[PATCH] usb: gadget: aspeed: set port_dev dma mask". Cheers, |
[ Upstream commit 74be987 ] KASAN + DEBUG_KOBJECT_RELEASE reports a potential use-after-free in cxl_decoder_release() where it goes to reference its parent, a cxl_port, to free its id back to port->decoder_ida. BUG: KASAN: use-after-free in to_cxl_port+0x18/0x90 [cxl_core] Read of size 8 at addr ffff888119270908 by task kworker/35:2/379 CPU: 35 PID: 379 Comm: kworker/35:2 Tainted: G OE 5.17.0-rc2+ #198 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: events kobject_delayed_cleanup Call Trace: <TASK> dump_stack_lvl+0x59/0x73 print_address_description.constprop.0+0x1f/0x150 ? to_cxl_port+0x18/0x90 [cxl_core] kasan_report.cold+0x83/0xdf ? to_cxl_port+0x18/0x90 [cxl_core] to_cxl_port+0x18/0x90 [cxl_core] cxl_decoder_release+0x2a/0x60 [cxl_core] device_release+0x5f/0x100 kobject_cleanup+0x80/0x1c0 The device core only guarantees parent lifetime until all children are unregistered. If a child needs a parent to complete its ->release() callback that child needs to hold a reference to extend the lifetime of the parent. Fixes: 40ba17a ("cxl/acpi: Introduce cxl_decoder objects") Reported-by: Ben Widawsky <ben.widawsky@intel.com> Tested-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Link: https://lore.kernel.org/r/164505751190.4175768.13324905271463416712.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
dev-5.9 (wip) on witherspoon:
The text was updated successfully, but these errors were encountered: