-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
binder benchmark causes kernel BUG
on Raspberry Pi
#365
Comments
Thanks Sven! Could you try it on the C version of binder? (Either way we should not allow a user space process to affect the kernel this way, but as a data point, it would be good to know if the ping client/server work on the C version of binder; I believe this is the first time they run on a 32-bit arch.) |
First time I ran the client, I get "illegal instruction"?
Afterwards, I do get some kind of failure:
Maybe I need to switch on selftests in |
When selftest is activated in
|
I apologize for running this on 32-bit architecture (which is not that important in the grand scheme of things). However, the real-world devices I have and am most familiar with, are 32-bit arm. |
Finding and reporting bugs is never something to apologize for! :) |
Yes, but I think this (edited: I am now going to try on a 64-bit arm Raspberry Pi 3) |
I tried on a 64-bit arm Raspberry Pi 3:
I get the same result with |
If it helps, here are the results of the C binder selftest on arm64:
|
Cause of
EDIT: replacing the subtraction with |
C binder has same self tests that run in the kernel and they seem to be failing. I think it's better to disable them for now. |
I have seen this before, in my case it had to do with different alignment when compiling structs in kernel and user spaces (due to different compiler settings/configs). Since the size of structs is encoded as part of the ioctl number, it would result in ioctl numbers that are unknown to the kernel. Let me take a look. |
I can confirm that nothing changes on arm or arm64 after applying #366 :( |
Commit wedsonaf@5baa6d7 fixes an underflow bug.(I haven't yet spent any time trying to understand why it doesn't trigger on x86-64.) With it I can run the benchmark on aarch64 qemu (emulated), on both versions of Binder. https://github.com/wedsonaf/linux/tree/ping has other commits: static building (to run on a stripped-down busybox build) and specifying the number of iterations (given it's emulated, 1M iterations was taking too long). @TheSven73 would be able to try this? If it still fails (and it seems like it should, given that you were getting a different error), would you be able to run your image on qemu to see if it works there? I'd like to understand if real hardware is behaving differently from qemu and exposing another bug. |
@wedsonaf it's working now for arm 32 and 64 bit. The 32-bit "Illegal Instruction" was an unrelated issue - the defaults of my arm toolchain are too "advanced" for the Rasperry Pi Zero's arm1176 cpu. Once I tweaked that, the ping test came to life. On arm64, I was getting So all good 💯 thanks so much for the awesome fix 👍 I've probably ran out of time for the benchmarks, will defer that until tomorrow. (ETA: it's probably a bug that a 32-bit userspace binder client can't interface with a 64-bit binder kernel, but I'm guessing no-one at Google is using, or planning to use that combo. so all good.) |
Thanks for the update, Sven! Great to hear they seem to be working now.
No rush.
I do care :) I think it's related to the alignment/size of structs used in ioctls. Even on 32-bit builds, Binder expects "pointers" ( @nbdd0121 whenever you find the time, I'd appreciate if you could try this latest version too. (Given the variety of archs you all have, I'm eager to finish my functional test conversion so that we get more coverage :)) |
It works now. |
Thanks for trying it out! Much appreciated. |
So now the On my end, I will benchmark arm 32-bit - and check if the mutex |
FYI, the reason I wasn't seeing this on x86-64 was that I didn't have After I enabled it, I could reproduce the issue even on x86-64. |
Perhaps this provides an excellent opportunity to consider the question of what would happen if we are in mainline, a Rust driver has an inadvertent overflow, and mushroom-clouds the whole kernel. Especially in light of "call BUG() when not absolutely necessary, and the fiery wrath of Linus Torvalds himself will descend on you" :) (ETA of course that should go into #283, and is a discussion for another day) |
In production I would assume that |
Kees Cook indicated that the production default should be saturation followed by a call to |
Saturation is not an option, since that's not what Rust itself -- the
choices are two's complement overflow or a panic.
…On Thu, Jun 10, 2021 at 10:28 AM Sven Van Asbroeck ***@***.***> wrote:
Kees Cook indicated that the production default should be saturation
followed by a call to WARN(). Except if overridden explicitly.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#365 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAAGBEYCYKDA54LIY4DPDLTSDDZLANCNFSM46KIB7JA>
.
--
All that is necessary for evil to succeed is for good people to do nothing.
|
I was wondering about that -- good to know it works as expected! |
I think an option to call a certain function on overflow and then return the two's complement overflow anyway would make sense to have in rust. This option will likely need to be unsafe though as the function may be called in the middle of unsafe code that temporarily broke a safety invariant. If the overflow function behaves nicely this option is indistinguishable from wrapping around on overflow from the perspective of the code that caused the overflow and as such shouldn't be a breaking change. |
Yeah, it should be doable even if Rust does not provide it right now. The option could even be two orthogonal ones:
Or even let the function called return the value to use on overflow. |
That would break existing code that assumes that overflows either panic or wrap as specified by RFC 560. |
Yeah, it would break existing code in the sense that one cannot blindly use the new mode, but it would not break existing projects, i.e. the ones that already use the existing modes. |
The delete set command does not rely on the transaction object for element removal, therefore, a combination of delete element + delete set from the abort path could result in restoring twice the refcount of the mapping. Check for inactive element in the next generation for the delete element command in the abort path, skip restoring state if next generation bit has been already cleared. This is similar to the activate logic using the set walk iterator. [ 6170.286929] ------------[ cut here ]------------ [ 6170.286939] WARNING: CPU: 6 PID: 790302 at net/netfilter/nf_tables_api.c:2086 nf_tables_chain_destroy+0x1f7/0x220 [nf_tables] [ 6170.287071] Modules linked in: [...] [ 6170.287633] CPU: 6 PID: 790302 Comm: kworker/6:2 Not tainted 6.9.0-rc3+ #365 [ 6170.287768] RIP: 0010:nf_tables_chain_destroy+0x1f7/0x220 [nf_tables] [ 6170.287886] Code: df 48 8d 7d 58 e8 69 2e 3b df 48 8b 7d 58 e8 80 1b 37 df 48 8d 7d 68 e8 57 2e 3b df 48 8b 7d 68 e8 6e 1b 37 df 48 89 ef eb c4 <0f> 0b 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 0f [ 6170.287895] RSP: 0018:ffff888134b8fd08 EFLAGS: 00010202 [ 6170.287904] RAX: 0000000000000001 RBX: ffff888125bffb28 RCX: dffffc0000000000 [ 6170.287912] RDX: 0000000000000003 RSI: ffffffffa20298ab RDI: ffff88811ebe4750 [ 6170.287919] RBP: ffff88811ebe4700 R08: ffff88838e812650 R09: fffffbfff0623a55 [ 6170.287926] R10: ffffffff8311d2af R11: 0000000000000001 R12: ffff888125bffb10 [ 6170.287933] R13: ffff888125bffb10 R14: dead000000000122 R15: dead000000000100 [ 6170.287940] FS: 0000000000000000(0000) GS:ffff888390b00000(0000) knlGS:0000000000000000 [ 6170.287948] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6170.287955] CR2: 00007fd31fc00710 CR3: 0000000133f60004 CR4: 00000000001706f0 [ 6170.287962] Call Trace: [ 6170.287967] <TASK> [ 6170.287973] ? __warn+0x9f/0x1a0 [ 6170.287986] ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables] [ 6170.288092] ? report_bug+0x1b1/0x1e0 [ 6170.287986] ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables] [ 6170.288092] ? report_bug+0x1b1/0x1e0 [ 6170.288104] ? handle_bug+0x3c/0x70 [ 6170.288112] ? exc_invalid_op+0x17/0x40 [ 6170.288120] ? asm_exc_invalid_op+0x1a/0x20 [ 6170.288132] ? nf_tables_chain_destroy+0x2b/0x220 [nf_tables] [ 6170.288243] ? nf_tables_chain_destroy+0x1f7/0x220 [nf_tables] [ 6170.288366] ? nf_tables_chain_destroy+0x2b/0x220 [nf_tables] [ 6170.288483] nf_tables_trans_destroy_work+0x588/0x590 [nf_tables] Fixes: 5910544 ("netfilter: nf_tables: revisit chain/object refcounting from elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
…ent path Check for table dormant flag otherwise netdev release event path tries to unregister an already unregistered hook. [524854.857999] ------------[ cut here ]------------ [524854.858010] WARNING: CPU: 0 PID: 3386599 at net/netfilter/core.c:501 __nf_unregister_net_hook+0x21a/0x260 [...] [524854.858848] CPU: 0 PID: 3386599 Comm: kworker/u32:2 Not tainted 6.9.0-rc3+ #365 [524854.858869] Workqueue: netns cleanup_net [524854.858886] RIP: 0010:__nf_unregister_net_hook+0x21a/0x260 [524854.858903] Code: 24 e8 aa 73 83 ff 48 63 43 1c 83 f8 01 0f 85 3d ff ff ff e8 98 d1 f0 ff 48 8b 3c 24 e8 8f 73 83 ff 48 63 43 1c e9 26 ff ff ff <0f> 0b 48 83 c4 18 48 c7 c7 00 68 e9 82 5b 5d 41 5c 41 5d 41 5e 41 [524854.858914] RSP: 0018:ffff8881e36d79e0 EFLAGS: 00010246 [524854.858926] RAX: 0000000000000000 RBX: ffff8881339ae790 RCX: ffffffff81ba524a [524854.858936] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881c8a16438 [524854.858945] RBP: ffff8881c8a16438 R08: 0000000000000001 R09: ffffed103c6daf34 [524854.858954] R10: ffff8881e36d79a7 R11: 0000000000000000 R12: 0000000000000005 [524854.858962] R13: ffff8881c8a16000 R14: 0000000000000000 R15: ffff8881351b5a00 [524854.858971] FS: 0000000000000000(0000) GS:ffff888390800000(0000) knlGS:0000000000000000 [524854.858982] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [524854.858991] CR2: 00007fc9be0f16f4 CR3: 00000001437cc004 CR4: 00000000001706f0 [524854.859000] Call Trace: [524854.859006] <TASK> [524854.859013] ? __warn+0x9f/0x1a0 [524854.859027] ? __nf_unregister_net_hook+0x21a/0x260 [524854.859044] ? report_bug+0x1b1/0x1e0 [524854.859060] ? handle_bug+0x3c/0x70 [524854.859071] ? exc_invalid_op+0x17/0x40 [524854.859083] ? asm_exc_invalid_op+0x1a/0x20 [524854.859100] ? __nf_unregister_net_hook+0x6a/0x260 [524854.859116] ? __nf_unregister_net_hook+0x21a/0x260 [524854.859135] nf_tables_netdev_event+0x337/0x390 [nf_tables] [524854.859304] ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables] [524854.859461] ? packet_notifier+0xb3/0x360 [524854.859476] ? _raw_spin_unlock_irqrestore+0x11/0x40 [524854.859489] ? dcbnl_netdevice_event+0x35/0x140 [524854.859507] ? __pfx_nf_tables_netdev_event+0x10/0x10 [nf_tables] [524854.859661] notifier_call_chain+0x7d/0x140 [524854.859677] unregister_netdevice_many_notify+0x5e1/0xae0 Fixes: d54725c ("netfilter: nf_tables: support for multiple devices per netdev hook") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
@wedsonaf
(this Issue doesn't really belong here, but I can't open an Issue on wedson's fork)
#ifdef
in Error.rs file #346 (comment)$ make bcm2835_rust_defconfig
then switch on binder indefconfig
./ping_server /dev/rust_binder &
works great!./ping_client /dev/rust_binder
oopses:The text was updated successfully, but these errors were encountered: