Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KASAN: slab-out-of-bounds in ncsi_rsp_handler_sma #146

Closed
shenki opened this issue Mar 22, 2018 · 3 comments
Closed

KASAN: slab-out-of-bounds in ncsi_rsp_handler_sma #146

shenki opened this issue Mar 22, 2018 · 3 comments
Assignees

Comments

@shenki
Copy link
Member

shenki commented Mar 22, 2018

e156398
v4.16-rc6-119-ge156398bfcad

from Joel's experimental 4.16 tree, on a qemu romulus machine. Also reproduces on Romulus hardware

[   32.662953] ftgmac100 1e660000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
[   38.111190] ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
[   38.117326] ftgmac100 1e660000.ethernet eth0: no vlan ids left to set
[   38.131543] ==================================================================
[   38.153464] BUG: KASAN: slab-out-of-bounds in ncsi_rsp_handler_sma+0x15c/0x270
[   38.155622] Write of size 6 at addr 97ff0628 by task kworker/0:1/213
[   38.156769] 
[   38.158546] CPU: 0 PID: 213 Comm: kworker/0:1 Not tainted 4.16.0-rc6-00118-g671c39af8e7d-dirty #269
[   38.159874] Hardware name: Generic DT based system
[   38.161708] Workqueue: events ncsi_dev_work
[   38.164177] [<80016978>] (unwind_backtrace) from [<80012af8>] (show_stack+0x20/0x24)
[   38.164859] [<80012af8>] (show_stack) from [<80929cfc>] (dump_stack+0x20/0x28)
[   38.165588] [<80929cfc>] (dump_stack) from [<8022cfd0>] (print_address_description+0x5c/0x32c)
[   38.166441] [<8022cfd0>] (print_address_description) from [<8022d590>] (kasan_report+0x14c/0x3a4)
[   38.167355] [<8022d590>] (kasan_report) from [<8022b724>] (check_memory_region+0xa0/0x19c)
[   38.168122] [<8022b724>] (check_memory_region) from [<8022bc28>] (memcpy+0x44/0x58)
[   38.168806] [<8022bc28>] (memcpy) from [<8091cc30>] (ncsi_rsp_handler_sma+0x15c/0x270)
[   38.169420] [<8091cc30>] (ncsi_rsp_handler_sma) from [<8091d8f0>] (ncsi_rcv_rsp+0x294/0x48c)
[   38.170098] [<8091d8f0>] (ncsi_rcv_rsp) from [<806ebe64>] (__netif_receive_skb_core+0xc44/0x1368)
[   38.170849] [<806ebe64>] (__netif_receive_skb_core) from [<806ed4ec>] (__netif_receive_skb+0x28/0x148)
[   38.171541] [<806ed4ec>] (__netif_receive_skb) from [<806f582c>] (netif_receive_skb_internal+0x38/0x130)
[   38.172252] [<806f582c>] (netif_receive_skb_internal) from [<806f7020>] (netif_receive_skb+0x34/0x104)
[   38.173039] [<806f7020>] (netif_receive_skb) from [<805b41b0>] (ftgmac100_poll+0x734/0xb88)
[   38.173703] [<805b41b0>] (ftgmac100_poll) from [<806f85a4>] (net_rx_action+0x210/0x7c4)
[   38.174394] [<806f85a4>] (net_rx_action) from [<8000a488>] (__do_softirq+0x178/0x744)
[   38.174984] [<8000a488>] (__do_softirq) from [<80034178>] (do_softirq.part.6+0x5c/0x6c)
[   38.175577] [<80034178>] (do_softirq.part.6) from [<80034298>] (__local_bh_enable_ip+0x110/0x1c0)
[   38.176280] [<80034298>] (__local_bh_enable_ip) from [<806f40d8>] (__dev_queue_xmit+0x364/0xc80)
[   38.177039] [<806f40d8>] (__dev_queue_xmit) from [<806f4a10>] (dev_queue_xmit+0x1c/0x20)
[   38.177763] [<806f4a10>] (dev_queue_xmit) from [<80919f2c>] (ncsi_xmit_cmd+0x380/0x518)
[   38.178431] [<80919f2c>] (ncsi_xmit_cmd) from [<809217e0>] (ncsi_configure_channel+0x530/0xc84)
[   38.179066] [<809217e0>] (ncsi_configure_channel) from [<80922d10>] (ncsi_dev_work+0xe4/0x964)
[   38.179677] [<80922d10>] (ncsi_dev_work) from [<8005a67c>] (process_one_work+0x3a4/0xa54)
[   38.180398] [<8005a67c>] (process_one_work) from [<8005add8>] (worker_thread+0xac/0xb50)
[   38.181069] [<8005add8>] (worker_thread) from [<800668c0>] (kthread+0x24c/0x35c)
[   38.181783] [<800668c0>] (kthread) from [<800090f0>] (ret_from_fork+0x14/0x24)
[   38.182420] Exception stack(0x976abfb0 to 0x976abff8)
[   38.183141] bfa0:                                     00000000 00000000 00000000 00000000
[   38.183933] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   38.184657] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   38.185205] 
[   38.185541] Allocated by task 7:
[   38.186115]  kasan_kmalloc+0xd4/0x174
[   38.186452]  __kmalloc+0xe4/0x250
[   38.186757]  ncsi_rsp_handler_gc+0x1fc/0x37c
[   38.187105]  ncsi_rcv_rsp+0x294/0x48c
[   38.187426]  __netif_receive_skb_core+0xc44/0x1368
[   38.187809]  __netif_receive_skb+0x28/0x148
[   38.188146]  netif_receive_skb_internal+0x38/0x130
[   38.188518]  netif_receive_skb+0x34/0x104
[   38.188847]  ftgmac100_poll+0x734/0xb88
[   38.189161]  net_rx_action+0x210/0x7c4
[   38.189475]  __do_softirq+0x178/0x744
[   38.189781] 
[   38.189945] Freed by task 1:
[   38.190214]  __kasan_slab_free+0x110/0x1ec
[   38.190546]  kasan_slab_free+0x14/0x18
[   38.190854]  kfree+0x7c/0x180
[   38.191120]  do_copy+0x70/0x160
[   38.191396]  write_buffer+0x84/0xa4
[   38.191688]  flush_buffer+0x40/0xcc
[   38.191992]  unxz+0x1ec/0x34c
[   38.192260]  unpack_to_rootfs+0x258/0x4ec
[   38.192590]  populate_rootfs+0x68/0x118
[   38.192943]  do_one_initcall+0x15c/0x260
[   38.193262]  kernel_init_freeable+0x2a4/0x388
[   38.193610]  kernel_init+0x1c/0x124
[   38.193901]  ret_from_fork+0x14/0x24
[   38.194378]    (null)
[   38.194581] 
[   38.194769] The buggy address belongs to the object at 97ff0600
[   38.194769]  which belongs to the cache kmalloc-32 of size 32
[   38.195612] The buggy address is located 8 bytes to the right of
[   38.195612]  32-byte region [97ff0600, 97ff0620)
[   38.196294] The buggy address belongs to the page:
[   38.196788] page:9fefae00 count:1 mapcount:0 mapping:97ff0000 index:0x97ff0fc1
[   38.197474] flags: 0x100(slab)
[   38.198200] raw: 00000100 97ff0000 97ff0fc1 00000032 00000001 9fef86d4 9fee7bb4 97c00620
[   38.198846] page dumped because: kasan: bad access detected
[   38.199271] 
[   38.199456] Memory state around the buggy address:
[   38.200084]  97ff0500: 00 04 fc fc fc fc fc fc 00 00 00 04 fc fc fc fc
[   38.200597]  97ff0580: 00 00 00 04 fc fc fc fc 00 00 00 fc fc fc fc fc
[   38.201094] >97ff0600: 00 00 00 04 fc fc fc fc 03 fc fc fc fc fc fc fc
[   38.201631]                           ^
[   38.201957]  97ff0680: 00 00 00 04 fc fc fc fc 00 00 04 fc fc fc fc fc
[   38.202431]  97ff0700: 05 fc fc fc fc fc fc fc 03 fc fc fc fc fc fc fc
[   38.202924] ==================================================================
[   38.203465] Disabling lock debugging due to kernel taint
[   39.243568] ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 config done
[   39.243917] ftgmac100 1e660000.ethernet eth0: NCSI: No more channels to process
[   39.244139] ftgmac100 1e660000.ethernet eth0: NCSI interface up
#0  kasan_report (addr=5, size=1912594629, is_write=198, ip=1) at mm/kasan/report.c:398
No locals.
#1  0x8022b724 in check_memory_region_inline (ret_ip=<optimized out>, write=<optimized out>, size=<optimized out>, addr=<optimized out>)
    at mm/kasan/kasan.c:260
No locals.
#2  check_memory_region (addr=2550072872, size=6, write=true, ret_ip=2157038640) at mm/kasan/kasan.c:274
No locals.
#3  0x8022bc28 in memcpy (dest=0x97ff0628, src=0x9769da80, len=6) at mm/kasan/kasan.c:310
No locals.
#4  0x8091cc30 in ncsi_rsp_handler_sma (nr=0x9769da86) at net/ncsi/ncsi-rsp.c:459
        ndp = 0x97ff0610
        nc = 0x97441b00
        ncf = 0x97ff0610
        bitmap = 0x6
#5  0x8091d8f0 in ncsi_rcv_rsp (skb=0x932ac700, dev=0x6, pt=0x1, orig_dev=0x8091cc30 <ncsi_rsp_handler_sma+348>) at net/ncsi/ncsi-rsp.c:1040
        nd = 0x97738020
        nr = 0x97738310
        payload = 6
        i = -1754037536
        ret = 0
#6  0x806ebe64 in __netif_receive_skb_core (skb=0x932ac700, pfmemalloc=6) at net/core/dev.c:4554
        pt_prev = 0x9773ac78
        orig_dev = 0x97763180
        ret = 1
#7  0x806ed4ec in __netif_receive_skb (skb=0x932ac700) at net/core/dev.c:4619
        ret = -1825913088
#8  0x806f582c in netif_receive_skb_internal (skb=0x932ac700) at net/core/dev.c:4693
        ret = -1744894424
#9  0x806f7020 in netif_receive_skb (skb=0x932ac700) at net/core/dev.c:4717
No locals.
#10 0x805b41b0 in ftgmac100_rx_packet (processed=<optimized out>, priv=<optimized out>) at drivers/net/ethernet/faraday/ftgmac100.c:575
        pointer = 63624
        rxdes = 0xa09ba0e0
        map = 2446730530
#11 ftgmac100_poll (napi=0x97763658, budget=6) at drivers/net/ethernet/faraday/ftgmac100.c:1328
        work_done = 0
#12 0x806f85a4 in napi_poll (repoll=<optimized out>, n=<optimized out>) at net/core/dev.c:5697
        work = 1
        weight = 64
        __warned = false
        __print_once = false
#13 net_rx_action (h=0x97ff0628) at net/core/dev.c:5763
        list = {next = 0x976abba0, prev = 0x976abba0}
        repoll = {next = 0x976abba8, prev = 0x976abba8}
#14 0x8000a488 in __do_softirq () at kernel/softirq.c:285
        vec_nr = 3
        pending = 8
#15 0x80034178 in do_softirq_own_stack () at ./include/linux/interrupt.h:499
No locals.
#16 do_softirq () at kernel/softirq.c:329
No locals.
#17 0x80034298 in do_softirq () at kernel/softirq.c:321
No locals.
#18 __local_bh_enable_ip (ip=2550072872, cnt=2096896) at kernel/softirq.c:182
No locals.
#19 0x806f40d8 in local_bh_enable () at ./include/linux/bottom_half.h:32
No locals.
#20 rcu_read_unlock_bh () at ./include/linux/rcupdate.h:726
No locals.
#21 __dev_queue_xmit (skb=0x0, accel_priv=0x6) at net/core/dev.c:3576
        dev = 0x97763180
        txq = 0x93611240
        rc = 0
#22 0x806f4a10 in dev_queue_xmit (skb=0x97ff0628) at net/core/dev.c:3582
No locals.
#23 0x80919f2c in ncsi_xmit_cmd (nca=0x9769da62) at net/ncsi/ncsi-cmd.c:348
        eh = 0x9769da62
        i = -1754037476
#24 0x809217e0 in ncsi_configure_channel (ndp=0x97738020) at net/ncsi/ncsi-manage.c:904
        np = 0x68954245
        nc = 0x976abdd2
        hot_nc = 0x97763180
        nca = <incomplete type>
#25 0x80922d10 in ncsi_dev_work (work=0x9773ac68) at net/ncsi/ncsi-manage.c:1288
No locals.
#26 0x8005a67c in process_one_work (worker=0x9764a400, work=0x9773ac68) at kernel/workqueue.c:2113
        pool = 0x80bab0fc <cpu_worker_pools>
#27 0x8005add8 in worker_thread (__worker=0x9764a400) at kernel/workqueue.c:2247
        pool = 0x80bab0fc <cpu_worker_pools>
#28 0x800668c0 in kthread (_create=0x9764b560) at kernel/kthread.c:238
        threadfn = 0x8005ad2c <worker_thread>
        data = 0x9764a400
        ret = -1744975904
@sammj
Copy link

sammj commented Apr 5, 2018

Looks like there's two triggers here - handling an SMA response and handling a SVF response. I've only triggered the SVF case so far but I suspect they're both the same root cause: NCSI has a generic concept of filters which it uses for both MAC addresses and VLAN IDs, storing either kind of data in a u32 data[] buffer. NCSI stores differently sized types in this buffer and allocates it accordingly - looks like it just gets the calculation slightly wrong and writes into unallocated memory just off the end of the buffer.
I've used this as a chance to finish off a refactor of the filtering code I already had going and so far it looks to avoid this error - patches to come.

@sammj
Copy link

sammj commented Jun 21, 2018

shenki pushed a commit that referenced this issue Jul 30, 2018
[ Upstream commit ff907a1 ]

syzbot caught a NULL deref [1], caused by skb_segment()

skb_segment() has many "goto err;" that assume the @err variable
contains -ENOMEM.

A successful call to __skb_linearize() should not clear @err,
otherwise a subsequent memory allocation error could return NULL.

While we are at it, we might use -EINVAL instead of -ENOMEM when
MAX_SKB_FRAGS limit is reached.

[1]
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
CPU: 0 PID: 13285 Comm: syz-executor3 Not tainted 4.18.0-rc4+ #146
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:tcp_gso_segment+0x3dc/0x1780 net/ipv4/tcp_offload.c:106
Code: f0 ff ff 0f 87 1c fd ff ff e8 00 88 0b fb 48 8b 75 d0 48 b9 00 00 00 00 00 fc ff df 48 8d be 90 00 00 00 48 89 f8 48 c1 e8 03 <0f> b6 14 08 48 8d 86 94 00 00 00 48 89 c6 83 e0 07 48 c1 ee 03 0f
RSP: 0018:ffff88019b7fd060 EFLAGS: 00010206
RAX: 0000000000000012 RBX: 0000000000000020 RCX: dffffc0000000000
RDX: 0000000000040000 RSI: 0000000000000000 RDI: 0000000000000090
RBP: ffff88019b7fd0f0 R08: ffff88019510e0c0 R09: ffffed003b5c46d6
R10: ffffed003b5c46d6 R11: ffff8801dae236b3 R12: 0000000000000001
R13: ffff8801d6c581f4 R14: 0000000000000000 R15: ffff8801d6c58128
FS:  00007fcae64d6700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004e8664 CR3: 00000001b669b000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 tcp4_gso_segment+0x1c3/0x440 net/ipv4/tcp_offload.c:54
 inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
 inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
 skb_mac_gso_segment+0x3b5/0x740 net/core/dev.c:2792
 __skb_gso_segment+0x3c3/0x880 net/core/dev.c:2865
 skb_gso_segment include/linux/netdevice.h:4099 [inline]
 validate_xmit_skb+0x640/0xf30 net/core/dev.c:3104
 __dev_queue_xmit+0xc14/0x3910 net/core/dev.c:3561
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
 neigh_hh_output include/net/neighbour.h:473 [inline]
 neigh_output include/net/neighbour.h:481 [inline]
 ip_finish_output2+0x1063/0x1860 net/ipv4/ip_output.c:229
 ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
 NF_HOOK_COND include/linux/netfilter.h:276 [inline]
 ip_output+0x223/0x880 net/ipv4/ip_output.c:405
 dst_output include/net/dst.h:444 [inline]
 ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
 iptunnel_xmit+0x567/0x850 net/ipv4/ip_tunnel_core.c:91
 ip_tunnel_xmit+0x1598/0x3af1 net/ipv4/ip_tunnel.c:778
 ipip_tunnel_xmit+0x264/0x2c0 net/ipv4/ipip.c:308
 __netdev_start_xmit include/linux/netdevice.h:4148 [inline]
 netdev_start_xmit include/linux/netdevice.h:4157 [inline]
 xmit_one net/core/dev.c:3034 [inline]
 dev_hard_start_xmit+0x26c/0xc30 net/core/dev.c:3050
 __dev_queue_xmit+0x29ef/0x3910 net/core/dev.c:3569
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
 neigh_direct_output+0x15/0x20 net/core/neighbour.c:1403
 neigh_output include/net/neighbour.h:483 [inline]
 ip_finish_output2+0xa67/0x1860 net/ipv4/ip_output.c:229
 ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
 NF_HOOK_COND include/linux/netfilter.h:276 [inline]
 ip_output+0x223/0x880 net/ipv4/ip_output.c:405
 dst_output include/net/dst.h:444 [inline]
 ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
 ip_queue_xmit+0x9df/0x1f80 net/ipv4/ip_output.c:504
 tcp_transmit_skb+0x1bf9/0x3f10 net/ipv4/tcp_output.c:1168
 tcp_write_xmit+0x1641/0x5c20 net/ipv4/tcp_output.c:2363
 __tcp_push_pending_frames+0xb2/0x290 net/ipv4/tcp_output.c:2536
 tcp_push+0x638/0x8c0 net/ipv4/tcp.c:735
 tcp_sendmsg_locked+0x2ec5/0x3f00 net/ipv4/tcp.c:1410
 tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1447
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:641 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:651
 __sys_sendto+0x3d7/0x670 net/socket.c:1797
 __do_sys_sendto net/socket.c:1809 [inline]
 __se_sys_sendto net/socket.c:1805 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1805
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455ab9
Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fcae64d5c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007fcae64d66d4 RCX: 0000000000455ab9
RDX: 0000000000000001 RSI: 0000000020000200 RDI: 0000000000000013
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
R13: 00000000004c1145 R14: 00000000004d1818 R15: 0000000000000006
Modules linked in:
Dumping ftrace buffer:
   (ftrace buffer empty)

Fixes: ddff00d ("net: Move skb_has_shared_frag check out of GRE code and into segmentation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@shenki
Copy link
Member Author

shenki commented Jul 30, 2018

Thanks @sammj !

@shenki shenki closed this as completed Jul 30, 2018
shenki pushed a commit that referenced this issue Apr 4, 2019
[ Upstream commit 398f013 ]

Since commit fc62814 ("net/packet: fix 4gb buffer limit due to overflow check")
one can now allocate packet ring buffers >= UINT_MAX. However, syzkaller
found that that triggers a warning:

[   21.100000] WARNING: CPU: 2 PID: 2075 at mm/page_alloc.c:4584 __alloc_pages_nod0
[   21.101490] Modules linked in:
[   21.101921] CPU: 2 PID: 2075 Comm: syz-executor.0 Not tainted 5.0.0 #146
[   21.102784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
[   21.103887] RIP: 0010:__alloc_pages_nodemask+0x2a0/0x630
[   21.104640] Code: fe ff ff 65 48 8b 04 25 c0 de 01 00 48 05 90 0f 00 00 41 bd 01 00 00 00 48 89 44 24 48 e9 9c fe 3
[   21.107121] RSP: 0018:ffff88805e1cf920 EFLAGS: 00010246
[   21.107819] RAX: 0000000000000000 RBX: ffffffff85a488a0 RCX: 0000000000000000
[   21.108753] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000000
[   21.109699] RBP: 1ffff1100bc39f28 R08: ffffed100bcefb67 R09: ffffed100bcefb67
[   21.110646] R10: 0000000000000001 R11: ffffed100bcefb66 R12: 000000000000000d
[   21.111623] R13: 0000000000000000 R14: ffff88805e77d888 R15: 000000000000000d
[   21.112552] FS:  00007f7c7de05700(0000) GS:ffff88806d100000(0000) knlGS:0000000000000000
[   21.113612] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   21.114405] CR2: 000000000065c000 CR3: 000000005e58e006 CR4: 00000000001606e0
[   21.115367] Call Trace:
[   21.115705]  ? __alloc_pages_slowpath+0x21c0/0x21c0
[   21.116362]  alloc_pages_current+0xac/0x1e0
[   21.116923]  kmalloc_order+0x18/0x70
[   21.117393]  kmalloc_order_trace+0x18/0x110
[   21.117949]  packet_set_ring+0x9d5/0x1770
[   21.118524]  ? packet_rcv_spkt+0x440/0x440
[   21.119094]  ? lock_downgrade+0x620/0x620
[   21.119646]  ? __might_fault+0x177/0x1b0
[   21.120177]  packet_setsockopt+0x981/0x2940
[   21.120753]  ? __fget+0x2fb/0x4b0
[   21.121209]  ? packet_release+0xab0/0xab0
[   21.121740]  ? sock_has_perm+0x1cd/0x260
[   21.122297]  ? selinux_secmark_relabel_packet+0xd0/0xd0
[   21.123013]  ? __fget+0x324/0x4b0
[   21.123451]  ? selinux_netlbl_socket_setsockopt+0x101/0x320
[   21.124186]  ? selinux_netlbl_sock_rcv_skb+0x3a0/0x3a0
[   21.124908]  ? __lock_acquire+0x529/0x3200
[   21.125453]  ? selinux_socket_setsockopt+0x5d/0x70
[   21.126075]  ? __sys_setsockopt+0x131/0x210
[   21.126533]  ? packet_release+0xab0/0xab0
[   21.127004]  __sys_setsockopt+0x131/0x210
[   21.127449]  ? kernel_accept+0x2f0/0x2f0
[   21.127911]  ? ret_from_fork+0x8/0x50
[   21.128313]  ? do_raw_spin_lock+0x11b/0x280
[   21.128800]  __x64_sys_setsockopt+0xba/0x150
[   21.129271]  ? lockdep_hardirqs_on+0x37f/0x560
[   21.129769]  do_syscall_64+0x9f/0x450
[   21.130182]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

We should allocate with __GFP_NOWARN to handle this.

Cc: Kal Conley <kal.conley@dectris.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Fixes: fc62814 ("net/packet: fix 4gb buffer limit due to overflow check")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
shenki pushed a commit that referenced this issue Jan 14, 2020
[ Upstream commit 8897c1b ]

syzbot found the following crash:

  BUG: KASAN: use-after-free in perf_trace_lock_acquire+0x401/0x530 include/trace/events/lock.h:13
  Read of size 8 at addr ffff8880a5cf2c50 by task syz-executor.0/26173

  CPU: 0 PID: 26173 Comm: syz-executor.0 Not tainted 5.3.0-rc6 #146
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
     perf_trace_lock_acquire+0x401/0x530 include/trace/events/lock.h:13
     trace_lock_acquire include/trace/events/lock.h:13 [inline]
     lock_acquire+0x2de/0x410 kernel/locking/lockdep.c:4411
     __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
     _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
     spin_lock include/linux/spinlock.h:338 [inline]
     shmem_fault+0x5ec/0x7b0 mm/shmem.c:2034
     __do_fault+0x111/0x540 mm/memory.c:3083
     do_shared_fault mm/memory.c:3535 [inline]
     do_fault mm/memory.c:3613 [inline]
     handle_pte_fault mm/memory.c:3840 [inline]
     __handle_mm_fault+0x2adf/0x3f20 mm/memory.c:3964
     handle_mm_fault+0x1b5/0x6b0 mm/memory.c:4001
     do_user_addr_fault arch/x86/mm/fault.c:1441 [inline]
     __do_page_fault+0x536/0xdd0 arch/x86/mm/fault.c:1506
     do_page_fault+0x38/0x590 arch/x86/mm/fault.c:1530
     page_fault+0x39/0x40 arch/x86/entry/entry_64.S:1202

It happens if the VMA got unmapped under us while we dropped mmap_sem
and inode got freed.

Pinning the file if we drop mmap_sem fixes the issue.

Link: http://lkml.kernel.org/r/20190927083908.rhifa4mmaxefc24r@box
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: syzbot+03ee87124ee05af991bd@syzkaller.appspotmail.com
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
amboar pushed a commit to amboar/linux that referenced this issue Sep 22, 2024
commit 823430c ("memory tier: consolidate the initialization of
memory tiers") introduces a locking change that use guard(mutex) to
instead of mutex_lock/unlock() for memory_tier_lock.  It unexpectedly
expanded the locked region to include the hotplug_memory_notifier(), as a
result, it triggers an locking dependency detected of ABBA deadlock. 
Exclude hotplug_memory_notifier() from the locked region to fixing it.

The deadlock scenario is that when a memory online event occurs, the
execution of memory notifier will access the read lock of the
memory_chain.rwsem, then the reigistration of the memory notifier in
memory_tier_init() acquires the write lock of the memory_chain.rwsem while
holding memory_tier_lock.  Then the memory online event continues to
invoke the memory hotplug callback registered by memory_tier_init(). 
Since this callback tries to acquire the memory_tier_lock, a deadlock
occurs.

In fact, this deadlock can't happen because memory_tier_init() always
executes before memory online events happen due to the subsys_initcall()
has an higher priority than module_init().

[  133.491106] WARNING: possible circular locking dependency detected
[  133.493656] 6.11.0-rc2+ openbmc#146 Tainted: G           O     N
[  133.504290] ------------------------------------------------------
[  133.515194] (udev-worker)/1133 is trying to acquire lock:
[  133.525715] ffffffff87044e28 (memory_tier_lock){+.+.}-{3:3}, at: memtier_hotplug_callback+0x383/0x4b0
[  133.536449]
[  133.536449] but task is already holding lock:
[  133.549847] ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0
[  133.556781]
[  133.556781] which lock already depends on the new lock.
[  133.556781]
[  133.569957]
[  133.569957] the existing dependency chain (in reverse order) is:
[  133.577618]
[  133.577618] -> openbmc#1 ((memory_chain).rwsem){++++}-{3:3}:
[  133.584997]        down_write+0x97/0x210
[  133.588647]        blocking_notifier_chain_register+0x71/0xd0
[  133.592537]        register_memory_notifier+0x26/0x30
[  133.596314]        memory_tier_init+0x187/0x300
[  133.599864]        do_one_initcall+0x117/0x5d0
[  133.603399]        kernel_init_freeable+0xab0/0xeb0
[  133.606986]        kernel_init+0x28/0x2f0
[  133.610312]        ret_from_fork+0x59/0x90
[  133.613652]        ret_from_fork_asm+0x1a/0x30
[  133.617012]
[  133.617012] -> #0 (memory_tier_lock){+.+.}-{3:3}:
[  133.623390]        __lock_acquire+0x2efd/0x5c60
[  133.626730]        lock_acquire+0x1ce/0x580
[  133.629757]        __mutex_lock+0x15c/0x1490
[  133.632731]        mutex_lock_nested+0x1f/0x30
[  133.635717]        memtier_hotplug_callback+0x383/0x4b0
[  133.638748]        notifier_call_chain+0xbf/0x370
[  133.641647]        blocking_notifier_call_chain+0x76/0xb0
[  133.644636]        memory_notify+0x2e/0x40
[  133.647427]        online_pages+0x597/0x720
[  133.650246]        memory_subsys_online+0x4f6/0x7f0
[  133.653107]        device_online+0x141/0x1d0
[  133.655831]        online_memory_block+0x4d/0x60
[  133.658616]        walk_memory_blocks+0xc0/0x120
[  133.661419]        add_memory_resource+0x51d/0x6c0
[  133.664202]        add_memory_driver_managed+0xf5/0x180
[  133.667060]        dev_dax_kmem_probe+0x7f7/0xb40 [kmem]
[  133.669949]        dax_bus_probe+0x147/0x230
[  133.672687]        really_probe+0x27f/0xac0
[  133.675463]        __driver_probe_device+0x1f3/0x460
[  133.678493]        driver_probe_device+0x56/0x1b0
[  133.681366]        __driver_attach+0x277/0x570
[  133.684149]        bus_for_each_dev+0x145/0x1e0
[  133.686937]        driver_attach+0x49/0x60
[  133.689673]        bus_add_driver+0x2f3/0x6b0
[  133.692421]        driver_register+0x170/0x4b0
[  133.695118]        __dax_driver_register+0x141/0x1b0
[  133.697910]        dax_kmem_init+0x54/0xff0 [kmem]
[  133.700794]        do_one_initcall+0x117/0x5d0
[  133.703455]        do_init_module+0x277/0x750
[  133.706054]        load_module+0x5d1d/0x74f0
[  133.708602]        init_module_from_file+0x12c/0x1a0
[  133.711234]        idempotent_init_module+0x3f1/0x690
[  133.713937]        __x64_sys_finit_module+0x10e/0x1a0
[  133.716492]        x64_sys_call+0x184d/0x20d0
[  133.719053]        do_syscall_64+0x6d/0x140
[  133.721537]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  133.724239]
[  133.724239] other info that might help us debug this:
[  133.724239]
[  133.730832]  Possible unsafe locking scenario:
[  133.730832]
[  133.735298]        CPU0                    CPU1
[  133.737759]        ----                    ----
[  133.740165]   rlock((memory_chain).rwsem);
[  133.742623]                                lock(memory_tier_lock);
[  133.745357]                                lock((memory_chain).rwsem);
[  133.748141]   lock(memory_tier_lock);
[  133.750489]
[  133.750489]  *** DEADLOCK ***
[  133.750489]
[  133.756742] 6 locks held by (udev-worker)/1133:
[  133.759179]  #0: ffff888207be6158 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x26c/0x570
[  133.762299]  openbmc#1: ffffffff875b5868 (device_hotplug_lock){+.+.}-{3:3}, at: lock_device_hotplug+0x20/0x30
[  133.765565]  openbmc#2: ffff88820cf6a108 (&dev->mutex){....}-{3:3}, at: device_online+0x2f/0x1d0
[  133.768978]  openbmc#3: ffffffff86d08ff0 (cpu_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x17/0x30
[  133.772312]  openbmc#4: ffffffff8702dfb0 (mem_hotplug_lock){++++}-{0:0}, at: mem_hotplug_begin+0x23/0x30
[  133.775544]  openbmc#5: ffffffff875d3310 ((memory_chain).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x60/0xb0
[  133.779113]
[  133.779113] stack backtrace:
[  133.783728] CPU: 5 UID: 0 PID: 1133 Comm: (udev-worker) Tainted: G           O     N 6.11.0-rc2+ openbmc#146
[  133.787220] Tainted: [O]=OOT_MODULE, [N]=TEST
[  133.789948] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[  133.793291] Call Trace:
[  133.795826]  <TASK>
[  133.798284]  dump_stack_lvl+0xea/0x150
[  133.801025]  dump_stack+0x19/0x20
[  133.803609]  print_circular_bug+0x477/0x740
[  133.806341]  check_noncircular+0x2f4/0x3e0
[  133.809056]  ? __pfx_check_noncircular+0x10/0x10
[  133.811866]  ? __pfx_lockdep_lock+0x10/0x10
[  133.814670]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[  133.817610]  __lock_acquire+0x2efd/0x5c60
[  133.820339]  ? __pfx___lock_acquire+0x10/0x10
[  133.823128]  ? __dax_driver_register+0x141/0x1b0
[  133.825926]  ? do_one_initcall+0x117/0x5d0
[  133.828648]  lock_acquire+0x1ce/0x580
[  133.831349]  ? memtier_hotplug_callback+0x383/0x4b0
[  133.834293]  ? __pfx_lock_acquire+0x10/0x10
[  133.837134]  __mutex_lock+0x15c/0x1490
[  133.839829]  ? memtier_hotplug_callback+0x383/0x4b0
[  133.842753]  ? memtier_hotplug_callback+0x383/0x4b0
[  133.845602]  ? __this_cpu_preempt_check+0x21/0x30
[  133.848438]  ? __pfx___mutex_lock+0x10/0x10
[  133.851200]  ? __pfx_lock_acquire+0x10/0x10
[  133.853935]  ? global_dirty_limits+0xc0/0x160
[  133.856699]  ? __sanitizer_cov_trace_switch+0x58/0xa0
[  133.859564]  mutex_lock_nested+0x1f/0x30
[  133.862251]  ? mutex_lock_nested+0x1f/0x30
[  133.864964]  memtier_hotplug_callback+0x383/0x4b0
[  133.867752]  notifier_call_chain+0xbf/0x370
[  133.870550]  ? writeback_set_ratelimit+0xe8/0x160
[  133.873372]  blocking_notifier_call_chain+0x76/0xb0
[  133.876311]  memory_notify+0x2e/0x40
[  133.879013]  online_pages+0x597/0x720
[  133.881686]  ? irqentry_exit+0x3e/0xa0
[  133.884397]  ? __pfx_online_pages+0x10/0x10
[  133.887244]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[  133.890299]  ? mhp_init_memmap_on_memory+0x7a/0x1c0
[  133.893203]  memory_subsys_online+0x4f6/0x7f0
[  133.896099]  ? __pfx_memory_subsys_online+0x10/0x10
[  133.899039]  ? xa_load+0x16d/0x2e0
[  133.901667]  ? __pfx_xa_load+0x10/0x10
[  133.904366]  ? __pfx_memory_subsys_online+0x10/0x10
[  133.907218]  device_online+0x141/0x1d0
[  133.909845]  online_memory_block+0x4d/0x60
[  133.912494]  walk_memory_blocks+0xc0/0x120
[  133.915104]  ? __pfx_online_memory_block+0x10/0x10
[  133.917776]  add_memory_resource+0x51d/0x6c0
[  133.920404]  ? __pfx_add_memory_resource+0x10/0x10
[  133.923104]  ? _raw_write_unlock+0x31/0x60
[  133.925781]  ? register_memory_resource+0x119/0x180
[  133.928450]  add_memory_driver_managed+0xf5/0x180
[  133.931036]  dev_dax_kmem_probe+0x7f7/0xb40 [kmem]
[  133.933665]  ? __pfx_dev_dax_kmem_probe+0x10/0x10 [kmem]
[  133.936332]  ? __pfx___up_read+0x10/0x10
[  133.938878]  dax_bus_probe+0x147/0x230
[  133.941332]  ? __pfx_dax_bus_probe+0x10/0x10
[  133.943954]  really_probe+0x27f/0xac0
[  133.946387]  ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
[  133.949106]  __driver_probe_device+0x1f3/0x460
[  133.951704]  ? parse_option_str+0x149/0x190
[  133.954241]  driver_probe_device+0x56/0x1b0
[  133.956749]  __driver_attach+0x277/0x570
[  133.959228]  ? __pfx___driver_attach+0x10/0x10
[  133.961776]  bus_for_each_dev+0x145/0x1e0
[  133.964367]  ? __pfx_bus_for_each_dev+0x10/0x10
[  133.967019]  ? __kasan_check_read+0x15/0x20
[  133.969543]  ? _raw_spin_unlock+0x31/0x60
[  133.972132]  driver_attach+0x49/0x60
[  133.974536]  bus_add_driver+0x2f3/0x6b0
[  133.977044]  driver_register+0x170/0x4b0
[  133.979480]  __dax_driver_register+0x141/0x1b0
[  133.982126]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
[  133.984724]  dax_kmem_init+0x54/0xff0 [kmem]
[  133.987284]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
[  133.989965]  do_one_initcall+0x117/0x5d0
[  133.992506]  ? __pfx_do_one_initcall+0x10/0x10
[  133.995185]  ? __kasan_kmalloc+0x88/0xa0
[  133.997748]  ? kasan_poison+0x3e/0x60
[  134.000288]  ? kasan_unpoison+0x2c/0x60
[  134.002762]  ? kasan_poison+0x3e/0x60
[  134.005202]  ? __asan_register_globals+0x62/0x80
[  134.007753]  ? __pfx_dax_kmem_init+0x10/0x10 [kmem]
[  134.010439]  do_init_module+0x277/0x750
[  134.012953]  load_module+0x5d1d/0x74f0
[  134.015406]  ? __pfx_load_module+0x10/0x10
[  134.017887]  ? __pfx_ima_post_read_file+0x10/0x10
[  134.020470]  ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
[  134.023127]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[  134.025767]  ? security_kernel_post_read_file+0xa2/0xd0
[  134.028429]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[  134.031162]  ? kernel_read_file+0x503/0x820
[  134.033645]  ? __pfx_kernel_read_file+0x10/0x10
[  134.036232]  ? __pfx___lock_acquire+0x10/0x10
[  134.038766]  init_module_from_file+0x12c/0x1a0
[  134.041291]  ? init_module_from_file+0x12c/0x1a0
[  134.043936]  ? __pfx_init_module_from_file+0x10/0x10
[  134.046516]  ? __this_cpu_preempt_check+0x21/0x30
[  134.049091]  ? __kasan_check_read+0x15/0x20
[  134.051551]  ? do_raw_spin_unlock+0x60/0x210
[  134.054077]  idempotent_init_module+0x3f1/0x690
[  134.056643]  ? __pfx_idempotent_init_module+0x10/0x10
[  134.059318]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[  134.061995]  ? __fget_light+0x17d/0x210
[  134.064428]  __x64_sys_finit_module+0x10e/0x1a0
[  134.066976]  x64_sys_call+0x184d/0x20d0
[  134.069405]  do_syscall_64+0x6d/0x140
[  134.071926]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

[yanfei.xu@intel.com: add mutex_lock/unlock() pair back]
  Link: https://lkml.kernel.org/r/20240830102447.1445296-1-yanfei.xu@intel.com
Link: https://lkml.kernel.org/r/20240827113614.1343049-1-yanfei.xu@intel.com
Fixes: 823430c ("memory tier: consolidate the initialization of memory tiers")
Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Ho-Ren (Jack) Chuang <horen.chuang@linux.dev>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants