Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth userspace api broken in 3.0.8 #10

Open
jcowgill opened this issue Oct 6, 2014 · 1 comment
Open

Bluetooth userspace api broken in 3.0.8 #10

jcowgill opened this issue Oct 6, 2014 · 1 comment

Comments

@jcowgill
Copy link

jcowgill commented Oct 6, 2014

Commit c172849 broke the userspace bluetooth API by increasing the size of the hci_conn_info struct. This causes buffer overflows in userspace programs (notably blueman crashes when doing almost anything).

The patch was submitted upstream but was NAKed for this reason here:
https://lkml.org/lkml/2014/6/6/147

@jcowgill
Copy link
Author

jcowgill commented Oct 6, 2014

Updated - accidentally submitted the issue a bit early.

chrisdearman pushed a commit that referenced this issue Mar 22, 2017
Only adb, mtp and ptp configurations have been functional
when selected. This patch introduces proper behaviour
when mtp,adb or ptp,adb configurations are selected.

During the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ #10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ #45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
pcercuei referenced this issue in OpenDingux/linux May 6, 2017
If a given cpu is not in cpu_present and cpu hotplug
is disabled, arch can skip setting up the cpu_dev.

Arch cpuidle driver should pass correct cpu mask
for registration, but failing to do so by the driver
causes error to propagate and crash like this:

[   30.076045] Unable to handle kernel paging request for data at address 0x00000048
[   30.076100] Faulting instruction address: 0xc0000000007b2f30
cpu 0x4d: Vector: 300 (Data Access) at [c000003feb18b670]
    pc: c0000000007b2f30: kobject_get+0x20/0x70
    lr: c0000000007b3c94: kobject_add_internal+0x54/0x3f0
    sp: c000003feb18b8f0
   msr: 9000000000009033
   dar: 48
 dsisr: 40000000
  current = 0xc000003fd2ed8300
  paca    = 0xc00000000fbab500   softe: 0        irq_happened: 0x01
    pid   = 1, comm = swapper/0
Linux version 4.11.0-rc2-svaidy+ (sv@sagarika) (gcc version 6.2.0
20161005 (Ubuntu 6.2.0-5ubuntu12) ) #10 SMP Sun Mar 19 00:08:09 IST 2017
enter ? for help
[c000003feb18b960] c0000000007b3c94 kobject_add_internal+0x54/0x3f0
[c000003feb18b9f0] c0000000007b43a4 kobject_init_and_add+0x64/0xa0
[c000003feb18ba70] c000000000e284f4 cpuidle_add_sysfs+0xb4/0x130
[c000003feb18baf0] c000000000e26038 cpuidle_register_device+0x118/0x1c0
[c000003feb18bb30] c000000000e26c48 cpuidle_register+0x78/0x120
[c000003feb18bbc0] c00000000168fd9c powernv_processor_idle_init+0x110/0x1c4
[c000003feb18bc40] c00000000000cff8 do_one_initcall+0x68/0x1d0
[c000003feb18bd00] c0000000016242f4 kernel_init_freeable+0x280/0x360
[c000003feb18bdc0] c00000000000d864 kernel_init+0x24/0x160
[c000003feb18be30] c00000000000b4e8 ret_from_kernel_thread+0x5c/0x74

Validating cpu_dev fixes the crash and reports correct error message like:

[   30.163506] Failed to register cpuidle device for cpu136
[   30.173329] Registration of powernv driver failed.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
[ rjw: Comment massage ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
pcercuei referenced this issue in OpenDingux/linux May 6, 2017
mipsxx_pmu_handle_shared_irq() calls irq_work_run() while holding the
pmuint_rwlock for read.  irq_work_run() can, via perf_pending_event(),
call try_to_wake_up() which can try to take rq->lock.

However, perf can also call perf_pmu_enable() (and thus take the
pmuint_rwlock for write) while holding the rq->lock, from
finish_task_switch() via perf_event_context_sched_in().

This leads to an ABBA deadlock:

 PID: 3855   TASK: 8f7ce288  CPU: 2   COMMAND: "process"
  #0 [89c39ac8] __delay at 803b5be4
  #1 [89c39ac8] do_raw_spin_lock at 8008fdcc
  #2 [89c39af8] try_to_wake_up at 8006e47c
  #3 [89c39b38] pollwake at 8018eab0
  #4 [89c39b68] __wake_up_common at 800879f4
  #5 [89c39b98] __wake_up at 800880e4
  #6 [89c39bc8] perf_event_wakeup at 8012109c
  #7 [89c39be8] perf_pending_event at 80121184
  #8 [89c39c08] irq_work_run_list at 801151f0
  #9 [89c39c38] irq_work_run at 80115274
 #10 [89c39c50] mipsxx_pmu_handle_shared_irq at 8002cc7c

 PID: 1481   TASK: 8eaac6a8  CPU: 3   COMMAND: "process"
  #0 [8de7f900] do_raw_write_lock at 800900e0
  #1 [8de7f918] perf_event_context_sched_in at 80122310
  #2 [8de7f938] __perf_event_task_sched_in at 80122608
  #3 [8de7f958] finish_task_switch at 8006b8a4
  #4 [8de7f998] __schedule at 805e4dc4
  #5 [8de7f9f8] schedule at 805e5558
  #6 [8de7fa10] schedule_hrtimeout_range_clock at 805e9984
  #7 [8de7fa70] poll_schedule_timeout at 8018e8f8
  #8 [8de7fa88] do_select at 8018f338
  #9 [8de7fd88] core_sys_select at 8018f5cc
 #10 [8de7fee0] sys_select at 8018f854
 #11 [8de7ff28] syscall_common at 80028fc8

The lock seems to be there to protect the hardware counters so there is
no need to hold it across irq_work_run().

Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
miodragdinic pushed a commit to miodragdinic/CI20_linux that referenced this issue May 24, 2017
Before this patch, using multiple active endpoints would not
be possible and would actually be canceling each other out.

The issue was discovered on Android when combining adb, mtp and ptp
configurations together. This patch introduces proper behaviour for
these cases.

Also, during the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ MIPS#10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ MIPS#45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
pcercuei referenced this issue in OpenDingux/linux May 16, 2018
syzbot caught an infinite recursion in nsh_gso_segment().

Problem here is that we need to make sure the NSH header is of
reasonable length.

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by syz-executor0/10189:
 #0:         (ptrval) (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x30f/0x34c0 net/core/dev.c:3517
 #1:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #1:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #2:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #2:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #3:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #3:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #4:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #4:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #5:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #5:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #6:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #6:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #7:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #7:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #8:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #8:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #9:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #9:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #10:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #10:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #11:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #11:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #12:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #12:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #13:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #13:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #14:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #14:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #15:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #15:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#16:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#16:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#17:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#17:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#18:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#18:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#19:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#19:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#20:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#20:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#21:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#21:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#22:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#22:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#23:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#23:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#24:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#24:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#25:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#25:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#26:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#26:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#27:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#27:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#28:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#28:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#29:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#29:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#30:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#30:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#31:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#31:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
dccp_close: ABORT with 65423 bytes unread
 MIPS#32:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#32:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#33:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#33:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#34:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#34:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#35:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#35:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#36:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#36:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#37:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#37:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#38:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#38:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#39:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#39:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#40:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#40:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#41:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#41:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#42:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#42:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#43:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#43:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#44:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#44:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#45:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#45:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#46:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#46:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 MIPS#47:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 MIPS#47:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
INFO: lockdep is turned off.
CPU: 1 PID: 10189 Comm: syz-executor0 Not tainted 4.17.0-rc2+ MIPS#26
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 __lock_acquire+0x1788/0x5140 kernel/locking/lockdep.c:3449
 lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
 rcu_lock_acquire include/linux/rcupdate.h:246 [inline]
 rcu_read_lock include/linux/rcupdate.h:632 [inline]
 skb_mac_gso_segment+0x25b/0x720 net/core/dev.c:2789
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865
 skb_gso_segment include/linux/netdevice.h:4025 [inline]
 validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3118
 validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3168
 sch_direct_xmit+0x354/0x11e0 net/sched/sch_generic.c:312
 qdisc_restart net/sched/sch_generic.c:399 [inline]
 __qdisc_run+0x741/0x1af0 net/sched/sch_generic.c:410
 __dev_xmit_skb net/core/dev.c:3243 [inline]
 __dev_queue_xmit+0x28ea/0x34c0 net/core/dev.c:3551
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3616
 packet_snd net/packet/af_packet.c:2951 [inline]
 packet_sendmsg+0x40f8/0x6070 net/packet/af_packet.c:2976
 sock_sendmsg_nosec net/socket.c:629 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:639
 __sys_sendto+0x3d7/0x670 net/socket.c:1789
 __do_sys_sendto net/socket.c:1801 [inline]
 __se_sys_sendto net/socket.c:1797 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: c411ed8 ("nsh: add GSO support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Benc <jbenc@redhat.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nemunaire pushed a commit to nemunaire/CI20_linux that referenced this issue Jun 6, 2018
[ Upstream commit ad0a45f ]

If a given cpu is not in cpu_present and cpu hotplug
is disabled, arch can skip setting up the cpu_dev.

Arch cpuidle driver should pass correct cpu mask
for registration, but failing to do so by the driver
causes error to propagate and crash like this:

[   30.076045] Unable to handle kernel paging request for data at address 0x00000048
[   30.076100] Faulting instruction address: 0xc0000000007b2f30
cpu 0x4d: Vector: 300 (Data Access) at [c000003feb18b670]
    pc: c0000000007b2f30: kobject_get+0x20/0x70
    lr: c0000000007b3c94: kobject_add_internal+0x54/0x3f0
    sp: c000003feb18b8f0
   msr: 9000000000009033
   dar: 48
 dsisr: 40000000
  current = 0xc000003fd2ed8300
  paca    = 0xc00000000fbab500   softe: 0        irq_happened: 0x01
    pid   = 1, comm = swapper/0
Linux version 4.11.0-rc2-svaidy+ (sv@sagarika) (gcc version 6.2.0
20161005 (Ubuntu 6.2.0-5ubuntu12) ) MIPS#10 SMP Sun Mar 19 00:08:09 IST 2017
enter ? for help
[c000003feb18b960] c0000000007b3c94 kobject_add_internal+0x54/0x3f0
[c000003feb18b9f0] c0000000007b43a4 kobject_init_and_add+0x64/0xa0
[c000003feb18ba70] c000000000e284f4 cpuidle_add_sysfs+0xb4/0x130
[c000003feb18baf0] c000000000e26038 cpuidle_register_device+0x118/0x1c0
[c000003feb18bb30] c000000000e26c48 cpuidle_register+0x78/0x120
[c000003feb18bbc0] c00000000168fd9c powernv_processor_idle_init+0x110/0x1c4
[c000003feb18bc40] c00000000000cff8 do_one_initcall+0x68/0x1d0
[c000003feb18bd00] c0000000016242f4 kernel_init_freeable+0x280/0x360
[c000003feb18bdc0] c00000000000d864 kernel_init+0x24/0x160
[c000003feb18be30] c00000000000b4e8 ret_from_kernel_thread+0x5c/0x74

Validating cpu_dev fixes the crash and reports correct error message like:

[   30.163506] Failed to register cpuidle device for cpu136
[   30.173329] Registration of powernv driver failed.

Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
[ rjw: Comment massage ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nemunaire pushed a commit to nemunaire/CI20_linux that referenced this issue Jun 6, 2018
[ Upstream commit 2bbea6e ]

when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 MIPS#1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 MIPS#2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 MIPS#3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 MIPS#4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 MIPS#5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 MIPS#6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 MIPS#7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 MIPS#8 [ffff880078447da8] mount_bdev at ffffffff81202570
 MIPS#9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
MIPS#10 [ffff880078447e28] mount_fs at ffffffff81202d09
MIPS#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
MIPS#12 [ffff880078447ea8] do_mount at ffffffff81220fee
MIPS#13 [ffff880078447f28] sys_mount at ffffffff812218d6
MIPS#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 MIPS#1 [ffff880078417900] schedule at ffffffff8168dc59
 MIPS#2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 MIPS#3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 MIPS#4 [ffff8800784179d0] down_read at ffffffff8168cde0
 MIPS#5 [ffff8800784179e8] get_super at ffffffff81201cc7
 MIPS#6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 MIPS#7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 MIPS#8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 MIPS#9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
MIPS#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
MIPS#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
MIPS#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
MIPS#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
MIPS#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
MIPS#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
MIPS#16 [ffff880078417d00] do_last at ffffffff8120d53d
MIPS#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
MIPS#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
MIPS#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
MIPS#20 [ffff880078417f70] sys_open at ffffffff811fde4e
MIPS#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pcercuei referenced this issue in OpenDingux/linux Jul 24, 2018
Crash dump shows following instructions

crash> bt
PID: 0      TASK: ffffffffbe412480  CPU: 0   COMMAND: "swapper/0"
 #0 [ffff891ee0003868] machine_kexec at ffffffffbd063ef1
 #1 [ffff891ee00038c8] __crash_kexec at ffffffffbd12b6f2
 #2 [ffff891ee0003998] crash_kexec at ffffffffbd12c84c
 #3 [ffff891ee00039b8] oops_end at ffffffffbd030f0a
 #4 [ffff891ee00039e0] no_context at ffffffffbd074643
 #5 [ffff891ee0003a40] __bad_area_nosemaphore at ffffffffbd07496e
 #6 [ffff891ee0003a90] bad_area_nosemaphore at ffffffffbd074a64
 #7 [ffff891ee0003aa0] __do_page_fault at ffffffffbd074b0a
 #8 [ffff891ee0003b18] do_page_fault at ffffffffbd074fc8
 #9 [ffff891ee0003b50] page_fault at ffffffffbda01925
    [exception RIP: qlt_schedule_sess_for_deletion+15]
    RIP: ffffffffc02e526f  RSP: ffff891ee0003c08  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: ffffffffc0307847
    RDX: 00000000000020e6  RSI: ffff891edbc377c8  RDI: 0000000000000000
    RBP: ffff891ee0003c18   R8: ffffffffc02f0b20   R9: 0000000000000250
    R10: 0000000000000258  R11: 000000000000b780  R12: ffff891ed9b43000
    R13: 00000000000000f0  R14: 0000000000000006  R15: ffff891edbc377c8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ffff891ee0003c20] qla2x00_fcport_event_handler at ffffffffc02853d3 [qla2xxx]
 #11 [ffff891ee0003cf0] __dta_qla24xx_async_gnl_sp_done_333 at ffffffffc0285a1d [qla2xxx]
 #12 [ffff891ee0003de8] qla24xx_process_response_queue at ffffffffc02a2eb5 [qla2xxx]
 #13 [ffff891ee0003e88] qla24xx_msix_rsp_q at ffffffffc02a5403 [qla2xxx]
 #14 [ffff891ee0003ec0] __handle_irq_event_percpu at ffffffffbd0f4c59
 #15 [ffff891ee0003f10] handle_irq_event_percpu at ffffffffbd0f4e02
 MIPS#16 [ffff891ee0003f40] handle_irq_event at ffffffffbd0f4e90
 MIPS#17 [ffff891ee0003f68] handle_edge_irq at ffffffffbd0f8984
 MIPS#18 [ffff891ee0003f88] handle_irq at ffffffffbd0305d5
 MIPS#19 [ffff891ee0003fb8] do_IRQ at ffffffffbda02a18
 --- <IRQ stack> ---
 MIPS#20 [ffffffffbe403d30] ret_from_intr at ffffffffbda0094e
    [exception RIP: unknown or invalid address]
    RIP: 000000000000001f  RSP: 0000000000000000  RFLAGS: fff3b8c2091ebb3f
    RAX: ffffbba5a0000200  RBX: 0000be8cdfa8f9fa  RCX: 0000000000000018
    RDX: 0000000000000101  RSI: 000000000000015d  RDI: 0000000000000193
    RBP: 0000000000000083   R8: ffffffffbe403e38   R9: 0000000000000002
    R10: 0000000000000000  R11: ffffffffbe56b820  R12: ffff891ee001cf00
    R13: ffffffffbd11c0a4  R14: ffffffffbe403d60  R15: 0000000000000001
    ORIG_RAX: ffff891ee0022ac0  CS: 0000  SS: ffffffffffffffb9
 bt: WARNING: possibly bogus exception frame
 MIPS#21 [ffffffffbe403dd8] cpuidle_enter_state at ffffffffbd67c6fd
 MIPS#22 [ffffffffbe403e40] cpuidle_enter at ffffffffbd67c907
 MIPS#23 [ffffffffbe403e50] call_cpuidle at ffffffffbd0d98f3
 MIPS#24 [ffffffffbe403e60] do_idle at ffffffffbd0d9b42
 MIPS#25 [ffffffffbe403e98] cpu_startup_entry at ffffffffbd0d9da3
 MIPS#26 [ffffffffbe403ec0] rest_init at ffffffffbd81d4aa
 MIPS#27 [ffffffffbe403ed0] start_kernel at ffffffffbe67d2ca
 MIPS#28 [ffffffffbe403f28] x86_64_start_reservations at ffffffffbe67c675
 MIPS#29 [ffffffffbe403f38] x86_64_start_kernel at ffffffffbe67c6eb
 MIPS#30 [ffffffffbe403f50] secondary_startup_64 at ffffffffbd0000d5

Fixes: 040036b ("scsi: qla2xxx: Delay loop id allocation at login")
Cc: <stable@vger.kernel.org> # v4.17+
Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
pcercuei referenced this issue in OpenDingux/linux Aug 4, 2018
In the code path where only rcu read lock is held, e.g. in the route
lookup code path, it is not safe to directly call fib6_info_hold()
because the fib6_info may already have been deleted but still exists
in the rcu grace period. Holding reference to it could cause double
free and crash the kernel.

This patch adds a new function fib6_info_hold_safe() and replace
fib6_info_hold() in all necessary places.

Syzbot reported 3 crash traces because of this. One of them is:
8021q: adding VLAN 0 to HW filter on device team0
IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready
dst_release: dst:(____ptrval____) refcnt:-1
dst_release: dst:(____ptrval____) refcnt:-2
WARNING: CPU: 1 PID: 4845 at include/net/dst.h:239 dst_hold include/net/dst.h:239 [inline]
WARNING: CPU: 1 PID: 4845 at include/net/dst.h:239 ip6_setup_cork+0xd66/0x1830 net/ipv6/ip6_output.c:1204
dst_release: dst:(____ptrval____) refcnt:-1
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 4845 Comm: syz-executor493 Not tainted 4.18.0-rc3+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 panic+0x238/0x4e7 kernel/panic.c:184
dst_release: dst:(____ptrval____) refcnt:-2
dst_release: dst:(____ptrval____) refcnt:-3
 __warn.cold.8+0x163/0x1ba kernel/panic.c:536
dst_release: dst:(____ptrval____) refcnt:-4
 report_bug+0x252/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
dst_release: dst:(____ptrval____) refcnt:-5
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
RIP: 0010:dst_hold include/net/dst.h:239 [inline]
RIP: 0010:ip6_setup_cork+0xd66/0x1830 net/ipv6/ip6_output.c:1204
Code: c1 ed 03 89 9d 18 ff ff ff 48 b8 00 00 00 00 00 fc ff df 41 c6 44 05 00 f8 e9 2d 01 00 00 4c 8b a5 c8 fe ff ff e8 1a f6 e6 fa <0f> 0b e9 6a fc ff ff e8 0e f6 e6 fa 48 8b 85 d0 fe ff ff 48 8d 78
RSP: 0018:ffff8801a8fcf178 EFLAGS: 00010293
RAX: ffff8801a8eba5c0 RBX: 0000000000000000 RCX: ffffffff869511e6
RDX: 0000000000000000 RSI: ffffffff869515b6 RDI: 0000000000000005
RBP: ffff8801a8fcf2c8 R08: ffff8801a8eba5c0 R09: ffffed0035ac8338
R10: ffffed0035ac8338 R11: ffff8801ad6419c3 R12: ffff8801a8fcf720
R13: ffff8801a8fcf6a0 R14: ffff8801ad6419c0 R15: ffff8801ad641980
 ip6_make_skb+0x2c8/0x600 net/ipv6/ip6_output.c:1768
 udpv6_sendmsg+0x2c90/0x35f0 net/ipv6/udp.c:1376
 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:641 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:651
 ___sys_sendmsg+0x51d/0x930 net/socket.c:2125
 __sys_sendmmsg+0x240/0x6f0 net/socket.c:2220
 __do_sys_sendmmsg net/socket.c:2249 [inline]
 __se_sys_sendmmsg net/socket.c:2246 [inline]
 __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2246
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446ba9
Code: e8 cc bb 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fb39a469da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00000000006dcc54 RCX: 0000000000446ba9
RDX: 00000000000000b8 RSI: 0000000020001b00 RDI: 0000000000000003
RBP: 00000000006dcc50 R08: 00007fb39a46a700 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 45c828efc7a64843
R13: e6eeb815b9d8a477 R14: 5068caf6f713c6fc R15: 0000000000000001
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..

Fixes: 93531c6 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: syzbot+902e2a1bcd4f7808cef5@syzkaller.appspotmail.com
Reported-by: syzbot+8ae62d67f647abeeceb9@syzkaller.appspotmail.com
Reported-by: syzbot+3f08feb14086930677d0@syzkaller.appspotmail.com
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nemunaire pushed a commit to nemunaire/CI20_linux that referenced this issue Aug 17, 2018
Before this patch, using multiple active endpoints would not
be possible and would actually be canceling each other out.

The issue was discovered on Android when combining adb, mtp and ptp
configurations together. This patch introduces proper behaviour for
these cases.

Also, during the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ MIPS#10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ MIPS#45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Sep 11, 2018
Before this patch, using multiple active endpoints would not
be possible and would actually be canceling each other out.

The issue was discovered on Android when combining adb, mtp and ptp
configurations together. This patch introduces proper behaviour for
these cases.

Also, during the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ MIPS#10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ MIPS#45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Sep 26, 2018
Before this patch, using multiple active endpoints would not
be possible and would actually be canceling each other out.

The issue was discovered on Android when combining adb, mtp and ptp
configurations together. This patch introduces proper behaviour for
these cases.

Also, during the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ MIPS#10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ MIPS#45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Oct 14, 2018
Before this patch, using multiple active endpoints would not
be possible and would actually be canceling each other out.

The issue was discovered on Android when combining adb, mtp and ptp
configurations together. This patch introduces proper behaviour for
these cases.

Also, during the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ MIPS#10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ MIPS#45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Nov 23, 2018
[ Upstream commit 4fde620 ]

At f_audio_free_inst, it tries to access struct gaudio *card which is
freed at f_audio_free, it causes below oops if the audio device is not
there (do unload module may trigger the same problem). The gaudio_cleanup
is related to function, so it is better move to f_audio_free.

root@freescale ~$ modprobe g_audio
[  751.968931] g_audio gadget: unable to open sound control device file: /dev/snd/controlC0
[  751.977134] g_audio gadget: we need at least one control device
[  751.988633] Unable to handle kernel paging request at virtual address 455f448e
[  751.995963] pgd = bd42c00
[  751.998681] [455f448e] *pgd=00000000
[  752.002383] Internal error: Oops: 5 [MIPS#1] SMP ARM
[  752.007008] Modules linked in: usb_f_uac1 g_audio(+) usb_f_mass_storage libcomposite configfs [last unloaded: g_mass_storage]
[  752.018427] CPU: 0 PID: 692 Comm: modprobe Not tainted 3.18.0-rc4-00345-g842f57b MIPS#10
[  752.026176] task: bdb3ba80 ti: bd41a000 task.ti: bd41a000
[  752.031590] PC is at filp_close+0xc/0x84
[  752.035530] LR is at gaudio_cleanup+0x28/0x54 [usb_f_uac1]
[  752.041023] pc : [<800ec94c>]    lr : [<7f03c63c>]    psr: 20000013
[  752.041023] sp : bd41bcc8  ip : bd41bce8  fp : bd41bce4
[  752.052504] r10: 7f036234  r9 : 7f036220  r8 : 7f036500
[  752.057732] r7 : bd456480  r6 : 7f036500  r5 : 7f03626c  r4 : bd441000
[  752.064264] r3 : 7f03b3dc  r2 : 7f03cab0  r1 : 00000000  r0 : 455f4456
[  752.070798] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  752.077938] Control: 10c5387d  Table: bd42c04a  DAC: 00000015
[  752.083688] Process modprobe (pid: 692, stack limit = 0xbd41a240)
[  752.089786] Stack: (0xbd41bcc8 to 0xbd41c000)
[  752.094152] bcc0:                   7f03b3dc bd441000 7f03626c 7f036500 bd41bcfc bd41bce8
[  752.102337] bce0: 7f03c63c 800ec94c 7f03b3dc bdaa6b00 bd41bd14 bd41bd00 7f03b3f4 7f03c620
[  752.110521] bd00: 7f03b3dc 7f03cbd4 bd41bd2c bd41bd18 7f00f88c 7f03b3e8 00000000 fffffffe
[  752.118705] bd20: bd41bd5c bd41bd30 7f0380d8 7f00f874 7f038000 bd456480 7f036364 be392240
[  752.126889] bd40: 00000000 7f00f620 7f00f638 bd41a008 bd41bd94 bd41bd60 7f00f6d4 7f03800c
[  752.135073] bd60: 00000001 00000000 8047438c be3a4000 7f036364 7f036364 7f00db28 7f00f620
[  752.143257] bd80: 7f00f638 bd41a008 bd41bdb4 bd41bd98 804742ac 7f00f644 00000000 809adde0
[  752.151442] bda0: 7f036364 7f036364 bd41bdcc bd41bdb8 804743c8 80474284 7f03633c 7f036200
[  752.159626] bdc0: bd41bdf4 bd41bdd0 7f00d5b4 8047435c bd41a000 80974060 7f038158 00000000
[  752.167811] bde0: 80974060 bdaa9940 bd41be04 bd41bdf8 7f03816c 7f00d518 bd41be8c bd41be08
[  752.175995] be00: 80008a5c 7f038164 be001f00 7f0363c4 bd41bf48 00000000 bd41be54 bd41be28
[  752.184179] be20: 800e9498 800e8e74 00000002 00000003 bd4129c0 c0a07000 00000001 7f0363c4
[  752.192363] be40: bd41bf48 00000000 bd41be74 bd41be58 800de780 800e9320 bd41a000 7f0363d0
[  752.200547] be60: 00000000 bd41a000 7f0363d0 00000000 bd41beec 7f0363c4 bd41bf48 00000000
[  752.208731] be80: bd41bf44 bd41be90 80093e5 800089e0 ffff8000 00007fff 80091390 0000065f
[  752.216915] bea0: 00000000 c0a0834c bd41bf7c 00000086 bd41bf50 00000000 7f03651c 00000086
[  752.225099] bec0: bd41a010 00c28758 800ddcc4 800ddae0 000000d2 bd412a00 bd41bf24 00000000
[  752.233283] bee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  752.241467] bf00: 00000000 00000000 00000000 00000000 00000000 00000000 bd41bf44 000025b0
[  752.249651] bf20: 00c28a08 00c28758 00000080 8000edc4 bd41a000 00000000 bd41bfa4 bd41bf48
[  752.257835] bf40: 800943e4 800932ec c0a07000 000025b0 c0a07f8c c0a07ea4 c0a08e5c 0000051c
[  752.266019] bf60: 0000088c 00000000 00000000 00000000 00000018 00000019 00000010 0000000b
[  752.274203] bf80: 00000009 00000000 00000000 000025b0 00000000 00c28758 00000000 bd41bfa8
[  752.282387] bfa0: 8000ec00 8009430c 000025b0 00000000 00c28a08 000025b0 00c28758 00c28980
[  752.290571] bfc0: 000025b0 00000000 00c28758 00000080 000a6a78 00000007 00c28718 00c28980
[  752.298756] bfe0: 7ebc1af0 7ebc1ae0 0001a32c 76e9c490 60000010 00c28a08 22013510 ecebffff
[  752.306933] Backtrace:
[  752.309414] [<800ec940>] (filp_close) from [<7f03c63c>] (gaudio_cleanup+0x28/0x54 [usb_f_uac1])
[  752.318115]  r6:7f036500 r5:7f03626c r4:bd441000 r3:7f03b3dc
[  752.323851] [<7f03c614>] (gaudio_cleanup [usb_f_uac1]) from [<7f03b3f4>] (f_audio_free_inst+0x18/0x68 [usb_f_uac1])
[  752.334288]  r4:bdaa6b00 r3:7f03b3dc
[  752.337931] [<7f03b3dc>] (f_audio_free_inst [usb_f_uac1]) from [<7f00f88c>] (usb_put_function_instance+0x24/0x30 [libcomposite])
[  752.349498]  r4:7f03cbd4 r3:7f03b3dc
[  752.353127] [<7f00f868>] (usb_put_function_instance [libcomposite]) from [<7f0380d8>] (audio_bind+0xd8/0xfc [g_audio])
[  752.363824]  r4:fffffffe r3:00000000
[  752.367456] [<7f038000>] (audio_bind [g_audio]) from [<7f00f6d4>] (composite_bind+0x9c/0x1e8 [libcomposite])
[  752.377284]  r10:bd41a008 r9:7f00f638 r8:7f00f620 r7:00000000 r6:be392240 r5:7f036364
[  752.385193]  r4:bd456480 r3:7f038000
[  752.388825] [<7f00f638>] (composite_bind [libcomposite]) from [<804742ac>] (udc_bind_to_driver+0x34/0xd8)
[  752.398394]  r10:bd41a008 r9:7f00f638 r8:7f00f620 r7:7f00db28 r6:7f036364 r5:7f036364
[  752.406302]  r4:be3a4000
[  752.408860] [<80474278>] (udc_bind_to_driver) from [<804743c8>] (usb_gadget_probe_driver+0x78/0xa8)
[  752.417908]  r6:7f036364 r5:7f036364 r4:809adde0 r3:00000000
[  752.423649] [<80474350>] (usb_gadget_probe_driver) from [<7f00d5b4>] (usb_composite_probe+0xa8/0xd4 [libcomposite])
[  752.434086]  r5:7f036200 r4:7f03633c
[  752.437713] [<7f00d50c>] (usb_composite_probe [libcomposite]) from [<7f03816c>] (audio_driver_init+0x14/0x1c [g_audio])
[  752.448498]  r9:bdaa9940 r8:80974060 r7:00000000 r6:7f038158 r5:80974060 r4:bd41a000
[  752.456330] [<7f038158>] (audio_driver_init [g_audio]) from [<80008a5c>] (do_one_initcall+0x88/0x1d4)
[  752.465564] [<800089d4>] (do_one_initcall) from [<80093e54>] (load_module+0xb74/0x1020)
[  752.473571]  r10:00000000 r9:bd41bf48 r8:7f0363c4 r7:bd41beec r6:00000000 r5:7f0363d0
[  752.481478]  r4:bd41a000
[  752.484037] [<800932e0>] (load_module) from [<800943e4>] (SyS_init_module+0xe4/0xf8)
[  752.491781]  r10:00000000 r9:bd41a000 r8:8000edc4 r7:00000080 r6:00c28758 r5:00c28a08
[  752.499689]  r4:000025b0
[  752.502252] [<80094300>] (SyS_init_module) from [<8000ec00>] (ret_fast_syscall+0x0/0x48)
[  752.510345]  r6:00c28758 r5:00000000 r4:000025b0
[  752.515013] Code: 808475b4 e1a0c00d e92dd878 e24cb004 (e5904038)
[  752.521223] ---[ end trace 70babe34de4ab99b ]---
Segmentation fault

Signed-off-by: Peter Chen <peter.chen@freescale.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Nov 23, 2018
Before this patch, using multiple active endpoints would not
be possible and would actually be canceling each other out.

The issue was discovered on Android when combining adb, mtp and ptp
configurations together. This patch introduces proper behaviour for
these cases.

Also, during the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ MIPS#10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ MIPS#45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Nov 28, 2018
Before this patch, using multiple active endpoints would not
be possible and would actually be canceling each other out.

The issue was discovered on Android when combining adb, mtp and ptp
configurations together. This patch introduces proper behaviour for
these cases.

Also, during the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ MIPS#10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ MIPS#45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Dec 11, 2018
Before this patch, using multiple active endpoints would not
be possible and would actually be canceling each other out.

The issue was discovered on Android when combining adb, mtp and ptp
configurations together. This patch introduces proper behaviour for
these cases.

Also, during the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ MIPS#10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ MIPS#45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Jan 1, 2019
[ Upstream commit c5a94f4 ]

It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().

At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.

When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared.  This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared

There is some uncertainty in this analysis, but it seems to be fit the
observations.  Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.

Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.

The backtrace for the blocked process looked like:

PID: 29360  TASK: ffff881ff2ac0f80  CPU: 3   COMMAND: "zsh"
 #0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
 MIPS#1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
 MIPS#2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
 MIPS#3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
 MIPS#4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
 MIPS#5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
 MIPS#6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
 MIPS#7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
 MIPS#8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
 MIPS#9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
MIPS#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
MIPS#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
MIPS#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
MIPS#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Jan 1, 2019
Before this patch, using multiple active endpoints would not
be possible and would actually be canceling each other out.

The issue was discovered on Android when combining adb, mtp and ptp
configurations together. This patch introduces proper behaviour for
these cases.

Also, during the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ MIPS#10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ MIPS#45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Jan 13, 2019
Before this patch, using multiple active endpoints would not
be possible and would actually be canceling each other out.

The issue was discovered on Android when combining adb, mtp and ptp
configurations together. This patch introduces proper behaviour for
these cases.

Also, during the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ MIPS#10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ MIPS#45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Feb 1, 2019
Before this patch, using multiple active endpoints would not
be possible and would actually be canceling each other out.

The issue was discovered on Android when combining adb, mtp and ptp
configurations together. This patch introduces proper behaviour for
these cases.

Also, during the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ MIPS#10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ MIPS#45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
pcercuei referenced this issue in OpenDingux/linux Feb 3, 2019
syzbot found that ax25 routes where not properly protected
against concurrent use [1].

In this particular report the bug happened while
copying ax25->digipeat.

Fix this problem by making sure we call ax25_get_route()
while ax25_route_lock is held, so that no modification
could happen while using the route.

The current two ax25_get_route() callers do not sleep,
so this change should be fine.

Once we do that, ax25_get_route() no longer needs to
grab a reference on the found route.

[1]
ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
BUG: KASAN: use-after-free in memcpy include/linux/string.h:352 [inline]
BUG: KASAN: use-after-free in kmemdup+0x42/0x60 mm/util.c:113
Read of size 66 at addr ffff888066641a80 by task syz-executor2/531

ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
CPU: 1 PID: 531 Comm: syz-executor2 Not tainted 5.0.0-rc2+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 check_memory_region_inline mm/kasan/generic.c:185 [inline]
 check_memory_region+0x123/0x190 mm/kasan/generic.c:191
 memcpy+0x24/0x50 mm/kasan/common.c:130
 memcpy include/linux/string.h:352 [inline]
 kmemdup+0x42/0x60 mm/util.c:113
 kmemdup include/linux/string.h:425 [inline]
 ax25_rt_autobind+0x25d/0x750 net/ax25/ax25_route.c:424
 ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1224
 __sys_connect+0x357/0x490 net/socket.c:1664
 __do_sys_connect net/socket.c:1675 [inline]
 __se_sys_connect net/socket.c:1672 [inline]
 __x64_sys_connect+0x73/0xb0 net/socket.c:1672
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458099
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f870ee22c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458099
RDX: 0000000000000048 RSI: 0000000020000080 RDI: 0000000000000005
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f870ee236d4
R13: 00000000004be48e R14: 00000000004ce9a8 R15: 00000000ffffffff

Allocated by task 526:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_kmalloc mm/kasan/common.c:496 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:469
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:504
ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
 kmem_cache_alloc_trace+0x151/0x760 mm/slab.c:3609
 kmalloc include/linux/slab.h:545 [inline]
 ax25_rt_add net/ax25/ax25_route.c:95 [inline]
 ax25_rt_ioctl+0x3b9/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
Freed by task 550:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:458
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:466
 __cache_free mm/slab.c:3487 [inline]
 kfree+0xcf/0x230 mm/slab.c:3806
 ax25_rt_add net/ax25/ax25_route.c:92 [inline]
 ax25_rt_ioctl+0x304/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff888066641a80
 which belongs to the cache kmalloc-96 of size 96
The buggy address is located 0 bytes inside of
 96-byte region [ffff888066641a80, ffff888066641ae0)
The buggy address belongs to the page:
page:ffffea0001999040 count:1 mapcount:0 mapping:ffff88812c3f04c0 index:0x0
flags: 0x1fffc0000000200(slab)
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
raw: 01fffc0000000200 ffffea0001817948 ffffea0002341dc8 ffff88812c3f04c0
raw: 0000000000000000 ffff888066641000 0000000100000020 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888066641980: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641a00: 00 00 00 00 00 00 00 00 02 fc fc fc fc fc fc fc
>ffff888066641a80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                   ^
 ffff888066641b00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641b80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Feb 12, 2019
Before this patch, using multiple active endpoints would not
be possible and would actually be canceling each other out.

The issue was discovered on Android when combining adb, mtp and ptp
configurations together. This patch introduces proper behaviour for
these cases.

Also, during the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ MIPS#10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ MIPS#45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
pcercuei referenced this issue in OpenDingux/linux Feb 25, 2019
commit 63530ab upstream.

syzbot found that ax25 routes where not properly protected
against concurrent use [1].

In this particular report the bug happened while
copying ax25->digipeat.

Fix this problem by making sure we call ax25_get_route()
while ax25_route_lock is held, so that no modification
could happen while using the route.

The current two ax25_get_route() callers do not sleep,
so this change should be fine.

Once we do that, ax25_get_route() no longer needs to
grab a reference on the found route.

[1]
ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
BUG: KASAN: use-after-free in memcpy include/linux/string.h:352 [inline]
BUG: KASAN: use-after-free in kmemdup+0x42/0x60 mm/util.c:113
Read of size 66 at addr ffff888066641a80 by task syz-executor2/531

ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
CPU: 1 PID: 531 Comm: syz-executor2 Not tainted 5.0.0-rc2+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 check_memory_region_inline mm/kasan/generic.c:185 [inline]
 check_memory_region+0x123/0x190 mm/kasan/generic.c:191
 memcpy+0x24/0x50 mm/kasan/common.c:130
 memcpy include/linux/string.h:352 [inline]
 kmemdup+0x42/0x60 mm/util.c:113
 kmemdup include/linux/string.h:425 [inline]
 ax25_rt_autobind+0x25d/0x750 net/ax25/ax25_route.c:424
 ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1224
 __sys_connect+0x357/0x490 net/socket.c:1664
 __do_sys_connect net/socket.c:1675 [inline]
 __se_sys_connect net/socket.c:1672 [inline]
 __x64_sys_connect+0x73/0xb0 net/socket.c:1672
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458099
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f870ee22c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458099
RDX: 0000000000000048 RSI: 0000000020000080 RDI: 0000000000000005
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f870ee236d4
R13: 00000000004be48e R14: 00000000004ce9a8 R15: 00000000ffffffff

Allocated by task 526:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_kmalloc mm/kasan/common.c:496 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:469
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:504
ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
 kmem_cache_alloc_trace+0x151/0x760 mm/slab.c:3609
 kmalloc include/linux/slab.h:545 [inline]
 ax25_rt_add net/ax25/ax25_route.c:95 [inline]
 ax25_rt_ioctl+0x3b9/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
Freed by task 550:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:458
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:466
 __cache_free mm/slab.c:3487 [inline]
 kfree+0xcf/0x230 mm/slab.c:3806
 ax25_rt_add net/ax25/ax25_route.c:92 [inline]
 ax25_rt_ioctl+0x304/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff888066641a80
 which belongs to the cache kmalloc-96 of size 96
The buggy address is located 0 bytes inside of
 96-byte region [ffff888066641a80, ffff888066641ae0)
The buggy address belongs to the page:
page:ffffea0001999040 count:1 mapcount:0 mapping:ffff88812c3f04c0 index:0x0
flags: 0x1fffc0000000200(slab)
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
raw: 01fffc0000000200 ffffea0001817948 ffffea0002341dc8 ffff88812c3f04c0
raw: 0000000000000000 ffff888066641000 0000000100000020 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888066641980: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641a00: 00 00 00 00 00 00 00 00 02 fc fc fc fc fc fc fc
>ffff888066641a80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                   ^
 ffff888066641b00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641b80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
pcercuei referenced this issue in OpenDingux/linux May 6, 2019
There is a KASAN slab-out-of-bounds:
BUG: KASAN: slab-out-of-bounds in _copy_from_iter_full+0x783/0xaa0
Read of size 80 at addr ffff88810c35e180 by task mount.cifs/539

CPU: 1 PID: 539 Comm: mount.cifs Not tainted 4.19 #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
            rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
Call Trace:
 dump_stack+0xdd/0x12a
 print_address_description+0xa7/0x540
 kasan_report+0x1ff/0x550
 check_memory_region+0x2f1/0x310
 memcpy+0x2f/0x80
 _copy_from_iter_full+0x783/0xaa0
 tcp_sendmsg_locked+0x1840/0x4140
 tcp_sendmsg+0x37/0x60
 inet_sendmsg+0x18c/0x490
 sock_sendmsg+0xae/0x130
 smb_send_kvec+0x29c/0x520
 __smb_send_rqst+0x3ef/0xc60
 smb_send_rqst+0x25a/0x2e0
 compound_send_recv+0x9e8/0x2af0
 cifs_send_recv+0x24/0x30
 SMB2_open+0x35e/0x1620
 open_shroot+0x27b/0x490
 smb2_open_op_close+0x4e1/0x590
 smb2_query_path_info+0x2ac/0x650
 cifs_get_inode_info+0x1058/0x28f0
 cifs_root_iget+0x3bb/0xf80
 cifs_smb3_do_mount+0xe00/0x14c0
 cifs_do_mount+0x15/0x20
 mount_fs+0x5e/0x290
 vfs_kern_mount+0x88/0x460
 do_mount+0x398/0x31e0
 ksys_mount+0xc6/0x150
 __x64_sys_mount+0xea/0x190
 do_syscall_64+0x122/0x590
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

It can be reproduced by the following step:
  1. samba configured with: server max protocol = SMB2_10
  2. mount -o vers=default

When parse the mount version parameter, the 'ops' and 'vals'
was setted to smb30,  if negotiate result is smb21, just
update the 'ops' to smb21, but the 'vals' is still smb30.
When add lease context, the iov_base is allocated with smb21
ops, but the iov_len is initiallited with the smb30. Because
the iov_len is longer than iov_base, when send the message,
copy array out of bounds.

we need to keep the 'ops' and 'vals' consistent.

Fixes: 9764c02 ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)")
Fixes: d5c7076 ("smb3: add smb3.1.1 to default dialect list")

Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
pcercuei referenced this issue in OpenDingux/linux May 6, 2019
By calling maps__insert() we assume to get 2 references on the map,
which we relese within maps__remove call.

However if there's already same map name, we currently don't bump the
reference and can crash, like:

  Program received signal SIGABRT, Aborted.
  0x00007ffff75e60f5 in raise () from /lib64/libc.so.6

  (gdb) bt
  #0  0x00007ffff75e60f5 in raise () from /lib64/libc.so.6
  #1  0x00007ffff75d0895 in abort () from /lib64/libc.so.6
  #2  0x00007ffff75d0769 in __assert_fail_base.cold () from /lib64/libc.so.6
  #3  0x00007ffff75de596 in __assert_fail () from /lib64/libc.so.6
  #4  0x00000000004fc006 in refcount_sub_and_test (i=1, r=0x1224e88) at tools/include/linux/refcount.h:131
  #5  refcount_dec_and_test (r=0x1224e88) at tools/include/linux/refcount.h:148
  #6  map__put (map=0x1224df0) at util/map.c:299
  #7  0x00000000004fdb95 in __maps__remove (map=0x1224df0, maps=0xb17d80) at util/map.c:953
  #8  maps__remove (maps=0xb17d80, map=0x1224df0) at util/map.c:959
  #9  0x00000000004f7d8a in map_groups__remove (map=<optimized out>, mg=<optimized out>) at util/map_groups.h:65
  #10 machine__process_ksymbol_unregister (sample=<optimized out>, event=0x7ffff7279670, machine=<optimized out>) at util/machine.c:728
  #11 machine__process_ksymbol (machine=<optimized out>, event=0x7ffff7279670, sample=<optimized out>) at util/machine.c:741
  #12 0x00000000004fffbb in perf_session__deliver_event (session=0xb11390, event=0x7ffff7279670, tool=0x7fffffffc7b0, file_offset=13936) at util/session.c:1362
  #13 0x00000000005039bb in do_flush (show_progress=false, oe=0xb17e80) at util/ordered-events.c:243
  #14 __ordered_events__flush (oe=0xb17e80, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:322
  #15 0x00000000005005e4 in perf_session__process_user_event (session=session@entry=0xb11390, event=event@entry=0x7ffff72a4af8,
  ...

Add the map to the list and getting the reference event if we find the
map with same name.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Fixes: 1e62856 ("perf symbols: Fix slowness due to -ffunction-section")
Link: http://lkml.kernel.org/r/20190416160127.30203-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
nemunaire pushed a commit to nemunaire/CI20_linux that referenced this issue Jun 16, 2019
commit 63530ab upstream.

syzbot found that ax25 routes where not properly protected
against concurrent use [1].

In this particular report the bug happened while
copying ax25->digipeat.

Fix this problem by making sure we call ax25_get_route()
while ax25_route_lock is held, so that no modification
could happen while using the route.

The current two ax25_get_route() callers do not sleep,
so this change should be fine.

Once we do that, ax25_get_route() no longer needs to
grab a reference on the found route.

[1]
ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
BUG: KASAN: use-after-free in memcpy include/linux/string.h:352 [inline]
BUG: KASAN: use-after-free in kmemdup+0x42/0x60 mm/util.c:113
Read of size 66 at addr ffff888066641a80 by task syz-executor2/531

ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
CPU: 1 PID: 531 Comm: syz-executor2 Not tainted 5.0.0-rc2+ MIPS#10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 check_memory_region_inline mm/kasan/generic.c:185 [inline]
 check_memory_region+0x123/0x190 mm/kasan/generic.c:191
 memcpy+0x24/0x50 mm/kasan/common.c:130
 memcpy include/linux/string.h:352 [inline]
 kmemdup+0x42/0x60 mm/util.c:113
 kmemdup include/linux/string.h:425 [inline]
 ax25_rt_autobind+0x25d/0x750 net/ax25/ax25_route.c:424
 ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1224
 __sys_connect+0x357/0x490 net/socket.c:1664
 __do_sys_connect net/socket.c:1675 [inline]
 __se_sys_connect net/socket.c:1672 [inline]
 __x64_sys_connect+0x73/0xb0 net/socket.c:1672
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458099
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f870ee22c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458099
RDX: 0000000000000048 RSI: 0000000020000080 RDI: 0000000000000005
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f870ee236d4
R13: 00000000004be48e R14: 00000000004ce9a8 R15: 00000000ffffffff

Allocated by task 526:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_kmalloc mm/kasan/common.c:496 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:469
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:504
ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
 kmem_cache_alloc_trace+0x151/0x760 mm/slab.c:3609
 kmalloc include/linux/slab.h:545 [inline]
 ax25_rt_add net/ax25/ax25_route.c:95 [inline]
 ax25_rt_ioctl+0x3b9/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
Freed by task 550:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:458
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:466
 __cache_free mm/slab.c:3487 [inline]
 kfree+0xcf/0x230 mm/slab.c:3806
 ax25_rt_add net/ax25/ax25_route.c:92 [inline]
 ax25_rt_ioctl+0x304/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff888066641a80
 which belongs to the cache kmalloc-96 of size 96
The buggy address is located 0 bytes inside of
 96-byte region [ffff888066641a80, ffff888066641ae0)
The buggy address belongs to the page:
page:ffffea0001999040 count:1 mapcount:0 mapping:ffff88812c3f04c0 index:0x0
flags: 0x1fffc0000000200(slab)
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
raw: 01fffc0000000200 ffffea0001817948 ffffea0002341dc8 ffff88812c3f04c0
raw: 0000000000000000 ffff888066641000 0000000100000020 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888066641980: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641a00: 00 00 00 00 00 00 00 00 02 fc fc fc fc fc fc fc
>ffff888066641a80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                   ^
 ffff888066641b00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641b80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nemunaire pushed a commit to nemunaire/CI20_linux that referenced this issue Jun 16, 2019
commit fa30dde upstream.

We see the following NULL pointer dereference while running xfstests
generic/475:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
Oops: 0000 [MIPS#1] SMP PTI
CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 MIPS#10
RIP: 0010:ext4_do_update_inode+0x4ec/0x760
...
Call Trace:
? jbd2_journal_get_write_access+0x42/0x50
? __ext4_journal_get_write_access+0x2c/0x70
? ext4_truncate+0x186/0x3f0
ext4_mark_iloc_dirty+0x61/0x80
ext4_mark_inode_dirty+0x62/0x1b0
ext4_truncate+0x186/0x3f0
? unmap_mapping_pages+0x56/0x100
ext4_setattr+0x817/0x8b0
notify_change+0x1df/0x430
do_truncate+0x5e/0x90
? generic_permission+0x12b/0x1a0

This is triggered because the NULL pointer handle->h_transaction was
dereferenced in function ext4_update_inode_fsync_trans().
I found that the h_transaction was set to NULL in jbd2__journal_restart
but failed to attached to a new transaction while the journal is aborted.

Fix this by checking the handle before updating the inode.

Fixes: b436b9b ("ext4: Wait for proper transaction commit on fsync")
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nemunaire pushed a commit to nemunaire/CI20_linux that referenced this issue Jun 16, 2019
[ Upstream commit d982b33 ]

  =================================================================
  ==20875==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 1160 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      MIPS#1 0x55bd50005599 in zalloc util/util.h:23
      MIPS#2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327
      MIPS#3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216
      MIPS#4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69
      MIPS#5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358
      MIPS#6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388
      MIPS#7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583
      MIPS#8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722
      MIPS#9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      MIPS#10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      MIPS#11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      MIPS#12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520
      MIPS#13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 19 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
      MIPS#1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 6a6cd11 ("perf test: Add test for the sched tracepoint format fields")
Link: http://lkml.kernel.org/r/20190316080556.3075-17-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
nemunaire pushed a commit to nemunaire/CI20_linux that referenced this issue Jun 16, 2019
Before this patch, using multiple active endpoints would not
be possible and would actually be canceling each other out.

The issue was discovered on Android when combining adb, mtp and ptp
configurations together. This patch introduces proper behaviour for
these cases.

Also, during the boot-up the following warning is no longer shown:

[    2.879328] ------------[ cut here ]------------
[    2.883983] WARNING: CPU: 0 PID: 1 at drivers/usb/dwc2/gadget.c:212 s3c_hsotg_init_fifo+0x168/0x1d0()
[    2.893204] insufficient fifo memory
[    2.896602] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.3+ MIPS#10
[    2.904004] Stack : 00000000 800919a0 00000000 00000004 00000006 800913f4 00000000 00000000
          00000000 00000000 80f75a12 00000042 80f75a12 00000042 00000006 00000000
          80e42767 80d7c2e 00000001 00000000 80f73574 8bc90418 80ea0000 01000d00
          80f06704 80b24c00 00000000 80035388 00000006 00000000 80d834a4 8bc99b04
          8bc99b04 80e40000 00000000 00000000 00000000 00000000 00000000 00000000
          ...
[    2.939709] Call Trace:
[    2.942174] [<8001bab0>] show_stack+0xd4/0xf0
[    2.946528] [<80b26c40>] dump_stack+0x70/0xbc
[    2.950880] [<800356bc>] warn_slowpath_common+0x90/0xe8
[    2.956116] [<80035808>] warn_slowpath_fmt+0x3c/0x48
[    2.961075] [<8069b824>] s3c_hsotg_init_fifo+0x168/0x1d0
[    2.966398] [<8069d8fc>] s3c_hsotg_init+0x50/0x9c
[    2.971095] [<806a0388>] dwc2_gadget_init+0x430/0x8c0
[    2.976158] [<806a0df0>] dwc2_driver_probe+0x218/0x2a8
[    2.981291] [<805b935c>] platform_drv_probe+0x64/0x120
[    2.986440] [<805b783c>] really_probe+0xa0/0x278
[    2.991050] [<805b7c78>] driver_probe_device+0x48/0x78
[    2.996197] [<805b7d74>] __driver_attach+0xcc/0xd4
[    3.000980] [<805b5b7c>] bus_for_each_dev+0x7c/0xc4
[    3.005874] [<805b64f8>] bus_add_driver+0x180/0x240
[    3.010743] [<805b8428>] driver_register+0xac/0x154
[    3.015633] [<80ea9e04>] do_one_initcall+0x150/0x1f4
[    3.020589] [<80eaa080>] kernel_init_freeable+0x1d8/0x298
[    3.025998] [<80b23c5c>] kernel_init+0x28/0x158
[    3.030522] [<800153ec>] ret_from_kernel_thread+0x14/0x1c
[    3.035926]
[    3.037412] ---[ end trace cb88537fdc8fa201 ]---

And during configuration transitions (e.g. adb -> mtp,adb)
the following warning is no longer shown:

[ 311.726159] -----------[ cut here ]-----------
[ 311.730817] WARNING: CPU: 0 PID: 0 at drivers/usb/dwc2/gadget.c:1475 s3c_hsotg_rx_data+0x130/0x13c()
[ 311.739931] Modules linked in:
[ 311.742993] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.18.3+ MIPS#45
[ 311.750199] Stack : 00000000 80080370 00000000 00000004 00000006 00000000 00000000 00000000
00000000 00000000 80f05b02 00000042 80d61010 80e18e20 80d60000 8b408010
80e18927 80d0df6c 00000000 00000000 80f03614 80e18e20 80d60000 8b408010
00250182 80a54f54 80e20cc4 80e20cc8 00000000 00000000 80d14ab8 80dfbacc
80dfbacc 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
[ 311.785841] Call Trace:
[ 311.788292] [<8001ac28>] show_stack+0xc4/0xe0
[ 311.792650] [<80a56e58>] dump_stack+0x70/0xbc
[ 311.797008] [<80033c14>] warn_slowpath_common+0x88/0xb8
[ 311.802224] [<80033cc8>] warn_slowpath_null+0x18/0x24
[ 311.807266] [<80606a3c>] s3c_hsotg_rx_data+0x130/0x13c
[ 311.812397] [<8060afa4>] s3c_hsotg_irq+0x3b4/0x5e8
[ 311.817183] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.822745] [<80082d4c>] handle_irq_event+0x54/0x98
[ 311.827617] [<80086390>] handle_level_irq+0xe0/0x1c0
[ 311.832572] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.837622] [<804bb680>] jz4740_cascade+0x78/0xac
[ 311.842317] [<80082ab8>] handle_irq_event_percpu+0x90/0x2d0
[ 311.847881] [<80086d18>] handle_percpu_irq+0x8c/0xbc
[ 311.852835] [<800820bc>] generic_handle_irq+0x3c/0x54
[ 311.857878] [<80016c8c>] do_IRQ+0x18/0x2c
[ 311.861879] [<80014c40>] ret_from_irq+0x0/0x4
[ 311.866227] [<80016b20>] mips_cpuidle_wait_enter+0x14/0x34
[ 311.871713] [<806d37b0>] cpuidle_enter_state+0x88/0x2c0
[ 311.876934] [<80074308>] cpu_startup_entry+0x36c/0x484
[ 311.882074] [<80e7dc04>] start_kernel+0x4b8/0x4e0
[ 311.886767]
[ 311.888253] --[ end trace dd7a60dcc5530db3 ]--

Change-Id: Ic8ac37a28913d4314371de0cd446f8a7cc45864d
Signed-off-by: Dragan Cecavac <dragan.cecavac@imgtec.com>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Jan 17, 2020
commit 63530ab upstream.

syzbot found that ax25 routes where not properly protected
against concurrent use [1].

In this particular report the bug happened while
copying ax25->digipeat.

Fix this problem by making sure we call ax25_get_route()
while ax25_route_lock is held, so that no modification
could happen while using the route.

The current two ax25_get_route() callers do not sleep,
so this change should be fine.

Once we do that, ax25_get_route() no longer needs to
grab a reference on the found route.

[1]
ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
BUG: KASAN: use-after-free in memcpy include/linux/string.h:352 [inline]
BUG: KASAN: use-after-free in kmemdup+0x42/0x60 mm/util.c:113
Read of size 66 at addr ffff888066641a80 by task syz-executor2/531

ax25_connect(): syz-executor0 uses autobind, please contact jreuter@yaina.de
CPU: 1 PID: 531 Comm: syz-executor2 Not tainted 5.0.0-rc2+ MIPS#10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 check_memory_region_inline mm/kasan/generic.c:185 [inline]
 check_memory_region+0x123/0x190 mm/kasan/generic.c:191
 memcpy+0x24/0x50 mm/kasan/common.c:130
 memcpy include/linux/string.h:352 [inline]
 kmemdup+0x42/0x60 mm/util.c:113
 kmemdup include/linux/string.h:425 [inline]
 ax25_rt_autobind+0x25d/0x750 net/ax25/ax25_route.c:424
 ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1224
 __sys_connect+0x357/0x490 net/socket.c:1664
 __do_sys_connect net/socket.c:1675 [inline]
 __se_sys_connect net/socket.c:1672 [inline]
 __x64_sys_connect+0x73/0xb0 net/socket.c:1672
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458099
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f870ee22c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458099
RDX: 0000000000000048 RSI: 0000000020000080 RDI: 0000000000000005
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f870ee236d4
R13: 00000000004be48e R14: 00000000004ce9a8 R15: 00000000ffffffff

Allocated by task 526:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_kmalloc mm/kasan/common.c:496 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:469
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:504
ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
 kmem_cache_alloc_trace+0x151/0x760 mm/slab.c:3609
 kmalloc include/linux/slab.h:545 [inline]
 ax25_rt_add net/ax25/ax25_route.c:95 [inline]
 ax25_rt_ioctl+0x3b9/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

ax25_connect(): syz-executor5 uses autobind, please contact jreuter@yaina.de
Freed by task 550:
 save_stack+0x45/0xd0 mm/kasan/common.c:73
 set_track mm/kasan/common.c:85 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:458
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:466
 __cache_free mm/slab.c:3487 [inline]
 kfree+0xcf/0x230 mm/slab.c:3806
 ax25_rt_add net/ax25/ax25_route.c:92 [inline]
 ax25_rt_ioctl+0x304/0x1270 net/ax25/ax25_route.c:233
 ax25_ioctl+0x322/0x10b0 net/ax25/af_ax25.c:1763
 sock_do_ioctl+0xe2/0x400 net/socket.c:950
 sock_ioctl+0x32f/0x6c0 net/socket.c:1074
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x107b/0x17d0 fs/ioctl.c:696
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff888066641a80
 which belongs to the cache kmalloc-96 of size 96
The buggy address is located 0 bytes inside of
 96-byte region [ffff888066641a80, ffff888066641ae0)
The buggy address belongs to the page:
page:ffffea0001999040 count:1 mapcount:0 mapping:ffff88812c3f04c0 index:0x0
flags: 0x1fffc0000000200(slab)
ax25_connect(): syz-executor4 uses autobind, please contact jreuter@yaina.de
raw: 01fffc0000000200 ffffea0001817948 ffffea0002341dc8 ffff88812c3f04c0
raw: 0000000000000000 ffff888066641000 0000000100000020 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888066641980: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641a00: 00 00 00 00 00 00 00 00 02 fc fc fc fc fc fc fc
>ffff888066641a80: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                   ^
 ffff888066641b00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
 ffff888066641b80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Jan 17, 2020
commit fa30dde upstream.

We see the following NULL pointer dereference while running xfstests
generic/475:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
Oops: 0000 [MIPS#1] SMP PTI
CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 MIPS#10
RIP: 0010:ext4_do_update_inode+0x4ec/0x760
...
Call Trace:
? jbd2_journal_get_write_access+0x42/0x50
? __ext4_journal_get_write_access+0x2c/0x70
? ext4_truncate+0x186/0x3f0
ext4_mark_iloc_dirty+0x61/0x80
ext4_mark_inode_dirty+0x62/0x1b0
ext4_truncate+0x186/0x3f0
? unmap_mapping_pages+0x56/0x100
ext4_setattr+0x817/0x8b0
notify_change+0x1df/0x430
do_truncate+0x5e/0x90
? generic_permission+0x12b/0x1a0

This is triggered because the NULL pointer handle->h_transaction was
dereferenced in function ext4_update_inode_fsync_trans().
I found that the h_transaction was set to NULL in jbd2__journal_restart
but failed to attached to a new transaction while the journal is aborted.

Fix this by checking the handle before updating the inode.

Fixes: b436b9b ("ext4: Wait for proper transaction commit on fsync")
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this issue Jan 17, 2020
[ Upstream commit d982b33 ]

  =================================================================
  ==20875==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 1160 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
      MIPS#1 0x55bd50005599 in zalloc util/util.h:23
      MIPS#2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327
      MIPS#3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216
      MIPS#4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69
      MIPS#5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358
      MIPS#6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388
      MIPS#7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583
      MIPS#8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722
      MIPS#9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
      MIPS#10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
      MIPS#11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
      MIPS#12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520
      MIPS#13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)

  Indirect leak of 19 byte(s) in 1 object(s) allocated from:
      #0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
      MIPS#1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f)

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 6a6cd11 ("perf test: Add test for the sched tracepoint format fields")
Link: http://lkml.kernel.org/r/20190316080556.3075-17-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
pcercuei referenced this issue in OpenDingux/linux Mar 31, 2020
When experimenting with bpf_send_signal() helper in our production
environment (5.2 based), we experienced a deadlock in NMI mode:
   #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
   #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
   #7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
   #8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
   #9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
  #10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
  #11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
  #12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
  #13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
  #14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
  #15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
  MIPS#16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
  MIPS#17 [ffffc9002219fc90] schedule at ffffffff81a3e219
  MIPS#18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
  MIPS#19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
  MIPS#20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
  MIPS#21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
  MIPS#22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
  MIPS#23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068

The above call stack is actually very similar to an issue
reported by Commit eac9153 ("bpf/stackmap: Fix deadlock with
rq_lock in bpf_get_stack()") by Song Liu. The only difference is
bpf_send_signal() helper instead of bpf_get_stack() helper.

The above deadlock is triggered with a perf_sw_event.
Similar to Commit eac9153, the below almost identical reproducer
used tracepoint point sched/sched_switch so the issue can be easily caught.
  /* stress_test.c */
  #include <stdio.h>
  #include <stdlib.h>
  #include <sys/mman.h>
  #include <pthread.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>

  #define THREAD_COUNT 1000
  char *filename;
  void *worker(void *p)
  {
        void *ptr;
        int fd;
        char *pptr;

        fd = open(filename, O_RDONLY);
        if (fd < 0)
                return NULL;
        while (1) {
                struct timespec ts = {0, 1000 + rand() % 2000};

                ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                usleep(1);
                if (ptr == MAP_FAILED) {
                        printf("failed to mmap\n");
                        break;
                }
                munmap(ptr, 4096 * 64);
                usleep(1);
                pptr = malloc(1);
                usleep(1);
                pptr[0] = 1;
                usleep(1);
                free(pptr);
                usleep(1);
                nanosleep(&ts, NULL);
        }
        close(fd);
        return NULL;
  }

  int main(int argc, char *argv[])
  {
        void *ptr;
        int i;
        pthread_t threads[THREAD_COUNT];

        if (argc < 2)
                return 0;

        filename = argv[1];

        for (i = 0; i < THREAD_COUNT; i++) {
                if (pthread_create(threads + i, NULL, worker, NULL)) {
                        fprintf(stderr, "Error creating thread\n");
                        return 0;
                }
        }

        for (i = 0; i < THREAD_COUNT; i++)
                pthread_join(threads[i], NULL);
        return 0;
  }
and the following command:
  1. run `stress_test /bin/ls` in one windown
  2. hack bcc trace.py with the following change:
     --- a/tools/trace.py
     +++ b/tools/trace.py
     @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s);
              __data.tgid = __tgid;
              __data.pid = __pid;
              bpf_get_current_comm(&__data.comm, sizeof(__data.comm));
     +        bpf_send_signal(10);
      %s
      %s
              %s.perf_submit(%s, &__data, sizeof(__data));
  3. in a different window run
     ./trace.py -p $(pidof stress_test) t:sched:sched_switch

The deadlock can be reproduced in our production system.

Similar to Song's fix, the fix is to delay sending signal if
irqs is disabled to avoid deadlocks involving with rq_lock.
With this change, my above stress-test in our production system
won't cause deadlock any more.

I also implemented a scale-down version of reproducer in the
selftest (a subsequent commit). With latest bpf-next,
it complains for the following potential deadlock.
  [   32.832450] -> #1 (&p->pi_lock){-.-.}:
  [   32.833100]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.833696]        task_rq_lock+0x2c/0xa0
  [   32.834182]        task_sched_runtime+0x59/0xd0
  [   32.834721]        thread_group_cputime+0x250/0x270
  [   32.835304]        thread_group_cputime_adjusted+0x2e/0x70
  [   32.835959]        do_task_stat+0x8a7/0xb80
  [   32.836461]        proc_single_show+0x51/0xb0
  ...
  [   32.839512] -> #0 (&(&sighand->siglock)->rlock){....}:
  [   32.840275]        __lock_acquire+0x1358/0x1a20
  [   32.840826]        lock_acquire+0xc7/0x1d0
  [   32.841309]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.841916]        __lock_task_sighand+0x79/0x160
  [   32.842465]        do_send_sig_info+0x35/0x90
  [   32.842977]        bpf_send_signal+0xa/0x10
  [   32.843464]        bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000
  [   32.844301]        trace_call_bpf+0x115/0x270
  [   32.844809]        perf_trace_run_bpf_submit+0x4a/0xc0
  [   32.845411]        perf_trace_sched_switch+0x10f/0x180
  [   32.846014]        __schedule+0x45d/0x880
  [   32.846483]        schedule+0x5f/0xd0
  ...

  [   32.853148] Chain exists of:
  [   32.853148]   &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock
  [   32.853148]
  [   32.854451]  Possible unsafe locking scenario:
  [   32.854451]
  [   32.855173]        CPU0                    CPU1
  [   32.855745]        ----                    ----
  [   32.856278]   lock(&rq->lock);
  [   32.856671]                                lock(&p->pi_lock);
  [   32.857332]                                lock(&rq->lock);
  [   32.857999]   lock(&(&sighand->siglock)->rlock);

  Deadlock happens on CPU0 when it tries to acquire &sighand->siglock
  but it has been held by CPU1 and CPU1 tries to grab &rq->lock
  and cannot get it.

  This is not exactly the callstack in our production environment,
  but sympotom is similar and both locks are using spin_lock_irqsave()
  to acquire the lock, and both involves rq_lock. The fix to delay
  sending signal when irq is disabled also fixed this issue.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com
pcercuei referenced this issue in OpenDingux/linux Sep 17, 2020
…s metrics" test

Linux 5.9 introduced perf test case "Parse and process metrics" and
on s390 this test case always dumps core:

  [root@t35lp67 perf]# ./perf test -vvvv -F 67
  67: Parse and process metrics                             :
  --- start ---
  metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC
  parsing metric: inst_retired.any / cpu_clk_unhalted.thread
  Segmentation fault (core dumped)
  [root@t35lp67 perf]#

I debugged this core dump and gdb shows this call chain:

  (gdb) where
   #0  0x000003ffabc3192a in __strnlen_c_1 () from /lib64/libc.so.6
   #1  0x000003ffabc293de in strcasestr () from /lib64/libc.so.6
   #2  0x0000000001102ba2 in match_metric(list=0x1e6ea20 "inst_retired.any",
            n=<optimized out>)
       at util/metricgroup.c:368
   #3  find_metric (map=<optimized out>, map=<optimized out>,
           metric=0x1e6ea20 "inst_retired.any")
      at util/metricgroup.c:765
   #4  __resolve_metric (ids=0x0, map=<optimized out>, metric_list=0x0,
           metric_no_group=<optimized out>, m=<optimized out>)
      at util/metricgroup.c:844
   #5  resolve_metric (ids=0x0, map=0x0, metric_list=0x0,
          metric_no_group=<optimized out>)
      at util/metricgroup.c:881
   #6  metricgroup__add_metric (metric=<optimized out>,
        metric_no_group=metric_no_group@entry=false, events=<optimized out>,
        events@entry=0x3ffd84fb878, metric_list=0x0,
        metric_list@entry=0x3ffd84fb868, map=0x0)
      at util/metricgroup.c:943
   #7  0x00000000011034ae in metricgroup__add_metric_list (map=0x13f9828 <map>,
        metric_list=0x3ffd84fb868, events=0x3ffd84fb878,
        metric_no_group=<optimized out>, list=<optimized out>)
      at util/metricgroup.c:988
   #8  parse_groups (perf_evlist=perf_evlist@entry=0x1e70260,
          str=str@entry=0x12f34b2 "IPC", metric_no_group=<optimized out>,
          metric_no_merge=<optimized out>,
          fake_pmu=fake_pmu@entry=0x1462f18 <perf_pmu.fake>,
          metric_events=0x3ffd84fba58, map=0x1)
      at util/metricgroup.c:1040
   #9  0x0000000001103eb2 in metricgroup__parse_groups_test(
  	evlist=evlist@entry=0x1e70260, map=map@entry=0x13f9828 <map>,
  	str=str@entry=0x12f34b2 "IPC",
  	metric_no_group=metric_no_group@entry=false,
  	metric_no_merge=metric_no_merge@entry=false,
  	metric_events=0x3ffd84fba58)
      at util/metricgroup.c:1082
   #10 0x00000000010c84d8 in __compute_metric (ratio2=0x0, name2=0x0,
          ratio1=<synthetic pointer>, name1=0x12f34b2 "IPC",
  	vals=0x3ffd84fbad8, name=0x12f34b2 "IPC")
      at tests/parse-metric.c:159
   #11 compute_metric (ratio=<synthetic pointer>, vals=0x3ffd84fbad8,
  	name=0x12f34b2 "IPC")
      at tests/parse-metric.c:189
   #12 test_ipc () at tests/parse-metric.c:208
.....
..... omitted many more lines

This test case was added with
commit 218ca91 ("perf tests: Add parse metric test for frontend metric").

When I compile with make DEBUG=y it works fine and I do not get a core dump.

It turned out that the above listed function call chain worked on a struct
pmu_event array which requires a trailing element with zeroes which was
missing. The marco map_for_each_event() loops over that array tests for members
metric_expr/metric_name/metric_group being non-NULL. Adding this element fixes
the issue.

Output after:

  [root@t35lp46 perf]# ./perf test 67
  67: Parse and process metrics                             : Ok
  [root@t35lp46 perf]#

Committer notes:

As Ian remarks, this is not s390 specific:

<quote Ian>
  This also shows up with address sanitizer on all architectures
  (perhaps change the patch title) and perhaps add a "Fixes: <commit>"
  tag.

  =================================================================
  ==4718==ERROR: AddressSanitizer: global-buffer-overflow on address
  0x55c93b4d59e8 at pc 0x55c93a1541e2 bp 0x7ffd24327c60 sp
  0x7ffd24327c58
  READ of size 8 at 0x55c93b4d59e8 thread T0
      #0 0x55c93a1541e1 in find_metric tools/perf/util/metricgroup.c:764:2
      #1 0x55c93a153e6c in __resolve_metric tools/perf/util/metricgroup.c:844:9
      #2 0x55c93a152f18 in resolve_metric tools/perf/util/metricgroup.c:881:9
      #3 0x55c93a1528db in metricgroup__add_metric
  tools/perf/util/metricgroup.c:943:9
      #4 0x55c93a151996 in metricgroup__add_metric_list
  tools/perf/util/metricgroup.c:988:9
      #5 0x55c93a1511b9 in parse_groups tools/perf/util/metricgroup.c:1040:8
      #6 0x55c93a1513e1 in metricgroup__parse_groups_test
  tools/perf/util/metricgroup.c:1082:9
      #7 0x55c93a0108ae in __compute_metric tools/perf/tests/parse-metric.c:159:8
      #8 0x55c93a010744 in compute_metric tools/perf/tests/parse-metric.c:189:9
      #9 0x55c93a00f5ee in test_ipc tools/perf/tests/parse-metric.c:208:2
      #10 0x55c93a00f1e8 in test__parse_metric
  tools/perf/tests/parse-metric.c:345:2
      #11 0x55c939fd7202 in run_test tools/perf/tests/builtin-test.c:410:9
      #12 0x55c939fd6736 in test_and_print tools/perf/tests/builtin-test.c:440:9
      #13 0x55c939fd58c3 in __cmd_test tools/perf/tests/builtin-test.c:661:4
      #14 0x55c939fd4e02 in cmd_test tools/perf/tests/builtin-test.c:807:9
      #15 0x55c939e4763d in run_builtin tools/perf/perf.c:313:11
      MIPS#16 0x55c939e46475 in handle_internal_command tools/perf/perf.c:365:8
      MIPS#17 0x55c939e4737e in run_argv tools/perf/perf.c:409:2
      MIPS#18 0x55c939e45f7e in main tools/perf/perf.c:539:3

  0x55c93b4d59e8 is located 0 bytes to the right of global variable
  'pme_test' defined in 'tools/perf/tests/parse-metric.c:17:25'
  (0x55c93b4d54a0) of size 1352
  SUMMARY: AddressSanitizer: global-buffer-overflow
  tools/perf/util/metricgroup.c:764:2 in find_metric
  Shadow bytes around the buggy address:
    0x0ab9a7692ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  =>0x0ab9a7692b30: 00 00 00 00 00 00 00 00 00 00 00 00 00[f9]f9 f9
    0x0ab9a7692b40: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
    0x0ab9a7692b50: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
    0x0ab9a7692b60: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
    0x0ab9a7692b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0ab9a7692b80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone:	   fa
    Freed heap region:	   fd
    Stack left redzone:	   f1
    Stack mid redzone:	   f2
    Stack right redzone:     f3
    Stack after return:	   f5
    Stack use after scope:   f8
    Global redzone:          f9
    Global init order:	   f6
    Poisoned by user:        f7
    Container overflow:	   fc
    Array cookie:            ac
    Intra object redzone:    bb
    ASan internal:           fe
    Left alloca redzone:     ca
    Right alloca redzone:    cb
    Shadow gap:              cc
</quote>

I'm also adding the missing "Fixes" tag and setting just .name to NULL,
as doing it that way is more compact (the compiler will zero out
everything else) and the table iterators look for .name being NULL as
the sentinel marking the end of the table.

Fixes: 0a507af ("perf tests: Add parse metric test for ipc metric")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20200825071211.16959-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
pcercuei referenced this issue in OpenDingux/linux Sep 24, 2020
The aliases were never released causing the following leaks:

  Indirect leak of 1224 byte(s) in 9 object(s) allocated from:
    #0 0x7feefb830628 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x107628)
    #1 0x56332c8f1b62 in __perf_pmu__new_alias util/pmu.c:322
    #2 0x56332c8f401f in pmu_add_cpu_aliases_map util/pmu.c:778
    #3 0x56332c792ce9 in __test__pmu_event_aliases tests/pmu-events.c:295
    #4 0x56332c792ce9 in test_aliases tests/pmu-events.c:367
    #5 0x56332c76a09b in run_test tests/builtin-test.c:410
    #6 0x56332c76a09b in test_and_print tests/builtin-test.c:440
    #7 0x56332c76ce69 in __cmd_test tests/builtin-test.c:695
    #8 0x56332c76ce69 in cmd_test tests/builtin-test.c:807
    #9 0x56332c7d2214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #10 0x56332c6701a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #11 0x56332c6701a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #12 0x56332c6701a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #13 0x7feefb359cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: 956a783 ("perf test: Test pmu-events aliases")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
pcercuei referenced this issue in OpenDingux/linux Sep 24, 2020
The evsel->unit borrows a pointer of pmu event or alias instead of
owns a string.  But tool event (duration_time) passes a result of
strdup() caused a leak.

It was found by ASAN during metric test:

  Direct leak of 210 byte(s) in 70 object(s) allocated from:
    #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
    #1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414
    #2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414
    #3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439
    #4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096
    #5 0x559fbbcc95da in __parse_events util/parse-events.c:2141
    #6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406
    #7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393
    #8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415
    #9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498
    #10 0x559fbbc0109b in run_test tests/builtin-test.c:410
    #11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440
    #12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695
    #13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807
    #14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    MIPS#16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    MIPS#17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    MIPS#18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
pcercuei referenced this issue in OpenDingux/linux Sep 24, 2020
The test_generic_metric() missed to release entries in the pctx.  Asan
reported following leak (and more):

  Direct leak of 128 byte(s) in 1 object(s) allocated from:
    #0 0x7f4c9396980e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
    #1 0x55f7e748cc14 in hashmap_grow (/home/namhyung/project/linux/tools/perf/perf+0x90cc14)
    #2 0x55f7e748d497 in hashmap__insert (/home/namhyung/project/linux/tools/perf/perf+0x90d497)
    #3 0x55f7e7341667 in hashmap__set /home/namhyung/project/linux/tools/perf/util/hashmap.h:111
    #4 0x55f7e7341667 in expr__add_ref util/expr.c:120
    #5 0x55f7e7292436 in prepare_metric util/stat-shadow.c:783
    #6 0x55f7e729556d in test_generic_metric util/stat-shadow.c:858
    #7 0x55f7e712390b in compute_single tests/parse-metric.c:128
    #8 0x55f7e712390b in __compute_metric tests/parse-metric.c:180
    #9 0x55f7e712446d in compute_metric tests/parse-metric.c:196
    #10 0x55f7e712446d in test_dcache_l2 tests/parse-metric.c:295
    #11 0x55f7e712446d in test__parse_metric tests/parse-metric.c:355
    #12 0x55f7e70be09b in run_test tests/builtin-test.c:410
    #13 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
    #14 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
    #15 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
    MIPS#16 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    MIPS#17 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    MIPS#18 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    MIPS#19 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    MIPS#20 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: 6d432c4 ("perf tools: Add test_generic_metric function")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
pcercuei referenced this issue in OpenDingux/linux Sep 24, 2020
The metricgroup__add_metric() can find multiple match for a metric group
and it's possible to fail.  Also it can fail in the middle like in
resolve_metric() even for single metric.

In those cases, the intermediate list and ids will be leaked like:

  Direct leak of 3 byte(s) in 1 object(s) allocated from:
    #0 0x7f4c938f40b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5)
    #1 0x55f7e71c1bef in __add_metric util/metricgroup.c:683
    #2 0x55f7e71c31d0 in add_metric util/metricgroup.c:906
    #3 0x55f7e71c3844 in metricgroup__add_metric util/metricgroup.c:940
    #4 0x55f7e71c488d in metricgroup__add_metric_list util/metricgroup.c:993
    #5 0x55f7e71c488d in parse_groups util/metricgroup.c:1045
    #6 0x55f7e71c60a4 in metricgroup__parse_groups_test util/metricgroup.c:1087
    #7 0x55f7e71235ae in __compute_metric tests/parse-metric.c:164
    #8 0x55f7e7124650 in compute_metric tests/parse-metric.c:196
    #9 0x55f7e7124650 in test_recursion_fail tests/parse-metric.c:318
    #10 0x55f7e7124650 in test__parse_metric tests/parse-metric.c:356
    #11 0x55f7e70be09b in run_test tests/builtin-test.c:410
    #12 0x55f7e70be09b in test_and_print tests/builtin-test.c:440
    #13 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661
    #14 0x55f7e70c101a in cmd_test tests/builtin-test.c:807
    #15 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    MIPS#16 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    MIPS#17 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    MIPS#18 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    MIPS#19 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: 83de0b7 ("perf metric: Collect referenced metrics in struct metric_ref_node")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
pcercuei referenced this issue in OpenDingux/linux Sep 24, 2020
The following leaks were detected by ASAN:

  Indirect leak of 360 byte(s) in 9 object(s) allocated from:
    #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e)
    #1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333
    #2 0x560578f752fc in perf_pmu_parse util/pmu.y:59
    #3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73
    #4 0x560578e07045 in test__pmu tests/pmu.c:155
    #5 0x560578de109b in run_test tests/builtin-test.c:410
    #6 0x560578de109b in test_and_print tests/builtin-test.c:440
    #7 0x560578de401a in __cmd_test tests/builtin-test.c:661
    #8 0x560578de401a in cmd_test tests/builtin-test.c:807
    #9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312
    #10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364
    #11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408
    #12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538
    #13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308

Fixes: cff7f95 ("perf tests: Move pmu tests into separate object")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
pcercuei referenced this issue in OpenDingux/linux Nov 3, 2020
Very sporadically I had test case btrfs/069 from fstests hanging (for
years, it is not a recent regression), with the following traces in
dmesg/syslog:

  [162301.160628] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg started
  [162301.181196] BTRFS info (device sdc): scrub: finished on devid 4 with status: 0
  [162301.287162] BTRFS info (device sdc): dev_replace from /dev/sdd (devid 2) to /dev/sdg finished
  [162513.513792] INFO: task btrfs-transacti:1356167 blocked for more than 120 seconds.
  [162513.514318]       Not tainted 5.9.0-rc6-btrfs-next-69 #1
  [162513.514522] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [162513.514747] task:btrfs-transacti state:D stack:    0 pid:1356167 ppid:     2 flags:0x00004000
  [162513.514751] Call Trace:
  [162513.514761]  __schedule+0x5ce/0xd00
  [162513.514765]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [162513.514771]  schedule+0x46/0xf0
  [162513.514844]  wait_current_trans+0xde/0x140 [btrfs]
  [162513.514850]  ? finish_wait+0x90/0x90
  [162513.514864]  start_transaction+0x37c/0x5f0 [btrfs]
  [162513.514879]  transaction_kthread+0xa4/0x170 [btrfs]
  [162513.514891]  ? btrfs_cleanup_transaction+0x660/0x660 [btrfs]
  [162513.514894]  kthread+0x153/0x170
  [162513.514897]  ? kthread_stop+0x2c0/0x2c0
  [162513.514902]  ret_from_fork+0x22/0x30
  [162513.514916] INFO: task fsstress:1356184 blocked for more than 120 seconds.
  [162513.515192]       Not tainted 5.9.0-rc6-btrfs-next-69 #1
  [162513.515431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [162513.515680] task:fsstress        state:D stack:    0 pid:1356184 ppid:1356177 flags:0x00004000
  [162513.515682] Call Trace:
  [162513.515688]  __schedule+0x5ce/0xd00
  [162513.515691]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [162513.515697]  schedule+0x46/0xf0
  [162513.515712]  wait_current_trans+0xde/0x140 [btrfs]
  [162513.515716]  ? finish_wait+0x90/0x90
  [162513.515729]  start_transaction+0x37c/0x5f0 [btrfs]
  [162513.515743]  btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs]
  [162513.515753]  btrfs_sync_fs+0x61/0x1c0 [btrfs]
  [162513.515758]  ? __ia32_sys_fdatasync+0x20/0x20
  [162513.515761]  iterate_supers+0x87/0xf0
  [162513.515765]  ksys_sync+0x60/0xb0
  [162513.515768]  __do_sys_sync+0xa/0x10
  [162513.515771]  do_syscall_64+0x33/0x80
  [162513.515774]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [162513.515781] RIP: 0033:0x7f5238f50bd7
  [162513.515782] Code: Bad RIP value.
  [162513.515784] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
  [162513.515786] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7
  [162513.515788] RDX: 00000000ffffffff RSI: 000000000daf0e74 RDI: 000000000000003a
  [162513.515789] RBP: 0000000000000032 R08: 000000000000000a R09: 00007f5239019be0
  [162513.515791] R10: fffffffffffff24f R11: 0000000000000206 R12: 000000000000003a
  [162513.515792] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340
  [162513.515804] INFO: task fsstress:1356185 blocked for more than 120 seconds.
  [162513.516064]       Not tainted 5.9.0-rc6-btrfs-next-69 #1
  [162513.516329] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [162513.516617] task:fsstress        state:D stack:    0 pid:1356185 ppid:1356177 flags:0x00000000
  [162513.516620] Call Trace:
  [162513.516625]  __schedule+0x5ce/0xd00
  [162513.516628]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [162513.516634]  schedule+0x46/0xf0
  [162513.516647]  wait_current_trans+0xde/0x140 [btrfs]
  [162513.516650]  ? finish_wait+0x90/0x90
  [162513.516662]  start_transaction+0x4d7/0x5f0 [btrfs]
  [162513.516679]  btrfs_setxattr_trans+0x3c/0x100 [btrfs]
  [162513.516686]  __vfs_setxattr+0x66/0x80
  [162513.516691]  __vfs_setxattr_noperm+0x70/0x200
  [162513.516697]  vfs_setxattr+0x6b/0x120
  [162513.516703]  setxattr+0x125/0x240
  [162513.516709]  ? lock_acquire+0xb1/0x480
  [162513.516712]  ? mnt_want_write+0x20/0x50
  [162513.516721]  ? rcu_read_lock_any_held+0x8e/0xb0
  [162513.516723]  ? preempt_count_add+0x49/0xa0
  [162513.516725]  ? __sb_start_write+0x19b/0x290
  [162513.516727]  ? preempt_count_add+0x49/0xa0
  [162513.516732]  path_setxattr+0xba/0xd0
  [162513.516739]  __x64_sys_setxattr+0x27/0x30
  [162513.516741]  do_syscall_64+0x33/0x80
  [162513.516743]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [162513.516745] RIP: 0033:0x7f5238f56d5a
  [162513.516746] Code: Bad RIP value.
  [162513.516748] RSP: 002b:00007fff67b97868 EFLAGS: 00000202 ORIG_RAX: 00000000000000bc
  [162513.516750] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f5238f56d5a
  [162513.516751] RDX: 000055b1fbb0d5a0 RSI: 00007fff67b978a0 RDI: 000055b1fbb0d470
  [162513.516753] RBP: 000055b1fbb0d5a0 R08: 0000000000000001 R09: 00007fff67b97700
  [162513.516754] R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000004
  [162513.516756] R13: 0000000000000024 R14: 0000000000000001 R15: 00007fff67b978a0
  [162513.516767] INFO: task fsstress:1356196 blocked for more than 120 seconds.
  [162513.517064]       Not tainted 5.9.0-rc6-btrfs-next-69 #1
  [162513.517365] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [162513.517763] task:fsstress        state:D stack:    0 pid:1356196 ppid:1356177 flags:0x00004000
  [162513.517780] Call Trace:
  [162513.517786]  __schedule+0x5ce/0xd00
  [162513.517789]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [162513.517796]  schedule+0x46/0xf0
  [162513.517810]  wait_current_trans+0xde/0x140 [btrfs]
  [162513.517814]  ? finish_wait+0x90/0x90
  [162513.517829]  start_transaction+0x37c/0x5f0 [btrfs]
  [162513.517845]  btrfs_attach_transaction_barrier+0x1f/0x50 [btrfs]
  [162513.517857]  btrfs_sync_fs+0x61/0x1c0 [btrfs]
  [162513.517862]  ? __ia32_sys_fdatasync+0x20/0x20
  [162513.517865]  iterate_supers+0x87/0xf0
  [162513.517869]  ksys_sync+0x60/0xb0
  [162513.517872]  __do_sys_sync+0xa/0x10
  [162513.517875]  do_syscall_64+0x33/0x80
  [162513.517878]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [162513.517881] RIP: 0033:0x7f5238f50bd7
  [162513.517883] Code: Bad RIP value.
  [162513.517885] RSP: 002b:00007fff67b978e8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
  [162513.517887] RAX: ffffffffffffffda RBX: 000055b1fad2c560 RCX: 00007f5238f50bd7
  [162513.517889] RDX: 0000000000000000 RSI: 000000007660add2 RDI: 0000000000000053
  [162513.517891] RBP: 0000000000000032 R08: 0000000000000067 R09: 00007f5239019be0
  [162513.517893] R10: fffffffffffff24f R11: 0000000000000206 R12: 0000000000000053
  [162513.517895] R13: 00007fff67b97950 R14: 00007fff67b97906 R15: 000055b1fad1a340
  [162513.517908] INFO: task fsstress:1356197 blocked for more than 120 seconds.
  [162513.518298]       Not tainted 5.9.0-rc6-btrfs-next-69 #1
  [162513.518672] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [162513.519157] task:fsstress        state:D stack:    0 pid:1356197 ppid:1356177 flags:0x00000000
  [162513.519160] Call Trace:
  [162513.519165]  __schedule+0x5ce/0xd00
  [162513.519168]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [162513.519174]  schedule+0x46/0xf0
  [162513.519190]  wait_current_trans+0xde/0x140 [btrfs]
  [162513.519193]  ? finish_wait+0x90/0x90
  [162513.519206]  start_transaction+0x4d7/0x5f0 [btrfs]
  [162513.519222]  btrfs_create+0x57/0x200 [btrfs]
  [162513.519230]  lookup_open+0x522/0x650
  [162513.519246]  path_openat+0x2b8/0xa50
  [162513.519270]  do_filp_open+0x91/0x100
  [162513.519275]  ? find_held_lock+0x32/0x90
  [162513.519280]  ? lock_acquired+0x33b/0x470
  [162513.519285]  ? do_raw_spin_unlock+0x4b/0xc0
  [162513.519287]  ? _raw_spin_unlock+0x29/0x40
  [162513.519295]  do_sys_openat2+0x20d/0x2d0
  [162513.519300]  do_sys_open+0x44/0x80
  [162513.519304]  do_syscall_64+0x33/0x80
  [162513.519307]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [162513.519309] RIP: 0033:0x7f5238f4a903
  [162513.519310] Code: Bad RIP value.
  [162513.519312] RSP: 002b:00007fff67b97758 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
  [162513.519314] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f5238f4a903
  [162513.519316] RDX: 0000000000000000 RSI: 00000000000001b6 RDI: 000055b1fbb0d470
  [162513.519317] RBP: 00007fff67b978c0 R08: 0000000000000001 R09: 0000000000000002
  [162513.519319] R10: 00007fff67b974f7 R11: 0000000000000246 R12: 0000000000000013
  [162513.519320] R13: 00000000000001b6 R14: 00007fff67b97906 R15: 000055b1fad1c620
  [162513.519332] INFO: task btrfs:1356211 blocked for more than 120 seconds.
  [162513.519727]       Not tainted 5.9.0-rc6-btrfs-next-69 #1
  [162513.520115] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [162513.520508] task:btrfs           state:D stack:    0 pid:1356211 ppid:1356178 flags:0x00004002
  [162513.520511] Call Trace:
  [162513.520516]  __schedule+0x5ce/0xd00
  [162513.520519]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  [162513.520525]  schedule+0x46/0xf0
  [162513.520544]  btrfs_scrub_pause+0x11f/0x180 [btrfs]
  [162513.520548]  ? finish_wait+0x90/0x90
  [162513.520562]  btrfs_commit_transaction+0x45a/0xc30 [btrfs]
  [162513.520574]  ? start_transaction+0xe0/0x5f0 [btrfs]
  [162513.520596]  btrfs_dev_replace_finishing+0x6d8/0x711 [btrfs]
  [162513.520619]  btrfs_dev_replace_by_ioctl.cold+0x1cc/0x1fd [btrfs]
  [162513.520639]  btrfs_ioctl+0x2a25/0x36f0 [btrfs]
  [162513.520643]  ? do_sigaction+0xf3/0x240
  [162513.520645]  ? find_held_lock+0x32/0x90
  [162513.520648]  ? do_sigaction+0xf3/0x240
  [162513.520651]  ? lock_acquired+0x33b/0x470
  [162513.520655]  ? _raw_spin_unlock_irq+0x24/0x50
  [162513.520657]  ? lockdep_hardirqs_on+0x7d/0x100
  [162513.520660]  ? _raw_spin_unlock_irq+0x35/0x50
  [162513.520662]  ? do_sigaction+0xf3/0x240
  [162513.520671]  ? __x64_sys_ioctl+0x83/0xb0
  [162513.520672]  __x64_sys_ioctl+0x83/0xb0
  [162513.520677]  do_syscall_64+0x33/0x80
  [162513.520679]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [162513.520681] RIP: 0033:0x7fc3cd307d87
  [162513.520682] Code: Bad RIP value.
  [162513.520684] RSP: 002b:00007ffe30a56bb8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
  [162513.520686] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc3cd307d87
  [162513.520687] RDX: 00007ffe30a57a30 RSI: 00000000ca289435 RDI: 0000000000000003
  [162513.520689] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
  [162513.520690] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000003
  [162513.520692] R13: 0000557323a212e0 R14: 00007ffe30a5a520 R15: 0000000000000001
  [162513.520703]
		  Showing all locks held in the system:
  [162513.520712] 1 lock held by khungtaskd/54:
  [162513.520713]  #0: ffffffffb40a91a0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x15/0x197
  [162513.520728] 1 lock held by in:imklog/596:
  [162513.520729]  #0: ffff8f3f0d781400 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4d/0x60
  [162513.520782] 1 lock held by btrfs-transacti/1356167:
  [162513.520784]  #0: ffff8f3d810cc848 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0x4a/0x170 [btrfs]
  [162513.520798] 1 lock held by btrfs/1356190:
  [162513.520800]  #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write_file+0x22/0x60
  [162513.520805] 1 lock held by fsstress/1356184:
  [162513.520806]  #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0
  [162513.520811] 3 locks held by fsstress/1356185:
  [162513.520812]  #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50
  [162513.520815]  #1: ffff8f3d80a650b8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: vfs_setxattr+0x50/0x120
  [162513.520820]  #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs]
  [162513.520833] 1 lock held by fsstress/1356196:
  [162513.520834]  #0: ffff8f3d576440e8 (&type->s_umount_key#62){++++}-{3:3}, at: iterate_supers+0x6f/0xf0
  [162513.520838] 3 locks held by fsstress/1356197:
  [162513.520839]  #0: ffff8f3d57644470 (sb_writers#15){.+.+}-{0:0}, at: mnt_want_write+0x20/0x50
  [162513.520843]  #1: ffff8f3d506465e8 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: path_openat+0x2a7/0xa50
  [162513.520846]  #2: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs]
  [162513.520858] 2 locks held by btrfs/1356211:
  [162513.520859]  #0: ffff8f3d810cde30 (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.}-{3:3}, at: btrfs_dev_replace_finishing+0x52/0x711 [btrfs]
  [162513.520877]  #1: ffff8f3d57644690 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x40e/0x5f0 [btrfs]

This was weird because the stack traces show that a transaction commit,
triggered by a device replace operation, is blocking trying to pause any
running scrubs but there are no stack traces of blocked tasks doing a
scrub.

After poking around with drgn, I noticed there was a scrub task that was
constantly running and blocking for shorts periods of time:

  >>> t = find_task(prog, 1356190)
  >>> prog.stack_trace(t)
  #0  __schedule+0x5ce/0xcfc
  #1  schedule+0x46/0xe4
  #2  schedule_timeout+0x1df/0x475
  #3  btrfs_reada_wait+0xda/0x132
  #4  scrub_stripe+0x2a8/0x112f
  #5  scrub_chunk+0xcd/0x134
  #6  scrub_enumerate_chunks+0x29e/0x5ee
  #7  btrfs_scrub_dev+0x2d5/0x91b
  #8  btrfs_ioctl+0x7f5/0x36e7
  #9  __x64_sys_ioctl+0x83/0xb0
  #10 do_syscall_64+0x33/0x77
  #11 entry_SYSCALL_64+0x7c/0x156

Which corresponds to:

int btrfs_reada_wait(void *handle)
{
    struct reada_control *rc = handle;
    struct btrfs_fs_info *fs_info = rc->fs_info;

    while (atomic_read(&rc->elems)) {
        if (!atomic_read(&fs_info->reada_works_cnt))
            reada_start_machine(fs_info);
        wait_event_timeout(rc->wait, atomic_read(&rc->elems) == 0,
                          (HZ + 9) / 10);
    }
(...)

So the counter "rc->elems" was set to 1 and never decreased to 0, causing
the scrub task to loop forever in that function. Then I used the following
script for drgn to check the readahead requests:

  $ cat dump_reada.py
  import sys
  import drgn
  from drgn import NULL, Object, cast, container_of, execscript, \
      reinterpret, sizeof
  from drgn.helpers.linux import *

  mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1"

  mnt = None
  for mnt in for_each_mount(prog, dst = mnt_path):
      pass

  if mnt is None:
      sys.stderr.write(f'Error: mount point {mnt_path} not found\n')
      sys.exit(1)

  fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info)

  def dump_re(re):
      nzones = re.nzones.value_()
      print(f're at {hex(re.value_())}')
      print(f'\t logical {re.logical.value_()}')
      print(f'\t refcnt {re.refcnt.value_()}')
      print(f'\t nzones {nzones}')
      for i in range(nzones):
          dev = re.zones[i].device
          name = dev.name.str.string_()
          print(f'\t\t dev id {dev.devid.value_()} name {name}')
      print()

  for _, e in radix_tree_for_each(fs_info.reada_tree):
      re = cast('struct reada_extent *', e)
      dump_re(re)

  $ drgn dump_reada.py
  re at 0xffff8f3da9d25ad8
          logical 38928384
          refcnt 1
          nzones 1
                 dev id 0 name b'/dev/sdd'
  $

So there was one readahead extent with a single zone corresponding to the
source device of that last device replace operation logged in dmesg/syslog.
Also the ID of that zone's device was 0 which is a special value set in
the source device of a device replace operation when the operation finishes
(constant BTRFS_DEV_REPLACE_DEVID set at btrfs_dev_replace_finishing()),
confirming again that device /dev/sdd was the source of a device replace
operation.

Normally there should be as many zones in the readahead extent as there are
devices, and I wasn't expecting the extent to be in a block group with a
'single' profile, so I went and confirmed with the following drgn script
that there weren't any single profile block groups:

  $ cat dump_block_groups.py
  import sys
  import drgn
  from drgn import NULL, Object, cast, container_of, execscript, \
      reinterpret, sizeof
  from drgn.helpers.linux import *

  mnt_path = b"/home/fdmanana/btrfs-tests/scratch_1"

  mnt = None
  for mnt in for_each_mount(prog, dst = mnt_path):
      pass

  if mnt is None:
      sys.stderr.write(f'Error: mount point {mnt_path} not found\n')
      sys.exit(1)

  fs_info = cast('struct btrfs_fs_info *', mnt.mnt.mnt_sb.s_fs_info)

  BTRFS_BLOCK_GROUP_DATA = (1 << 0)
  BTRFS_BLOCK_GROUP_SYSTEM = (1 << 1)
  BTRFS_BLOCK_GROUP_METADATA = (1 << 2)
  BTRFS_BLOCK_GROUP_RAID0 = (1 << 3)
  BTRFS_BLOCK_GROUP_RAID1 = (1 << 4)
  BTRFS_BLOCK_GROUP_DUP = (1 << 5)
  BTRFS_BLOCK_GROUP_RAID10 = (1 << 6)
  BTRFS_BLOCK_GROUP_RAID5 = (1 << 7)
  BTRFS_BLOCK_GROUP_RAID6 = (1 << 8)
  BTRFS_BLOCK_GROUP_RAID1C3 = (1 << 9)
  BTRFS_BLOCK_GROUP_RAID1C4 = (1 << 10)

  def bg_flags_string(bg):
      flags = bg.flags.value_()
      ret = ''
      if flags & BTRFS_BLOCK_GROUP_DATA:
          ret = 'data'
      if flags & BTRFS_BLOCK_GROUP_METADATA:
          if len(ret) > 0:
              ret += '|'
          ret += 'meta'
      if flags & BTRFS_BLOCK_GROUP_SYSTEM:
          if len(ret) > 0:
              ret += '|'
          ret += 'system'
      if flags & BTRFS_BLOCK_GROUP_RAID0:
          ret += ' raid0'
      elif flags & BTRFS_BLOCK_GROUP_RAID1:
          ret += ' raid1'
      elif flags & BTRFS_BLOCK_GROUP_DUP:
          ret += ' dup'
      elif flags & BTRFS_BLOCK_GROUP_RAID10:
          ret += ' raid10'
      elif flags & BTRFS_BLOCK_GROUP_RAID5:
          ret += ' raid5'
      elif flags & BTRFS_BLOCK_GROUP_RAID6:
          ret += ' raid6'
      elif flags & BTRFS_BLOCK_GROUP_RAID1C3:
          ret += ' raid1c3'
      elif flags & BTRFS_BLOCK_GROUP_RAID1C4:
          ret += ' raid1c4'
      else:
          ret += ' single'

      return ret

  def dump_bg(bg):
      print()
      print(f'block group at {hex(bg.value_())}')
      print(f'\t start {bg.start.value_()} length {bg.length.value_()}')
      print(f'\t flags {bg.flags.value_()} - {bg_flags_string(bg)}')

  bg_root = fs_info.block_group_cache_tree.address_of_()
  for bg in rbtree_inorder_for_each_entry('struct btrfs_block_group', bg_root, 'cache_node'):
      dump_bg(bg)

  $ drgn dump_block_groups.py

  block group at 0xffff8f3d673b0400
         start 22020096 length 16777216
         flags 258 - system raid6

  block group at 0xffff8f3d53ddb400
         start 38797312 length 536870912
         flags 260 - meta raid6

  block group at 0xffff8f3d5f4d9c00
         start 575668224 length 2147483648
         flags 257 - data raid6

  block group at 0xffff8f3d08189000
         start 2723151872 length 67108864
         flags 258 - system raid6

  block group at 0xffff8f3db70ff000
         start 2790260736 length 1073741824
         flags 260 - meta raid6

  block group at 0xffff8f3d5f4dd800
         start 3864002560 length 67108864
         flags 258 - system raid6

  block group at 0xffff8f3d67037000
         start 3931111424 length 2147483648
         flags 257 - data raid6
  $

So there were only 2 reasons left for having a readahead extent with a
single zone: reada_find_zone(), called when creating a readahead extent,
returned NULL either because we failed to find the corresponding block
group or because a memory allocation failed. With some additional and
custom tracing I figured out that on every further ocurrence of the
problem the block group had just been deleted when we were looping to
create the zones for the readahead extent (at reada_find_extent()), so we
ended up with only one zone in the readahead extent, corresponding to a
device that ends up getting replaced.

So after figuring that out it became obvious why the hang happens:

1) Task A starts a scrub on any device of the filesystem, except for
   device /dev/sdd;

2) Task B starts a device replace with /dev/sdd as the source device;

3) Task A calls btrfs_reada_add() from scrub_stripe() and it is currently
   starting to scrub a stripe from block group X. This call to
   btrfs_reada_add() is the one for the extent tree. When btrfs_reada_add()
   calls reada_add_block(), it passes the logical address of the extent
   tree's root node as its 'logical' argument - a value of 38928384;

4) Task A then enters reada_find_extent(), called from reada_add_block().
   It finds there isn't any existing readahead extent for the logical
   address 38928384, so it proceeds to the path of creating a new one.

   It calls btrfs_map_block() to find out which stripes exist for the block
   group X. On the first iteration of the for loop that iterates over the
   stripes, it finds the stripe for device /dev/sdd, so it creates one
   zone for that device and adds it to the readahead extent. Before getting
   into the second iteration of the loop, the cleanup kthread deletes block
   group X because it was empty. So in the iterations for the remaining
   stripes it does not add more zones to the readahead extent, because the
   calls to reada_find_zone() returned NULL because they couldn't find
   block group X anymore.

   As a result the new readahead extent has a single zone, corresponding to
   the device /dev/sdd;

4) Before task A returns to btrfs_reada_add() and queues the readahead job
   for the readahead work queue, task B finishes the device replace and at
   btrfs_dev_replace_finishing() swaps the device /dev/sdd with the new
   device /dev/sdg;

5) Task A returns to reada_add_block(), which increments the counter
   "->elems" of the reada_control structure allocated at btrfs_reada_add().

   Then it returns back to btrfs_reada_add() and calls
   reada_start_machine(). This queues a job in the readahead work queue to
   run the function reada_start_machine_worker(), which calls
   __reada_start_machine().

   At __reada_start_machine() we take the device list mutex and for each
   device found in the current device list, we call
   reada_start_machine_dev() to start the readahead work. However at this
   point the device /dev/sdd was already freed and is not in the device
   list anymore.

   This means the corresponding readahead for the extent at 38928384 is
   never started, and therefore the "->elems" counter of the reada_control
   structure allocated at btrfs_reada_add() never goes down to 0, causing
   the call to btrfs_reada_wait(), done by the scrub task, to wait forever.

Note that the readahead request can be made either after the device replace
started or before it started, however in pratice it is very unlikely that a
device replace is able to start after a readahead request is made and is
able to complete before the readahead request completes - maybe only on a
very small and nearly empty filesystem.

This hang however is not the only problem we can have with readahead and
device removals. When the readahead extent has other zones other than the
one corresponding to the device that is being removed (either by a device
replace or a device remove operation), we risk having a use-after-free on
the device when dropping the last reference of the readahead extent.

For example if we create a readahead extent with two zones, one for the
device /dev/sdd and one for the device /dev/sde:

1) Before the readahead worker starts, the device /dev/sdd is removed,
   and the corresponding btrfs_device structure is freed. However the
   readahead extent still has the zone pointing to the device structure;

2) When the readahead worker starts, it only finds device /dev/sde in the
   current device list of the filesystem;

3) It starts the readahead work, at reada_start_machine_dev(), using the
   device /dev/sde;

4) Then when it finishes reading the extent from device /dev/sde, it calls
   __readahead_hook() which ends up dropping the last reference on the
   readahead extent through the last call to reada_extent_put();

5) At reada_extent_put() it iterates over each zone of the readahead extent
   and attempts to delete an element from the device's 'reada_extents'
   radix tree, resulting in a use-after-free, as the device pointer of the
   zone for /dev/sdd is now stale. We can also access the device after
   dropping the last reference of a zone, through reada_zone_release(),
   also called by reada_extent_put().

And a device remove suffers the same problem, however since it shrinks the
device size down to zero before removing the device, it is very unlikely to
still have readahead requests not completed by the time we free the device,
the only possibility is if the device has a very little space allocated.

While the hang problem is exclusive to scrub, since it is currently the
only user of btrfs_reada_add() and btrfs_reada_wait(), the use-after-free
problem affects any path that triggers readhead, which includes
btree_readahead_hook() and __readahead_hook() (a readahead worker can
trigger readahed for the children of a node) for example - any path that
ends up calling reada_add_block() can trigger the use-after-free after a
device is removed.

So fix this by waiting for any readahead requests for a device to complete
before removing a device, ensuring that while waiting for existing ones no
new ones can be made.

This problem has been around for a very long time - the readahead code was
added in 2011, device remove exists since 2008 and device replace was
introduced in 2013, hard to pick a specific commit for a git Fixes tag.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
pcercuei referenced this issue in OpenDingux/linux Nov 28, 2020
This fix is for a failure that occurred in the DWARF unwind perf test.

Stack unwinders may probe memory when looking for frames.

Memory sanitizer will poison and track uninitialized memory on the
stack, and on the heap if the value is copied to the heap.

This can lead to false memory sanitizer failures for the use of an
uninitialized value.

Avoid this problem by removing the poison on the copied stack.

The full msan failure with track origins looks like:

==2168==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8
    #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    MIPS#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    MIPS#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    MIPS#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    MIPS#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    MIPS#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    MIPS#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    MIPS#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    MIPS#23 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22
    #1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13
    #2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    MIPS#16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    MIPS#17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    MIPS#18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    MIPS#19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    MIPS#20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    MIPS#21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    MIPS#22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    MIPS#23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    MIPS#24 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9
    #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    MIPS#16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    MIPS#17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    MIPS#18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    MIPS#19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    MIPS#20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    MIPS#21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    MIPS#22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    MIPS#23 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10
    #1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13
    #2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18
    #3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4
    #4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7
    #5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10
    #6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17
    #7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17
    #8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14
    #9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10
    #10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8
    #11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8
    #12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    MIPS#16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    MIPS#17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    MIPS#18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    MIPS#19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    MIPS#20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    MIPS#21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    MIPS#22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    MIPS#23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    MIPS#24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    MIPS#25 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was stored to memory at
    #0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3
    #1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2
    #2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9
    #3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6
    #4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26
    #5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0)
    #6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2
    #7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9
    #8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9
    #9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8
    #10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9
    #11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9
    #12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4
    #13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9
    #14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11
    #15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8
    MIPS#16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2
    MIPS#17 0x559cea95fbce in main tools/perf/perf.c:539:3

  Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events'
    #0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445

SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: clang-built-linux@googlegroups.com
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandeep Dasgupta <sdasgup@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
pcercuei referenced this issue in OpenDingux/linux Dec 17, 2020
We hit this issue in our internal test.  When enabling generic kasan, a
kfree()'d object is put into per-cpu quarantine first.  If the cpu goes
offline, object still remains in the per-cpu quarantine.  If we call
kmem_cache_destroy() now, slub will report "Objects remaining" error.

  =============================================================================
  BUG test_module_slab (Not tainted): Objects remaining in test_module_slab on __kmem_cache_shutdown()
  -----------------------------------------------------------------------------

  Disabling lock debugging due to kernel taint
  INFO: Slab 0x(____ptrval____) objects=34 used=1 fp=0x(____ptrval____) flags=0x2ffff00000010200
  CPU: 3 PID: 176 Comm: cat Tainted: G    B             5.10.0-rc1-00007-g4525c8781ec0-dirty #10
  Hardware name: linux,dummy-virt (DT)
  Call trace:
     dump_backtrace+0x0/0x2b0
     show_stack+0x18/0x68
     dump_stack+0xfc/0x168
     slab_err+0xac/0xd4
     __kmem_cache_shutdown+0x1e4/0x3c8
     kmem_cache_destroy+0x68/0x130
     test_version_show+0x84/0xf0
     module_attr_show+0x40/0x60
     sysfs_kf_seq_show+0x128/0x1c0
     kernfs_seq_show+0xa0/0xb8
     seq_read+0x1f0/0x7e8
     kernfs_fop_read+0x70/0x338
     vfs_read+0xe4/0x250
     ksys_read+0xc8/0x180
     __arm64_sys_read+0x44/0x58
     el0_svc_common.constprop.0+0xac/0x228
     do_el0_svc+0x38/0xa0
     el0_sync_handler+0x170/0x178
     el0_sync+0x174/0x180
  INFO: Object 0x(____ptrval____) @offset=15848
  INFO: Allocated in test_version_show+0x98/0xf0 age=8188 cpu=6 pid=172
     stack_trace_save+0x9c/0xd0
     set_track+0x64/0xf0
     alloc_debug_processing+0x104/0x1a0
     ___slab_alloc+0x628/0x648
     __slab_alloc.isra.0+0x2c/0x58
     kmem_cache_alloc+0x560/0x588
     test_version_show+0x98/0xf0
     module_attr_show+0x40/0x60
     sysfs_kf_seq_show+0x128/0x1c0
     kernfs_seq_show+0xa0/0xb8
     seq_read+0x1f0/0x7e8
     kernfs_fop_read+0x70/0x338
     vfs_read+0xe4/0x250
     ksys_read+0xc8/0x180
     __arm64_sys_read+0x44/0x58
     el0_svc_common.constprop.0+0xac/0x228
  kmem_cache_destroy test_module_slab: Slab cache still has objects

Register a cpu hotplug function to remove all objects in the offline
per-cpu quarantine when cpu is going offline.  Set a per-cpu variable to
indicate this cpu is offline.

[qiang.zhang@windriver.com: fix slab double free when cpu-hotplug]
  Link: https://lkml.kernel.org/r/20201204102206.20237-1-qiang.zhang@windriver.com

Link: https://lkml.kernel.org/r/1606895585-17382-2-git-send-email-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Signed-off-by: Zqiang <qiang.zhang@windriver.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Guangye Yang <guangye.yang@mediatek.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Nicholas Tang <nicholas.tang@mediatek.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Qian Cai <qcai@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants