Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VC4 kernel framework driver - memory leak - 2 #123

Closed
nkichukov opened this issue Dec 2, 2017 · 7 comments
Closed

VC4 kernel framework driver - memory leak - 2 #123

nkichukov opened this issue Dec 2, 2017 · 7 comments

Comments

@nkichukov
Copy link

Hi all,

I have found a second vc4 memory leak on raspberry pi 3b after #122 got fixed.

Details follow below:

kernel: 4.14.0-v7+
compiler: gcc-6.4
raspberry pi running in 32bit mode
CMA=256MB
OS: Gentoo Linux
Kernel configuration is attached, see below.

Whenever kodi(v17.6) is playing video, I can see that /proc/slabinfo reports number of kmalloc-128 constantly increasing and never releasing:
kmalloc-128 51027 58863 384 21 2 : tunables 0 0 0 : slabdata 2803 2803 0

The memleak pattern is like the one below (from /sys/kernel/debug/kmemleak):

unreferenced object 0x842aa340 (size 128):
comm "X", pid 2009, jiffies 7034334 (age 301.527s)
hex dump (first 32 bytes):
50 3f eb b6 01 00 00 00 00 00 00 00 00 00 00 00 P?..............
50 a3 2a 84 50 a3 2a 84 00 00 00 00 00 00 00 00 P..P..........
backtrace:
[<802827c8>] kmem_cache_alloc_trace+0x234/0x2f8
[<7f167ae8>] drm_atomic_helper_setup_commit+0x1cc/0x3b0 [drm_kms_helper]
[<7f1e2aa0>] vc4_atomic_commit+0x30/0x130 [vc4]
[<7f0f44a0>] drm_atomic_commit+0x5c/0x68 [drm]
[<7f0f5700>] drm_atomic_connector_commit_dpms+0xf8/0x108 [drm]
[<7f0fae64>] drm_mode_obj_set_property_ioctl+0x1bc/0x2b4 [drm]
[<7f0f9820>] drm_mode_connector_property_set_ioctl+0x48/0x50 [drm]
[<7f0e296c>] drm_ioctl_kernel+0x78/0xb8 [drm]
[<7f0e2cd4>] drm_ioctl+0x1b4/0x37c [drm]
[<802ac090>] do_vfs_ioctl+0xb0/0x8b4
[<802ac8d8>] SyS_ioctl+0x44/0x6c
[<80108140>] ret_fast_syscall+0x0/0x28
[] 0xffffffff
unreferenced object 0x83ee2340 (size 128):
comm "X", pid 2009, jiffies 7044631 (age 291.296s)
hex dump (first 32 bytes):
50 3f eb b6 01 00 00 00 00 00 00 00 00 00 00 00 P?..............
50 23 ee 83 50 23 ee 83 00 00 00 00 00 00 00 00 P#..P#..........
backtrace:
[<802827c8>] kmem_cache_alloc_trace+0x234/0x2f8
[<7f167ae8>] drm_atomic_helper_setup_commit+0x1cc/0x3b0 [drm_kms_helper]
[<7f1e2aa0>] vc4_atomic_commit+0x30/0x130 [vc4]
[<7f0f44a0>] drm_atomic_commit+0x5c/0x68 [drm]
[<7f0f5700>] drm_atomic_connector_commit_dpms+0xf8/0x108 [drm]
[<7f0fae64>] drm_mode_obj_set_property_ioctl+0x1bc/0x2b4 [drm]
[<7f0f9820>] drm_mode_connector_property_set_ioctl+0x48/0x50 [drm]
[<7f0e296c>] drm_ioctl_kernel+0x78/0xb8 [drm]
[<7f0e2cd4>] drm_ioctl+0x1b4/0x37c [drm]
[<802ac090>] do_vfs_ioctl+0xb0/0x8b4
[<802ac8d8>] SyS_ioctl+0x44/0x6c
[<80108140>] ret_fast_syscall+0x0/0x28
[] 0xffffffff
unreferenced object 0x84bbe040 (size 128):
comm "X", pid 2009, jiffies 7049751 (age 286.242s)
hex dump (first 32 bytes):
50 3f eb b6 01 00 00 00 00 00 00 00 00 00 00 00 P?..............
50 e0 bb 84 50 e0 bb 84 00 00 00 00 00 00 00 00 P...P...........
backtrace:
[<802827c8>] kmem_cache_alloc_trace+0x234/0x2f8
[<7f167ae8>] drm_atomic_helper_setup_commit+0x1cc/0x3b0 [drm_kms_helper]
[<7f1e2aa0>] vc4_atomic_commit+0x30/0x130 [vc4]
[<7f0f44a0>] drm_atomic_commit+0x5c/0x68 [drm]
[<7f0f5700>] drm_atomic_connector_commit_dpms+0xf8/0x108 [drm]
[<7f0fae64>] drm_mode_obj_set_property_ioctl+0x1bc/0x2b4 [drm]
[<7f0f9820>] drm_mode_connector_property_set_ioctl+0x48/0x50 [drm]
[<7f0e296c>] drm_ioctl_kernel+0x78/0xb8 [drm]
[<7f0e2cd4>] drm_ioctl+0x1b4/0x37c [drm]
[<802ac090>] do_vfs_ioctl+0xb0/0x8b4
[<802ac8d8>] SyS_ioctl+0x44/0x6c
[<80108140>] ret_fast_syscall+0x0/0x28
[] 0xffffffff

Let me know if additional information is required to track this issue down.

rpi-4.14.y-kernel_configuration_file.gz

Thank you,
-Nikolay

@anholt
Copy link
Owner

anholt commented Jan 5, 2018

@nkichukov
Copy link
Author

Indeed, looks like it! I will apply the patch and see if it fixes it. Will report back as soon as I collect the results.

Thank you!
-N

@nkichukov
Copy link
Author

The patch would not apply on 4.14.y kernels. Had to upgrade to raspberrypi linux 4.15.rc6:

patching file drivers/gpu/drm/drm_atomic_helper.c
Hunk #1 succeeded at 3327 (offset -94 lines).

However, this is what I get when the vc4 kernel module loads at boot time:

Jan  1 01:00:28 grpi kernel: [   12.153431] vc4_hdmi 3f902000.hdmi: vc4-hdmi-hifi <-> 3f902000.hdmi mapping ok
Jan  1 01:00:28 grpi kernel: [   12.154337] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
Jan  1 01:00:28 grpi kernel: [   12.154427] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
Jan  1 01:00:28 grpi kernel: [   12.154598] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
Jan  1 01:00:28 grpi kernel: [   12.154705] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
Jan  1 01:00:28 grpi kernel: [   12.154811] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
Jan  1 01:00:28 grpi kernel: [   12.170853] ------------[ cut here ]------------
Jan  1 01:00:28 grpi kernel: [   12.170883] WARNING: CPU: 1 PID: 980 at kernel/irq/chip.c:244 __irq_startup+0xb4/0xb8
Jan  1 01:00:28 grpi kernel: [   12.170887] Modules linked in: vc4(+) snd_soc_core snd_pcm_dmaengine drm_kms_helper drm evdev cec snd_bcm2835(C) fb font smsc95xx snd_pcm usbnet
 mii snd_timer snd i2c_bcm2835 fixed
Jan  1 01:00:28 grpi kernel: [   12.170951] CPU: 1 PID: 980 Comm: systemd-udevd Tainted: G         C       4.15.0-rc6-v7+ #1
Jan  1 01:00:28 grpi kernel: [   12.170954] Hardware name: BCM2835
Jan  1 01:00:28 grpi kernel: [   12.170977] [<801108fc>] (unwind_backtrace) from [<8010ca9c>] (show_stack+0x20/0x24)
Jan  1 01:00:28 grpi kernel: [   12.170987] [<8010ca9c>] (show_stack) from [<8065ded8>] (dump_stack+0xc8/0x10c)
Jan  1 01:00:28 grpi kernel: [   12.170997] [<8065ded8>] (dump_stack) from [<8011e48c>] (__warn+0x104/0x11c)
Jan  1 01:00:28 grpi kernel: [   12.171007] [<8011e48c>] (__warn) from [<8011e594>] (warn_slowpath_null+0x50/0x58)
Jan  1 01:00:28 grpi kernel: [   12.171015] [<8011e594>] (warn_slowpath_null) from [<8017d288>] (__irq_startup+0xb4/0xb8)
Jan  1 01:00:28 grpi kernel: [   12.171026] [<8017d288>] (__irq_startup) from [<8017d2ec>] (irq_startup+0x60/0x128)
Jan  1 01:00:28 grpi kernel: [   12.171036] [<8017d2ec>] (irq_startup) from [<8017aca4>] (__enable_irq+0x78/0x7c)
Jan  1 01:00:28 grpi kernel: [   12.171044] [<8017aca4>] (__enable_irq) from [<8017acec>] (enable_irq+0x44/0x7c)
Jan  1 01:00:28 grpi kernel: [   12.171118] [<8017acec>] (enable_irq) from [<7f1f097c>] (vc4_irq_postinstall+0x24/0x40 [vc4])
Jan  1 01:00:28 grpi kernel: [   12.171345] [<7f1f097c>] (vc4_irq_postinstall [vc4]) from [<7f0e4184>] (drm_irq_install+0xe0/0x120 [drm])
Jan  1 01:00:28 grpi kernel: [   12.171520] [<7f0e4184>] (drm_irq_install [drm]) from [<7f1f3b00>] (vc4_v3d_bind+0x138/0x230 [vc4])
Jan  1 01:00:28 grpi kernel: [   12.171564] [<7f1f3b00>] (vc4_v3d_bind [vc4]) from [<804684b8>] (component_bind_all+0x12c/0x24c)
Jan  1 01:00:28 grpi kernel: [   12.171604] [<804684b8>] (component_bind_all) from [<7f1e54f8>] (vc4_drm_bind+0xa4/0x14c [vc4])
Jan  1 01:00:28 grpi kernel: [   12.171646] [<7f1e54f8>] (vc4_drm_bind [vc4]) from [<8046896c>] (try_to_bring_up_master+0x180/0x1bc)
Jan  1 01:00:28 grpi kernel: [   12.171654] [<8046896c>] (try_to_bring_up_master) from [<80468c38>] (component_master_add_with_match+0x9c/0xd0)
Jan  1 01:00:28 grpi kernel: [   12.171692] [<80468c38>] (component_master_add_with_match) from [<7f1e5664>] (vc4_platform_drm_probe+0xc4/0xd4 [vc4])
Jan  1 01:00:28 grpi kernel: [   12.171751] [<7f1e5664>] (vc4_platform_drm_probe [vc4]) from [<80470570>] (platform_drv_probe+0x60/0xc0)
Jan  1 01:00:28 grpi kernel: [   12.171763] [<80470570>] (platform_drv_probe) from [<8046eacc>] (driver_probe_device+0x25c/0x338)
Jan  1 01:00:28 grpi kernel: [   12.171774] [<8046eacc>] (driver_probe_device) from [<8046ec70>] (__driver_attach+0xc8/0xcc)
Jan  1 01:00:28 grpi kernel: [   12.171782] [<8046ec70>] (__driver_attach) from [<8046cc40>] (bus_for_each_dev+0x78/0xac)
Jan  1 01:00:28 grpi kernel: [   12.171791] [<8046cc40>] (bus_for_each_dev) from [<8046e3d0>] (driver_attach+0x2c/0x30)
Jan  1 01:00:28 grpi kernel: [   12.171800] [<8046e3d0>] (driver_attach) from [<8046de1c>] (bus_add_driver+0x114/0x220)
Jan  1 01:00:28 grpi kernel: [   12.171807] [<8046de1c>] (bus_add_driver) from [<8046f41c>] (driver_register+0x88/0x104)
Jan  1 01:00:28 grpi kernel: [   12.171814] [<8046f41c>] (driver_register) from [<804704bc>] (__platform_driver_register+0x50/0x58)
Jan  1 01:00:28 grpi kernel: [   12.171853] [<804704bc>] (__platform_driver_register) from [<7f209040>] (vc4_drm_register+0x40/0x4c [vc4])
Jan  1 01:00:28 grpi kernel: [   12.171910] [<7f209040>] (vc4_drm_register [vc4]) from [<80101c5c>] (do_one_initcall+0x54/0x17c)
Jan  1 01:00:28 grpi kernel: [   12.171922] [<80101c5c>] (do_one_initcall) from [<801af7dc>] (do_init_module+0x74/0x224)
Jan  1 01:00:28 grpi kernel: [   12.171931] [<801af7dc>] (do_init_module) from [<801ae888>] (load_module+0x1f04/0x25a8)
Jan  1 01:00:28 grpi kernel: [   12.171940] [<801ae888>] (load_module) from [<801af160>] (SyS_finit_module+0xb8/0xc8)
Jan  1 01:00:28 grpi kernel: [   12.171948] [<801af160>] (SyS_finit_module) from [<80108140>] (ret_fast_syscall+0x0/0x28)
Jan  1 01:00:28 grpi kernel: [   12.171955] ---[ end trace 35939b95472d70af ]---
Jan  1 01:00:28 grpi kernel: [   12.172031] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
Jan  1 01:00:28 grpi kernel: [   12.172613] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0

and those errors were printed in the messages log:

Jan  6 00:58:48 grpi kernel: [  497.632306] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:53:crtc-2] flip_done timed out
Jan  6 00:59:11 grpi kernel: [  521.184305] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:52:plane-20] flip_done timed out

The system continues to function as normal, video rendering is not impacted as kodi plays everything just fine. At this stage I cannot say if there are any side (de)effects from the drm changes introduced in 4.15.y and their interaction with the vc4 module.

I do not see the memory leak at this moment that was originally reported in this issue, but I have to monitor for some time longer and will let you know if that is resolved.

Cheers,
-Nik

@nkichukov
Copy link
Author

Hello Eric,
The patch fixes the memory leak reported here.

If you believe there is something you can do for the stack trace when the module loads and the two errors that show up, let me know and I can open another Issue for it to track separately. If not, just close this case as the suggested worked and I no longer see the memory leak.

Thank you,
-Nik

@stschake
Copy link

stschake commented Jan 9, 2018

@nkichukov
Copy link
Author

Thanks Stefan,
I see this is not merged into rpi-4.15-rc7 yet. But I will apply the patches manually for now as I plan on getting rc7 as this has the proposed KAISER/MELTDOWN patches from mainline which seem to affect the cortex a53 ARM CPUs too.

I will let you know if the warning is gone once I have the patched kernel booted up.

Cheers,
-N

@nkichukov
Copy link
Author

rpi-4.15-rc7 with both patches applied resolves all of the described above. Hope those get merged into mainline any time sooner so I no longer have to patch manually for the next kernel upgrade.

Thanks for your support and keep up the good work for making the free software better and better!
-Nikolay

anholt pushed a commit that referenced this issue Jan 17, 2018
[ Upstream commit 1106638 ]

When slub_debug=O is set.  It is possible to clear debug flags for an
"unmergeable" slab cache in kmem_cache_open().  It makes the "unmergeable"
cache became "mergeable" in sysfs_slab_add().

These caches will generate their "unique IDs" by create_unique_id(), but
it is possible to create identical unique IDs.  In my experiment,
sgpool-128, names_cache, biovec-256 generate the same ID ":Ft-0004096" and
the kernel reports "sysfs: cannot create duplicate filename
'/kernel/slab/:Ft-0004096'".

To repeat my experiment, set disable_higher_order_debug=1,
CONFIG_SLUB_DEBUG_ON=y in kernel-4.14.

Fix this issue by setting unmergeable=1 if slub_debug=O and the the
default slub_debug contains any no-merge flags.

call path:
kmem_cache_create()
  __kmem_cache_alias()	-> we set SLAB_NEVER_MERGE flags here
  create_cache()
    __kmem_cache_create()
      kmem_cache_open()	-> clear DEBUG_METADATA_FLAGS
      sysfs_slab_add()	-> the slab cache is mergeable now

  sysfs: cannot create duplicate filename '/kernel/slab/:Ft-0004096'
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       4.14.0-rc7ajb-00131-gd4c2e9f-dirty #123
  Hardware name: linux,dummy-virt (DT)
  task: ffffffc07d4e0080 task.stack: ffffff8008008000
  PC is at sysfs_warn_dup+0x60/0x7c
  LR is at sysfs_warn_dup+0x60/0x7c
  pc :  lr :  pstate: 60000145
  Call trace:
   sysfs_warn_dup+0x60/0x7c
   sysfs_create_dir_ns+0x98/0xa0
   kobject_add_internal+0xa0/0x294
   kobject_init_and_add+0x90/0xb4
   sysfs_slab_add+0x90/0x200
   __kmem_cache_create+0x26c/0x438
   kmem_cache_create+0x164/0x1f4
   sg_pool_init+0x60/0x100
   do_one_initcall+0x38/0x12c
   kernel_init_freeable+0x138/0x1d4
   kernel_init+0x10/0xfc
   ret_from_fork+0x10/0x18

Link: http://lkml.kernel.org/r/1510365805-5155-1-git-send-email-miles.chen@mediatek.com
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants