-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
z_wr_int can be interrupted when CONFIG_PREEMPT_NONE=y is set #746
Comments
Here is another failure that occurred: [ 5103.739075] z_wr_int/11: page allocation failure: order:0, mode:0x4020 |
This is a failed atomic allocations in the network layer (__alloc_skb) likely because we've consumed to much of the reserve. It will result in a packet drop which is harmless since it will trigger a retransmit but does show we're still pushing the VM pretty hard. |
I only observe these failed allocations in the context of a z_wr_int thread. I have yet to observe them in any other situation, which is strange. I suspect that the use of PF_MEMALLOC had an unforeseen side effect. |
This issue is resolved by pull request #883. |
) Bumps [sysinfo](https://github.com/GuillaumeGomez/sysinfo) from 0.28.1 to 0.28.2. - [Release notes](https://github.com/GuillaumeGomez/sysinfo/releases) - [Changelog](https://github.com/GuillaumeGomez/sysinfo/blob/master/CHANGELOG.md) - [Commits](https://github.com/GuillaumeGomez/sysinfo/commits) --- updated-dependencies: - dependency-name: sysinfo dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
When stress testing the patches in #726, the following appeared in dmesg:
[ 1773.498133] z_wr_int/11: page allocation failure: order:0, mode:0x4020
[ 1773.498140] Pid: 3846, comm: z_wr_int/11 Tainted: P O 3.2.17 #1
[ 1773.498143] Call Trace:
[ 1773.498146] [] ? warn_alloc_failed+0xfc/0x150
[ 1773.498162] [] ? __alloc_pages_nodemask+0x580/0x7f0
[ 1773.498169] [] ? new_slab+0x209/0x220
[ 1773.498174] [] ? __slab_alloc.clone.64+0x204/0x290
[ 1773.498178] [] ? __netdev_alloc_skb+0x17/0x40
[ 1773.498184] [] ? __wake_up_common+0x50/0x80
[ 1773.498188] [] ? __kmalloc_track_caller+0x10b/0x160
[ 1773.498191] [] ? __alloc_skb+0x6e/0x170
[ 1773.498195] [] ? __netdev_alloc_skb+0x17/0x40
[ 1773.498208] [] ? e1000_alloc_rx_buffers+0x1fa/0x2e0 [e1000e]
[ 1773.498216] [] ? e1000_clean_rx_irq+0x28a/0x420 [e1000e]
[ 1773.498223] [] ? e1000_clean+0xb0/0x2a0 [e1000e]
[ 1773.498228] [] ? net_rx_action+0x79/0x110
[ 1773.498233] [] ? __do_softirq+0x90/0x110
[ 1773.498239] [] ? call_softirq+0x1c/0x30
[ 1773.498243] [] ? do_softirq+0x4d/0x80
[ 1773.498247] [] ? irq_exit+0x96/0xb0
[ 1773.498250] [] ? do_IRQ+0x5c/0xd0
[ 1773.498254] [] ? common_interrupt+0x6b/0x6b
[ 1773.498256] [] ? __mutex_lock_slowpath+0x1da/0x2a0
[ 1773.498282] [] ? dsl_dir_transfer_space+0x67/0x100 [zfs]
[ 1773.498286] [] ? mutex_lock+0x9/0x20
[ 1773.498297] [] ? arc_released+0x28/0x80 [zfs]
[ 1773.498309] [] ? dbuf_rele_and_unlock+0x46e/0x680 [zfs]
[ 1773.498319] [] ? arc_space_return+0x17c8/0x2de0 [zfs]
[ 1773.498331] [] ? zio_worst_error+0x290/0x1040 [zfs]
[ 1773.498338] [] ? kmem_free_debug+0x41/0x220 [spl]
[ 1773.498348] [] ? zio_worst_error+0x730/0x1040 [zfs]
[ 1773.498354] [] ? kmem_free_debug+0x41/0x220 [spl]
[ 1773.498369] [] ? vdev_queue_io_done+0x638/0x3760 [zfs]
[ 1773.498379] [] ? zio_worst_error+0x730/0x1040 [zfs]
[ 1773.498390] [] ? zio_execute+0x95/0x2a0 [zfs]
[ 1773.498404] [] ? vdev_queue_offset_compare+0x30b/0x5c0 [zfs]
[ 1773.498408] [] ? __mutex_lock_slowpath+0x1e2/0x2a0
[ 1773.498423] [] ? vdev_queue_io_done+0xa7/0x3760 [zfs]
[ 1773.498434] [] ? zio_buf_free+0x418/0xa90 [zfs]
[ 1773.498444] [] ? zio_execute+0x95/0x2a0 [zfs]
[ 1773.498450] [] ? __taskq_dispatch+0x7e0/0xb30 [spl]
[ 1773.498454] [] ? __schedule+0x29e/0x740
[ 1773.498460] [] ? try_to_wake_up+0x270/0x270
[ 1773.498465] [] ? __taskq_dispatch+0x580/0xb30 [spl]
[ 1773.498471] [] ? __taskq_dispatch+0x580/0xb30 [spl]
[ 1773.498477] [] ? kthread+0x96/0xa0
[ 1773.498481] [] ? kernel_thread_helper+0x4/0x10
[ 1773.498486] [] ? kthread_worker_fn+0x180/0x180
[ 1773.498490] [] ? gs_change+0xb/0xb
[ 1773.498492] Mem-Info:
[ 1773.498495] DMA per-cpu:
[ 1773.498498] CPU 0: hi: 0, btch: 1 usd: 0
[ 1773.498500] CPU 1: hi: 0, btch: 1 usd: 0
[ 1773.498503] CPU 2: hi: 0, btch: 1 usd: 0
[ 1773.498505] CPU 3: hi: 0, btch: 1 usd: 0
[ 1773.498508] CPU 4: hi: 0, btch: 1 usd: 0
[ 1773.498510] CPU 5: hi: 0, btch: 1 usd: 0
[ 1773.498512] DMA32 per-cpu:
[ 1773.498515] CPU 0: hi: 186, btch: 31 usd: 13
[ 1773.498517] CPU 1: hi: 186, btch: 31 usd: 24
[ 1773.498520] CPU 2: hi: 186, btch: 31 usd: 19
[ 1773.498522] CPU 3: hi: 186, btch: 31 usd: 14
[ 1773.498525] CPU 4: hi: 186, btch: 31 usd: 18
[ 1773.498527] CPU 5: hi: 186, btch: 31 usd: 0
[ 1773.498529] Normal per-cpu:
[ 1773.498531] CPU 0: hi: 186, btch: 31 usd: 0
[ 1773.498533] CPU 1: hi: 186, btch: 31 usd: 0
[ 1773.498535] CPU 2: hi: 186, btch: 31 usd: 0
[ 1773.498538] CPU 3: hi: 186, btch: 31 usd: 4
[ 1773.498540] CPU 4: hi: 186, btch: 31 usd: 2
[ 1773.498542] CPU 5: hi: 186, btch: 31 usd: 3
[ 1773.498548] active_anon:1831687 inactive_anon:213435 isolated_anon:0
[ 1773.498549] active_file:113 inactive_file:914 isolated_file:0
[ 1773.498551] unevictable:547 dirty:0 writeback:0 unstable:0
[ 1773.498552] free:14153 slab_reclaimable:17162 slab_unreclaimable:59568
[ 1773.498553] mapped:910 shmem:2 pagetables:5903 bounce:0
[ 1773.498560] DMA free:15896kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15640kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 1773.498567] lowmem_reserve[]: 0 3118 15214 15214
[ 1773.498576] DMA32 free:40564kB min:13836kB low:17292kB high:20752kB active_anon:557436kB inactive_anon:156204kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3193664kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:28556kB slab_unreclaimable:93192kB kernel_stack:144kB pagetables:3212kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2 all_unreclaimable? no
[ 1773.498583] lowmem_reserve[]: 0 0 12096 12096
[ 1773.498592] Normal free:152kB min:53676kB low:67092kB high:80512kB active_anon:6769312kB inactive_anon:697536kB active_file:452kB inactive_file:3656kB unevictable:2188kB isolated(anon):0kB isolated(file):0kB present:12386304kB mlocked:2188kB dirty:0kB writeback:0kB mapped:3636kB shmem:8kB slab_reclaimable:40092kB slab_unreclaimable:145080kB kernel_stack:2712kB pagetables:20400kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:42 all_unreclaimable? no
[ 1773.498599] lowmem_reserve[]: 0 0 0 0
[ 1773.498602] DMA: 0_4kB 1_8kB 1_16kB 0_32kB 2_64kB 1_128kB 1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15896kB
[ 1773.498611] DMA32: 413_4kB 267_8kB 151_16kB 73_32kB 32_64kB 8_128kB 13_256kB 18_512kB 11_1024kB 0_2048kB 1_4096kB = 39516kB
[ 1773.498620] Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0*4096kB = 0kB
[ 1773.498629] 23224 total pagecache pages
[ 1773.498631] 21711 pages in swap cache
[ 1773.498633] Swap cache stats: add 406690, delete 384979, find 25632/30547
[ 1773.498635] Free swap = 15335732kB
[ 1773.498637] Total swap = 16777212kB
[ 1773.562577] 3964912 pages RAM
[ 1773.562577] 102929 pages reserved
[ 1773.562577] 29333 pages shared
[ 1773.562577] 3318044 pages non-shared
[ 1773.562577] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[ 1773.562577] cache: kmalloc-2048, object size: 2048, buffer size: 2048, default order: 3, min order: 0
[ 1773.562577] node 0: slabs: 131, objs: 2096, free: 0
This is Linux 3.2.17 with CONFIG_PREEMPT_NONE=y set. I checked /proc/config.gz to be certain that preemption should be disabled. This should not be able to occur. There also appears to be a bug in e1000, as there is no reason why its page allocation should fail, but at least it gave us a backtrace so we can investigate why interrupts are on.
The text was updated successfully, but these errors were encountered: