pull from torvalds #1

e-mailky · 2014-01-21T07:55:31Z

No description provided.

The only valid use of preempt_enable_no_resched() is if the very next line is schedule() or if we know preemption cannot actually be enabled by that statement due to known more preempt_count 'refs'. This busy_poll stuff looks to be completely and utterly broken, sched_clock() can return utter garbage with interrupts enabled (rare but still) and it can drift unbounded between CPUs. This means that if you get preempted/migrated and your new CPU is years behind on the previous CPU we get to busy spin for a _very_ long time. There is a _REASON_ sched_clock() warns about preemptability - papering over it with a preempt_disable()/preempt_enable_no_resched() is just terminal brain damage on so many levels. Replace sched_clock() usage with local_clock() which has a bounded drift between CPUs (<2 jiffies). There is a further problem with the entire busy wait poll thing in that the spin time is additive to the syscall timeout, not inclusive. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: rui.zhang@intel.com Cc: jacob.jun.pan@linux.intel.com Cc: Mike Galbraith <bitbucket@online.de> Cc: hpa@zytor.com Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: lenb@kernel.org Cc: rjw@rjwysocki.net Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>

The only valid use of preempt_enable_no_resched() is if the very next line is schedule() or if we know preemption cannot actually be enabled by that statement due to known more preempt_count 'refs'. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: rjw@rjwysocki.net Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: rui.zhang@intel.com Cc: jacob.jun.pan@linux.intel.com Cc: Mike Galbraith <bitbucket@online.de> Cc: hpa@zytor.com Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: lenb@kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-zcfvacdlvlr63qmnn5i58vuj@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>

On a freshly installed system, after libelf-dev is installed we get: CC /tmp/build/perf/util/probe-event.o util/probe-event.c: In function ‘try_to_find_probe_trace_events’: util/probe-event.c:753:46: error: unused parameter ‘target’ [-Werror=unused-parameter] int max_tevs __maybe_unused, const char *target) ^ CC /tmp/build/perf/util/cgroup.o util/probe-event.c: At top level: util/probe-event.c:193:12: error: ‘get_text_start_address’ defined but not used [-Werror=unused-function] static int get_text_start_address(const char *exec, unsigned long *address) ^ cc1: all warnings being treated as errors make[1]: *** [/tmp/build/perf/util/probe-event.o] Error 1 make[1]: *** Waiting for unfinished jobs.... make: *** [install] Error 2 Fix it by enclosing functions only used when those libraries are installed under the suitable preprocessor define and using __maybe_unused to a function that is only built when DWARF support is disabled. Problem introduced in this changeset: commit fb7345b Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Date: Thu Dec 26 05:41:53 2013 +0000 perf probe: Support basic dwarf-based operations on uprobe events Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-73kc2fopt81517hrdgdra18o@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

The ramdisk can possibly get relocated if the whole image is not mapped. And since we're going over it in the microcode loader and fishing out the relevant microcode patches, we want access it at its new location. Thus, export it. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>

We want to use those in AMD's early loading path too. Also, add a native_wrmsrl variant. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>

The original idea to use the microcode cache for the APs doesn't pan out because we do memory allocation there very early and with IRQs disabled and we don't want to involve GFP_ATOMIC allocations. Not if it can be helped. Thus, extend the caching of the BSP patch approach to the APs and iterate over the ucode in the initrd instead of using the cache. We still save the relevant patches to it but later, right before we jettison the initrd. While at it, fix early ucode loading on 32-bit too. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>

We've grown a bunch of microcode loader files all prefixed with "microcode_". They should be under cpu/ because this is strictly CPU-related functionality so do that and drop the prefix since they're in their own directory now which gives that prefix. :) While at it, drop MICROCODE_INTEL_LIB config item and stash the functionality under CONFIG_MICROCODE_INTEL as it was its only user. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>

I don't know how large "tp->vlan_shift" is but static checkers worry about shift wrapping bugs here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Dimitris Michailidis <dm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Since commit 01287e2, test-volatile-register-var.c is no more built as part of the automatic feature check. This patch remove the unneeded file. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/339d86ad76741ed929defd18541f774b404003b4.1389461371.git.ydroneaud@opteya.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

This reverts commit 88533f9. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…lback_owner()" This reverts commit d1ba277. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…e_callback()" This reverts commit bdbb0a1. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: linux-s390@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…e_callback()" This reverts commit de1dee7. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…_callback()" This reverts commit 6716d28. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…d its wrappers" This reverts commit 1ae0681. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

This reverts commit 9f010c2. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…ated but not removed" This reverts commit 895a068. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

This reverts commit 99177a3. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…_remove()" This reverts commit f601f9a. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…turn" This reverts commit 45a140e. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

This reverts commit ae34372. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

This reverts commit a69d001. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…eactivate_waitq" This reverts commit ea1c472. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

The DWC2 driver should now be in good enough shape to move out of staging. I have stress tested it overnight on RPI running mass storage and Ethernet transfers in parallel, and for several days on our proprietary PCI-based platform. Signed-off-by: Paul Zimmerman <paulz@synopsys.com> Cc: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

This reverts commit d92d2e6. Tejun writes: I'm sorry but can you please revert the whole series? get_active() waiting while a node is deactivated has potential to lead to deadlock and that deactivate/reactivate interface is something fundamentally flawed and that cgroup will have to work with the remove_self() like everybody else. IOW, I think the first posting was correct. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

The TTY_PRINTK option is bool, and hence this code is either present or absent. It will never be modular, so using module_init as an alias for __initcall is rather misleading. Fix this up now, so that we can relocate module_init from init.h into module.h in the future. If we don't do this, we'd have to add module.h to obviously non-modular code, and that would be a worse thing. Note that direct use of __initcall is discouraged, vs. one of the priority categorized subgroups. As __initcall gets mapped onto device_initcall, our use of device_initcall directly in this change means that the runtime impact is zero -- it will remain at level 6 in initcall ordering. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

1. MEI_DEV_RESETTING device state spans only hardware reset flow while starting dev state is saved into a local variable for further reference, this let us to reduce big if statements in case we are trying to avoid nested resets 2. During initializations if the reset ended in MEI_DEV_DISABLED device state we bail out with -ENODEV 3. Remove redundant interrupts_enabled parameter as this can be deduced from the starting dev_state 4. mei_reset propagates error code to the caller 5. Add mei_restart function to wrap the pci resume Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

give up reseting after 3 unsuccessful tries Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

This patch adds Comedi driver for Humusoft MF634 (PCIe) and MF624 (PCI) data acquisition cards. The legacy card Humusoft MF614 is not supported. More info about the cards may be found at http://humusoft.cz/produkty/datacq/ The driver was tested with both cards. Everything seems to work properly. Just the basic functionality of the card (DIO, ADC, DAC) is supported by this driver. Signed-off-by: Rostislav Lisovy <lisovy@gmail.com> Reviewed-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…mon.c. This patch for ni_mio_common.c removes many unneccesary braces to fix checkpatch.pl warnings. Signed-off-by: Chase Southwood <chase.southwood@yahoo.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sdata->u.ap.request_smps_work can’t be flushed synchronously under wdev_lock(wdev) since ieee80211_request_smps_ap_work itself locks the same lock. While at it, reset the driver_smps_mode when the ap is stopped to its default: OFF. This solves: ====================================================== [ INFO: possible circular locking dependency detected ] 3.12.0-ipeer+ #2 Tainted: G O ------------------------------------------------------- rmmod/2867 is trying to acquire lock: ((&sdata->u.ap.request_smps_work)){+.+...}, at: [<c105b8d0>] flush_work+0x0/0x90 but task is already holding lock: (&wdev->mtx){+.+.+.}, at: [<f9b32626>] cfg80211_stop_ap+0x26/0x230 [cfg80211] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&wdev->mtx){+.+.+.}: [<c10aefa9>] lock_acquire+0x79/0xe0 [<c1607a1a>] mutex_lock_nested+0x4a/0x360 [<fb06288b>] ieee80211_request_smps_ap_work+0x2b/0x50 [mac80211] [<c105cdd8>] process_one_work+0x198/0x450 [<c105d469>] worker_thread+0xf9/0x320 [<c10669ff>] kthread+0x9f/0xb0 [<c1613397>] ret_from_kernel_thread+0x1b/0x28 -> #0 ((&sdata->u.ap.request_smps_work)){+.+...}: [<c10ae9df>] __lock_acquire+0x183f/0x1910 [<c10aefa9>] lock_acquire+0x79/0xe0 [<c105b917>] flush_work+0x47/0x90 [<c105d867>] __cancel_work_timer+0x67/0xe0 [<c105d90f>] cancel_work_sync+0xf/0x20 [<fb0765cc>] ieee80211_stop_ap+0x8c/0x340 [mac80211] [<f9b3268c>] cfg80211_stop_ap+0x8c/0x230 [cfg80211] [<f9b0d8f9>] cfg80211_leave+0x79/0x100 [cfg80211] [<f9b0da72>] cfg80211_netdev_notifier_call+0xf2/0x4f0 [cfg80211] [<c160f2c9>] notifier_call_chain+0x59/0x130 [<c106c6de>] __raw_notifier_call_chain+0x1e/0x30 [<c106c70f>] raw_notifier_call_chain+0x1f/0x30 [<c14f8213>] call_netdevice_notifiers_info+0x33/0x70 [<c14f8263>] call_netdevice_notifiers+0x13/0x20 [<c14f82a4>] __dev_close_many+0x34/0xb0 [<c14f83fe>] dev_close_many+0x6e/0xc0 [<c14f9c77>] rollback_registered_many+0xa7/0x1f0 [<c14f9dd4>] unregister_netdevice_many+0x14/0x60 [<fb06f4d9>] ieee80211_remove_interfaces+0xe9/0x170 [mac80211] [<fb055116>] ieee80211_unregister_hw+0x56/0x110 [mac80211] [<fa3e9396>] iwl_op_mode_mvm_stop+0x26/0xe0 [iwlmvm] [<f9b9d8ca>] _iwl_op_mode_stop+0x3a/0x70 [iwlwifi] [<f9b9d96f>] iwl_opmode_deregister+0x6f/0x90 [iwlwifi] [<fa405179>] __exit_compat+0xd/0x19 [iwlmvm] [<c10b8bf9>] SyS_delete_module+0x179/0x2b0 [<c1613421>] sysenter_do_call+0x12/0x32 Fixes: 687da13 ("mac80211: implement SMPS for AP") Cc: <stable@vger.kernel.org> [3.13] Reported-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Don't set read callback to NULL during reset as this leads to memory leak of both cb and its buffer. The memory is correctly freed during mei_release. The memory leak is detectable by kmemleak if application has open read call while system is going through suspend/resume. unreferenced object 0xecead780 (size 64): comm "AsyncTask #1", pid 1018, jiffies 4294949621 (age 152.440s) hex dump (first 32 bytes): 00 01 10 00 00 02 20 00 00 bf 30 f1 00 00 00 00 ...... ...0..... 00 00 00 00 00 00 00 00 36 01 00 00 00 70 da e2 ........6....p.. backtrace: [<c1a60aec>] kmemleak_alloc+0x3c/0xa0 [<c131ed56>] kmem_cache_alloc_trace+0xc6/0x190 [<c16243c9>] mei_io_cb_init+0x29/0x50 [<c1625722>] mei_cl_read_start+0x102/0x360 [<c16268f3>] mei_read+0x103/0x4e0 [<c1324b09>] vfs_read+0x89/0x160 [<c1324d5f>] SyS_read+0x4f/0x80 [<c1a7b318>] syscall_call+0x7/0xb [<ffffffff>] 0xffffffff unreferenced object 0xe2da7000 (size 512): comm "AsyncTask #1", pid 1018, jiffies 4294949621 (age 152.440s) hex dump (first 32 bytes): 00 6c da e2 7c 00 00 00 00 00 00 00 c0 eb 0c 59 .l..|..........Y 1b 00 00 00 01 00 00 00 02 10 00 00 01 00 00 00 ................ backtrace: [<c1a60aec>] kmemleak_alloc+0x3c/0xa0 [<c131f127>] __kmalloc+0xe7/0x1d0 [<c162447e>] mei_io_cb_alloc_resp_buf+0x2e/0x60 [<c162574c>] mei_cl_read_start+0x12c/0x360 [<c16268f3>] mei_read+0x103/0x4e0 [<c1324b09>] vfs_read+0x89/0x160 [<c1324d5f>] SyS_read+0x4f/0x80 [<c1a7b318>] syscall_call+0x7/0xb [<ffffffff>] 0xffffffff Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Steven Noonan forwarded a users report where they had a problem starting vsftpd on a Xen paravirtualized guest, with this in dmesg: BUG: Bad page map in process vsftpd pte:8000000493b88165 pmd:e9cc01067 page:ffffea00124ee200 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x2ffc0000000014(referenced|dirty) addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping: (null) index:7f97eea74 CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1 Call Trace: dump_stack+0x45/0x56 print_bad_pte+0x22e/0x250 unmap_single_vma+0x583/0x890 unmap_vmas+0x65/0x90 exit_mmap+0xc5/0x170 mmput+0x65/0x100 do_exit+0x393/0x9e0 do_group_exit+0xcc/0x140 SyS_exit_group+0x14/0x20 system_call_fastpath+0x1a/0x1f Disabling lock debugging due to kernel taint BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1 BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1 The issue could not be reproduced under an HVM instance with the same kernel, so it appears to be exclusive to paravirtual Xen guests. He bisected the problem to commit 1667918 ("mm: numa: clear numa hinting information on mprotect") that was also included in 3.12-stable. The problem was related to how xen translates ptes because it was not accounting for the _PAGE_NUMA bit. This patch splits pte_present to add a pteval_present helper for use by xen so both bare metal and xen use the same code when checking if a PTE is present. [mgorman@suse.de: wrote changelog, proposed minor modifications] [akpm@linux-foundation.org: fix typo in comment] Reported-by: Steven Noonan <steven@uplinklabs.net> Tested-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

If you do echo 0 > /sys/module/edac_core/parameters/edac_mc_poll_msec the following stack trace is output because the edac module is not designed to poll with a timeout of zero. WARNING: CPU: 12 PID: 0 at lib/list_debug.c:33 __list_add+0xac/0xc0() list_add corruption. prev->next should be next (ffff8808291dd1b8), but was (null). (prev=ffff8808286fe3f8). Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Call Trace: <IRQ> __list_add+0xac/0xc0 __internal_add_timer+0xab/0x130 internal_add_timer+0x17/0x40 mod_timer_pinned+0xca/0x170 intel_pstate_timer_func+0x28a/0x380 call_timer_fn+0x36/0x100 run_timer_softirq+0x1ff/0x2f0 __do_softirq+0xf5/0x2e0 irq_exit+0x10d/0x120 smp_apic_timer_interrupt+0x45/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> cpuidle_idle_call+0xb9/0x1f0 arch_cpu_idle+0xe/0x30 cpu_startup_entry+0x9e/0x240 start_secondary+0x1e4/0x290 kernel BUG at kernel/timer.c:1084! invalid opcode: 0000 [#1] SMP Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 0 Comm: swapper/12 Tainted: G W 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Call Trace: <IRQ> run_timer_softirq+0x245/0x2f0 __do_softirq+0xf5/0x2e0 irq_exit+0x10d/0x120 smp_apic_timer_interrupt+0x45/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> cpuidle_idle_call+0xb9/0x1f0 arch_cpu_idle+0xe/0x30 cpu_startup_entry+0x9e/0x240 start_secondary+0x1e4/0x290 RIP cascade+0x93/0xa0 WARNING: CPU: 36 PID: 1154 at kernel/workqueue.c:1461 __queue_delayed_work+0xed/0x1a0() Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 36 PID: 1154 Comm: kworker/u481:3 Tainted: G W 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Workqueue: edac-poller edac_mc_workq_function [edac_core] Call Trace: dump_stack+0x45/0x56 warn_slowpath_common+0x7d/0xa0 warn_slowpath_null+0x1a/0x20 __queue_delayed_work+0xed/0x1a0 queue_delayed_work_on+0x27/0x50 edac_mc_workq_function+0x72/0xa0 [edac_core] process_one_work+0x17b/0x460 worker_thread+0x11b/0x400 kthread+0xd2/0xf0 ret_from_fork+0x7c/0xb0 This patch adds a range check in the edac_mc_poll_msec code to check for 0. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Vladimir reported the following issue: Commit c65c187 ("slub: use lockdep_assert_held") requires remove_partial() to be called with n->list_lock held, but free_partial() called from kmem_cache_close() on cache destruction does not follow this rule, leading to a warning: WARNING: CPU: 0 PID: 2787 at mm/slub.c:1536 __kmem_cache_shutdown+0x1b2/0x1f0() Modules linked in: CPU: 0 PID: 2787 Comm: modprobe Tainted: G W 3.14.0-rc1-mm1+ #1 Hardware name: 0000000000000600 ffff88003ae1dde8 ffffffff816d9583 0000000000000600 0000000000000000 ffff88003ae1de28 ffffffff8107c107 0000000000000000 ffff880037ab2b00 ffff88007c240d30 ffffea0001ee5280 ffffea0001ee52a0 Call Trace: __kmem_cache_shutdown+0x1b2/0x1f0 kmem_cache_destroy+0x43/0xf0 xfs_destroy_zones+0x103/0x110 [xfs] exit_xfs_fs+0x38/0x4e4 [xfs] SyS_delete_module+0x19a/0x1f0 system_call_fastpath+0x16/0x1b His solution was to add a spinlock in order to quiet lockdep. Although there would be no contention to adding the lock, that lock also requires disabling of interrupts which will have a larger impact on the system. Instead of adding a spinlock to a location where it is not needed for lockdep, make a __remove_partial() function that does not test if the list_lock is held, as no one should have it due to it being freed. Also added a __add_partial() function that does not do the lock validation either, as it is not needed for the creation of the cache. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Vladimir Davydov <vdavydov@parallels.com> Suggested-by: David Rientjes <rientjes@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

If napi is left enabled after a failed attempt to bring the interface up, we BUG: fec 2188000.ethernet eth0: no PHY, assuming direct connection to switch libphy: PHY fixed-0:00 not found fec 2188000.ethernet eth0: could not attach to PHY ------------[ cut here ]------------ kernel BUG at include/linux/netdevice.h:502! Internal error: Oops - BUG: 0 [#1] SMP ARM ... PC is at fec_enet_open+0x4d0/0x500 LR is at __dev_open+0xa4/0xfc Only enable napi after we are past all the failure paths. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>

If we cannot calibrate TSC via MSR based calibration try_msr_calibrate_tsc() stores zero to fast_calibrate and returns that to the caller. This value gets then propagated further to clockevents code resulting division by zero oops like the one below: divide error: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 3.13.0+ torvalds#47 task: ffff880075508000 ti: ffff880075506000 task.ti: ffff880075506000 RIP: 0010:[<ffffffff810aec14>] [<ffffffff810aec14>] clockevents_config.part.3+0x24/0xa0 RSP: 0000:ffff880075507e58 EFLAGS: 00010246 RAX: ffffffffffffffff RBX: ffff880079c0cd80 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff RBP: ffff880075507e70 R08: 0000000000000001 R09: 00000000000000be R10: 00000000000000bd R11: 0000000000000003 R12: 000000000000b008 R13: 0000000000000008 R14: 000000000000b010 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880079c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff880079fff000 CR3: 0000000001c0b000 CR4: 00000000001006f0 Stack: ffff880079c0cd80 000000000000b008 0000000000000008 ffff880075507e88 ffffffff810aecb0 ffff880079c0cd80 ffff880075507e98 ffffffff81030168 ffff880075507ed8 ffffffff81d1104f 00000000000000c3 0000000000000000 Call Trace: [<ffffffff810aecb0>] clockevents_config_and_register+0x20/0x30 [<ffffffff81030168>] setup_APIC_timer+0xc8/0xd0 [<ffffffff81d1104f>] setup_boot_APIC_clock+0x4cc/0x4d8 [<ffffffff81d0f5de>] native_smp_prepare_cpus+0x3dd/0x3f0 [<ffffffff81d02ee9>] kernel_init_freeable+0xc3/0x205 [<ffffffff8177c910>] ? rest_init+0x90/0x90 [<ffffffff8177c91e>] kernel_init+0xe/0x120 [<ffffffff8178deec>] ret_from_fork+0x7c/0xb0 [<ffffffff8177c910>] ? rest_init+0x90/0x90 Prevent this from happening by: 1) Modifying try_msr_calibrate_tsc() to return calibration value or zero if it fails. 2) Check this return value in native_calibrate_tsc() and in case of zero fallback to use normal non-MSR based calibration. [mw: Added subject and changelog] Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bin Gao <bin.gao@linux.intel.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1392810750-18660-1-git-send-email-mika.westerberg@linux.intel.com Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

There is a race between the TX path and the STA wakeup: while a station is sleeping, mac80211 buffers frames until it wakes up, then the frames are transmitted. However, the RX and TX path are concurrent, so the packet indicating wakeup can be processed while a packet is being transmitted. This can lead to a situation where the buffered frames list is emptied on the one side, while a frame is being added on the other side, as the station is still seen as sleeping in the TX path. As a result, the newly added frame will not be send anytime soon. It might be sent much later (and out of order) when the station goes to sleep and wakes up the next time. Additionally, it can lead to the crash below. Fix all this by synchronising both paths with a new lock. Both path are not fastpath since they handle PS situations. In a later patch we'll remove the extra skb queue locks to reduce locking overhead. BUG: unable to handle kernel NULL pointer dereference at 000000b0 IP: [<ff6f1791>] ieee80211_report_used_skb+0x11/0x3e0 [mac80211] *pde = 00000000 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC EIP: 0060:[<ff6f1791>] EFLAGS: 00210282 CPU: 1 EIP is at ieee80211_report_used_skb+0x11/0x3e0 [mac80211] EAX: e5900da0 EBX: 00000000 ECX: 00000001 EDX: 00000000 ESI: e41d00c0 EDI: e5900da0 EBP: ebe458e4 ESP: ebe458b0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 000000b0 CR3: 25a78000 CR4: 000407d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process iperf (pid: 3934, ti=ebe44000 task=e757c0b0 task.ti=ebe44000) iwlwifi 0000:02:00.0: I iwl_pcie_enqueue_hcmd Sending command LQ_CMD (#4e), seq: 0x0903, 92 bytes at 3[3]:9 Stack: e403b32c ebe458c4 00200002 00200286 e403b338 ebe458cc c10960bb e5900da0 ff76a6ec ebe458d8 00000000 e41d00c0 e5900da0 ebe458f0 ff6f1b75 e403b210 ebe4598c ff723dc1 00000000 ff76a6ec e597c978 e403b758 00000002 00000002 Call Trace: [<ff6f1b75>] ieee80211_free_txskb+0x15/0x20 [mac80211] [<ff723dc1>] invoke_tx_handlers+0x1661/0x1780 [mac80211] [<ff7248a5>] ieee80211_tx+0x75/0x100 [mac80211] [<ff7249bf>] ieee80211_xmit+0x8f/0xc0 [mac80211] [<ff72550e>] ieee80211_subif_start_xmit+0x4fe/0xe20 [mac80211] [<c149ef70>] dev_hard_start_xmit+0x450/0x950 [<c14b9aa9>] sch_direct_xmit+0xa9/0x250 [<c14b9c9b>] __qdisc_run+0x4b/0x150 [<c149f732>] dev_queue_xmit+0x2c2/0xca0 Cc: stable@vger.kernel.org Reported-by: Yaara Rozenblum <yaara.rozenblum@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com> [reword commit log, use a separate lock] Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Boris reports he's seeing: > [ 9.195943] INFO: trying to register non-static key. > [ 9.196031] the code is fine but needs lockdep annotation. > [ 9.196031] turning off the locking correctness validator. > [ 9.196031] CPU: 1 PID: 933 Comm: modprobe Not tainted 3.14.0-rc4+ #1 with the r8169 driver. These are occuring because the seqcount embedded in u64_stats_sync on 32-bit SMP is uninitialized which is making lockdep unhappy. Signed-off-by: Kyle McMartin <kyle@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Kirill has reported the following: Task in /test killed as a result of limit of /test memory: usage 10240kB, limit 10240kB, failcnt 51 memory+swap: usage 10240kB, limit 10240kB, failcnt 0 kmem: usage 0kB, limit 18014398509481983kB, failcnt 0 Memory cgroup stats for /test: BUG: sleeping function called from invalid context at kernel/cpu.c:68 in_atomic(): 1, irqs_disabled(): 0, pid: 66, name: memcg_test 2 locks held by memcg_test/66: #0: (memcg_oom_lock#2){+.+...}, at: [<ffffffff81131014>] pagefault_out_of_memory+0x14/0x90 #1: (oom_info_lock){+.+...}, at: [<ffffffff81197b2a>] mem_cgroup_print_oom_info+0x2a/0x390 CPU: 2 PID: 66 Comm: memcg_test Not tainted 3.14.0-rc1-dirty torvalds#745 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011 Call Trace: __might_sleep+0x16a/0x210 get_online_cpus+0x1c/0x60 mem_cgroup_read_stat+0x27/0xb0 mem_cgroup_print_oom_info+0x260/0x390 dump_header+0x88/0x251 ? trace_hardirqs_on+0xd/0x10 oom_kill_process+0x258/0x3d0 mem_cgroup_oom_synchronize+0x656/0x6c0 ? mem_cgroup_charge_common+0xd0/0xd0 pagefault_out_of_memory+0x14/0x90 mm_fault_error+0x91/0x189 __do_page_fault+0x48e/0x580 do_page_fault+0xe/0x10 page_fault+0x22/0x30 which complains that mem_cgroup_read_stat cannot be called from an atomic context but mem_cgroup_print_oom_info takes a spinlock. Change oom_info_lock to a mutex. This was introduced by 947b3dd ("memcg, oom: lock mem_cgroup_print_oom_info"). Signed-off-by: Michal Hocko <mhocko@suse.cz> Reported-by: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

…king Daniel Borkmann reported a VM_BUG_ON assertion failing: ------------[ cut here ]------------ kernel BUG at mm/mlock.c:528! invalid opcode: 0000 [#1] SMP Modules linked in: ccm arc4 iwldvm [...] video CPU: 3 PID: 2266 Comm: netsniff-ng Not tainted 3.14.0-rc2+ torvalds#8 Hardware name: LENOVO 2429BP3/2429BP3, BIOS G4ET37WW (1.12 ) 05/29/2012 task: ffff8801f87f9820 ti: ffff88002cb44000 task.ti: ffff88002cb44000 RIP: 0010:[<ffffffff81171ad0>] [<ffffffff81171ad0>] munlock_vma_pages_range+0x2e0/0x2f0 Call Trace: do_munmap+0x18f/0x3b0 vm_munmap+0x41/0x60 SyS_munmap+0x22/0x30 system_call_fastpath+0x1a/0x1f RIP munlock_vma_pages_range+0x2e0/0x2f0 ---[ end trace a0088dcf07ae10f2 ]--- because munlock_vma_pages_range() thinks it's unexpectedly in the middle of a THP page. This can be reproduced with default config since 3.11 kernels. A reproducer can be found in the kernel's selftest directory for networking by running ./psock_tpacket. The problem is that an order=2 compound page (allocated by alloc_one_pg_vec_page() is part of the munlocked VM_MIXEDMAP vma (mapped by packet_mmap()) and mistaken for a THP page and assumed to be order=9. The checks for THP in munlock came with commit ff6a6da ("mm: accelerate munlock() treatment of THP pages"), i.e. since 3.9, but did not trigger a bug. It just makes munlock_vma_pages_range() skip such compound pages until the next 512-pages-aligned page, when it encounters a head page. This is however not a problem for vma's where mlocking has no effect anyway, but it can distort the accounting. Since commit 7225522 ("mm: munlock: batch non-THP page isolation and munlock+putback using pagevec") this can trigger a VM_BUG_ON in PageTransHuge() check. This patch fixes the issue by adding VM_MIXEDMAP flag to VM_SPECIAL, a list of flags that make vma's non-mlockable and non-mergeable. The reasoning is that VM_MIXEDMAP vma's are similar to VM_PFNMAP, which is already on the VM_SPECIAL list, and both are intended for non-LRU pages where mlocking makes no sense anyway. Related Lkml discussion can be found in [2]. [1] tools/testing/selftests/net/psock_tpacket [2] https://lkml.org/lkml/2014/1/10/427 Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Reported-by: Daniel Borkmann <dborkman@redhat.com> Tested-by: Daniel Borkmann <dborkman@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: John David Anglin <dave.anglin@bell.net> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Jared Hulbert <jaredeh@gmail.com> Tested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [3.11.x+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Alex reported hitting the following BUG after the EFI 1:1 virtual mapping work was merged, kernel BUG at arch/x86/mm/init_64.c:351! invalid opcode: 0000 [#1] SMP Call Trace: [<ffffffff818aa71d>] init_extra_mapping_uc+0x13/0x15 [<ffffffff818a5e20>] uv_system_init+0x22b/0x124b [<ffffffff8108b886>] ? clockevents_register_device+0x138/0x13d [<ffffffff81028dbb>] ? setup_APIC_timer+0xc5/0xc7 [<ffffffff8108b620>] ? clockevent_delta2ns+0xb/0xd [<ffffffff818a3a92>] ? setup_boot_APIC_clock+0x4a8/0x4b7 [<ffffffff8153d955>] ? printk+0x72/0x74 [<ffffffff818a1757>] native_smp_prepare_cpus+0x389/0x3d6 [<ffffffff818957bc>] kernel_init_freeable+0xb7/0x1fb [<ffffffff81535530>] ? rest_init+0x74/0x74 [<ffffffff81535539>] kernel_init+0x9/0xff [<ffffffff81541dfc>] ret_from_fork+0x7c/0xb0 [<ffffffff81535530>] ? rest_init+0x74/0x74 Getting this thing to work with the new mapping scheme would need more work, so automatically switch to the old memmap layout for SGI UV. Acked-by: Russ Anderson <rja@sgi.com> Cc: Alex Thorlton <athorlton@sgi.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Matt Fleming <matt.fleming@intel.com>

commit 655dada causes kernel panic, this patch fixes it. [ 1.197816] [ffffffee] *pgd=0d7fd821, *pte=00000000, *ppte=00000000 [ 1.204070] Internal error: Oops: 17 [#1] PREEMPT SMP ARM [ 1.209447] Modules linked in: [ 1.212490] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc1 #3 [ 1.218737] task: cd03c000 ti: cd040000 task.ti: cd040000 [ 1.224127] PC is at gpiod_lock_as_irq+0xc/0x64 [ 1.228634] LR is at sirfsoc_gpio_irq_startup+0x18/0x44 [ 1.233842] pc : [<c01d3990>] lr : [<c01d1c38>] psr: a0000193 [ 1.233842] sp : cd041d30 ip : 00000000 fp : 00000000 [ 1.245296] r10: 00000000 r9 : cd023db4 r8 : 60000113 [ 1.250505] r7 : 0000003e r6 : cd023dd4 r5 : c06bfa54 r4 : cd023d80 [ 1.257014] r3 : 00000020 r2 : 00000000 r1 : ffffffea r0 : ffffffea [ 1.263526] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel [ 1.270903] Control: 10c53c7d Table: 00004059 DAC: 00000015 [ 1.276631] Process swapper/0 (pid: 1, stack limit = 0xcd040240) [ 1.282620] Stack: (0xcd041d30 to 0xcd042000) [ 1.286963] 1d20: cd023d80 c01d1c38 c01d1c20 cd023d80 [ 1.295124] 1d40: 00000001 c0068438 cd023d80 ccb6d880 cd023dd4 c0067044 0000718e c006719c [ 1.286963] 1d20: cd023d80 c01d1c38 c01d1c20 cd023d80 [ 1.295124] 1d40: 00000001 c0068438 cd023d80 ccb6d880 cd023dd4 c0067044 0000718e c006719c [ 1.295124] 1d40: 00000001 c0068438 cd023d80 ccb6d880 cd023dd4 c0067044 0000718e c006719c [ 1.303283] 1d60: 00000800 00000083 ccb6d880 cd023d80 c02b41d8 00000083 0000003e ccb7c41 [ 1.311442] 1d80: 00000000 c00671dc 00000083 0000003e c02b41d8 cd3dd5c0 0000003e ccb7c634 [ 1.319601] 1da0: cd040030 c00672a8 cd3dd5c0 ccb7c41 ccb6d340 ccb7c41 ccb6d340 cd3dd400 [ 1.327760] 1dc0: cd3dd410 c02b4434 ccb7c41 c01265a8 00000001 cd3dd410 c0687108 00000000 [ 1.335919] 1de0: c0687108 00000000 00000000 c0240170 c0240158 cd3dd410 c06c30d0 c023e8bc [ 1.344079] 1e00: c023e9d4 00000000 cd3dd410 c023e9d4 c0682150 c023cf88 cd003e98 cd2d50c4 [ 1.352238] 1e20: cd3dd410 cd3dd444 c06822f0 c023e768 cd3dd418 cd3dd410 c06822f0 c023de14 [ 1.360397] 1e40: cd3dd418 00000000 cd3dd410 c023c398 cd041e78 cd041ea8 cd3dd400 cd3dd410 [ 1.368556] 1e60: 00000083 00000000 cd3dd400 cd3dd410 00000083 000000c8 c04e00c8 c023fee8 [ 1.376715] 1e80: 00000000 cd041ea8 cd3dd400 00000001 00000083 c024048c c0435ef8 c0434dec [ 1.384874] 1ea0: c068da58 c04c6d04 c0682150 c0435ef8 ffffffff 00000000 00000000 c068da58 [ 1.393033] 1ec0: 00000020 00000000 00000000 00000000 c05dabb8 00000007 c068d640 c068d640 [ 1.401193] 1ee0: c04c247c c04c249c 00000000 c00088e8 cd004c00 c043bbb8 cd029180 c03812a0 [ 1.409352] 1f00: 00000000 00000000 60000113 c0673728 60000113 c0673728 00000000 00000000 [ 1.417511] 1f20: cd7fce01 c0390a54 00000065 c003a81c c049e8bc 00000007 cd7fce0e 00000007 [ 1.425670] 1f40: 00000000 c05dabb8 00000007 c068d640 c068d640 c04c050c c04e00c8 00000065 [ 1.433829] 1f60: c04e00c0 c04c0c54 00000007 00000007 c04c050c c037d8fc cd03c000 c004322c [ 1.441988] 1f80: c0662b40 0000d640 c03737c0 00000000 00000000 00000000 00000000 00000000 [ 1.450147] 1fa0: 00000000 c03737cc 00000000 c000e478 00000000 00000000 00000000 00000000 [ 1.458307] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 1.466467] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 0002d481 05014092 [ 1.474640] [<c01d3990>] (gpiod_lock_as_irq) from [<c01d1c38>] (sirfsoc_gpio_irq_startup+0x18/0x44) [ 1.483661] [<c01d1c38>] (sirfsoc_gpio_irq_startup) from [<c0068438>] (irq_startup+0x34/0x6c) [ 1.492163] [<c0068438>] (irq_startup) from [<c0067044>] (__setup_irq+0x450/0x4b8) [ 1.499714] [<c0067044>] (__setup_irq) from [<c00671dc>] (request_threaded_irq+0xa8/0x128) [ 1.507960] [<c00671dc>] (request_threaded_irq) from [<c00672a8>] (request_any_context_irq+0x4c/0x7c) [ 1.517164] [<c00672a8>] (request_any_context_irq) from [<c02b4434>] (gpio_extcon_probe+0x144/0x1d4) [ 1.526279] [<c02b4434>] (gpio_extcon_probe) from [<c0240170>] (platform_drv_probe+0x18/0x48) [ 1.534783] [<c0240170>] (platform_drv_probe) from [<c023e8bc>] (driver_probe_device+0x120/0x238) [ 1.543641] [<c023e8bc>] (driver_probe_device) from [<c023cf88>] (bus_for_each_drv+0x58/0x8c) [ 1.552143] [<c023cf88>] (bus_for_each_drv) from [<c023e768>] (device_attach+0x74/0x88) [ 1.560126] [<c023e768>] (device_attach) from [<c023de14>] (bus_probe_device+0x84/0xa8) [ 1.568113] [<c023de14>] (bus_probe_device) from [<c023c398>] (device_add+0x440/0x520) [ 1.576012] [<c023c398>] (device_add) from [<c023fee8>] (platform_device_add+0xb4/0x214) [ 1.584084] [<c023fee8>] (platform_device_add) from [<c024048c>] (platform_device_register_full+0xb8/0xdc) [ 1.593719] [<c024048c>] (platform_device_register_full) from [<c04c6d04>] (sirfsoc_init_late+0xec/0xf4) [ 1.603185] [<c04c6d04>] (sirfsoc_init_late) from [<c04c249c>] (init_machine_late+0x20/0x28) [ 1.611603] [<c04c249c>] (init_machine_late) from [<c00088e8>] (do_one_initcall+0xf8/0x144) [ 1.619934] [<c00088e8>] (do_one_initcall) from [<c04c0c54>] (kernel_init_freeable+0x13c/0x1dc) [ 1.628620] [<c04c0c54>] (kernel_init_freeable) from [<c03737cc>] (kernel_init+0xc/0x118) Signed-off-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

policy->rwsem is used to lock access to all parts of code modifying struct cpufreq_policy, but it's not used on a new policy created by __cpufreq_add_dev(). Because of that, if cpufreq_update_policy() is called in a tight loop on one CPU in parallel with offline/online of another CPU, then the following crash can be triggered: Unable to handle kernel NULL pointer dereference at virtual address 00000020 pgd = c0003000 [00000020] *pgd=80000000004003, *pmd=00000000 Internal error: Oops: 206 [#1] PREEMPT SMP ARM PC is at __cpufreq_governor+0x10/0x1ac LR is at cpufreq_update_policy+0x114/0x150 ---[ end trace f23a8defea6cd706 ]--- Kernel panic - not syncing: Fatal exception CPU0: stopping CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W 3.10.0-gd727407-00074-g979ede8 torvalds#396 [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68) [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44) [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc) [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78) [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74) [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c) [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180) [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68) [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30) Fix that by taking locks at appropriate places in __cpufreq_add_dev() as well. Reported-by: Saravana Kannan <skannan@codeaurora.org> Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Eric Hugne says: ==================== tipc: refcount and memory leak fixes v3: Remove error logging from data path completely. Rebased on top of latest net merge. v2: Drop specific -ENOMEM logging in patch #1 (tipc: allow connection shutdown callback to be invoked in advance) And add a general error message if an internal server tries to send a message on a closed/nonexisting connection. In addition to the fix for refcount leak and memory leak during module removal, we also fix a problem where the topology server listening socket where unexpectedly closed. We also eliminate an unnecessary context switch during accept()/recvmsg() for nonblocking sockets. It might be good to include this patchset in stable aswell. After the v3 rebase on latest merge from net all patches apply cleanly on that tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>

When tracking sunrpc_task events in nfs client, the clnt pointer may be NULL. [ 139.269266] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [ 139.269915] IP: [<ffffffffa026f216>] ftrace_raw_event_rpc_task_running+0x86/0xf0 [sunrpc] [ 139.269915] PGD 1d293067 PUD 1d294067 PMD 0 [ 139.269915] Oops: 0000 [#1] SMP [ 139.269915] Modules linked in: nfsv4 dns_resolver nfs lockd sunrpc fscache sg ppdev e1000 serio_raw pcspkr parport_pc parport i2c_piix4 i2c_core microcode xfs libcrc32c sd_mod sr_mod cdrom ata_generic crc_t10dif crct10dif_common pata_acpi ahci libahci ata_piix libata dm_mirror dm_region_hash dm_log dm_mod [ 139.269915] CPU: 0 PID: 59 Comm: kworker/0:2 Not tainted 3.10.0-84.el7.x86_64 #1 [ 139.269915] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 139.269915] Workqueue: rpciod rpc_async_schedule [sunrpc] [ 139.269915] task: ffff88001b598000 ti: ffff88001b632000 task.ti: ffff88001b632000 [ 139.269915] RIP: 0010:[<ffffffffa026f216>] [<ffffffffa026f216>] ftrace_raw_event_rpc_task_running+0x86/0xf0 [sunrpc] [ 139.269915] RSP: 0018:ffff88001b633d70 EFLAGS: 00010206 [ 139.269915] RAX: ffff88001dfc5338 RBX: ffff88001cc37a00 RCX: ffff88001dfc5334 [ 139.269915] RDX: ffff88001dfc5338 RSI: 0000000000000000 RDI: ffff88001dfc533c [ 139.269915] RBP: ffff88001b633db0 R08: 000000000000002c R09: 000000000000000a [ 139.269915] R10: 0000000000062180 R11: 00000020759fb9dc R12: ffffffffa0292c20 [ 139.269915] R13: ffff88001dfc5334 R14: 0000000000000000 R15: 0000000000000000 [ 139.269915] FS: 0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 [ 139.269915] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 139.269915] CR2: 0000000000000004 CR3: 000000001d290000 CR4: 00000000000006f0 [ 139.269915] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 139.269915] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 139.269915] Stack: [ 139.269915] 000000001b633d98 0000000000000246 ffff88001df1dc00 ffff88001cc37a00 [ 139.269915] ffff88001bc35e60 0000000000000000 ffff88001ffa0a48 ffff88001bc35ee0 [ 139.269915] ffff88001b633e08 ffffffffa02704b5 0000000000010000 ffff88001cc37a70 [ 139.269915] Call Trace: [ 139.269915] [<ffffffffa02704b5>] __rpc_execute+0x1d5/0x400 [sunrpc] [ 139.269915] [<ffffffffa0270706>] rpc_async_schedule+0x26/0x30 [sunrpc] [ 139.269915] [<ffffffff8107867b>] process_one_work+0x17b/0x460 [ 139.269915] [<ffffffff8107942b>] worker_thread+0x11b/0x400 [ 139.269915] [<ffffffff81079310>] ? rescuer_thread+0x3e0/0x3e0 [ 139.269915] [<ffffffff8107fc80>] kthread+0xc0/0xd0 [ 139.269915] [<ffffffff8107fbc0>] ? kthread_create_on_node+0x110/0x110 [ 139.269915] [<ffffffff815d122c>] ret_from_fork+0x7c/0xb0 [ 139.269915] [<ffffffff8107fbc0>] ? kthread_create_on_node+0x110/0x110 [ 139.269915] Code: 4c 8b 45 c8 48 8d 7d d0 89 4d c4 41 89 c9 b9 28 00 00 00 e8 9d b4 e9 e0 48 85 c0 49 89 c5 74 a2 48 89 c7 e8 9d 3f e9 e0 48 89 c2 <41> 8b 46 04 48 8b 7d d0 4c 89 e9 4c 89 e6 89 42 0c 0f b7 83 d4 [ 139.269915] RIP [<ffffffffa026f216>] ftrace_raw_event_rpc_task_running+0x86/0xf0 [sunrpc] [ 139.269915] RSP <ffff88001b633d70> [ 139.269915] CR2: 0000000000000004 [ 140.946406] ---[ end trace ba486328b98d7622 ]--- Signed-off-by: Ditang Chen <chendt.fnst@cn.fujitsu.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

Commit 1874198 ("blk-mq: rework flush sequencing logic") switched ->flush_rq from being an embedded member of the request_queue structure to being dynamically allocated in blk_init_queue_node(). Request-based DM multipath doesn't use blk_init_queue_node(), instead it uses blk_alloc_queue_node() + blk_init_allocated_queue(). Because commit 1874198 placed the dynamic allocation of ->flush_rq in blk_init_queue_node() any flush issued to a dm-mpath device would crash with a NULL pointer, e.g.: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8125037e>] blk_rq_init+0x1e/0xb0 PGD bb3c7067 PUD bb01d067 PMD 0 Oops: 0002 [#1] SMP ... CPU: 5 PID: 5028 Comm: dt Tainted: G W O 3.14.0-rc3.snitm+ torvalds#10 ... task: ffff88032fb270e0 ti: ffff880079564000 task.ti: ffff880079564000 RIP: 0010:[<ffffffff8125037e>] [<ffffffff8125037e>] blk_rq_init+0x1e/0xb0 RSP: 0018:ffff880079565c98 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000030 RDX: ffff880260c74048 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff880079565ca8 R08: ffff880260aa1e98 R09: 0000000000000001 R10: ffff88032fa78500 R11: 0000000000000246 R12: 0000000000000000 R13: ffff880260aa1de8 R14: 0000000000000650 R15: 0000000000000000 FS: 00007f8d36a2a700(0000) GS:ffff88033fca0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000079b36000 CR4: 00000000000007e0 Stack: 0000000000000000 ffff880260c74048 ffff880079565cd8 ffffffff81257a47 ffff880260aa1de8 ffff880260c74048 0000000000000001 0000000000000000 ffff880079565d08 ffffffff81257c2d 0000000000000000 ffff880260aa1de8 Call Trace: [<ffffffff81257a47>] blk_flush_complete_seq+0x2d7/0x2e0 [<ffffffff81257c2d>] blk_insert_flush+0x1dd/0x210 [<ffffffff8124ec59>] __elv_add_request+0x1f9/0x320 [<ffffffff81250681>] ? blk_account_io_start+0x111/0x190 [<ffffffff81253a4b>] blk_queue_bio+0x25b/0x330 [<ffffffffa0020bf5>] dm_request+0x35/0x40 [dm_mod] [<ffffffff812530c0>] generic_make_request+0xc0/0x100 [<ffffffff81253173>] submit_bio+0x73/0x140 [<ffffffff811becdd>] submit_bio_wait+0x5d/0x80 [<ffffffff81257528>] blkdev_issue_flush+0x78/0xa0 [<ffffffff811c1f6f>] blkdev_fsync+0x3f/0x60 [<ffffffff811b7fde>] vfs_fsync_range+0x1e/0x20 [<ffffffff811b7ffc>] vfs_fsync+0x1c/0x20 [<ffffffff811b81f1>] do_fsync+0x41/0x80 [<ffffffff8118874e>] ? SyS_lseek+0x7e/0x80 [<ffffffff811b8260>] SyS_fsync+0x10/0x20 [<ffffffff8154c2d2>] system_call_fastpath+0x16/0x1b Fix this by moving the ->flush_rq allocation from blk_init_queue_node() to blk_init_allocated_queue(). blk_init_queue_node() also calls blk_init_allocated_queue() so this change is functionality equivalent for all blk_init_queue_node() callers. Reported-by: Hannes Reinecke <hare@suse.de> Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>

@s

…me name This fixes CVE-2014-0102. The following command sequence produces an oops: keyctl new_session i=`keyctl newring _ses @s` keyctl link @s $i The problem is that search_nested_keyrings() sees two keyrings that have matching type and description, so keyring_compare_object() returns true. s_n_k() then passes the key to the iterator function - keyring_detect_cycle_iterator() - which *should* check to see whether this is the keyring of interest, not just one with the same name. Because assoc_array_find() will return one and only one match, I assumed that the iterator function would only see an exact match or never be called - but the iterator isn't only called from assoc_array_find()... The oops looks something like this: kernel BUG at /data/fs/linux-2.6-fscache/security/keys/keyring.c:1003! invalid opcode: 0000 [#1] SMP ... RIP: keyring_detect_cycle_iterator+0xe/0x1f ... Call Trace: search_nested_keyrings+0x76/0x2aa __key_link_check_live_key+0x50/0x5f key_link+0x4e/0x85 keyctl_keyring_link+0x60/0x81 SyS_keyctl+0x65/0xe4 tracesys+0xdd/0xe2 The fix is to make keyring_detect_cycle_iterator() check that the key it has is the key it was actually looking for rather than calling BUG_ON(). A testcase has been included in the keyutils testsuite for this: http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=891f3365d07f1996778ade0e3428f01878a1790b Reported-by: Tommi Rantala <tt.rantala@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

In the first place, the loop 'for' in the macro 'for_each_isci_host' (drivers/scsi/isci/host.h:314) is incorrect, because it accesses the 3rd element of 2 element array. After the 2nd iteration it executes the instruction: ihost = to_pci_info(pdev)->hosts[2] (while the size of the 'hosts' array equals 2) and reads an out of range element. In the second place, this loop is incorrectly optimized by GCC v4.8 (see http://marc.info/?l=linux-kernel&m=138998871911336&w=2). As a result, on platforms with two SCU controllers, the loop is executed more times than it can be (for i=0,1 and 2). It causes kernel panic during entering the S3 state and the following oops after 'rmmod isci': BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8131360b>] __list_add+0x1b/0xc0 Oops: 0000 [#1] SMP RIP: 0010:[<ffffffff8131360b>] [<ffffffff8131360b>] __list_add+0x1b/0xc0 Call Trace: [<ffffffff81661b84>] __mutex_lock_slowpath+0x114/0x1b0 [<ffffffff81661c3f>] mutex_lock+0x1f/0x30 [<ffffffffa03e97cb>] sas_disable_events+0x1b/0x50 [libsas] [<ffffffffa03e9818>] sas_unregister_ha+0x18/0x60 [libsas] [<ffffffffa040316e>] isci_unregister+0x1e/0x40 [isci] [<ffffffffa0403efd>] isci_pci_remove+0x5d/0x100 [isci] [<ffffffff813391cb>] pci_device_remove+0x3b/0xb0 [<ffffffff813fbf7f>] __device_release_driver+0x7f/0xf0 [<ffffffff813fc8f8>] driver_detach+0xa8/0xb0 [<ffffffff813fbb8b>] bus_remove_driver+0x9b/0x120 [<ffffffff813fcf2c>] driver_unregister+0x2c/0x50 [<ffffffff813381f3>] pci_unregister_driver+0x23/0x80 [<ffffffffa04152f8>] isci_exit+0x10/0x1e [isci] [<ffffffff810d199b>] SyS_delete_module+0x16b/0x2d0 [<ffffffff81012a21>] ? do_notify_resume+0x61/0xa0 [<ffffffff8166ce29>] system_call_fastpath+0x16/0x1b The loop has been corrected. This patch fixes kernel panic during entering the S3 state and the above oops. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com> Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Tested-by: Lukasz Dorau <lukasz.dorau@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>

vmxnet3's netpoll driver is incorrectly coded. It directly calls vmxnet3_do_poll, which is the driver internal napi poll routine. As the netpoll controller method doesn't block real napi polls in any way, there is a potential for race conditions in which the netpoll controller method and the napi poll method run concurrently. The result is data corruption causing panics such as this one recently observed: PID: 1371 TASK: ffff88023762caa0 CPU: 1 COMMAND: "rs:main Q:Reg" #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92 #2 [ffff88023abd58b0] oops_end at ffffffff8152b570 #3 [ffff88023abd58e0] die at ffffffff81010e0b #4 [ffff88023abd5910] do_trap at ffffffff8152add4 #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95 #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b [exception RIP: vmxnet3_rq_rx_complete+1968] RIP: ffffffffa00f1e80 RSP: ffff88023abd5ac8 RFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff88023b5dcee0 RCX: 00000000000000c0 RDX: 0000000000000000 RSI: 00000000000005f2 RDI: ffff88023b5dcee0 RBP: ffff88023abd5b48 R8: 0000000000000000 R9: ffff88023a3b6048 R10: 0000000000000000 R11: 0000000000000002 R12: ffff8802398d4cd8 R13: ffff88023af35140 R14: ffff88023b60c890 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 torvalds#7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3] torvalds#8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3] torvalds#9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7 The fix is to do as other drivers do, and have the poll controller call the top half interrupt handler, which schedules a napi poll properly to recieve frames Tested by myself, successfully. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Shreyas Bhatewara <sbhatewara@vmware.com> CC: "VMware, Inc." <pv-drivers@vmware.com> CC: "David S. Miller" <davem@davemloft.net> CC: stable@vger.kernel.org Reviewed-by: Shreyas N Bhatewara <sbhatewara@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>

A few of the simpler TTM drivers (cirrus, ast, mgag200) do not implement this function. Yet can end up somehow with an evicted bo: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 16e761067 PUD 16e6cf067 PMD 0 Oops: 0010 [#1] SMP Modules linked in: bnep bluetooth rfkill fuse ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables sg btrfs zlib_deflate raid6_pq xor dm_queue_length iTCO_wdt iTCO_vendor_support coretemp kvm dcdbas dm_service_time microcode serio_raw pcspkr lpc_ich mfd_core i7core_edac edac_core ses enclosure ipmi_si ipmi_msghandler shpchp acpi_power_meter mperf nfsd auth_rpcgss nfs_acl lockd uinput sunrpc dm_multipath xfs libcrc32c ata_generic pata_acpi sr_mod cdrom sd_mod usb_storage mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit lpfc drm_kms_helper ttm crc32c_intel ata_piix bfa drm ixgbe libata i2c_core mdio crc_t10dif ptp crct10dif_common pps_core scsi_transport_fc dca scsi_tgt megaraid_sas bnx2 dm_mirror dm_region_hash dm_log dm_mod CPU: 16 PID: 2572 Comm: X Not tainted 3.10.0-86.el7.x86_64 #1 Hardware name: Dell Inc. PowerEdge R810/0H235N, BIOS 0.3.0 11/14/2009 task: ffff8801799dabc0 ti: ffff88016c884000 task.ti: ffff88016c884000 RIP: 0010:[<0000000000000000>] [< (null)>] (null) RSP: 0018:ffff88016c885ad8 EFLAGS: 00010202 RAX: ffffffffa04e94c0 RBX: ffff880178937a20 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000240004 RDI: ffff880178937a00 RBP: ffff88016c885b60 R08: 00000000000171a0 R09: ffff88007cf171a0 R10: ffffea0005842540 R11: ffffffff810487b9 R12: ffff880178937b30 R13: ffff880178937a00 R14: ffff88016c885b78 R15: ffff880179929400 FS: 00007f81ba2ef980(0000) GS:ffff88007cf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000016e763000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffffffffa0306fae ffff8801799295c0 0000000000260004 0000000000000001 ffff88016c885b60 ffffffffa0307669 00ff88007cf17738 ffff88017cf17700 ffff880178937a00 ffff880100000000 ffff880100000000 0000000079929400 Call Trace: [<ffffffffa0306fae>] ? ttm_bo_handle_move_mem+0x54e/0x5b0 [ttm] [<ffffffffa0307669>] ? ttm_bo_mem_space+0x169/0x340 [ttm] [<ffffffffa0307bd7>] ttm_bo_move_buffer+0x117/0x130 [ttm] [<ffffffff81130001>] ? perf_event_init_context+0x141/0x220 [<ffffffffa0307cb1>] ttm_bo_validate+0xc1/0x130 [ttm] [<ffffffffa04e7377>] mgag200_bo_pin+0x87/0xc0 [mgag200] [<ffffffffa04e56c4>] mga_crtc_cursor_set+0x474/0xbb0 [mgag200] [<ffffffff811971d2>] ? __mem_cgroup_commit_charge+0x152/0x3b0 [<ffffffff815c4182>] ? mutex_lock+0x12/0x2f [<ffffffffa0201433>] drm_mode_cursor_common+0x123/0x170 [drm] [<ffffffffa0205231>] drm_mode_cursor_ioctl+0x41/0x50 [drm] [<ffffffffa01f5ca2>] drm_ioctl+0x502/0x630 [drm] [<ffffffff815cbab4>] ? __do_page_fault+0x1f4/0x510 [<ffffffff8101cb68>] ? __restore_xstate_sig+0x218/0x4f0 [<ffffffff811b4445>] do_vfs_ioctl+0x2e5/0x4d0 [<ffffffff8124488e>] ? file_has_perm+0x8e/0xa0 [<ffffffff811b46b1>] SyS_ioctl+0x81/0xa0 [<ffffffff815d05d9>] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff88016c885ad8> CR2: 0000000000000000 Signed-off-by: Rob Clark <rclark@redhat.com> Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Cc: stable@vger.kernel.org

There are two problematic situations. A deadlock can happen when is_percpu is false because it can get interrupted while holding the spinlock. Then it executes ovs_flow_stats_update() in softirq context which tries to get the same lock. The second sitation is that when is_percpu is true, the code correctly disables BH but only for the local CPU, so the following can happen when locking the remote CPU without disabling BH: CPU#0 CPU#1 ovs_flow_stats_get() stats_read() +->spin_lock remote CPU#1 ovs_flow_stats_get() | <interrupted> stats_read() | ... +--> spin_lock remote CPU#0 | | <interrupted> | ovs_flow_stats_update() | ... | spin_lock local CPU#0 <--+ ovs_flow_stats_update() +---------------------------------- spin_lock local CPU#1 This patch disables BH for both cases fixing the deadlocks. Acked-by: Jesse Gross <jesse@nicira.com> ================================= [ INFO: inconsistent lock state ] 3.14.0-rc8-00007-g632b06a #1 Tainted: G I --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/0/0 [HC0[0]:SC1[5]:HE1:SE0] takes: (&(&cpu_stats->lock)->rlock){+.?...}, at: [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch] {SOFTIRQ-ON-W} state was registered at: [<ffffffff810f973f>] __lock_acquire+0x68f/0x1c40 [<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0 [<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80 [<ffffffffa05dd9e4>] ovs_flow_stats_get+0xc4/0x1e0 [openvswitch] [<ffffffffa05da855>] ovs_flow_cmd_fill_info+0x185/0x360 [openvswitch] [<ffffffffa05daf05>] ovs_flow_cmd_build_info.constprop.27+0x55/0x90 [openvswitch] [<ffffffffa05db41d>] ovs_flow_cmd_new_or_set+0x4dd/0x570 [openvswitch] [<ffffffff816c245d>] genl_family_rcv_msg+0x1cd/0x3f0 [<ffffffff816c270e>] genl_rcv_msg+0x8e/0xd0 [<ffffffff816c0239>] netlink_rcv_skb+0xa9/0xc0 [<ffffffff816c0798>] genl_rcv+0x28/0x40 [<ffffffff816bf830>] netlink_unicast+0x100/0x1e0 [<ffffffff816bfc57>] netlink_sendmsg+0x347/0x770 [<ffffffff81668e9c>] sock_sendmsg+0x9c/0xe0 [<ffffffff816692d9>] ___sys_sendmsg+0x3a9/0x3c0 [<ffffffff8166a911>] __sys_sendmsg+0x51/0x90 [<ffffffff8166a962>] SyS_sendmsg+0x12/0x20 [<ffffffff817e3ce9>] system_call_fastpath+0x16/0x1b irq event stamp: 1740726 hardirqs last enabled at (1740726): [<ffffffff8175d5e0>] ip6_finish_output2+0x4f0/0x840 hardirqs last disabled at (1740725): [<ffffffff8175d59b>] ip6_finish_output2+0x4ab/0x840 softirqs last enabled at (1740674): [<ffffffff8109be12>] _local_bh_enable+0x22/0x50 softirqs last disabled at (1740675): [<ffffffff8109db05>] irq_exit+0xc5/0xd0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&cpu_stats->lock)->rlock); <Interrupt> lock(&(&cpu_stats->lock)->rlock); *** DEADLOCK *** 5 locks held by swapper/0/0: #0: (((&ifa->dad_timer))){+.-...}, at: [<ffffffff810a7155>] call_timer_fn+0x5/0x320 #1: (rcu_read_lock){.+.+..}, at: [<ffffffff81788a55>] mld_sendpack+0x5/0x4a0 #2: (rcu_read_lock_bh){.+....}, at: [<ffffffff8175d149>] ip6_finish_output2+0x59/0x840 #3: (rcu_read_lock_bh){.+....}, at: [<ffffffff8168ba75>] __dev_queue_xmit+0x5/0x9b0 #4: (rcu_read_lock){.+.+..}, at: [<ffffffffa05e41b5>] internal_dev_xmit+0x5/0x110 [openvswitch] stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G I 3.14.0-rc8-00007-g632b06a #1 Hardware name: /DX58SO, BIOS SOX5810J.86A.5599.2012.0529.2218 05/29/2012 0000000000000000 0fcf20709903df0c ffff88042d603808 ffffffff817cfe3c ffffffff81c134c0 ffff88042d603858 ffffffff817cb6da 0000000000000005 ffffffff00000001 ffff880400000000 0000000000000006 ffffffff81c134c0 Call Trace: <IRQ> [<ffffffff817cfe3c>] dump_stack+0x4d/0x66 [<ffffffff817cb6da>] print_usage_bug+0x1f4/0x205 [<ffffffff810f7f10>] ? check_usage_backwards+0x180/0x180 [<ffffffff810f8963>] mark_lock+0x223/0x2b0 [<ffffffff810f96d3>] __lock_acquire+0x623/0x1c40 [<ffffffff810f5707>] ? __lock_is_held+0x57/0x80 [<ffffffffa05e26c6>] ? masked_flow_lookup+0x236/0x250 [openvswitch] [<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch] [<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch] [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch] [<ffffffffa05dcc64>] ovs_dp_process_received_packet+0x84/0x120 [openvswitch] [<ffffffff810f93f7>] ? __lock_acquire+0x347/0x1c40 [<ffffffffa05e3bea>] ovs_vport_receive+0x2a/0x30 [openvswitch] [<ffffffffa05e4218>] internal_dev_xmit+0x68/0x110 [openvswitch] [<ffffffffa05e41b5>] ? internal_dev_xmit+0x5/0x110 [openvswitch] [<ffffffff8168b4a6>] dev_hard_start_xmit+0x2e6/0x8b0 [<ffffffff8168be87>] __dev_queue_xmit+0x417/0x9b0 [<ffffffff8168ba75>] ? __dev_queue_xmit+0x5/0x9b0 [<ffffffff8175d5e0>] ? ip6_finish_output2+0x4f0/0x840 [<ffffffff8168c430>] dev_queue_xmit+0x10/0x20 [<ffffffff8175d641>] ip6_finish_output2+0x551/0x840 [<ffffffff8176128a>] ? ip6_finish_output+0x9a/0x220 [<ffffffff8176128a>] ip6_finish_output+0x9a/0x220 [<ffffffff8176145f>] ip6_output+0x4f/0x1f0 [<ffffffff81788c29>] mld_sendpack+0x1d9/0x4a0 [<ffffffff817895b8>] mld_send_initial_cr.part.32+0x88/0xa0 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220 [<ffffffff8178e301>] ipv6_mc_dad_complete+0x31/0x50 [<ffffffff817690d7>] addrconf_dad_completed+0x147/0x220 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220 [<ffffffff8176934f>] addrconf_dad_timer+0x19f/0x1c0 [<ffffffff810a71e9>] call_timer_fn+0x99/0x320 [<ffffffff810a7155>] ? call_timer_fn+0x5/0x320 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220 [<ffffffff810a76c4>] run_timer_softirq+0x254/0x3b0 [<ffffffff8109d47d>] __do_softirq+0x12d/0x480 Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>

addrconf_join_solict and addrconf_join_anycast may cause actions which need rtnl locked, especially on first address creation. A new DAD state is introduced which defers processing of the initial DAD processing into a workqueue. To get rtnl lock we need to push the code paths which depend on those calls up to workqueues, specifically addrconf_verify and the DAD processing. (v2) addrconf_dad_failure needs to be queued up to the workqueue, too. This patch introduces a new DAD state and stop the DAD processing in the workqueue (this is because of the possible ipv6_del_addr processing which removes the solicited multicast address from the device). addrconf_verify_lock is removed, too. After the transition it is not needed any more. As we are not processing in bottom half anymore we need to be a bit more careful about disabling bottom half out when we lock spin_locks which are also used in bh. Relevant backtrace: [ 541.030090] RTNL: assertion failed at net/core/dev.c (4496) [ 541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 3.10.33-1-amd64-vyatta #1 [ 541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 541.031146] ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8 [ 541.031148] 0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18 [ 541.031150] 0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000 [ 541.031152] Call Trace: [ 541.031153] <IRQ> [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17 [ 541.031180] [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180 [ 541.031183] [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0 [ 541.031185] [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0 [ 541.031189] [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90 [ 541.031198] [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6] [ 541.031208] [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0 [ 541.031212] [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6] [ 541.031216] [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6] [ 541.031219] [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6] [ 541.031223] [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6] [ 541.031226] [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6] [ 541.031229] [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6] [ 541.031233] [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6] [ 541.031241] [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50 [ 541.031244] [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6] [ 541.031247] [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6] [ 541.031255] [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90 [ 541.031258] [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6] Hunks and backtrace stolen from a patch by Stephen Hemminger. Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>

…memory range When limiting memory by mem= and ACPI DSDT table has PNP0C80, firmware_map_entrys of same memory range are allocated and memmap X sysfses which have same memory range are created as follows: # cat /sys/firmware/memmap/0/* 0x407ffffffff 0x40000000000 System RAM # cat /sys/firmware/memmap/33/* 0x407ffffffff 0x40000000000 System RAM # cat /sys/firmware/memmap/35/* 0x407ffffffff 0x40000000000 System RAM In this case, when hot-removing memory, kernel panic occurs, showing following call trace: BUG: unable to handle kernel paging request at 00000001003e000b IP: sysfs_open_file+0x46/0x2b0 PGD 203a89fe067 PUD 0 Oops: 0000 [#1] SMP ... Call Trace: do_dentry_open+0x1ef/0x2a0 finish_open+0x31/0x40 do_last+0x57c/0x1220 path_openat+0xc2/0x4c0 do_filp_open+0x4b/0xb0 do_sys_open+0xf3/0x1f0 SyS_open+0x1e/0x20 system_call_fastpath+0x16/0x1b The problem occurs as follows: When calling e820_reserve_resources(), firmware_map_entrys of all e820 memory map are allocated. And all firmware_map_entrys is added map_entries list as follows: map_entries -> +--- entry A --------+ -> ... | start 0x407ffffffff| | end 0x40000000000| | type System RAM | +--------------------+ After that, if ACPI DSDT table has PNP0C80 and the memory range is limited by mem=, the PNP0C80 is hot-added. Then firmware_map_entry of PNP0C80 is allocated and added map_entries list as follows: map_entries -> +--- entry A --------+ -> ... -> +--- entry B --------+ | start 0x407ffffffff| | start 0x407ffffffff| | end 0x40000000000| | end 0x40000000000| | type System RAM | | type System RAM | +--------------------+ +--------------------+ Then memmap 0 sysfs for entry B is created. After that, firmware_memmap_init() creates memmap sysfses of all firmware_map_entrys in map_entries list. As a result, memmap 33 sysfs for entry A and memmap 35 sysfs for entry B are created. But kobject of entry B has been used by memmap 0 sysfs. So when creating memmap 35 sysfs, the kobject is broken. If hot-removing memory, memmap 0 sysfs is destroyed and kobject of memmap 0 sysfs is freed. But the kobject can be accessed via memmap 35 sysfs. So when open memmap 35 sysfs, kernel panic occurs. This patch checks whether there is firmware_map_entry of same memory range in map_entries list and don't allocate firmware_map_entry of same memroy range. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

When ACPI_HOTPLUG_MEMORY is not configured, memory_device_handler.attach is not set. In acpi_scan_attach_handler(), the acpi_device->handler will not be initialized. In acpi_scan_hot_remove(), it doesn't check if acpi_device->handler is NULL. If we do memory hot-remove without ACPI_HOTPLUG_MEMORY configured, the kernel will panic. BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 IP: [<ffffffff813e318f>] acpi_device_hotplug+0x1d7/0x4c4 PGD 0 Oops: 0000 [#1] SMP Modules linked in: sd_mod(E) sr_mod(E) cdrom(E) crc_t10dif(E) crct10dif_common(E) ata_piix(E) libata(E) CPU: 0 PID: 41 Comm: kworker/u2:1 Tainted: G E 3.16.0-rc7--3.16-rc7-tangchen+ torvalds#20 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Workqueue: kacpi_hotplug acpi_hotplug_work_fn task: ffff8800182436c0 ti: ffff880018254000 task.ti: ffff880018254000 RIP: 0010:[<ffffffff813e318f>] [<ffffffff813e318f>] acpi_device_hotplug+0x1d7/0x4c4 RSP: 0000:ffff880018257da8 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88001cd8d800 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88001e40e6f8 RDI: 0000000000000246 RBP: ffff880018257df0 R08: 0000000000000096 R09: 00000000000011a0 R10: 63735f6970636120 R11: 725f746f685f6e61 R12: 0000000000000003 R13: ffff88001cc1c400 R14: ffff88001e062028 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff88001e400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000088 CR3: 000000001a9a2000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 Stack: 00000000523cab58 ffff88001cd8d9f8 ffff88001852d480 00000000523cab58 ffff88001852d480 ffff880018221e40 ffff88001cc1c400 ffff88001cce2d00 0000000000000040 ffff880018257e08 ffffffff813dc31d ffff88001852d480 Call Trace: [<ffffffff813dc31d>] acpi_hotplug_work_fn+0x1e/0x29 [<ffffffff8108eefb>] process_one_work+0x17b/0x460 [<ffffffff8108f69d>] worker_thread+0x11d/0x5b0 [<ffffffff8108f580>] ? rescuer_thread+0x3a0/0x3a0 [<ffffffff81096811>] kthread+0xe1/0x100 [<ffffffff81096730>] ? kthread_create_on_node+0x1a0/0x1a0 [<ffffffff816cc6bc>] ret_from_fork+0x7c/0xb0 [<ffffffff81096730>] ? kthread_create_on_node+0x1a0/0x1a0 This patch fixes this problem by checking if acpi_device->handler is NULL in acpi_scan_hot_remove(). Fixes: d22ddcb (ACPI / hotplug: Add demand_offline hotplug profile flag) Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ [rjw: Subject] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

This patch integrates creation of sysfs groups and attributes into NILFS file system driver. It was found the issue with nilfs_sysfs_{create/delete}_snapshot_group functions by Michael L Semon <mlsemon35@gmail.com> in the first version of the patch: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:579 in_atomic(): 1, irqs_disabled(): 0, pid: 32676, name: umount.nilfs2 2 locks held by umount.nilfs2/32676: #0: (&type->s_umount_key#21){++++..}, at: [<790c18e2>] deactivate_super+0x37/0x58 #1: (&(&nilfs->ns_cptree_lock)->rlock){+.+...}, at: [<791bf659>] nilfs_put_root+0x23/0x5a Preemption disabled at:[<791bf659>] nilfs_put_root+0x23/0x5a CPU: 0 PID: 32676 Comm: umount.nilfs2 Not tainted 3.14.0+ #2 Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002 Call Trace: dump_stack+0x4b/0x75 __might_sleep+0x111/0x16f mutex_lock_nested+0x1e/0x3ad kernfs_remove+0x12/0x26 sysfs_remove_dir+0x3d/0x62 kobject_del+0x13/0x38 nilfs_sysfs_delete_snapshot_group+0xb/0xd nilfs_put_root+0x2a/0x5a nilfs_detach_log_writer+0x1ab/0x2c1 nilfs_put_super+0x13/0x68 generic_shutdown_super+0x60/0xd1 kill_block_super+0x1d/0x60 deactivate_locked_super+0x22/0x3f deactivate_super+0x3e/0x58 mntput_no_expire+0xe2/0x141 SyS_oldumount+0x70/0xa5 syscall_call+0x7/0xb The reason of the issue was placement of nilfs_sysfs_{create/delete}_snapshot_group() call under nilfs->ns_cptree_lock protection. But this protection is unnecessary and wrong solution. The second version of the patch fixes this issue. [fengguang.wu@intel.com: nilfs_sysfs_create_mounted_snapshots_group can be static] Reported-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Vyacheslav Dubeyko <Vyacheslav.Dubeyko@hgst.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Function remove_ddw() could be called in of_reconfig_notifier and we potentially remove the dynamic DMA window property, which invokes of_reconfig_notifier again. Eventually, it leads to the deadlock as following backtrace shows. The patch fixes the above issue by deferring releasing the dynamic DMA window property while releasing the device node. ============================================= [ INFO: possible recursive locking detected ] 3.16.0+ torvalds#428 Tainted: G W --------------------------------------------- drmgr/2273 is trying to acquire lock: ((of_reconfig_chain).rwsem){.+.+..}, at: [<c000000000091890>] \ .__blocking_notifier_call_chain+0x40/0x78 but task is already holding lock: ((of_reconfig_chain).rwsem){.+.+..}, at: [<c000000000091890>] \ .__blocking_notifier_call_chain+0x40/0x78 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock((of_reconfig_chain).rwsem); lock((of_reconfig_chain).rwsem); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by drmgr/2273: #0: (sb_writers#4){.+.+.+}, at: [<c0000000001cbe70>] \ .vfs_write+0xb0/0x1f8 #1: ((of_reconfig_chain).rwsem){.+.+..}, at: [<c000000000091890>] \ .__blocking_notifier_call_chain+0x40/0x78 stack backtrace: CPU: 17 PID: 2273 Comm: drmgr Tainted: G W 3.16.0+ torvalds#428 Call Trace: [c0000000137e7000] [c000000000013d9c] .show_stack+0x88/0x148 (unreliable) [c0000000137e70b0] [c00000000083cd34] .dump_stack+0x7c/0x9c [c0000000137e7130] [c0000000000b8afc] .__lock_acquire+0x128c/0x1c68 [c0000000137e7280] [c0000000000b9a4c] .lock_acquire+0xe8/0x104 [c0000000137e7350] [c00000000083588c] .down_read+0x4c/0x90 [c0000000137e73e0] [c000000000091890] .__blocking_notifier_call_chain+0x40/0x78 [c0000000137e7490] [c000000000091900] .blocking_notifier_call_chain+0x38/0x48 [c0000000137e7520] [c000000000682a28] .of_reconfig_notify+0x34/0x5c [c0000000137e75b0] [c000000000682a9c] .of_property_notify+0x4c/0x54 [c0000000137e7650] [c000000000682bf0] .of_remove_property+0x30/0xd4 [c0000000137e76f0] [c000000000052a44] .remove_ddw+0x144/0x168 [c0000000137e7790] [c000000000053204] .iommu_reconfig_notifier+0x30/0xe0 [c0000000137e7820] [c00000000009137c] .notifier_call_chain+0x6c/0xb4 [c0000000137e78c0] [c0000000000918ac] .__blocking_notifier_call_chain+0x5c/0x78 [c0000000137e7970] [c000000000091900] .blocking_notifier_call_chain+0x38/0x48 [c0000000137e7a00] [c000000000682a28] .of_reconfig_notify+0x34/0x5c [c0000000137e7a90] [c000000000682e14] .of_detach_node+0x44/0x1fc [c0000000137e7b40] [c0000000000518e4] .ofdt_write+0x3ac/0x688 [c0000000137e7c20] [c000000000238430] .proc_reg_write+0xb8/0xd4 [c0000000137e7cd0] [c0000000001cbeac] .vfs_write+0xec/0x1f8 [c0000000137e7d70] [c0000000001cc3b0] .SyS_write+0x58/0xa0 [c0000000137e7e30] [c00000000000a064] syscall_exit+0x0/0x98 Cc: stable@vger.kernel.org Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

This reverts commit a188a54. It causes crashes ==================== [ 80.643286] BUG: unable to handle kernel NULL pointer dereference at 0000000000000878 [ 80.670103] IP: [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0 [ 80.691289] PGD 22c102067 PUD 235bf0067 PMD 0 [ 80.706611] Oops: 0002 [#1] SMP [ 80.717836] Modules linked in: macvlan nfsd lockd nfs_acl exportfs auth_rpcgss sunrpc oid_registry ioatdma ixgbe(-) mdio igb dca [ 80.757935] CPU: 37 PID: 6724 Comm: rmmod Not tainted 3.16.0-net-next-08-12-2014-FCoE+ #1 [ 80.785688] Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014 [ 80.820310] task: ffff880235a9eae0 ti: ffff88022e844000 task.ti: ffff88022e844000 [ 80.845770] RIP: 0010:[<ffffffff810832e4>] [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0 [ 80.875326] RSP: 0018:ffff88022e847b28 EFLAGS: 00010046 [ 80.893251] RAX: 0000000000037a6a RBX: 0000000000000878 RCX: 0000000000000000 [ 80.917187] RDX: ffff880235a9eae0 RSI: 0000000000000001 RDI: ffffffff810832db [ 80.941125] RBP: ffff88022e847b58 R08: 0000000000000000 R09: 0000000000000000 [ 80.965056] R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022e847b70 [ 80.988994] R13: 0000000000000000 R14: ffff88022e847be8 R15: ffffffff81ebe440 [ 81.012929] FS: 00007fab90b07700(0000) GS:ffff88043f7a0000(0000) knlGS:0000000000000000 [ 81.040400] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 81.059757] CR2: 0000000000000878 CR3: 0000000235a42000 CR4: 00000000001407e0 [ 81.083689] Stack: [ 81.090739] ffff880235a9eae0 0000000000000878 ffff88022e847b70 0000000000000000 [ 81.116253] ffff88022e847be8 ffffffff81ebe440 ffff88022e847b98 ffffffff810847f1 [ 81.141766] ffff88022e847b78 0000000000000286 ffff880234200000 0000000000000000 [ 81.167282] Call Trace: [ 81.175768] [<ffffffff810847f1>] __cancel_work_timer+0x31/0x170 [ 81.195985] [<ffffffff8108494b>] cancel_work_sync+0xb/0x10 [ 81.214769] [<ffffffffa015ae68>] macvlan_port_destroy+0x28/0x60 [macvlan] [ 81.237844] [<ffffffffa015b930>] macvlan_uninit+0x40/0x50 [macvlan] [ 81.259209] [<ffffffff816bf6e2>] rollback_registered_many+0x1a2/0x2c0 [ 81.281140] [<ffffffff816bf81a>] unregister_netdevice_many+0x1a/0xb0 [ 81.302786] [<ffffffffa015a4ff>] macvlan_device_event+0x1ef/0x240 [macvlan] [ 81.326439] [<ffffffff8108a13d>] notifier_call_chain+0x4d/0x70 [ 81.346366] [<ffffffff8108a201>] raw_notifier_call_chain+0x11/0x20 [ 81.367439] [<ffffffff816bf25b>] call_netdevice_notifiers_info+0x3b/0x70 [ 81.390228] [<ffffffff816bf2a1>] call_netdevice_notifiers+0x11/0x20 [ 81.411587] [<ffffffff816bf6bd>] rollback_registered_many+0x17d/0x2c0 [ 81.433518] [<ffffffff816bf925>] unregister_netdevice_queue+0x75/0x110 [ 81.455735] [<ffffffff816bfb2b>] unregister_netdev+0x1b/0x30 [ 81.475094] [<ffffffffa0039b50>] ixgbe_remove+0x170/0x1d0 [ixgbe] [ 81.495886] [<ffffffff813512a2>] pci_device_remove+0x32/0x60 [ 81.515246] [<ffffffff814c75c4>] __device_release_driver+0x64/0xd0 [ 81.536321] [<ffffffff814c76f8>] driver_detach+0xc8/0xd0 [ 81.554530] [<ffffffff814c656e>] bus_remove_driver+0x4e/0xa0 [ 81.573888] [<ffffffff814c828b>] driver_unregister+0x2b/0x60 [ 81.593246] [<ffffffff8135143e>] pci_unregister_driver+0x1e/0xa0 [ 81.613749] [<ffffffffa005db18>] ixgbe_exit_module+0x1c/0x2e [ixgbe] [ 81.635401] [<ffffffff810e738b>] SyS_delete_module+0x15b/0x1e0 [ 81.655334] [<ffffffff8187a395>] ? sysret_check+0x22/0x5d [ 81.673833] [<ffffffff810abd2d>] ? trace_hardirqs_on_caller+0x11d/0x1e0 [ 81.696339] [<ffffffff8132bfde>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 81.717985] [<ffffffff8187a369>] system_call_fastpath+0x16/0x1b [ 81.738199] Code: 00 48 83 3d 6e bb da 00 00 48 89 c2 0f 84 67 01 00 00 fa 66 0f 1f 44 00 00 49 89 14 24 e8 b5 4b 02 00 45 84 ed 0f 85 ac 00 00 00 <f0> 0f ba 2b 00 72 1d 31 c0 48 8b 5d d8 4c 8b 65 e0 4c 8b 6d e8 [ 81.807026] RIP [<ffffffff810832e4>] try_to_grab_pending+0x64/0x1f0 [ 81.828468] RSP <ffff88022e847b28> [ 81.840384] CR2: 0000000000000878 [ 81.851731] ---[ end trace 9f6c7232e3464e11 ]--- ==================== This bug could be triggered by these steps: modprobe ixgbe ; modprobe macvlan ip link add link p96p1 address 00:1B:21:6E:06:00 macvlan0 type macvlan ip link add link p96p1 address 00:1B:21:6E:06:01 macvlan1 type macvlan ip link add link p96p1 address 00:1B:21:6E:06:02 macvlan2 type macvlan ip link add link p96p1 address 00:1B:21:6E:06:03 macvlan3 type macvlan rmmod ixgbe Reported-by: "Keller, Jacob E" <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>

We don't know right timestamp for repaired skb-s. Wrong RTT estimations isn't good, because some congestion modules heavily depends on it. This patch adds the TCPCB_REPAIRED flag, which is included in TCPCB_RETRANS. Thanks to Eric for the advice how to fix this issue. This patch fixes the warning: [ 879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380() [ 879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1 [ 879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 879.568177] 0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2 [ 879.568776] 0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80 [ 879.569386] ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc [ 879.569982] Call Trace: [ 879.570264] [<ffffffff817aa2d2>] dump_stack+0x4d/0x66 [ 879.570599] [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0 [ 879.570935] [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20 [ 879.571292] [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380 [ 879.571614] [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710 [ 879.571958] [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370 [ 879.572315] [<ffffffff81657459>] release_sock+0x89/0x1d0 [ 879.572642] [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860 [ 879.573000] [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80 [ 879.573352] [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40 [ 879.573678] [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20 [ 879.574031] [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0 [ 879.574393] [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b [ 879.574730] ---[ end trace a17cbc38eb8c5c00 ]--- v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit, where it's set for other skb-s. Fixes: 431a912 ("tcp: timestamp SYN+DATA messages") Fixes: 740b0f1 ("tcp: switch rtt estimations to usec resolution") Cc: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>

We've got bug reports that btrfs crashes when quota is enabled on 32bit kernel, typically with the Oops like below: BUG: unable to handle kernel NULL pointer dereference at 00000004 IP: [<f9234590>] find_parent_nodes+0x360/0x1380 [btrfs] *pde = 00000000 Oops: 0000 [#1] SMP CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S W 3.15.2-1.gd43d97e-default #1 Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs] task: f1478130 ti: f147c000 task.ti: f147c000 EIP: 0060:[<f9234590>] EFLAGS: 00010213 CPU: 0 EIP is at find_parent_nodes+0x360/0x1380 [btrfs] EAX: f147dda8 EBX: f147ddb0 ECX: 00000011 EDX: 00000000 ESI: 00000000 EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 00000004 CR3: 00bf3000 CR4: 00000690 Stack: 00000000 00000000 f147dda4 00000050 00000001 00000000 00000001 00000050 00000001 00000000 d3059000 00000001 00000022 000000a8 00000000 00000000 00000000 000000a1 00000000 00000000 00000001 00000000 00000000 11800000 Call Trace: [<f923564d>] __btrfs_find_all_roots+0x9d/0xf0 [btrfs] [<f9237bb1>] btrfs_qgroup_rescan_worker+0x401/0x760 [btrfs] [<f9206148>] normal_work_helper+0xc8/0x270 [btrfs] [<c025e38b>] process_one_work+0x11b/0x390 [<c025eea1>] worker_thread+0x101/0x340 [<c026432b>] kthread+0x9b/0xb0 [<c0712a71>] ret_from_kernel_thread+0x21/0x30 [<c0264290>] kthread_create_on_node+0x110/0x110 This indicates a NULL corruption in prefs_delayed list. The further investigation and bisection pointed that the call of ulist_add_merge() results in the corruption. ulist_add_merge() takes u64 as aux and writes a 64bit value into old_aux. The callers of this function in backref.c, however, pass a pointer of a pointer to old_aux. That is, the function overwrites 64bit value on 32bit pointer. This caused a NULL in the adjacent variable, in this case, prefs_delayed. Here is a quick attempt to band-aid over this: a new function, ulist_add_merge_ptr() is introduced to pass/store properly a pointer value instead of u64. There are still ugly void ** cast remaining in the callers because void ** cannot be taken implicitly. But, it's safer than explicit cast to u64, anyway. Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=887046 Cc: <stable@vger.kernel.org> [v3.11+] Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Chris Mason <clm@fb.com>

Peter Zijlstra and others added 30 commits January 13, 2014 17:39

x86, microcode: Share native MSR accessing variants

e1b43e3

We want to use those in AMD's early loading path too. Also, add a native_wrmsrl variant. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>

mei: limit the number of consecutive resets

6adb8ef

give up reseting after 3 unsuccessful tries Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pull from torvalds #1

pull from torvalds #1

e-mailky commented Jan 21, 2014

pull from torvalds #1

pull from torvalds #1

Conversation

e-mailky commented Jan 21, 2014