8 years agorcu: Add more information to the wrong-idle-task complaint
Paul E. McKenney [Tue, 1 Nov 2011 15:57:21 +0000]
rcu: Add more information to the wrong-idle-task complaint

The current code just complains if the current task is not the idle task.
This commit therefore adds printing of the identity of the idle task.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Deconfuse dynticks entry-exit tracing
Paul E. McKenney [Mon, 31 Oct 2011 22:01:54 +0000]
rcu: Deconfuse dynticks entry-exit tracing

The trace_rcu_dyntick() trace event did not print both the old and
the new value of the nesting level, and furthermore printed only
the low-order 32 bits of it.  This could result in some confusion
when interpreting trace-event dumps, so this commit prints both
the old and the new value, prints the full 64 bits, and also selects
the process-entry/exit increment to print nicely in hexadecimal.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Add documentation for raw SRCU read-side primitives
Paul E. McKenney [Thu, 3 Nov 2011 20:43:24 +0000]
rcu: Add documentation for raw SRCU read-side primitives

Update various files in Documentation/RCU to reflect srcu_read_lock_raw()
and srcu_read_unlock_raw().  Credit to Peter Zijlstra for suggesting
use of the existing _raw suffix instead of the earlier bulkref names.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

8 years agorcu: Introduce raw SRCU read-side primitives
Paul E. McKenney [Sun, 9 Oct 2011 22:13:11 +0000]
rcu: Introduce raw SRCU read-side primitives

The RCU implementations, including SRCU, are designed to be used in a
lock-like fashion, so that the read-side lock and unlock primitives must
execute in the same context for any given read-side critical section.
This constraint is enforced by lockdep-RCU.  However, there is a need
to enter an SRCU read-side critical section within the context of an
exception and then exit in the context of the task that encountered the
exception.  The cost of this capability is that the read-side operations
incur the overhead of disabling interrupts.

Note that although the current implementation allows a given read-side
critical section to be entered by one task and then exited by another, all
known possible implementations that allow this have scalability problems.
Therefore, a given read-side critical section must be exited by the same
task that entered it, though perhaps from an interrupt or exception
handler running within that task's context.  But if you are thinking
in terms of interrupt handlers, make sure that you have considered the
possibility of threaded interrupt handlers.

Credit goes to Peter Zijlstra for suggesting use of the existing _raw
suffix to indicate disabling lockdep over the earlier "bulkref" names.

Requested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

8 years agopowerpc: Tell RCU about idle after hcall tracing
Paul E. McKenney [Fri, 14 Oct 2011 02:05:12 +0000]
powerpc: Tell RCU about idle after hcall tracing

The PowerPC pSeries platform (CONFIG_PPC_PSERIES=y) enables
hypervisor-call tracing for CONFIG_TRACEPOINTS=y kernels.  One of the
hypervisor calls that is traced is the H_CEDE call in the idle loop
that tells the hypervisor that this OS instance no longer needs the
current CPU.  However, tracing uses RCU, so this combination of kernel
configuration variables needs to avoid telling RCU about the current CPU's
idleness until after the H_CEDE-entry tracing completes on the one hand,
and must tell RCU that the the current CPU is no longer idle before the
H_CEDE-exit tracing starts.

In all other cases, it suffices to inform RCU of CPU idleness upon
idle-loop entry and exit.

This commit makes the required adjustments.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Fix early call to rcu_idle_enter()
Frederic Weisbecker [Fri, 7 Oct 2011 23:31:02 +0000]
rcu: Fix early call to rcu_idle_enter()

On the irq exit path, tick_nohz_irq_exit()
may raise a softirq, which action leads to the wake up
path and select_task_rq_fair() that makes use of rcu
to iterate the domains.

This is an illegal use of RCU because we may be in RCU
extended quiescent state if we interrupted an RCU-idle
window in the idle loop:

[  132.978883] ===============================
[  132.978883] [ INFO: suspicious RCU usage. ]
[  132.978883] -------------------------------
[  132.978883] kernel/sched_fair.c:1707 suspicious rcu_dereference_check() usage!
[  132.978883]
[  132.978883] other info that might help us debug this:
[  132.978883]
[  132.978883]
[  132.978883] rcu_scheduler_active = 1, debug_locks = 0
[  132.978883] RCU used illegally from extended quiescent state!
[  132.978883] 2 locks held by swapper/0:
[  132.978883]  #0:  (&p->pi_lock){-.-.-.}, at: [<ffffffff8105a729>] try_to_wake_up+0x39/0x2f0
[  132.978883]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8105556a>] select_task_rq_fair+0x6a/0xec0
[  132.978883]
[  132.978883] stack backtrace:
[  132.978883] Pid: 0, comm: swapper Tainted: G        W   3.0.0+ #178
[  132.978883] Call Trace:
[  132.978883]  <IRQ>  [<ffffffff810a01f6>] lockdep_rcu_suspicious+0xe6/0x100
[  132.978883]  [<ffffffff81055c49>] select_task_rq_fair+0x749/0xec0
[  132.978883]  [<ffffffff8105556a>] ? select_task_rq_fair+0x6a/0xec0
[  132.978883]  [<ffffffff812fe494>] ? do_raw_spin_lock+0x54/0x150
[  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[  132.978883]  [<ffffffff8105a7c3>] try_to_wake_up+0xd3/0x2f0
[  132.978883]  [<ffffffff81094f98>] ? ktime_get+0x68/0xf0
[  132.978883]  [<ffffffff8105aa35>] wake_up_process+0x15/0x20
[  132.978883]  [<ffffffff81069dd5>] raise_softirq_irqoff+0x65/0x110
[  132.978883]  [<ffffffff8108eb65>] __hrtimer_start_range_ns+0x415/0x5a0
[  132.978883]  [<ffffffff812fe3ee>] ? do_raw_spin_unlock+0x5e/0xb0
[  132.978883]  [<ffffffff8108ed08>] hrtimer_start+0x18/0x20
[  132.978883]  [<ffffffff8109c9c3>] tick_nohz_stop_sched_tick+0x393/0x450
[  132.978883]  [<ffffffff810694f2>] irq_exit+0xd2/0x100
[  132.978883]  [<ffffffff81829e96>] do_IRQ+0x66/0xe0
[  132.978883]  [<ffffffff81820d53>] common_interrupt+0x13/0x13
[  132.978883]  <EOI>  [<ffffffff8103434b>] ? native_safe_halt+0xb/0x10
[  132.978883]  [<ffffffff810a1f2d>] ? trace_hardirqs_on+0xd/0x10
[  132.978883]  [<ffffffff810144ea>] default_idle+0xba/0x370
[  132.978883]  [<ffffffff810147fe>] amd_e400_idle+0x5e/0x130
[  132.978883]  [<ffffffff8100a9f6>] cpu_idle+0xb6/0x120
[  132.978883]  [<ffffffff817f217f>] rest_init+0xef/0x150
[  132.978883]  [<ffffffff817f20e2>] ? rest_init+0x52/0x150
[  132.978883]  [<ffffffff81ed9cf3>] start_kernel+0x3da/0x3e5
[  132.978883]  [<ffffffff81ed9346>] x86_64_start_reservations+0x131/0x135
[  132.978883]  [<ffffffff81ed944d>] x86_64_start_kernel+0x103/0x112

Fix this by calling rcu_idle_enter() after tick_nohz_irq_exit().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agox86: Call idle notifier after irq_enter()
Frederic Weisbecker [Fri, 7 Oct 2011 16:22:09 +0000]
x86: Call idle notifier after irq_enter()

Interrupts notify the idle exit state before calling irq_enter().
But the notifier code calls rcu_read_lock() and this is not
allowed while rcu is in an extended quiescent state. We need
to wait for irq_enter() -> rcu_idle_exit() to be called before
doing so otherwise this results in a grumpy RCU:

[    0.099991] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[    0.099991] Hardware name: AMD690VM-FMH
[    0.099991] Modules linked in:
[    0.099991] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #255
[    0.099991] Call Trace:
[    0.099991]  <IRQ>  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[    0.099991]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[    0.099991]  [<ffffffff817d6fa2>] __atomic_notifier_call_chain+0xd2/0x110
[    0.099991]  [<ffffffff817d6ff1>] atomic_notifier_call_chain+0x11/0x20
[    0.099991]  [<ffffffff81001873>] exit_idle+0x43/0x50
[    0.099991]  [<ffffffff81020439>] smp_apic_timer_interrupt+0x39/0xa0
[    0.099991]  [<ffffffff817da253>] apic_timer_interrupt+0x13/0x20
[    0.099991]  <EOI>  [<ffffffff8100ae67>] ? default_idle+0xa7/0x350
[    0.099991]  [<ffffffff8100ae65>] ? default_idle+0xa5/0x350
[    0.099991]  [<ffffffff8100b19b>] amd_e400_idle+0x8b/0x110
[    0.099991]  [<ffffffff810cb01f>] ? rcu_enter_nohz+0x8f/0x160
[    0.099991]  [<ffffffff810019a0>] cpu_idle+0xb0/0x110
[    0.099991]  [<ffffffff817a7505>] rest_init+0xe5/0x140
[    0.099991]  [<ffffffff817a7468>] ? rest_init+0x48/0x140
[    0.099991]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[    0.099991]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[    0.099991]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Henroid <andrew.d.henroid@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agox86: Enter rcu extended qs after idle notifier call
Frederic Weisbecker [Fri, 7 Oct 2011 16:22:08 +0000]
x86: Enter rcu extended qs after idle notifier call

The idle notifier, called by enter_idle(), enters into rcu read
side critical section but at that time we already switched into
the RCU-idle window (rcu_idle_enter() has been called). And it's
illegal to use rcu_read_lock() in that state.

This results in rcu reporting its bad mood:

[    1.275635] WARNING: at include/linux/rcupdate.h:194 __atomic_notifier_call_chain+0xd2/0x110()
[    1.275635] Hardware name: AMD690VM-FMH
[    1.275635] Modules linked in:
[    1.275635] Pid: 0, comm: swapper Not tainted 3.0.0-rc6+ #252
[    1.275635] Call Trace:
[    1.275635]  [<ffffffff81051c8a>] warn_slowpath_common+0x7a/0xb0
[    1.275635]  [<ffffffff81051cd5>] warn_slowpath_null+0x15/0x20
[    1.275635]  [<ffffffff817d6f22>] __atomic_notifier_call_chain+0xd2/0x110
[    1.275635]  [<ffffffff817d6f71>] atomic_notifier_call_chain+0x11/0x20
[    1.275635]  [<ffffffff810018a0>] enter_idle+0x20/0x30
[    1.275635]  [<ffffffff81001995>] cpu_idle+0xa5/0x110
[    1.275635]  [<ffffffff817a7465>] rest_init+0xe5/0x140
[    1.275635]  [<ffffffff817a73c8>] ? rest_init+0x48/0x140
[    1.275635]  [<ffffffff81cc5ca3>] start_kernel+0x3d1/0x3dc
[    1.275635]  [<ffffffff81cc5321>] x86_64_start_reservations+0x131/0x135
[    1.275635]  [<ffffffff81cc5412>] x86_64_start_kernel+0xed/0xf4
[    1.275635] ---[ end trace a22d306b065d4a66 ]---

Fix this by entering rcu extended quiescent state later, just before
the CPU goes to sleep.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agonohz: Allow rcu extended quiescent state handling seperately from tick stop
Frederic Weisbecker [Sat, 8 Oct 2011 14:01:00 +0000]
nohz: Allow rcu extended quiescent state handling seperately from tick stop

It is assumed that rcu won't be used once we switch to tickless
mode and until we restart the tick. However this is not always
true, as in x86-64 where we dereference the idle notifiers after
the tick is stopped.

To prepare for fixing this, add two new APIs:
tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().

If no use of RCU is made in the idle loop between
tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
must instead call the new *_norcu() version such that the arch doesn't
need to call rcu_idle_enter() and rcu_idle_exit().

Otherwise the arch must call tick_nohz_enter_idle() and
tick_nohz_exit_idle() and also call explicitly:

- rcu_idle_enter() after its last use of RCU before the CPU is put
to sleep.
- rcu_idle_exit() before the first use of RCU after the CPU is woken

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

8 years agonohz: Separate out irq exit and idle loop dyntick logic
Frederic Weisbecker [Fri, 7 Oct 2011 16:22:06 +0000]
nohz: Separate out irq exit and idle loop dyntick logic

The tick_nohz_stop_sched_tick() function, which tries to delay
the next timer tick as long as possible, can be called from two

- From the idle loop to start the dytick idle mode
- From interrupt exit if we have interrupted the dyntick
idle mode, so that we reprogram the next tick event in
case the irq changed some internal state that requires this

There are only few minor differences between both that
are handled by that function, driven by the ts->inidle
cpu variable and the inidle parameter. The whole guarantees
that we only update the dyntick mode on irq exit if we actually
interrupted the dyntick idle mode, and that we enter in RCU extended
quiescent state from idle loop entry only.

Split this function into:

- tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
dynticks idle mode unconditionally if it can, and enters into RCU
extended quiescent state.

- tick_nohz_irq_exit() which only updates the dynticks idle mode
when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).

To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
into tick_nohz_idle_exit().

This simplifies the code and micro-optimize the irq exit path (no need
for local_irq_save there). This also prepares for the split between
dynticks and rcu extended quiescent state logics. We'll need this split to
further fix illegal uses of RCU in extended quiescent states in the idle

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Make srcu_read_lock_held() call common lockdep-enabled function
Paul E. McKenney [Fri, 7 Oct 2011 16:22:05 +0000]
rcu: Make srcu_read_lock_held() call common lockdep-enabled function

A common debug_lockdep_rcu_enabled() function is used to check whether
RCU lockdep splats should be reported, but srcu_read_lock() does not
use it.  This commit therefore brings srcu_read_lock_held() up to date.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Warn when srcu_read_lock() is used in an extended quiescent state
Paul E. McKenney [Fri, 7 Oct 2011 16:22:04 +0000]
rcu: Warn when srcu_read_lock() is used in an extended quiescent state

Catch SRCU up to the other variants of RCU by making PROVE_RCU
complain if either srcu_read_lock() or srcu_read_lock_held() are
used from within RCU-idle mode.

Frederic reworked this to allow for the new versions of his patches
that check for extended quiescent states.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Remove one layer of abstraction from PROVE_RCU checking
Paul E. McKenney [Fri, 7 Oct 2011 16:22:03 +0000]
rcu: Remove one layer of abstraction from PROVE_RCU checking

Simplify things a bit by substituting the definitions of the single-line
rcu_read_acquire(), rcu_read_release(), rcu_read_acquire_bh(),
rcu_read_release_bh(), rcu_read_acquire_sched(), and
rcu_read_release_sched() functions at their call points.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Warn when rcu_read_lock() is used in extended quiescent state
Frederic Weisbecker [Fri, 7 Oct 2011 16:22:02 +0000]
rcu: Warn when rcu_read_lock() is used in extended quiescent state

We are currently able to detect uses of rcu_dereference_check() inside
extended quiescent states (such as the RCU-free window in idle).
But rcu_read_lock() and friends can be used without rcu_dereference(),
so that the earlier commit checking for use of rcu_dereference() and
friends while in RCU idle mode miss some error conditions.  This commit
therefore adds extended quiescent state checking to rcu_read_lock() and

Uses of RCU from within RCU-idle mode are totally ignored by
RCU, hence the importance of these checks.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Inform the user about extended quiescent state on PROVE_RCU warning
Frederic Weisbecker [Fri, 7 Oct 2011 16:22:01 +0000]
rcu: Inform the user about extended quiescent state on PROVE_RCU warning

Inform the user if an RCU usage error is detected by lockdep while in
an extended quiescent state (in this case, the RCU-free window in idle).
This is accomplished by adding a line to the RCU lockdep splat indicating
whether or not the splat occurred in extended quiescent state.

Uses of RCU from within extended quiescent state mode are totally ignored
by RCU, hence the importance of this diagnostic.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Detect illegal rcu dereference in extended quiescent state
Frederic Weisbecker [Fri, 7 Oct 2011 23:25:18 +0000]
rcu: Detect illegal rcu dereference in extended quiescent state

Report that none of the rcu read lock maps are held while in an RCU
extended quiescent state (the section between rcu_idle_enter()
and rcu_idle_exit()). This helps detect any use of rcu_dereference()
and friends from within the section in idle where RCU is not allowed.

This way we can guarantee an extended quiescent window where the CPU
can be put in dyntick idle mode or can simply aoid to be part of any
global grace period completion while in the idle loop.

Uses of RCU from such mode are totally ignored by RCU, hence the
importance of these checks.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Remove redundant return from rcu_report_exp_rnp()
Thomas Gleixner [Thu, 3 Nov 2011 19:08:17 +0000]
rcu: Remove redundant return from rcu_report_exp_rnp()

Empty void functions do not need "return", so this commit removes it
from rcu_report_exp_rnp().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

8 years agorcu: Omit self-awaken when setting up expedited grace period
Thomas Gleixner [Sat, 22 Oct 2011 14:12:34 +0000]
rcu: Omit self-awaken when setting up expedited grace period

When setting up an expedited grace period, if there were no readers, the
task will awaken itself.  This commit removes this useless self-awakening.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

8 years agorcu: Disable preemption in rcu_is_cpu_idle()
Paul E. McKenney [Mon, 3 Oct 2011 18:38:52 +0000]
rcu: Disable preemption in rcu_is_cpu_idle()

Because rcu_is_cpu_idle() is to be used to check for extended quiescent
states in RCU-preempt read-side critical sections, it cannot assume that
preemption is disabled.  And preemption must be disabled when accessing
the dyntick-idle state, because otherwise the following sequence of events
could occur:

1. Task A on CPU 1 enters rcu_is_cpu_idle() and picks up the pointer
to CPU 1's per-CPU variables.

2. Task B preempts Task A and starts running on CPU 1.

3. Task A migrates to CPU 2.

4. Task B blocks, leaving CPU 1 idle.

5. Task A continues execution on CPU 2, accessing CPU 1's dyntick-idle
information using the pointer fetched in step 1 above, and finds
that CPU 1 is idle.

6. Task A therefore incorrectly concludes that it is executing in
an extended quiescent state, possibly issuing a spurious splat.

Therefore, this commit disables preemption within the rcu_is_cpu_idle()

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Document failing tick as cause of RCU CPU stall warning
Paul E. McKenney [Mon, 3 Oct 2011 00:21:18 +0000]
rcu: Document failing tick as cause of RCU CPU stall warning

One of lclaudio's systems was seeing RCU CPU stall warnings from idle.
These turned out to be caused by a bug that stopped scheduling-clock
tick interrupts from being sent to a given CPU for several hundred seconds.
This commit therefore updates the documentation to call this out as a
possible cause for RCU CPU stall warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Add failure tracing to rcutorture
Paul E. McKenney [Sun, 2 Oct 2011 14:44:32 +0000]
rcu: Add failure tracing to rcutorture

Trace the rcutorture RCU accesses and dump the trace buffer when the
first failure is detected.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agotrace: Allow ftrace_dump() to be called from modules
Paul E. McKenney [Sun, 2 Oct 2011 18:01:15 +0000]
trace: Allow ftrace_dump() to be called from modules

Add an EXPORT_SYMBOL_GPL() so that rcutorture can dump the trace buffer
upon detection of an RCU error.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Track idleness independent of idle tasks
Paul E. McKenney [Fri, 30 Sep 2011 19:10:22 +0000]
rcu: Track idleness independent of idle tasks

Earlier versions of RCU used the scheduling-clock tick to detect idleness
by checking for the idle task, but handled idleness differently for
CONFIG_NO_HZ=y.  But there are now a number of uses of RCU read-side
critical sections in the idle task, for example, for tracing.  A more
fine-grained detection of idleness is therefore required.

This commit presses the old dyntick-idle code into full-time service,
so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is
always invoked at the beginning of an idle loop iteration.  Similarly,
rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked
at the end of an idle-loop iteration.  This allows the idle task to
use RCU everywhere except between consecutive rcu_idle_enter() and
rcu_idle_exit() calls, in turn allowing architecture maintainers to
specify exactly where in the idle loop that RCU may be used.

Because some of the userspace upcall uses can result in what looks
to RCU like half of an interrupt, it is not possible to expect that
the irq_enter() and irq_exit() hooks will give exact counts.  This
patch therefore expands the ->dynticks_nesting counter to 64 bits
and uses two separate bitfields to count process/idle transitions
and interrupt entry/exit transitions.  It is presumed that userspace
upcalls do not happen in the idle loop or from usermode execution
(though usermode might do a system call that results in an upcall).
The counter is hard-reset on each process/idle transition, which
avoids the interrupt entry/exit error from accumulating.  Overflow
is avoided by the 64-bitness of the ->dyntick_nesting counter.

This commit also adds warnings if a non-idle task asks RCU to enter
idle state (and these checks will need some adjustment before applying
Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246).
In addition, validation of ->dynticks and ->dynticks_nesting is added.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agolockdep: Update documentation for lock-class leak detection
Paul E. McKenney [Wed, 28 Sep 2011 17:23:39 +0000]
lockdep: Update documentation for lock-class leak detection

There are a number of bugs that can leak or overuse lock classes,
which can cause the maximum number of lock classes (currently 8191)
to be exceeded.  However, the documentation does not tell you how to
track down these problems.  This commit addresses this shortcoming.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

8 years agorcu: Make synchronize_sched_expedited() better at work sharing
Paul E. McKenney [Thu, 22 Sep 2011 20:18:44 +0000]
rcu: Make synchronize_sched_expedited() better at work sharing

When synchronize_sched_expedited() takes its second and subsequent
snapshots of sync_sched_expedited_started, it subtracts 1.  This
means that the concurrent caller of synchronize_sched_expedited()
that incremented to that value sees our successful completion, it
will not be able to take advantage of it.  This restriction is
pointless, given that our full expedited grace period would have
happened after the other guy started, and thus should be able to
serve as a proxy for the other guy successfully executing

This commit therefore removes the subtraction of 1.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: Avoid RCU-preempt expedited grace-period botch
Paul E. McKenney [Wed, 21 Sep 2011 21:41:37 +0000]
rcu: Avoid RCU-preempt expedited grace-period botch

Because rcu_read_unlock_special() samples rcu_preempted_readers_exp(rnp)
after dropping rnp->lock, the following sequence of events is possible:

1. Task A exits its RCU read-side critical section, and removes
itself from the ->blkd_tasks list, releases rnp->lock, and is
then preempted.  Task B remains on the ->blkd_tasks list, and
blocks the current expedited grace period.

2. Task B exits from its RCU read-side critical section and removes
itself from the ->blkd_tasks list.  Because it is the last task
blocking the current expedited grace period, it ends that
expedited grace period.

3. Task A resumes, and samples rcu_preempted_readers_exp(rnp) which
of course indicates that nothing is blocking the nonexistent
expedited grace period. Task A is again preempted.

4. Some other CPU starts an expedited grace period.  There are several
tasks blocking this expedited grace period queued on the
same rcu_node structure that Task A was using in step 1 above.

5. Task A examines its state and incorrectly concludes that it was
the last task blocking the expedited grace period on the current
rcu_node structure.  It therefore reports completion up the
rcu_node tree.

6. The expedited grace period can then incorrectly complete before
the tasks blocked on this same rcu_node structure exit their
RCU read-side critical sections.  Arbitrarily bad things happen.

This commit therefore takes a snapshot of rcu_preempted_readers_exp(rnp)
prior to dropping the lock, so that only the last task thinks that it is
the last task, thus avoiding the failure scenario laid out above.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agorcu: ->signaled better named ->fqs_state
Paul E. McKenney [Sun, 11 Sep 2011 04:54:08 +0000]
rcu: ->signaled better named ->fqs_state

The ->signaled field was named before complications in the form of
dyntick-idle mode and offlined CPUs.  These complications have required
that force_quiescent_state() be implemented as a state machine, instead
of simply unconditionally sending reschedule IPIs.  Therefore, this
commit renames ->signaled to ->fqs_state to catch up with the new
force_quiescent_state() reality.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

8 years agoLinux 3.2-rc5
Linus Torvalds [Fri, 9 Dec 2011 23:09:32 +0000]
Linux 3.2-rc5

8 years agoMerge git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Fri, 9 Dec 2011 22:45:44 +0000]
Merge git://git.samba.org/sfrench/cifs-2.6

* git://git.samba.org/sfrench/cifs-2.6:
  cifs: check for NULL last_entry before calling cifs_save_resume_key
  cifs: attempt to freeze while looping on a receive attempt
  cifs: Fix sparse warning when calling cifs_strtoUCS
  CIFS: Add descriptions to the brlock cache functions

8 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 9 Dec 2011 22:45:12 +0000]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, efi: Calling __pa() with an ioremap()ed address is invalid
  x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked
  x86/intel_mid: Kconfig select fix
  x86/intel_mid: Fix the Kconfig for MID selection

8 years agoMerge branch 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6
Linus Torvalds [Fri, 9 Dec 2011 22:41:50 +0000]
Merge branch 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6

* 'spi/for-3.2' of git://git.pengutronix.de/git/wsa/linux-2.6:
  spi/gpio: fix section mismatch warning
  spi/fsl-espi: disable CONFIG_SPI_FSL_ESPI=m build
  spi/nuc900: Include linux/module.h
  spi/ath79: fix compile error due to missing include

8 years agoMerge branch 'for-linus' of git://neil.brown.name/md
Linus Torvalds [Fri, 9 Dec 2011 16:18:08 +0000]
Merge branch 'for-linus' of git://neil.brown.name/md

* 'for-linus' of git://neil.brown.name/md:
  md: raid5 crash during degradation
  md/raid5: never wait for bad-block acks on failed device.
  md: ensure new badblocks are handled promptly.
  md: bad blocks shouldn't cause a Blocked status on a Faulty device.
  md: take a reference to mddev during sysfs access.
  md: refine interpretation of "hold_active == UNTIL_IOCTL".
  md/lock: ensure updates to page_attrs are properly locked.

8 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Linus Torvalds [Fri, 9 Dec 2011 16:08:57 +0000]
Merge git://git./linux/kernel/git/cmetcalf/linux-tile

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  arch/tile: use new generic {enable,disable}_percpu_irq() routines
  drivers/net/ethernet/tile: use skb_frag_page() API
  asm-generic/unistd.h: support new process_vm_{readv,write} syscalls
  arch/tile: fix double-free bug in homecache_free_pages()
  arch/tile: add a few #includes and an EXPORT to catch up with kernel changes.

8 years agoMerge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro...
Linus Torvalds [Fri, 9 Dec 2011 16:08:14 +0000]
Merge branch 'iommu/fixes' of git://git./linux/kernel/git/joro/iommu

* 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  MAINTAINERS: Update amd-iommu F: patterns
  iommu/amd: Fix typo in kernel-parameters.txt
  iommu/msm: Fix compile error in mach-msm/devices-iommu.c
  Fix comparison using wrong pointer variable in dma debug code

8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Linus Torvalds [Fri, 9 Dec 2011 16:07:42 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda/realtek - Fix lost speaker volume controls
  ALSA: hda/realtek - Create "Bass Speaker" for two speaker pins
  ALSA: hda/realtek - Don't create extra controls with channel suffix
  ALSA: hda - Fix remaining VREF mute-LED NID check in post-3.1 changes
  ALSA: hda - Fix GPIO LED setup for IDT 92HD75 codecs
  ASoC: Provide a more complete DMA driver stub
  ASoC: Remove references to corgi and spitz from machine driver document
  ASoC: Make SND_SOC_MX27VIS_AIC32X4 depend on I2C
  ASoC: Fix dependency for SND_SOC_RAUMFELD and SND_PXA2XX_SOC_HX4700
  ASoC: uda1380: Return proper error in uda1380_modinit failure path
  ASoC: kirkwood: Make SND_KIRKWOOD_SOC_OPENRD and SND_KIRKWOOD_SOC_T5325 depend on I2C
  ASoC: Mark WM8994 ADC muxes as virtual
  ALSA: hda/realtek - Fix Oops in alc_mux_select()
  ALSA: sis7019 - give slow codecs more time to reset

8 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 9 Dec 2011 16:07:24 +0000]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Do no try to schedule task events if there are none
  lockdep, kmemcheck: Annotate ->lock in lockdep_init_map()
  perf header: Use event_name() to get an event name
  perf stat: Failure with "Operation not supported"

8 years agosys_getppid: add missing rcu_dereference
Mandeep Singh Baines [Thu, 8 Dec 2011 22:34:44 +0000]
sys_getppid: add missing rcu_dereference

In order to safely dereference current->real_parent inside an
rcu_read_lock, we need an rcu_dereference.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agorapidio/tsi721: modify PCIe capability settings
Alexandre Bounine [Thu, 8 Dec 2011 22:34:42 +0000]
rapidio/tsi721: modify PCIe capability settings

Modify initialization of PCIe capability registers in Tsi721 mport driver:
 - change Completion Timeout value to avoid unexpected data transfer
   aborts during intensive traffic.
 - replace hardcoded offset of PCIe capability block by making it use the
   common function.

This patch is applicable to kernel versions starting from 3.2-rc1.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agorapidio/tsi721: fix mailbox resource reporting
Alexandre Bounine [Thu, 8 Dec 2011 22:34:36 +0000]
rapidio/tsi721: fix mailbox resource reporting

Bug fix for Tsi721 RapidIO mport driver: Tsi721 supports four RapidIO
mailboxes (MBOX0 - MBOX3) as defined by RapidIO specification.  Mailbox
resources has to be properly reported to allow use of all available
mailboxes (initial version reports only MBOX0).

This patch is applicable to kernel versions staring from 3.2-rc1.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agorapidio/tsi721: switch to dma_zalloc_coherent
Alexandre Bounine [Thu, 8 Dec 2011 22:34:35 +0000]
rapidio/tsi721: switch to dma_zalloc_coherent

Replace the pair dma_alloc_coherent()+memset() with the new
dma_zalloc_coherent() added by Andrew Morton for kernel version 3.2

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agoprocfs: do not overflow get_{idle,iowait}_time for nohz
Michal Hocko [Thu, 8 Dec 2011 22:34:32 +0000]
procfs: do not overflow get_{idle,iowait}_time for nohz

Since commit a25cac5198d4 ("proc: Consider NO_HZ when printing idle and
iowait times") we are reporting idle/io_wait time also while a CPU is
tickless.  We rely on get_{idle,iowait}_time functions to retrieve
proper data.

These functions, however, use usecs_to_cputime to translate micro
seconds time to cputime64_t.  This is just an alias to usecs_to_jiffies
which reduces the data type from u64 to unsigned int and also checks
whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
and returns MAX_JIFFY_OFFSET in that case.

When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
>3000s! until we overflow unsigned int.  Just for reference
CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
CONFIG_HZ_1000 ~2s.

This results in a bug when people saw [h]top going mad reporting 100%
CPU usage even though there was basically no CPU load.  The reason was
simply that /proc/stat stopped reporting idle/io_wait changes (and
reported MAX_JIFFY_OFFSET) and so the only change happening was for user
system time.

Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
to 32b type and it is much more appropriate for cumulative time values
(unlike usecs_to_jiffies which intended for timeout calculations).

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Artem S. Tashkinov <t.artem@mailcity.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agomm: vmalloc: check for page allocation failure before vmlist insertion
Mel Gorman [Thu, 8 Dec 2011 22:34:30 +0000]
mm: vmalloc: check for page allocation failure before vmlist insertion

Commit f5252e00 ("mm: avoid null pointer access in vm_struct via
/proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after
it is fully initialised.  Unfortunately, it did not check that
__vmalloc_area_node() successfully populated the area.  In the event of
allocation failure, the vmalloc area is freed but the pointer to freed
memory is inserted into the vmlist leading to a a crash later in

This patch adds a check for ____vmalloc_area_node() failure within
__vmalloc_node_range.  It does not use "goto fail" as in the previous
error path as a warning was already displayed by __vmalloc_area_node()
before it called vfree in its failure path.

Credit goes to Luciano Chavez for doing all the real work of identifying
exactly where the problem was.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Luciano Chavez <lnx1138@linux.vnet.ibm.com>
Tested-by: Luciano Chavez <lnx1138@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [3.1.x+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agomm: Ensure that pfn_valid() is called once per pageblock when reserving pageblocks
Michal Hocko [Thu, 8 Dec 2011 22:34:27 +0000]
mm: Ensure that pfn_valid() is called once per pageblock when reserving pageblocks

setup_zone_migrate_reserve() expects that zone->start_pfn starts at
pageblock_nr_pages aligned pfn otherwise we could access beyond an
existing memblock resulting in the following panic if
CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid:

  IP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180
  *pdpt = 0000000000000000 *pde = f000ff53f000ff53
  Oops: 0000 [#1] SMP
  Pid: 1, comm: swapper Not tainted 3.0.7-0.7-pae #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
  EIP: 0060:[<c02d331d>] EFLAGS: 00010006 CPU: 0
  EIP is at setup_zone_migrate_reserve+0xcd/0x180
  EAX: 000c0000 EBX: f5801fc0 ECX: 000c0000 EDX: 00000000
  ESI: 000c01fe EDI: 000c01fe EBP: 00140000 ESP: f2475f58
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
  Process swapper (pid: 1, ti=f2474000 task=f2472cd0 task.ti=f2474000)
  Call Trace:
  [<c02d389c>] __setup_per_zone_wmarks+0xec/0x160
  [<c02d3a1f>] setup_per_zone_wmarks+0xf/0x20
  [<c08a771c>] init_per_zone_wmark_min+0x27/0x86
  [<c020111b>] do_one_initcall+0x2b/0x160
  [<c086639d>] kernel_init+0xbe/0x157
  [<c05cae26>] kernel_thread_helper+0x6/0xd
  Code: a5 39 f5 89 f7 0f 46 fd 39 cf 76 40 8b 03 f6 c4 08 74 32 eb 91 90 89 c8 c1 e8 0e 0f be 80 80 2f 86 c0 8b 14 85 60 2f 86 c0 89 c8 <2b> 82 b4 12 00 00 c1 e0 05 03 82 ac 12 00 00 8b 00 f6 c4 08 0f
  EIP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180 SS:ESP 0068:f2475f58
  CR2: 00000000000012b4

We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because
highstart_pfn = 0x36ffe.

The issue was introduced in 3.0-rc1 by 6d3163ce ("mm: check if any page
in a pageblock is reserved before marking it MIGRATE_RESERVE").

Make sure that start_pfn is always aligned to pageblock_nr_pages to
ensure that pfn_valid s always called at the start of each pageblock.
Architectures with holes in pageblocks will be correctly handled by
pfn_valid_within in pageblock_is_reserved.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Dang Bo <bdang@vmware.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Arve Hjnnevg <arve@android.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> [3.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agomm/migrate.c: pair unlock_page() and lock_page() when migrating huge pages
Hillf Danton [Thu, 8 Dec 2011 22:34:20 +0000]
mm/migrate.c: pair unlock_page() and lock_page() when migrating huge pages

Avoid unlocking and unlocked page if we failed to lock it.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agothp: set compound tail page _count to zero
Youquan Song [Thu, 8 Dec 2011 22:34:18 +0000]
thp: set compound tail page _count to zero

Commit 70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all
page_tail->_count zero at all times.  But the current kernel does not
set page_tail->_count to zero if a 1GB page is utilized.  So when an
IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
tail page's _count does not equal zero.

  kernel BUG at include/linux/mm.h:386!
  invalid opcode: 0000 [#1] SMP
  Call Trace:
    ? trace_hardirqs_off+0xd/0xf
  RIP  gup_huge_pud+0xf2/0x159

Signed-off-by: Youquan Song <youquan.song@intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agothp: add compound tail page _mapcount when mapped
Youquan Song [Thu, 8 Dec 2011 22:34:16 +0000]
thp: add compound tail page _mapcount when mapped

With the 3.2-rc kernel, IOMMU 2M pages in KVM works.  But when I tried
to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
failed to be used.

The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
page calls gup_huge_pmd.  If compound pages are used and the page is a
tail page, gup_huge_pmd() increases _mapcount to record tail page are
mapped while gup_huge_pud does not do that.

So when the mapped page is relesed, it will result in kernel oops
because the page is not marked mapped.

This patch add tail process for compound page in 1GB huge page which
keeps the same process as 2M page.

Reproduce like:
1. Add grub boot option: hugepagesz=1G hugepages=8
2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
-net none -device pci-assign,host=07:00.1

  kernel BUG at mm/swap.c:114!
  invalid opcode: 0000 [#1] SMP
  Call Trace:
  RIP  put_compound_page+0xd4/0x168

Signed-off-by: Youquan Song <youquan.song@intel.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agoprintk: avoid double lock acquire
Peter Zijlstra [Thu, 8 Dec 2011 22:34:13 +0000]
printk: avoid double lock acquire

Commit 4f2a8d3cf5e ("printk: Fix console_sem vs logbuf_lock unlock race")
introduced another silly bug where we would want to acquire an already
held lock.  Avoid this.

Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agomemcg: update maintainers
KAMEZAWA Hiroyuki [Thu, 8 Dec 2011 22:34:10 +0000]
memcg: update maintainers

More players joined to memory cgroup developments and Johannes' great work
changed internal design of memory cgroup dramatically.  And he will do
more works.  Michal Hokko did many bug fixes and know memory cgroup very
well.  Daisuke Nishimura helped us very much but he seems busy now.
Thanks to his works.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agodrivers/rtc/rtc-s3c.c: fix driver clock enable/disable balance issues
Jonghwan Choi [Thu, 8 Dec 2011 22:34:02 +0000]
drivers/rtc/rtc-s3c.c: fix driver clock enable/disable balance issues

If an error occurs after the clock is enabled, the enable/disable state
can become unbalanced.

Signed-off-by: Jonghwan Choi <jhbird.choi@samsung.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agoCREDITS: update Kees's expired fingerprint and fix details
Kees Cook [Thu, 8 Dec 2011 22:34:00 +0000]
CREDITS: update Kees's expired fingerprint and fix details

Small clean-up for my CREDITS entry; the GPG fingerprint was not up to
date, so I fixed other details at the same time too.

Signed-off-by: Kees Cook <kees@outflux.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agothp: reduce khugepaged freezing latency
Andrea Arcangeli [Thu, 8 Dec 2011 22:33:57 +0000]
thp: reduce khugepaged freezing latency

khugepaged can sometimes cause suspend to fail, requiring that the user
retry the suspend operation.

Use wait_event_freezable_timeout() instead of
schedule_timeout_interruptible() to avoid missing freezer wakeups.  A
try_to_freeze() would have been needed in the khugepaged_alloc_hugepage
tight loop too in case of the allocation failing repeatedly, and
wait_event_freezable_timeout will provide it too.

khugepaged would still freeze just fine by trying again the next minute
but it's better if it freezes immediately.

Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Jiri Slaby <jslaby@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: "Rafael J. Wysocki" <rjw@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agofs/proc/meminfo.c: fix compilation error
Claudio Scordino [Thu, 8 Dec 2011 22:33:56 +0000]
fs/proc/meminfo.c: fix compilation error

Fix the error message "directives may not be used inside a macro argument"
which appears when the kernel is compiled for the cris architecture.

Signed-off-by: Claudio Scordino <claudio@evidence.eu.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agovmscan: use atomic-long for shrinker batching
Konstantin Khlebnikov [Thu, 8 Dec 2011 22:33:54 +0000]
vmscan: use atomic-long for shrinker batching

Use atomic-long operations instead of looping around cmpxchg().

[akpm@linux-foundation.org: massage atomic.h inclusions]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agovmscan: fix initial shrinker size handling
Konstantin Khlebnikov [Thu, 8 Dec 2011 22:33:51 +0000]
vmscan: fix initial shrinker size handling

A shrinker function can return -1, means that it cannot do anything
without a risk of deadlock.  For example prune_super() does this if it
cannot grab a superblock refrence, even if nr_to_scan=0.  Currently we
interpret this -1 as a ULONG_MAX size shrinker and evaluate `total_scan'
according to this.  So the next time around this shrinker can cause
really big pressure.  Let's skip such shrinkers instead.

Also make total_scan signed, otherwise the check (total_scan < 0) below
never works.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agoMAINTAINERS: Update amd-iommu F: patterns
Joe Perches [Fri, 9 Dec 2011 04:21:40 +0000]
MAINTAINERS: Update amd-iommu F: patterns

Commit 29b68415e335 ("x86: amd_iommu: move to drivers/iommu/")
moved the files, update the patterns.

CC: Ohad Ben-Cohen <ohad@wizery.com>
CC: Joerg Roedel <joerg.roedel@amd.com>

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

8 years agox86, efi: Calling __pa() with an ioremap()ed address is invalid
Matt Fleming [Fri, 18 Nov 2011 13:09:11 +0000]
x86, efi: Calling __pa() with an ioremap()ed address is invalid

If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set
in ->attribute we currently call set_memory_uc(), which in turn
calls __pa() on a potentially ioremap'd address.

On CONFIG_X86_32 this is invalid, resulting in the following
oops on some machines:

  BUG: unable to handle kernel paging request at f7f22280
  IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210

  Call Trace:
   [<c104f8ca>] ? page_is_ram+0x1a/0x40
   [<c1025aff>] reserve_memtype+0xdf/0x2f0
   [<c1024dc9>] set_memory_uc+0x49/0xa0
   [<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa
   [<c19216d4>] start_kernel+0x291/0x2f2
   [<c19211c7>] ? loglevel+0x1b/0x1b
   [<c19210bf>] i386_start_kernel+0xbf/0xc8

A better approach to this problem is to map the memory region
with the correct attributes from the start, instead of modifying
it after the fact. The uncached case can be handled by
ioremap_nocache() and the cached by ioremap_cache().

Despite first impressions, it's not possible to use
ioremap_cache() to map all cached memory regions on
CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really
don't like being mapped into the vmalloc space, as detailed in
the following bug report,


Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA
regions are covered by the direct kernel mapping table on
CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI
regions via the direct kernel mapping with the initial call to
init_memory_mapping() in setup_arch(), whereas previously these
regions wouldn't be mapped if they were after the last E820_RAM
region until efi_ioremap() was called. Doing it this way allows
us to delete efi_ioremap() completely.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Huang Ying <huang.ying.caritas@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>

8 years agocifs: check for NULL last_entry before calling cifs_save_resume_key
Jeff Layton [Fri, 2 Dec 2011 01:23:34 +0000]
cifs: check for NULL last_entry before calling cifs_save_resume_key

Prior to commit eaf35b1, cifs_save_resume_key had some NULL pointer
checks at the top. It turns out that at least one of those NULL
pointer checks is needed after all.

When the LastNameOffset in a FIND reply appears to be beyond the end of
the buffer, CIFSFindFirst and CIFSFindNext will set srch_inf.last_entry
to NULL. Since eaf35b1, the code will now oops in this situation.

Fix this by having the callers check for a NULL last entry pointer
before calling cifs_save_resume_key. No change is needed for the
call site in cifs_readdir as it's not reachable with a NULL
current_entry pointer.

This should fix:


Cc: stable@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Reported-by: Adam G. Metzler <adamgmetzler@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>

8 years agocifs: attempt to freeze while looping on a receive attempt
Jeff Layton [Fri, 2 Dec 2011 01:22:41 +0000]
cifs: attempt to freeze while looping on a receive attempt

In the recent overhaul of the demultiplex thread receive path, I
neglected to ensure that we attempt to freeze on each pass through the
receive loop.

Reported-and-Tested-by: Woody Suwalski <terraluna977@gmail.com>
Reported-and-Tested-by: Adam Williamson <awilliam@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>

8 years agocifs: Fix sparse warning when calling cifs_strtoUCS
Steve French [Thu, 10 Nov 2011 18:48:20 +0000]
cifs: Fix sparse warning when calling cifs_strtoUCS

Fix sparse endian check warning while calling cifs_strtoUCS

CHECK   fs/cifs/smbencrypt.c
fs/cifs/smbencrypt.c:216:37: warning: incorrect type in argument 1
(different base types)
fs/cifs/smbencrypt.c:216:37:    expected restricted __le16 [usertype] *<noident>
fs/cifs/smbencrypt.c:216:37:    got unsigned short *<noident>

Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com

8 years agoCIFS: Add descriptions to the brlock cache functions
Pavel Shilovsky [Mon, 7 Nov 2011 13:11:24 +0000]
CIFS: Add descriptions to the brlock cache functions

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <sfrench@us.ibm.com>

8 years agomd: raid5 crash during degradation
Adam Kwolek [Fri, 9 Dec 2011 03:26:11 +0000]
md: raid5 crash during degradation

NULL pointer access causes crash in raid5 module.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

8 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 8 Dec 2011 21:21:28 +0000]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  alarmtimers: Fix time comparison
  ptp: Fix clock_getres() implementation

8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux...
Linus Torvalds [Thu, 8 Dec 2011 21:18:59 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: drop spin lock when memory alloc fails
  Btrfs: check if the to-be-added device is writable
  Btrfs: try cluster but don't advance in search list
  Btrfs: try to allocate from cluster even at LOOP_NO_EMPTY_SIZE

8 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Linus Torvalds [Thu, 8 Dec 2011 21:18:38 +0000]
Merge branch 'fixes' of git://git./linux/kernel/git/arm/arm-soc

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits)
  ARM: sa1100: fix build error
  ARM: OMAP1: recalculate loops per jiffy after dpll1 reprogram
  ARM: davinci: dm365 evm: align nand partition table to u-boot
  ARM: davinci: da850 evm: change audio edma event queue to EVENTQ_0
  ARM: davinci: dm646x evm: wrong register used in setup_vpif_input_channel_mode
  ARM: davinci: dm646x does not have a DSP domain
  ARM: davinci: psc: fix incorrect offsets
  ARM: davinci: psc: fix incorrect mask
  ARM: mx28: LRADC macro rename
  arm: mx23: recognise stmp378x as mx23
  ARM: mxs: fix machines' initializers order
  ARM: mxs/tx28: add __initconst for fec pdata
  ARM: S3C64XX: Staticise s3c6400_sysclass
  ARM: S3C64XX: Add linux/export.h to dev-spi.c
  ARM: S3C64XX: Remove extern from definition of framebuffer setup call
  MAINTAINERS: Extend Samsung patterns to cover SPI and ASoC drivers
  MAINTAINERS: Add linux-samsung-soc mailing list for Samsung
  ARM: CSR: PM: fix build error due to undeclared 'THIS_MODULE'
  ARM: CSR: fix build error due to new mdesc->dma_zone_size

8 years agoTOMOYO: Fix pathname handling of disconnected paths.
Tetsuo Handa [Thu, 8 Dec 2011 12:24:06 +0000]
TOMOYO: Fix pathname handling of disconnected paths.

Current tomoyo_realpath_from_path() implementation returns strange pathname
when calculating pathname of a file which belongs to lazy unmounted tree.
Use local pathname rather than strange absolute pathname in that case.

Also, this patch fixes a regression by commit 02125a82 "fix apparmor
dereferencing potentially freed dentry, sanitize __d_path() API".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agox86, hpet: Immediately disable HPET timer 1 if rtc irq is masked
Mark Langsdorf [Fri, 18 Nov 2011 15:33:06 +0000]
x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked

When HPET is operating in RTC mode, the TN_ENABLE bit on timer1
controls whether the HPET or the RTC delivers interrupts to irq8. When
the system goes into suspend, the RTC driver sends a signal to the
HPET driver so that the HPET releases control of irq8, allowing the
RTC to wake the system from suspend. The switchover is accomplished by
a write to the HPET configuration registers which currently only
occurs while servicing the HPET interrupt.

On some systems, I have seen the system suspend before an HPET
interrupt occurs, preventing the write to the HPET configuration
register and leaving the HPET in control of the irq8. As the HPET is
not active during suspend, it does not generate a wake signal and RTC
alarms do not work.

This patch forces the HPET driver to immediately transfer control of
the irq8 channel to the RTC instead of waiting until the next
interrupt event.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Link: http://lkml.kernel.org/r/20111118153306.GB16319@alberich.amd.com
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org

8 years agoMerge branch 'fixes' of git://github.com/hzhuang1/linux into fixes
Arnd Bergmann [Thu, 8 Dec 2011 15:52:23 +0000]
Merge branch 'fixes' of git://github.com/hzhuang1/linux into fixes

8 years agoBtrfs: drop spin lock when memory alloc fails
Liu Bo [Thu, 8 Dec 2011 01:08:40 +0000]
Btrfs: drop spin lock when memory alloc fails

Drop spin lock in convert_extent_bit() when memory alloc fails,
otherwise, it will be a deadlock.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

8 years agoBtrfs: check if the to-be-added device is writable
Li Zefan [Thu, 8 Dec 2011 01:08:40 +0000]
Btrfs: check if the to-be-added device is writable

If we call ioctl(BTRFS_IOC_ADD_DEV) directly, we'll succeed in adding
a readonly device to a btrfs filesystem, and btrfs will write to
that device, emitting kernel errors:

[ 3109.833692] lost page write due to I/O error on loop2
[ 3109.833720] lost page write due to I/O error on loop2

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

8 years agoBtrfs: try cluster but don't advance in search list
Alexandre Oliva [Thu, 8 Dec 2011 01:08:40 +0000]
Btrfs: try cluster but don't advance in search list

When we find an existing cluster, we switch to its block group as the
current block group, possibly skipping multiple blocks in the process.
Furthermore, under heavy contention, multiple threads may fail to
allocate from a cluster and then release just-created clusters just to
proceed to create new ones in a different block group.

This patch tries to allocate from an existing cluster regardless of its
block group, and doesn't switch to that group, instead proceeding to
try to allocate a cluster from the group it was iterating before the

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

8 years agoARM: sa1100: fix build error
Jett.Zhou [Wed, 30 Nov 2011 06:32:54 +0000]
ARM: sa1100: fix build error

arm-eabi-4.4.3-ld:--defsym zreladdr=: syntax error
make[2]: *** [arch/arm/boot/compressed/vmlinux] Error 1
make[1]: *** [arch/arm/boot/compressed/vmlinux] Error 2
make: *** [uImage] Error 2

Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Signed-off-by: Jett.Zhou <jtzhou@marvell.com>

8 years agomd/raid5: never wait for bad-block acks on failed device.
NeilBrown [Thu, 8 Dec 2011 05:27:57 +0000]
md/raid5: never wait for bad-block acks on failed device.

Once a device is failed we really want to completely ignore it.
It should go away soon anyway.

In particular the presence of bad blocks on it should not cause us to
block as we won't be trying to write there anyway.

So as soon as we can check if a device is Faulty, do so and pretend
that it is already gone if it is Faulty.

Signed-off-by: NeilBrown <neilb@suse.de>

8 years agomd: ensure new badblocks are handled promptly.
NeilBrown [Thu, 8 Dec 2011 05:26:08 +0000]
md: ensure new badblocks are handled promptly.

When we mark blocks as bad we need them to be acknowledged by the
metadata handler promptly.

For an in-kernel metadata handler that was already being done.  But
for an external metadata handler we need to alert it of the change by
sending a notification through the sysfs file.  This adds that

Signed-off-by: NeilBrown <neilb@suse.de>

8 years agomd: bad blocks shouldn't cause a Blocked status on a Faulty device.
NeilBrown [Thu, 8 Dec 2011 05:22:48 +0000]
md: bad blocks shouldn't cause a Blocked status on a Faulty device.

Once a device is marked Faulty the badblocks - whether acknowledged or
not - become irrelevant.  So they shouldn't cause the device to be
marked as Blocked.

Without this patch, a process might write "-blocked" to clear the
Blocked status, but while that will correctly fail the device, it
won't remove the apparent 'blocked' status.

Signed-off-by: NeilBrown <neilb@suse.de>

8 years agomd: take a reference to mddev during sysfs access.
NeilBrown [Thu, 8 Dec 2011 04:49:46 +0000]
md: take a reference to mddev during sysfs access.

When we are accessing an mddev via sysfs we know that the
mddev cannot disappear because it has an embedded kobj which
is refcounted by sysfs.
And we also take the mddev_lock.
However this is not enough.

The final mddev_put could have been called and the
mddev_delayed_delete is waiting for sysfs to let go so it can destroy
the kobj and mddev.
In this state there are a lot of changes that should not be attempted.

To to guard against this we:
 - initialise mddev->all_mddevs in on last put so the state can be
   easily detected.
 - in md_attr_show and md_attr_store, check ->all_mddevs under
   all_mddevs_lock and mddev_get the mddev if it still appears to
   be active.

This means that if we get to sysfs as the mddev is being deleted we
will get -EBUSY.

rdev_attr_store and rdev_attr_show are similar but already have
sufficient protection.  They check that rdev->mddev still points to
mddev after taking mddev_lock.  As this is cleared  before delayed
removal which can only be requested under the mddev_lock, this
ensure the rdev and mddev are still alive.

Signed-off-by: NeilBrown <neilb@suse.de>

8 years agomd: refine interpretation of "hold_active == UNTIL_IOCTL".
NeilBrown [Thu, 8 Dec 2011 04:49:12 +0000]
md: refine interpretation of "hold_active == UNTIL_IOCTL".

We like md devices to disappear when they really are not needed.
However it is not possible to tell from the current state whether it
is needed or not.  We can only tell from recent history of changes.

In particular immediately after we create an md device it looks very
similar to immediately after we have finished with it.

So we always preserve a newly created md device until something
significant happens.  This state is stored in 'hold_active'.

The normal case is to keep it until an ioctl happens, as that will
normally either activate it, or explicitly de-activate it.  If it
doesn't then it was probably created by mistake and it is now time to
get rid of it.

We can also modify an array via sysfs (instead of via ioctl) and we
currently treat any change via sysfs like an ioctl as a sign that if
it now isn't more active, it should be destroyed.
However this is not appropriate as changes made via sysfs are more
gradual so we should look for a more definitive change.

So this patch only clears 'hold_active' from UNTIL_IOCTL to clear when
the array_state is changed via sysfs.  Other changes via sysfs
are ignored.

Signed-off-by: NeilBrown <neilb@suse.de>

8 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux...
Olof Johansson [Thu, 8 Dec 2011 04:36:27 +0000]
Merge branch 'fixes' of git://git./linux/kernel/git/tmlind/linux-omap into fixes

8 years agoMerge branch '3.2-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab...
Linus Torvalds [Thu, 8 Dec 2011 02:18:27 +0000]
Merge branch '3.2-rc-fixes' of git://git./linux/kernel/git/nab/target-pending

* '3.2-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (25 commits)
  iscsi-target: Fix hex2bin warn_unused compile message
  target: Don't return an error if disabling unsupported features
  target/rd: fix or rewrite the copy routine
  target/rd: simplify the page/offset computation
  target: remove the unused se_dev_list
  target/file: walk properly over sg list
  target: remove unused struct fields
  target: Fix page length in emulated INQUIRY VPD page 86h
  target: Handle 0 correctly in transport_get_sectors_6()
  target: Don't return an error status for 0-length READ and WRITE
  iscsi-target: Use kmemdup rather than duplicating its implementation
  iscsi-target: Add missing F_BIT for iscsi_tm_rsp
  iscsi-target: Fix residual count hanlding + remove iscsi_cmd->residual_count
  target: Reject SCSI data overflow for fabrics using transport_generic_map_mem_to_cmd
  target: remove the unused t_task_pt_sgl and t_task_pt_sgl_num se_cmd fields
  target: remove the t_tasks_bidi se_cmd field
  target: remove the t_tasks_fua se_cmd field
  target: remove the se_ordered_node se_cmd field
  target: remove the se_obj_ptr and se_orig_obj_ptr se_cmd fields
  target: Drop config_item_name usage in fabric TFO->free_wwn()

8 years agoBtrfs: try to allocate from cluster even at LOOP_NO_EMPTY_SIZE
Alexandre Oliva [Thu, 8 Dec 2011 00:50:42 +0000]
Btrfs: try to allocate from cluster even at LOOP_NO_EMPTY_SIZE

If we reach LOOP_NO_EMPTY_SIZE, we won't even try to use a cluster that
others might have set up.  Odds are that there won't be one, but if
someone else succeeded in setting it up, we might as well use it, even
if we don't try to set up a cluster again.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

8 years agoMerge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
Linus Torvalds [Thu, 8 Dec 2011 00:13:54 +0000]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: fix the logspace waiting algorithm
  xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels
  xfs: fix allocation length overflow in xfs_bmapi_write()

8 years agoMerge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Thu, 8 Dec 2011 00:12:29 +0000]
Merge branch 'pm-fixes' of git://git./linux/kernel/git/rafael/linux-pm

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / Driver core: leave runtime PM enabled during system shutdown

8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux...
Linus Torvalds [Thu, 8 Dec 2011 00:12:14 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/geert/linux-m68k

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: Wire up process_vm_{read,write}v

8 years agoMerge branch 'perf/urgent' of git://github.com/acmel/linux into perf/urgent
Ingo Molnar [Wed, 7 Dec 2011 22:23:44 +0000]
Merge branch 'perf/urgent' of git://github.com/acmel/linux into perf/urgent

8 years agoPM / Driver core: leave runtime PM enabled during system shutdown
Alan Stern [Tue, 6 Dec 2011 22:24:52 +0000]
PM / Driver core: leave runtime PM enabled during system shutdown

Disabling all runtime PM during system shutdown turns out not to be a
good idea, because some devices may need to be woken up from a
low-power state at that time.

The whole point of disabling runtime PM for system shutdown was to
prevent untimely runtime-suspend method calls.  This patch (as1504)
accomplishes the same result by incrementing the usage count for each
device and waiting for ongoing runtime-PM callbacks to finish.  This
is what we already do during system suspend and hibernation, which
makes sense since the shutdown method is pretty much a legacy analog
of the pm->poweroff method.

This fixes a recent regression on some OMAP systems introduced by
commit af8db1508f2c9f3b6e633e2d2d906c6557c617f9 (PM / driver core:
disable device's runtime PM during shutdown).

Reported-and-tested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

8 years agospi/gpio: fix section mismatch warning
Manuel Lauss [Sat, 19 Nov 2011 12:08:10 +0000]
spi/gpio: fix section mismatch warning

The function __devinit spi_gpio_probe() references
a function __init spi_gpio_alloc.isra.4().
If spi_gpio_alloc.isra.4 is only used by spi_gpio_probe then
annotate spi_gpio_alloc.isra.4 with a matching annotation.

[wsa: fix spi_gpio_request(), too]

Signed-off-by: Manuel Lauss <manuel.lauss@googlemail.com>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>

8 years agospi/fsl-espi: disable CONFIG_SPI_FSL_ESPI=m build
Jiri Slaby [Wed, 7 Dec 2011 20:18:16 +0000]
spi/fsl-espi: disable CONFIG_SPI_FSL_ESPI=m build

When spi_fsl_espi is chosen to be built as a module, there is a build
error because we test only CONFIG_SPI_FSL_ESPI in declaration of
struct mpc8xxx_spi in drivers/spi/spi_fsl_lib.h. Also some called
functions are not exported.

So we forbid CONFIG_SPI_FSL_ESPI to be tristate here.

The error looks like:
drivers/spi/spi_fsl_espi.c: In function 'fsl_espi_bufs':
drivers/spi/spi_fsl_espi.c:232: error: 'struct mpc8xxx_spi' has no member named 'len'

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>

8 years agospi/nuc900: Include linux/module.h
Axel Lin [Thu, 24 Nov 2011 03:10:51 +0000]
spi/nuc900: Include linux/module.h

Include linux/module.h to fix below build error:

CC drivers/spi/spi-nuc900.o
drivers/spi/spi-nuc900.c:484: error: 'THIS_MODULE' undeclared here (not in a function)
drivers/spi/spi-nuc900.c:489: error: expected declaration specifiers or '...' before string constant
drivers/spi/spi-nuc900.c:489: warning: data definition has no type or storage class
drivers/spi/spi-nuc900.c:489: warning: type defaults to 'int' in declaration of 'MODULE_AUTHOR'
drivers/spi/spi-nuc900.c:489: warning: function declaration isn't a prototype
drivers/spi/spi-nuc900.c:490: error: expected declaration specifiers or '...' before string constant
drivers/spi/spi-nuc900.c:490: warning: data definition has no type or storage class
drivers/spi/spi-nuc900.c:490: warning: type defaults to 'int' in declaration of 'MODULE_DESCRIPTION'
drivers/spi/spi-nuc900.c:490: warning: function declaration isn't a prototype
drivers/spi/spi-nuc900.c:491: error: expected declaration specifiers or '...' before string constant
drivers/spi/spi-nuc900.c:491: warning: data definition has no type or storage class
drivers/spi/spi-nuc900.c:491: warning: type defaults to 'int' in declaration of 'MODULE_LICENSE'
drivers/spi/spi-nuc900.c:491: warning: function declaration isn't a prototype
drivers/spi/spi-nuc900.c:492: error: expected declaration specifiers or '...' before string constant
drivers/spi/spi-nuc900.c:492: warning: data definition has no type or storage class
drivers/spi/spi-nuc900.c:492: warning: type defaults to 'int' in declaration of 'MODULE_ALIAS'
drivers/spi/spi-nuc900.c:492: warning: function declaration isn't a prototype
make[2]: *** [drivers/spi/spi-nuc900.o] Error 1
make[1]: *** [drivers/spi] Error 2
make: *** [drivers] Error 2

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>

8 years agospi/ath79: fix compile error due to missing include
Gabor Juhos [Wed, 16 Nov 2011 19:01:43 +0000]
spi/ath79: fix compile error due to missing include

Whithout including 'linux/module.h' spi-ath79 driver fails to compile
with the these errors:

drivers/spi/spi-ath79.c:273:12: error: 'THIS_MODULE' undeclared here (not in a function)
drivers/spi/spi-ath79.c:278:20: error: expected declaration specifiers or '...' before string constant
drivers/spi/spi-ath79.c:278:1: warning: data definition has no type or storage class
drivers/spi/spi-ath79.c:278:1: warning: type defaults to 'int' in declaration of 'MODULE_DESCRIPTION'
drivers/spi/spi-ath79.c:278:20: warning: function declaration isn't a prototype
drivers/spi/spi-ath79.c:279:15: error: expected declaration specifiers or '...' before string constant
drivers/spi/spi-ath79.c:279:1: warning: data definition has no type or storage class
drivers/spi/spi-ath79.c:279:1: warning: type defaults to 'int' in declaration of 'MODULE_AUTHOR'
drivers/spi/spi-ath79.c:279:15: warning: function declaration isn't a prototype
drivers/spi/spi-ath79.c:280:16: error: expected declaration specifiers or '...' before string constant
drivers/spi/spi-ath79.c:280:1: warning: data definition has no type or storage class
drivers/spi/spi-ath79.c:280:1: warning: type defaults to 'int' in declaration of 'MODULE_LICENSE'
drivers/spi/spi-ath79.c:280:16: warning: function declaration isn't a prototype
drivers/spi/spi-ath79.c:281:14: error: expected declaration specifiers or '...' before string constant
drivers/spi/spi-ath79.c:281:1: warning: data definition has no type or storage class
drivers/spi/spi-ath79.c:281:1: warning: type defaults to 'int' in declaration of 'MODULE_ALIAS'
drivers/spi/spi-ath79.c:281:14: warning: function declaration isn't a prototype

Signed-off-by: Gabor Juhos <juhosg@openwrt.org>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>

8 years agoof/irq: Get rid of NO_IRQ usage
Anton Vorontsov [Tue, 6 Dec 2011 23:16:26 +0000]
of/irq: Get rid of NO_IRQ usage

PPC32/64 defines NO_IRQ to zero, so no problems expected.
ARM defines NO_IRQ to -1, but OF code relies on IRQ domains support,
which returns correct ('0') value in 'no irq' case. So everything
should be fine.

Other arches might break if some of their OF drivers rely on NO_IRQ
being not 0. If so, the drivers must be fixed, finally.

[ Rob Herring points out that microblaze should be fixed, and has posted
  a patch for testing for that.   - Linus ]

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8 years agoALSA: hda/realtek - Fix lost speaker volume controls
Takashi Iwai [Wed, 7 Dec 2011 16:20:30 +0000]
ALSA: hda/realtek - Fix lost speaker volume controls

When there are the same or more number of HP pins are available, HP pins
are used as the primary outputs instead of the speaker pins.  But, in
some cases (especially with ALC663 & co), some DACs are available only
with a later pin and it's assigned to a speaker, and since the driver
parses the pins from the lower NID, such a DAC was skipped eventually
without assignments.  This resulted in a regression, the missing speaker
volume control in the new parser.

As a workaround for this, now the driver retries the pin->DAC mapping
again after restoring the speaker-pins as primary.  This is still an ad
hoc fix, but it works so far for most of Realtek codecs.

Signed-off-by: Takashi Iwai <tiwai@suse.de>

8 years agoALSA: hda/realtek - Create "Bass Speaker" for two speaker pins
Takashi Iwai [Wed, 7 Dec 2011 16:14:20 +0000]
ALSA: hda/realtek - Create "Bass Speaker" for two speaker pins

On systems with two speaker pins, the secondary speaker pin is mostly
assigned to a bass speaker instead of a surround.  Thus it makes more
sense to rename the control properly.

Signed-off-by: Takashi Iwai <tiwai@suse.de>

8 years agoALSA: hda/realtek - Don't create extra controls with channel suffix
Takashi Iwai [Wed, 7 Dec 2011 15:55:19 +0000]
ALSA: hda/realtek - Don't create extra controls with channel suffix

The multiple headphone or speaker pins are usually provided to
output the same stream unlike line-out jacks (which are supposed
to be multi-channel surrounds).  Thus giving a mixer name like
"Headphone Surround" is rather confusing.  Instead, when multiple
headphone volumes are available, use index with the same "Headphone"

Signed-off-by: Takashi Iwai <tiwai@suse.de>

8 years agoMerge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Wed, 7 Dec 2011 16:20:01 +0000]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  vmwgfx: Use kcalloc instead of kzalloc to allocate array
  drm/i915: fix infinite recursion on unbind due to ilk vt-d w/a
  drm/radeon/kms: fix return type for radeon_encoder_get_dp_bridge_encoder_id

8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Wed, 7 Dec 2011 16:14:42 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API

8 years agoperf: Do no try to schedule task events if there are none
Gleb Natapov [Tue, 22 Nov 2011 14:08:21 +0000]
perf: Do no try to schedule task events if there are none

perf_event_sched_in() shouldn't try to schedule task events if there
are none otherwise task's ctx->is_active will be set and will not be
cleared during sched_out. This will prevent newly added events from
being scheduled into the task context.

Fixes a boo-boo in commit 1d5f003f5a9 ("perf: Do not set task_ctx
pointer in cpuctx if there are no events in the context").

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111122140821.GF2557@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>

8 years agovmwgfx: Use kcalloc instead of kzalloc to allocate array
Thomas Meyer [Tue, 29 Nov 2011 21:08:00 +0000]
vmwgfx: Use kcalloc instead of kzalloc to allocate array

The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.

The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

8 years agodrm/i915: fix infinite recursion on unbind due to ilk vt-d w/a
Daniel Vetter [Tue, 6 Dec 2011 11:12:33 +0000]
drm/i915: fix infinite recursion on unbind due to ilk vt-d w/a

The recursion loop goes retire_requests->unbind->gpu_idle->retire_reqeusts.

Every time we go through this we need a
- active object that can be retired
- and there are no other references to that object than the one from
  the active list, so that it gets unbound and freed immediately.
Otherwise the recursion stops. So the recursion is only limited by the
number of objects that fit these requirements sitting in the active list
any time retire_request is called.

Issue exercised by tests/gem_unref_active_buffers from i-g-t.

There's been a decent bikeshed discussion whether it wouldn't be
better to pass around a flag, but imo this is o.k. for such a limited
case that only supports a w/a.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42180

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson>
[ickle- we built better bikesheds, but this keeps the rain off for now]
Tested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

8 years agodrm/radeon/kms: fix return type for radeon_encoder_get_dp_bridge_encoder_id
Alex Deucher [Fri, 2 Dec 2011 23:15:27 +0000]
drm/radeon/kms: fix return type for radeon_encoder_get_dp_bridge_encoder_id

Seems like something got mis-merged here.

Noticed by kallisti5 on IRC.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

8 years agoMerge branch 'samsung-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene...
Arnd Bergmann [Wed, 7 Dec 2011 10:27:55 +0000]
Merge branch 'samsung-fixes-2' of git://git./linux/kernel/git/kgene/linux-samsung into fixes

8 years agofix apparmor dereferencing potentially freed dentry, sanitize __d_path() API
Al Viro [Mon, 5 Dec 2011 13:43:34 +0000]
fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API

__d_path() API is asking for trouble and in case of apparmor d_namespace_path()
getting just that.  The root cause is that when __d_path() misses the root
it had been told to look for, it stores the location of the most remote ancestor
in *root.  Without grabbing references.  Sure, at the moment of call it had
been pinned down by what we have in *path.  And if we raced with umount -l, we
could have very well stopped at vfsmount/dentry that got freed as soon as
prepend_path() dropped vfsmount_lock.

It is safe to compare these pointers with pre-existing (and known to be still
alive) vfsmount and dentry, as long as all we are asking is "is it the same
address?".  Dereferencing is not safe and apparmor ended up stepping into
that.  d_namespace_path() really wants to examine the place where we stopped,
even if it's not connected to our namespace.  As the result, it looked
at ->d_sb->s_magic of a dentry that might've been already freed by that point.
All other callers had been careful enough to avoid that, but it's really
a bad interface - it invites that kind of trouble.

The fix is fairly straightforward, even though it's bigger than I'd like:
* prepend_path() root argument becomes const.
* __d_path() is never called with NULL/NULL root.  It was a kludge
to start with.  Instead, we have an explicit function - d_absolute_root().
Same as __d_path(), except that it doesn't get root passed and stops where
it stops.  apparmor and tomoyo are using it.
* __d_path() returns NULL on path outside of root.  The main
caller is show_mountinfo() and that's precisely what we pass root for - to
skip those outside chroot jail.  Those who don't want that can (and do)
use d_path().
* __d_path() root argument becomes const.  Everyone agrees, I hope.
* apparmor does *NOT* try to use __d_path() or any of its variants
when it sees that path->mnt is an internal vfsmount.  In that case it's
definitely not mounted anywhere and dentry_path() is exactly what we want
there.  Handling of sysctl()-triggered weirdness is moved to that place.
* if apparmor is asked to do pathname relative to chroot jail
and __d_path() tells it we it's not in that jail, the sucker just calls
d_absolute_path() instead.  That's the other remaining caller of __d_path(),
        * seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway -
the normal seq_file logics will take care of growing the buffer and redoing
the call of ->show() just fine).  However, if it gets path not reachable
from root, it returns SEQ_SKIP.  The only caller adjusted (i.e. stopped
ignoring the return value as it used to do).

Reviewed-by: John Johansen <john.johansen@canonical.com>
ACKed-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org