9 years agorcu: Add expedited grace-period support for preemptible RCU
Paul E. McKenney [Wed, 2 Dec 2009 20:10:15 +0000]
rcu: Add expedited grace-period support for preemptible RCU

Implement an synchronize_rcu_expedited() for preemptible RCU
that actually is expedited.  This uses
synchronize_sched_expedited() to force all threads currently
running in a preemptible-RCU read-side critical section onto the
appropriate ->blocked_tasks[] list, then takes a snapshot of all
of these lists and waits for them to drain.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1259784616158-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Enable fourth level of TREE_RCU hierarchy
Paul E. McKenney [Wed, 2 Dec 2009 20:10:14 +0000]
rcu: Enable fourth level of TREE_RCU hierarchy

Enable a fourth level of rcu_node hierarchy for TREE_RCU and
TREE_PREEMPT_RCU.  This is for stress-testing and experiemental
purposes only, although in theory this would enable 16,777,216
CPUs on 64-bit systems, though only 1,048,576 CPUs on 32-bit
systems. Normal experimental use of this fourth level will
normally set CONFIG_RCU_FANOUT=2, requiring a 16-CPU system,
though the more adventurous (and more fortunate) experimenters
may wish to chose CONFIG_RCU_FANOUT=3 for 81-CPU systems or even
CONFIG_RCU_FANOUT=4 for 256-CPU systems.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12597846161257-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Rename "quiet" functions
Paul E. McKenney [Wed, 2 Dec 2009 20:10:13 +0000]
rcu: Rename "quiet" functions

The number of "quiet" functions has grown recently, and the
names are no longer very descriptive.  The point of all of these
functions is to do some portion of the task of reporting a
quiescent state, so rename them accordingly:

o cpu_quiet() becomes rcu_report_qs_rdp(), which reports a
quiescent state to the per-CPU rcu_data structure.  If this
turns out to be a new quiescent state for this grace period,
then rcu_report_qs_rnp() will be invoked to propagate the
quiescent state up the rcu_node hierarchy.

o cpu_quiet_msk() becomes rcu_report_qs_rnp(), which reports
a quiescent state for a given CPU (or possibly a set of CPUs)
up the rcu_node hierarchy.

o cpu_quiet_msk_finish() becomes rcu_report_qs_rsp(), which
reports a full set of quiescent states to the global rcu_state
structure.

o task_quiet() becomes rcu_report_unblock_qs_rnp(), which reports
a quiescent state due to a task exiting an RCU read-side critical
section that had previously blocked in that same critical section.
As indicated by the new name, this type of quiescent state is
reported up the rcu_node hierarchy (using rcu_report_qs_rnp()
to do so).

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12597846163698-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Re-arrange code to reduce #ifdef pain
Paul E. McKenney [Sun, 22 Nov 2009 16:53:50 +0000]
rcu: Re-arrange code to reduce #ifdef pain

Remove #ifdefs from kernel/rcupdate.c and
include/linux/rcupdate.h by moving code to
include/linux/rcutiny.h, include/linux/rcutree.h, and
kernel/rcutree.c.

Also remove some definitions that are no longer used.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1258908830885-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Eliminate unneeded function wrapping
Paul E. McKenney [Sun, 22 Nov 2009 16:53:49 +0000]
rcu: Eliminate unneeded function wrapping

The functions rcu_init() is a wrapper for __rcu_init(), and also
sets up the CPU-hotplug notifier for rcu_barrier_cpu_hotplug().
But TINY_RCU doesn't need CPU-hotplug notification, and the
rcu_barrier_cpu_hotplug() is a simple wrapper for
rcu_cpu_notify().

So push rcu_init() out to kernel/rcutree.c and kernel/rcutiny.c
and get rid of the wrapper function rcu_barrier_cpu_hotplug().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12589088302320-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Fix grace-period-stall bug on large systems with CPU hotplug
Paul E. McKenney [Sun, 22 Nov 2009 16:53:48 +0000]
rcu: Fix grace-period-stall bug on large systems with CPU hotplug

When the last CPU of a given leaf rcu_node structure goes
offline, all of the tasks queued on that leaf rcu_node structure
(due to having blocked in their current RCU read-side critical
sections) are requeued onto the root rcu_node structure.  This
requeuing is carried out by rcu_preempt_offline_tasks().
However, it is possible that these queued tasks are the only
thing preventing the leaf rcu_node structure from reporting a
quiescent state up the rcu_node hierarchy.  Unfortunately, the
old code would fail to do this reporting, resulting in a
grace-period stall given the following sequence of events:

1. Kernel built for more than 32 CPUs on 32-bit systems or for more
than 64 CPUs on 64-bit systems, so that there is more than one
rcu_node structure.  (Or CONFIG_RCU_FANOUT is artificially set
to a number smaller than CONFIG_NR_CPUS.)

2. The kernel is built with CONFIG_TREE_PREEMPT_RCU.

3. A task running on a CPU associated with a given leaf rcu_node
structure blocks while in an RCU read-side critical section
-and- that CPU has not yet passed through a quiescent state
for the current RCU grace period.  This will cause the task
to be queued on the leaf rcu_node's blocked_tasks[] array, in
particular, on the element of this array corresponding to the
current grace period.

4. Each of the remaining CPUs corresponding to this same leaf rcu_node
structure pass through a quiescent state.  However, the task is
still in its RCU read-side critical section, so these quiescent
states cannot be reported further up the rcu_node hierarchy.
Nevertheless, all bits in the leaf rcu_node structure's ->qsmask
field are now zero.

5. Each of the remaining CPUs go offline.  (The events in step
#4 and #5 can happen in any order as long as each CPU passes
through a quiescent state before going offline.)

6. When the last CPU goes offline, __rcu_offline_cpu() will invoke
rcu_preempt_offline_tasks(), which will move the task to the
root rcu_node structure, but without reporting a quiescent state
up the rcu_node hierarchy (and this failure to report a quiescent
state is the bug).

But because this leaf rcu_node structure's ->qsmask field is
already zero and its ->block_tasks[] entries are all empty,
force_quiescent_state() will skip this rcu_node structure.

Therefore, grace periods are now hung.

This patch abstracts some code out of rcu_read_unlock_special(),
calling the result task_quiet() by analogy with cpu_quiet(), and
invokes task_quiet() from both rcu_read_lock_special() and
__rcu_offline_cpu().  Invoking task_quiet() from
__rcu_offline_cpu() reports the quiescent state up the rcu_node
hierarchy, fixing the bug.  This ends up requiring a separate
lock_class_key per level of the rcu_node hierarchy, which this
patch also provides.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12589088301770-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Eliminate __rcu_pending() false positives
Paul E. McKenney [Sat, 14 Nov 2009 03:51:39 +0000]
rcu: Eliminate __rcu_pending() false positives

Now that there are both ->gpnum and ->completed fields in the
rcu_node structure, __rcu_pending() should check rdp->gpnum and
rdp->completed against rnp->gpnum and rdp->completed, respectively,
instead of the prior comparison against the rcu_state fields
rsp->gpnum and rsp->completed.

Given the old comparison, __rcu_pending() could return 1, resulting
in a needless raise_softirq(RCU_SOFTIRQ).  This useless work would
happen if RCU responded to a scheduling-clock interrupt after the
rcu_state fields had been updated, but before the rcu_node fields
had been updated.

Changing the comparison from the rcu_state fields to the rcu_node
fields prevents this useless work from happening.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12581706991966-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Further cleanups of use of lastcomp
Paul E. McKenney [Sat, 14 Nov 2009 03:51:38 +0000]
rcu: Further cleanups of use of lastcomp

Now that a copy of the rsp->completed flag is available in all
rcu_node structures, make full use of it.  It is still
legitimate to access rsp->completed while holding the root
rcu_node structure's lock, however.

Also, tighten up force_quiescent_state()'s checks for end of
current grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1258170699933-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Simplify association of forced quiescent states with grace periods
Paul E. McKenney [Fri, 13 Nov 2009 06:35:04 +0000]
rcu: Simplify association of forced quiescent states with grace periods

The force_quiescent_state() function also took a snapshot
of the ->completed field, which was as obnoxious as it was in
rcu_sched_qs() and friends.  So snapshot ->gpnum-1.

Also, since the dyntick_record_completed() and
dyntick_recall_completed() functions are now simple assignments
that are independent of CONFIG_NO_HZ, and since their names are
now misleading, get rid of them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12580941042308-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Accelerate callback processing on CPUs not detecting GP end
Paul E. McKenney [Fri, 13 Nov 2009 06:35:03 +0000]
rcu: Accelerate callback processing on CPUs not detecting GP end

An earlier fix for a race resulted in a situation where the CPUs
other than the CPU that detected the end of the grace period would
not process their callbacks until the next grace period started.

This means that these other CPUs would unnecessarily demand that an
extra grace period be started.

This patch eliminates this extra grace period and speeds callback
processing by propagating rsp->completed to the rcu_node structures
in the case where the CPU detecting the end of the grace period
sees no reason to start a new grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1258094104417-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Mark init-time-only rcu_bootup_announce() as __init
Paul E. McKenney [Wed, 11 Nov 2009 19:28:06 +0000]
rcu: Mark init-time-only rcu_bootup_announce() as __init

Because rcu_bootup_announce() is used only at boot time, mark it
as __init, presumably so that its memory can be reclaimed.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <20091111192806.GA10073@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Simplify association of quiescent states with grace periods
Paul E. McKenney [Tue, 10 Nov 2009 21:37:22 +0000]
rcu: Simplify association of quiescent states with grace periods

The rdp->passed_quiesc_completed fields are used to properly
associate the recorded quiescent state with a grace period.  It
is OK to wrongly associate a given quiescent state with a
preceding grace period, but it is fatal to associate a given
quiescent state with a grace period that begins after the
quiescent state occurred.  Grace periods are numbered, and the
following fields track them:

o ->gpnum is the number of the grace period currently in
progress, or the number of the last grace period to
complete if no grace period is currently in progress.

o ->completed is the number of the last grace period to
have completed.

These two fields are equal if there is no grace period in
progress, otherwise ->gpnum is one greater than ->completed.
But the rdp->passed_quiesc_completed field compared against
->completed, and if equal, the quiescent state is presumed to
count against the current grace period.

The earlier code copied rdp->completed to
rdp->passed_quiesc_completed, which has been made to work, but
is error-prone.  In contrast, copying one less than rdp->gpnum
is guaranteed safe, because rdp->gpnum is not incremented until
after the start of the corresponding grace period. At the end of
the grace period, when ->completed has incremented, then any
quiescent periods recorded previously will be discarded.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12578890421011-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Rename dynticks_completed to completed_fqs
Paul E. McKenney [Tue, 10 Nov 2009 21:37:21 +0000]
rcu: Rename dynticks_completed to completed_fqs

This field is used whether or not CONFIG_NO_HZ is set, so the
old name of ->dynticks_completed is quite misleading.

Change to ->completed_fqs, given that it the value that
force_quiescent_state() is trying to drive the ->completed field
away from.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12578890423298-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Enable synchronize_sched_expedited() fastpath
Paul E. McKenney [Tue, 10 Nov 2009 21:37:20 +0000]
rcu: Enable synchronize_sched_expedited() fastpath

This patch adds a counter increment to enable tasks to actually
take the synchronize_sched_expedited() function's fastpath.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1257889042435-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Remove inline from forward-referenced functions
Paul E. McKenney [Tue, 10 Nov 2009 21:37:19 +0000]
rcu: Remove inline from forward-referenced functions

Some variants of gcc are reputed to dislike forward references
to functions declared "inline".  Remove the "inline" keyword
from such functions.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12578890422402-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Fix note_new_gpnum() uses of ->gpnum
Paul E. McKenney [Mon, 2 Nov 2009 21:52:29 +0000]
rcu: Fix note_new_gpnum() uses of ->gpnum

Impose a clear locking design on the note_new_gpnum()
function's use of the ->gpnum counter.  This is done by updating
rdp->gpnum only from the corresponding leaf rcu_node structure's
rnp->gpnum field, and even then only under the protection of
that same rcu_node structure's ->lock field.  Performance and
scalability are maintained using a form of double-checked
locking, and excessive spinning is avoided by use of the
spin_trylock() function.  The use of spin_trylock() is safe due
to the fact that CPUs who fail to acquire this lock will try
again later. The hierarchical nature of the rcu_node data
structure limits contention (which could be limited further if
need be using the RCU_FANOUT kernel parameter).

Without this patch, obscure but quite possible races could
result in a quiescent state that occurred during one grace
period to be accounted to the following grace period, causing
this following grace period to end prematurely.  Not good!

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: <stable@kernel.org> # .32.x
LKML-Reference: <12571987492350-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter
Paul E. McKenney [Mon, 2 Nov 2009 21:52:28 +0000]
rcu: Fix synchronization for rcu_process_gp_end() uses of ->completed counter

Impose a clear locking design on the rcu_process_gp_end()
function's use of the ->completed counter.  This is done by
creating a ->completed field in the rcu_node structure, which
can safely be accessed under the protection of that structure's
lock.  Performance and scalability are maintained by using a
form of double-checked locking, so that rcu_process_gp_end()
only acquires the leaf rcu_node structure's ->lock if a grace
period has recently ended.

This fix reduces rcutorture failure rate by at least two orders
of magnitude under heavy stress with force_quiescent_state()
being invoked artificially often.  Without this fix,
unsynchronized access to the ->completed field can cause
rcu_process_gp_end() to advance callbacks whose grace period has
not yet expired.  (Bad idea!)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: <stable@kernel.org> # .32.x
LKML-Reference: <12571987494069-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->complete...
Paul E. McKenney [Mon, 2 Nov 2009 21:52:27 +0000]
rcu: Prepare for synchronization fixes: clean up for non-NO_HZ handling of ->completed counter

Impose a clear locking design on non-NO_HZ handling of the
->completed counter.  This increases the distance between the
RCU and the CPU-hotplug mechanisms.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: <stable@kernel.org> # .32.x
LKML-Reference: <12571987491353-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agoMerge branch 'core/urgent' into core/rcu
Ingo Molnar [Tue, 10 Nov 2009 03:10:31 +0000]
Merge branch 'core/urgent' into core/rcu

Merge reason: Pick up RCU fixlet to base further commits on.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls
Lai Jiangshan [Wed, 28 Oct 2009 15:14:48 +0000]
rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls

Currently, rcu_irq_exit() is invoked only for CONFIG_NO_HZ,
while rcu_irq_enter() is invoked unconditionally.  This patch
moves rcu_irq_exit() out from under CONFIG_NO_HZ so that the
calls are balanced.

This patch has no effect on the behavior of the kernel because
both rcu_irq_enter() and rcu_irq_exit() are empty for
!CONFIG_NO_HZ, but the code is easier to understand if the calls
are obviously balanced in all cases.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12567428891605-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Fix long-grace-period race between forcing and initialization
Paul E. McKenney [Wed, 28 Oct 2009 15:14:49 +0000]
rcu: Fix long-grace-period race between forcing and initialization

Very long RCU read-side critical sections (50 milliseconds or
so) can cause a race between force_quiescent_state() and
rcu_start_gp() as follows on kernel builds with multi-level
rcu_node hierarchies:

1. CPU 0 calls force_quiescent_state(), sees that there is a
grace period in progress, and acquires ->fsqlock.

2. CPU 1 detects the end of the grace period, and so
cpu_quiet_msk_finish() sets rsp->completed to rsp->gpnum.
This operation is carried out under the root rnp->lock,
but CPU 0 has not yet acquired that lock.  Note that
rsp->signaled is still RCU_SAVE_DYNTICK from the last
grace period.

3. CPU 1 calls rcu_start_gp(), but no one wants a new grace
period, so it drops the root rnp->lock and returns.

4. CPU 0 acquires the root rnp->lock and picks up rsp->completed
and rsp->signaled, then drops rnp->lock.  It then enters the
RCU_SAVE_DYNTICK leg of the switch statement.

5. CPU 2 invokes call_rcu(), and now needs a new grace period.
It calls rcu_start_gp(), which acquires the root rnp->lock, sets
rsp->signaled to RCU_GP_INIT (too bad that CPU 0 is already in
the RCU_SAVE_DYNTICK leg of the switch statement!)  and starts
initializing the rcu_node hierarchy.  If there are multiple
levels to the hierarchy, it will drop the root rnp->lock and
initialize the lower levels of the hierarchy.

6. CPU 0 notes that rsp->completed has not changed, which permits
        both CPU 2 and CPU 0 to try updating it concurrently.  If CPU 0's
update prevails, later calls to force_quiescent_state() can
count old quiescent states against the new grace period, which
can in turn result in premature ending of grace periods.

Not good.

This patch adds an RCU_GP_IDLE state for rsp->signaled that is
set initially at boot time and any time a grace period ends.
This prevents CPU 0 from getting into the workings of
force_quiescent_state() in step 4.  Additional locking and
checks prevent the concurrent update of rsp->signaled in step 6.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1256742889199-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agouids: Prevent tear down race
Thomas Gleixner [Mon, 2 Nov 2009 12:01:56 +0000]
uids: Prevent tear down race

Ingo triggered the following warning:

WARNING: at lib/debugobjects.c:255 debug_print_object+0x42/0x50()
Hardware name: System Product Name
ODEBUG: init active object type: timer_list
Modules linked in:
Pid: 2619, comm: dmesg Tainted: G        W  2.6.32-rc5-tip+ #5298
Call Trace:
 [<81035443>] warn_slowpath_common+0x6a/0x81
 [<8120e483>] ? debug_print_object+0x42/0x50
 [<81035498>] warn_slowpath_fmt+0x29/0x2c
 [<8120e483>] debug_print_object+0x42/0x50
 [<8120ec2a>] __debug_object_init+0x279/0x2d7
 [<8120ecb3>] debug_object_init+0x13/0x18
 [<810409d2>] init_timer_key+0x17/0x6f
 [<81041526>] free_uid+0x50/0x6c
 [<8104ed2d>] put_cred_rcu+0x61/0x72
 [<81067fac>] rcu_do_batch+0x70/0x121

debugobjects warns about an enqueued timer being initialized. If
CONFIG_USER_SCHED=y the user management code uses delayed work to
remove the user from the hash table and tear down the sysfs objects.

free_uid is called from RCU and initializes/schedules delayed work if
the usage count of the user_struct is 0. The init/schedule happens
outside of the uidhash_lock protected region which allows a concurrent
caller of find_user() to reference the about to be destroyed
user_struct w/o preventing the work from being scheduled. If the next
free_uid call happens before the work timer expired then the active
timer is initialized and the work scheduled again.

The race was introduced in commit 5cb350ba (sched: group scheduling,
sysfs tunables) and made more prominent by commit 3959214f (sched:
delayed cleanup of user_struct)

Move the init/schedule_delayed_work inside of the uidhash_lock
protected region to prevent the race.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: stable@kernel.org

9 years agofutex: Fix spurious wakeup for requeue_pi really
Thomas Gleixner [Wed, 28 Oct 2009 19:26:48 +0000]
futex: Fix spurious wakeup for requeue_pi really

The requeue_pi path doesn't use unqueue_me() (and the racy lock_ptr ==
NULL test) nor does it use the wake_list of futex_wake() which where
the reason for commit 41890f2 (futex: Handle spurious wake up)

See debugging discussing on LKML Message-ID: <4AD4080C.20703@us.ibm.com>

The changes in this fix to the wait_requeue_pi path were considered to
be a likely unecessary, but harmless safety net. But it turns out that
due to the fact that for unknown $@#!*( reasons EWOULDBLOCK is defined
as EAGAIN we built an endless loop in the code path which returns
correctly EWOULDBLOCK.

Spurious wakeups in wait_requeue_pi code path are unlikely so we do
the easy solution and return EWOULDBLOCK^WEAGAIN to user space and let
it deal with the spurious wakeup.

Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
LKML-Reference: <4AE23C74.1090502@us.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

9 years agorcu: Fix TINY_RCU #elif condition
Paul E. McKenney [Mon, 26 Oct 2009 20:57:44 +0000]
rcu: Fix TINY_RCU #elif condition

Some compilers are happy with "#elif CONFIG_RCU_TINY", while
others strongly prefer "#elif defined(CONFIG_RCU_TINY)".  Change
to the latter to make more compilers happy.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12565906642768-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Simplify creating of lockdep class for root rcu_node
Peter Zijlstra [Mon, 26 Oct 2009 17:24:31 +0000]
rcu: Simplify creating of lockdep class for root rcu_node

Use lockdep_set_class() to simplify the code and to avoid any
additional overhead in the !LOCKDEP case.  Also move the
definition of rcu_root_class into kernel/rcutree.c, as suggested
by Lai Jiangshan.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1256577871443-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Do tiny cleanups in rcutiny
Ingo Molnar [Mon, 26 Oct 2009 06:55:55 +0000]
rcu: Do tiny cleanups in rcutiny

No change in functionality - just straighten out a few small
stylistic details.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
LKML-Reference: <12565226351355-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Improve rcutorture diagnostics when bad torture_type specified
Paul E. McKenney [Mon, 26 Oct 2009 02:03:54 +0000]
rcu: Improve rcutorture diagnostics when bad torture_type specified

Make rcutorture list the available torture_type values when it
doesn't like the one specified.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
LKML-Reference: <12565226351868-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Add synchronize_srcu_expedited() to the documentation
Paul E. McKenney [Mon, 26 Oct 2009 02:03:53 +0000]
rcu: Add synchronize_srcu_expedited() to the documentation

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
LKML-Reference: <12565226354176-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Add synchronize_srcu_expedited() to the rcutorture test suite
Paul E. McKenney [Mon, 26 Oct 2009 02:03:52 +0000]
rcu: Add synchronize_srcu_expedited() to the rcutorture test suite

Adds the "srcu_expedited" torture type, and also renames
sched_ops_sync to sched_sync_ops for consistency while we are in
this file.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
LKML-Reference: <12565226353636-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Add synchronize_srcu_expedited()
Paul E. McKenney [Mon, 26 Oct 2009 02:03:51 +0000]
rcu: Add synchronize_srcu_expedited()

This patch creates a synchronize_srcu_expedited() that uses
synchronize_sched_expedited() where synchronize_srcu()
uses synchronize_sched().  The synchronize_srcu() and
synchronize_srcu_expedited() functions become one-liners that
pass synchronize_sched() or synchronize_sched_expedited(),
repectively, to a new __synchronize_srcu() function.

While in the file, move the EXPORT_SYMBOL_GPL()s to immediately
follow the corresponding functions.

Requested-by: Avi Kivity <avi@redhat.com>
Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: avi@redhat.com
LKML-Reference: <12565226354038-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: "Tiny RCU", The Bloatwatch Edition
Paul E. McKenney [Mon, 26 Oct 2009 02:03:50 +0000]
rcu: "Tiny RCU", The Bloatwatch Edition

This patch is a version of RCU designed for !SMP provided for a
small-footprint RCU implementation.  In particular, the
implementation of synchronize_rcu() is extremely lightweight and
high performance. It passes rcutorture testing in each of the
four relevant configurations (combinations of NO_HZ and PREEMPT)
on x86.  This saves about 1K bytes compared to old Classic RCU
(which is no longer in mainline), and more than three kilobytes
compared to Hierarchical RCU (updated to 2.6.30):

CONFIG_TREE_RCU:

   text    data     bss     dec     filename
    183       4       0     187     kernel/rcupdate.o
   2783     520      36    3339     kernel/rcutree.o
   3526 Total (vs 4565 for v7)

CONFIG_TREE_PREEMPT_RCU:

   text    data     bss     dec     filename
    263       4       0     267     kernel/rcupdate.o
   4594     776      52    5422     kernel/rcutree.o
       5689 Total (6155 for v7)

CONFIG_TINY_RCU:

   text    data     bss     dec     filename
     96       4       0     100     kernel/rcupdate.o
    734      24       0     758     kernel/rcutiny.o
         858 Total (vs 848 for v7)

The above is for x86.  Your mileage may vary on other platforms.
Further compression is possible, but is being procrastinated.

Changes from v7 (http://lkml.org/lkml/2009/10/9/388)

o Apply Lai Jiangshan's review comments (aside from
might_sleep()  in synchronize_sched(), which is covered by SMP builds).

o Fix up expedited primitives.

Changes from v6 (http://lkml.org/lkml/2009/9/23/293).

o Forward ported to put it into the 2.6.33 stream.

o Added lockdep support.

o Make lightweight rcu_barrier.

Changes from v5 (http://lkml.org/lkml/2009/6/23/12).

o Ported to latest pre-2.6.32 merge window kernel.

- Renamed rcu_qsctr_inc() to rcu_sched_qs().
- Renamed rcu_bh_qsctr_inc() to rcu_bh_qs().
- Provided trivial rcu_cpu_notify().
- Provided trivial exit_rcu().
- Provided trivial rcu_needs_cpu().
- Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h.

o Removed the dependence on EMBEDDED, with a view to making
TINY_RCU default for !SMP at some time in the future.

o Added (trivial) support for expedited grace periods.

Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include:

o Squeeze the size down a bit further by removing the
->completed field from struct rcu_ctrlblk.

o This permits synchronize_rcu() to become the empty function.
Previous concerns about rcutorture were unfounded, as
rcutorture correctly handles a constant value from
rcu_batches_completed() and rcu_batches_completed_bh().

Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include:

o Changed rcu_batches_completed(), rcu_batches_completed_bh()
rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and
rcu_nmi_exit(), to be static inlines, as suggested by David
Howells.  Doing this saves about 100 bytes from rcutiny.o.
(The numbers between v3 and this v4 of the patch are not directly
comparable, since they are against different versions of Linux.)

Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include:

o Fix whitespace issues.

o Change short-circuit "||" operator to instead be "+" in order
to  fix performance bug noted by "kraai" on LWN.

(http://lwn.net/Articles/324348/)

Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include:

o This version depends on EMBEDDED as well as !SMP, as suggested
by Ingo.

o Updated rcu_needs_cpu() to unconditionally return zero,
permitting the CPU to enter dynticks-idle mode at any time.
This works because callbacks can be invoked upon entry to
dynticks-idle mode.

o Paul is now OK with this being included, based on a poll at
the  Kernel Miniconf at linux.conf.au, where about ten people said
that they cared about saving 900 bytes on single-CPU systems.

o Applies to both mainline and tip/core/rcu.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: avi@redhat.com
Cc: mtosatti@redhat.com
LKML-Reference: <12565226351355-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agofutex: Move drop_futex_key_refs out of spinlock'ed region
Darren Hart [Thu, 15 Oct 2009 22:30:48 +0000]
futex: Move drop_futex_key_refs out of spinlock'ed region

When requeuing tasks from one futex to another, the reference held
by the requeued task to the original futex location needs to be
dropped eventually.

Dropping the reference may ultimately lead to a call to
"iput_final" and subsequently call into filesystem- specific code -
which may be non-atomic.

It is therefore safer to defer this drop operation until after the
futex_hash_bucket spinlock has been dropped.

Originally-From: Helge Bahmann <hcb@chaoticmind.net>
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: <stable@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@linux.vnet.ibm.com>
Cc: Sven-Thorsten Dietrich <sdietrich@novell.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <4AD7A298.5040802@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang
Paul E. McKenney [Thu, 15 Oct 2009 16:26:14 +0000]
rcu: Fix TREE_PREEMPT_RCU CPU_HOTPLUG bad-luck hang

If the following sequence of events occurs, then
TREE_PREEMPT_RCU will hang waiting for a grace period to
complete, eventually OOMing the system:

o A TREE_PREEMPT_RCU build of the kernel is booted on a system
with more than 64 physical CPUs present (32 on a 32-bit system).
Alternatively, a TREE_PREEMPT_RCU build of the kernel is booted
with RCU_FANOUT set to a sufficiently small value that the
physical CPUs populate two or more leaf rcu_node structures.

o A task is preempted in an RCU read-side critical section
while running on a CPU corresponding to a given leaf rcu_node
structure.

o All CPUs corresponding to this same leaf rcu_node structure
record quiescent states for the current grace period.

o All of these same CPUs go offline (hence the need for enough
physical CPUs to populate more than one leaf rcu_node structure).
This causes the preempted task to be moved to the root rcu_node
structure.

At this point, there is nothing left to cause the quiescent
state to be propagated up the rcu_node tree, so the current
grace period never completes.

The simplest fix, especially after considering the deadlock
possibilities, is to detect this situation when the last CPU is
offlined, and to set that CPU's ->qsmask bit in its leaf
rcu_node structure.  This will cause the next invocation of
force_quiescent_state() to end the grace period.

Without this fix, this hang can be triggered in an hour or so on
some machines with rcutorture and random CPU onlining/offlining.
With this fix, these same machines pass a full 10 hours of this
sort of abuse.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <20091015162614.GA19131@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Update trace.txt documentation for blocked-tasks lists
Paul E. McKenney [Wed, 14 Oct 2009 17:15:59 +0000]
rcu: Update trace.txt documentation for blocked-tasks lists

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: npiggin@suse.de
Cc: jens.axboe@oracle.com
LKML-Reference: <12555405592804-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Update trace.txt documentation to reflect recent changes
Paul E. McKenney [Wed, 14 Oct 2009 17:15:54 +0000]
rcu: Update trace.txt documentation to reflect recent changes

o Remove the CONFIG_PREEMPT_RCU documentation since this
config option has now been removed.

o Change the now-incorrect references to "rcu" labels to
instead be "rcu_sched".

o Add notes stating that CONFIG_TREE_PREEMPT_RCU kernels will
have additional "rcu_preempt" output.

o Note the new "oqlen" field in the rcuhier output (for
RCU callbacks orphaned by an offlined CPU).

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: npiggin@suse.de
Cc: jens.axboe@oracle.com
LKML-Reference: <1255540559799-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Add rnp->blocked_tasks to tracing
Paul E. McKenney [Wed, 14 Oct 2009 23:36:38 +0000]
rcu: Add rnp->blocked_tasks to tracing

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: npiggin@suse.de
Cc: jens.axboe@oracle.com
Cc: Josh Triplett <josh@joshtriplett.org>
LKML-Reference: <20091014233638.GE6763@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
 kernel/rcutree_trace.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

9 years agorcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU
Paul E. McKenney [Wed, 14 Oct 2009 17:15:56 +0000]
rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU

For the short term, map synchronize_rcu_expedited() to
synchronize_rcu() for TREE_PREEMPT_RCU and to
synchronize_sched_expedited() for TREE_RCU.

Longer term, there needs to be a real expedited grace period for
TREE_PREEMPT_RCU, but candidate patches to date are considerably
more complex and intrusive.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: npiggin@suse.de
Cc: jens.axboe@oracle.com
LKML-Reference: <12555405592331-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agorcu: Prevent RCU IPI storms in presence of high call_rcu() load
Paul E. McKenney [Wed, 14 Oct 2009 17:15:55 +0000]
rcu: Prevent RCU IPI storms in presence of high call_rcu() load

As the number of callbacks on a given CPU rises, invoke
force_quiescent_state() only every blimit number of callbacks
(defaults to 10,000), and even then only if no other CPU has
invoked force_quiescent_state() in the meantime.

This should fix the performance regression reported by Nick.

Reported-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: jens.axboe@oracle.com
LKML-Reference: <12555405592133-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agofutex: Check for NULL keys in match_futex
Darren Hart [Wed, 14 Oct 2009 17:12:39 +0000]
futex: Check for NULL keys in match_futex

If userspace tries to perform a requeue_pi on a non-requeue_pi waiter,
it will find the futex_q->requeue_pi_key to be NULL and OOPS.

Check for NULL in match_futex() instead of doing explicit NULL pointer
checks on all call sites.  While match_futex(NULL, NULL) returning
false is a little odd, it's still correct as we expect valid key
references.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Dinakar Guniguntala <dino@in.ibm.com>
CC: John Stultz <johnstul@us.ibm.com>
Cc: stable@kernel.org
LKML-Reference: <4AD60687.10306@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

9 years agofutex: Handle spurious wake up
Thomas Gleixner [Tue, 13 Oct 2009 18:40:43 +0000]
futex: Handle spurious wake up

The futex code does not handle spurious wake up in futex_wait and
futex_wait_requeue_pi.

The code assumes that any wake up which was not caused by futex_wake /
requeue or by a timeout was caused by a signal wake up and returns one
of the syscall restart error codes.

In case of a spurious wake up the signal delivery code which deals
with the restart error codes is not invoked and we return that error
code to user space. That causes applications which actually check the
return codes to fail. Blaise reported that on preempt-rt a python test
program run into a exception trap. -rt exposed that due to a built in
spurious wake up accelerator :)

Solve this by checking signal_pending(current) in the wake up path and
handle the spurious wake up case w/o returning to user space.

Reported-by: Blaise Gassend <blaise@willowgarage.com>
Debugged-by: Darren Hart <dvhltc@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org
LKML-Reference: <new-submission>

9 years agoMerge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile...
Ingo Molnar [Mon, 12 Oct 2009 21:26:36 +0000]
Merge branch 'urgent' of git://git./linux/kernel/git/rric/oprofile into core/urgent

9 years agooprofile: warn on freeing event buffer too early
Robert Richter [Fri, 9 Oct 2009 01:17:44 +0000]
oprofile: warn on freeing event buffer too early

A race shouldn't happen since all workqueues or handlers are canceled
or flushed before the event buffer is freed. A warning is triggered
now if the buffer is freed too early.

Also, this patch adds some comments about event buffer protection,
reworks some code and adds code to clear buffer_pos during alloc and
free of the event buffer.

Cc: David Rientjes <rientjes@google.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>

9 years agooprofile: fix race condition in event_buffer free
David Rientjes [Wed, 9 Sep 2009 13:02:33 +0000]
oprofile: fix race condition in event_buffer free

Looking at the 2.6.31-rc9 code, it appears there is a race condition
in the event_buffer cleanup code path (shutdown). This could lead to
kernel panic as some CPUs may be operating on the event buffer AFTER
it has been freed. The attached patch solves the problem and makes
sure CPUs check if the buffer is not NULL before they access it as
some may have been spinning on the mutex while the buffer was being
freed.

The race may happen if the buffer is freed during pending reads. But
it is not clear why there are races in add_event_entry() since all
workqueues or handlers are canceled or flushed before the event buffer
is freed.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>

9 years agolockdep: Use cpu_clock() for lockstat
Peter Zijlstra [Fri, 9 Oct 2009 08:12:41 +0000]
lockdep: Use cpu_clock() for lockstat

Some tracepoint magic (TRACE_EVENT(lock_acquired)) relies on
the fact that lock hold times are positive and uses div64 on
that. That triggered a build warning on MIPS, and probably
causes bad output in certain circumstances as well.

Make it truly positive.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1254818502.21044.112.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzi...
Linus Torvalds [Thu, 8 Oct 2009 19:22:45 +0000]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  pata_atp867x: add Power Management support
  pata_atp867x: PIO support fixes
  pata_atp867x: clarifications in timings calculations and cable detection
  pata_atp867x: fix it to not claim MWDMA support
  libata: fix incorrect link online check during probe
  ahci: filter FPDMA non-zero offset enable for Aspire 3810T
  libata: make gtf_filter per-dev
  libata: implement more acpi filtering options
  libata: cosmetic updates
  ahci: display all AHCI 1.3 HBA capability flags (v2)
  pata_ali: trivial fix of a very frequent spelling mistake
  ahci: disable 64bit DMA by default on SB600s

9 years agoMerge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 8 Oct 2009 19:16:35 +0000]
Merge branch 'core-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  futex: fix requeue_pi key imbalance
  futex: Fix typo in FUTEX_WAIT/WAKE_BITSET_PRIVATE definitions
  rcu: Place root rcu_node structure in separate lockdep class
  rcu: Make hot-unplugged CPU relinquish its own RCU callbacks
  rcu: Move rcu_barrier() to rcutree
  futex: Move exit_pi_state() call to release_mm()
  futex: Nullify robust lists after cleanup
  futex: Fix locking imbalance
  panic: Fix panic message visibility by calling bust_spinlocks(0) before dying
  rcu: Replace the rcu_barrier enum with pointer to call_rcu*() function
  rcu: Clean up code based on review feedback from Josh Triplett, part 4
  rcu: Clean up code based on review feedback from Josh Triplett, part 3
  rcu: Fix rcu_lock_map build failure on CONFIG_PROVE_LOCKING=y
  rcu: Clean up code to address Ingo's checkpatch feedback
  rcu: Clean up code based on review feedback from Josh Triplett, part 2
  rcu: Clean up code based on review feedback from Josh Triplett

9 years agoMerge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 8 Oct 2009 19:07:24 +0000]
Merge branch 'sched-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Set correct normal_prio and prio values in sched_fork()

9 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 8 Oct 2009 19:06:36 +0000]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, pci: Correct spelling in a comment
  x86: Simplify bound checks in the MTRR code
  x86: EDAC: carve out AMD MCE decoding logic
  initcalls: Add early_initcall() for modules
  x86: EDAC: MCE: Fix MCE decoding callback logic

9 years agoMerge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 8 Oct 2009 19:06:09 +0000]
Merge branch 'tracing-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: user local buffer variable for trace branch tracer
  tracing: fix warning on kernel/trace/trace_branch.c andtrace_hw_branches.c
  ftrace: check for failure for all conversions
  tracing: correct module boundaries for ftrace_release
  tracing: fix transposed numbers of lock_depth and preempt_count
  trace: Fix missing assignment in trace_ctxwake_*
  tracing: Use free_percpu instead of kfree
  tracing: Check total refcount before releasing bufs in profile_enable failure

9 years agoMerge branch 'sparc-perf-events-fixes-for-linus' of git://git.kernel.org/pub/scm...
Linus Torvalds [Thu, 8 Oct 2009 19:05:50 +0000]
Merge branch 'sparc-perf-events-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'sparc-perf-events-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  mm, perf_event: Make vmalloc_user() align base kernel virtual address to SHMLBA
  perf_event: Provide vmalloc() based mmap() backing

9 years agoMerge branch 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 8 Oct 2009 19:05:00 +0000]
Merge branch 'perf-fixes-for-linus-2' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf_events: Make ABI definitions available to userspace
  perf tools: elf_sym__is_function() should accept "zero" sized functions
  tracing/syscalls: Use long for syscall ret format and field definitions
  perf trace: Update eval_flag() flags array to match interrupt.h
  perf trace: Remove unused code in builtin-trace.c
  perf: Propagate term signal to child

9 years agoMerge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 8 Oct 2009 19:04:04 +0000]
Merge branch 'timers-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, timers: Check for pending timers after (device) interrupts
  NOHZ: update idle state also when NOHZ is inactive

9 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
Linus Torvalds [Thu, 8 Oct 2009 19:03:21 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: ice1724: increase SPDIF and independent stereo buffer sizes
  ALSA: opl3: circular locking in the snd_opl3_note_on() and snd_opl3_note_off()
  ALSA: ICE1712/24 - Change the Multi Track Peak control (level meters) from MIXER to PCM type
  ALSA: hda - Fix yet another auto-mic bug in ALC268
  ASoC: WM8350 capture PGA mutes are inverted
  ASoC: Remove absent SYNC and TDM DAI format options from i.MX SSI
  sound: via82xx: move DXS volume controls to PCM interface
  ALSA: hda - Don't pick up invalid HP pins in alc_subsystem_id()
  ALSA: hda - Add a workaround for ASUS A7K
  ALSA: hda - Fix invalid initializations for ALC861 auto mode
  ASoC: wm8940: Fix check on error code form snd_soc_codec_set_cache_io
  ASoC: Fix SND_SOC_DAPM_LINE handling

9 years agoMerge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied...
Linus Torvalds [Thu, 8 Oct 2009 19:02:06 +0000]
Merge branch 'drm-linus' of git://git./linux/kernel/git/airlied/drm-2.6

* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (24 commits)
  drm/radeon/kms: fix vline register for second head.
  drm/r600: avoid assigning vb twice in blit code
  drm/radeon: use list_for_each_entry instead of list_for_each
  drm/radeon/kms: Fix AGP support for R600/RV770 family (v2)
  drm/radeon/kms: Fallback to non AGP when acceleration fails to initialize (v2)
  drm/radeon/kms: Fix RS600/RV515/R520/RS690 IRQ
  drm/radeon: Fix setting of bits
  drm/ttm: fix refcounting in ttm global code.
  drm/fb: add more correct 8/16/24/32 bpp fb support.
  drm/fb: add setcmap and fix 8-bit support.
  drm/radeon/kms: respect single crtc cards, only create one crtc. (v2)
  drm: Delete the DRM_DEBUG_KMS in drm_mode_cursor_ioctl
  drm/radeon/kms: add support for "Surround View"
  drm/radeon/kms: Fix irq handling on AVIVO hw
  drm/radeon/kms: R600/RV770 remove dead code and print message for wrong BIOS
  drm/radeon/kms: Fix R600/RV770 disable acceleration path
  drm/radeon/kms: Fix R600/RV770 startup path & reset
  drm/radeon/kms: Fix R600 write back buffer
  drm/radeon/kms: Remove old init path as no hw use it anymore
  drm/radeon/kms: Convert RS600 to new init path
  ...

9 years agoMerge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 8 Oct 2009 19:01:01 +0000]
Merge branch 'omap-fixes-for-linus' of git://git./linux/kernel/git/tmlind/linux-omap-2.6

* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
  omapfb: Blizzard: constify register address tables
  omapfb: Blizzard: fix pointer to be const
  omapfb: Condition mutex acquisition
  omap: iovmm: Add missing mutex_unlock
  omap: iovmm: Fix incorrect spelling
  omap: SRAM: flush the right address after memcpy in omap_sram_push
  omap: Lock DPLL5 at boot
  omap: Fix incorrect 730 vs 850 detection
  OMAP3: PM: introduce a new powerdomain walk helper
  OMAP3: PM: Enable GPIO module-level wakeups
  OMAP3: PM: USBHOST: clear wakeup events on both hosts
  OMAP3: PM: PRCM interrupt: only handle selected PRCM interrupts
  OMAP3: PM: PRCM interrupt: check MPUGRPSEL register
  OMAP3: PM: Prevent hang in prcm_interrupt_handler

9 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Linus Torvalds [Thu, 8 Oct 2009 19:00:39 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/bp/bp

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  amd64_edac: beef up DRAM error injection
  amd64_edac: fix DRAM base and limit extraction
  amd64_edac: fix chip select handling
  amd64_edac: simple fix to allow reporting of CECC errors
  amd64_edac: fix K8 intlv_sel check
  amd64_edac: fix interleave enable tests
  amd64_edac: fix DRAM base and limit address extraction
  amd64_edac: fix driver instance lookup table allocation

9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Thu, 8 Oct 2009 18:59:30 +0000]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
  ethoc: limit the number of buffers to 128
  ethoc: use system memory as buffer
  ethoc: align received packet to make IP header at word boundary
  ethoc: fix buffer address mapping
  ethoc: fix typo to compute number of tx descriptors
  au1000_eth: Duplicate test of RX_OVERLEN bit in update_rx_stats()
  netxen: Fix Unlikely(x) > y
  pasemi_mac: ethtool get settings fix
  add maintainer for network drop monitor kernel service
  tg3: Fix phylib locking strategy
  rndis_host: support ETHTOOL_GPERMADDR
  ipv4: arp_notify address list bug
  gigaset: add kerneldoc comments
  gigaset: correct debugging output selection
  gigaset: improve error recovery
  gigaset: fix device ERROR response handling
  gigaset: announce if built with debugging
  gigaset: handle isoc frame errors more gracefully
  gigaset: linearize skb
  gigaset: fix reject/hangup handling
  ...

9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6
Linus Torvalds [Thu, 8 Oct 2009 18:59:06 +0000]
Merge git://git./linux/kernel/git/davem/ide-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6:
  Revert "Revert "ide: try to use PIO Mode 0 during probe if possible""
  sis5513: fix PIO setup for ATAPI devices

9 years agox86, timers: Check for pending timers after (device) interrupts
Arjan van de Ven [Thu, 8 Oct 2009 13:40:41 +0000]
x86, timers: Check for pending timers after (device) interrupts

Now that range timers and deferred timers are common, I found a
problem with these using the "perf timechart" tool. Frans Pop also
reported high scheduler latencies via LatencyTop, when using
iwlagn.

It turns out that on x86, these two 'opportunistic' timers only get
checked when another "real" timer happens. These opportunistic
timers have the objective to save power by hitchhiking on other
wakeups, as to avoid CPU wakeups by themselves as much as possible.

The change in this patch runs this check not only at timer
interrupts, but at all (device) interrupts. The effect is that:

 1) the deferred timers/range timers get delayed less

 2) the range timers cause less wakeups by themselves because
    the percentage of hitchhiking on existing wakeup events goes up.

I've verified the working of the patch using "perf timechart", the
original exposed bug is gone with this patch. Frans also reported
success - the latencies are now down in the expected ~10 msec
range.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Tested-by: Frans Pop <elendil@planet.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091008064041.67219b13@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agomm, perf_event: Make vmalloc_user() align base kernel virtual address to SHMLBA
David Miller [Mon, 21 Sep 2009 19:22:34 +0000]
mm, perf_event: Make vmalloc_user() align base kernel virtual address to SHMLBA

When a vmalloc'd area is mmap'd into userspace, some kind of
co-ordination is necessary for this to work on platforms with cpu
D-caches which can have aliases.

Otherwise kernel side writes won't be seen properly in userspace
and vice versa.

If the kernel side mapping and the user side one have the same
alignment, modulo SHMLBA, this can work as long as VM_SHARED is
shared of VMA and for all current users this is true.  VM_SHARED
will force SHMLBA alignment of the user side mmap on platforms with
D-cache aliasing matters.

The bulk of this patch is just making it so that a specific
alignment can be passed down into __get_vm_area_node().  All
existing callers pass in '1' which preserves existing behavior.
vmalloc_user() gives SHMLBA for the alignment.

As a side effect this should get the video media drivers and other
vmalloc_user() users into more working shape on such systems.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <200909211922.n8LJMYjw029425@imap1.linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6
Linus Torvalds [Thu, 8 Oct 2009 14:40:19 +0000]
Merge branch 'fixes' of git://git./linux/kernel/git/kyle/parisc-2.6

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6:
  agp: parisc-agp.c - use correct page_mask function
  parisc: Fix linker script breakage.
  parisc: convert to asm-generic/hardirq.h
  parisc: Make THREAD_SIZE available to assembly files and linker scripts.
  parisc: correct use of SHF_ALLOC
  parisc: rename parisc's vmalloc_start to parisc_vmalloc_start
  parisc: add me to Maintainers
  parisc: includecheck fix: signal.c
  parisc: HAVE_ARCH_TRACEHOOK
  parisc: add skeleton syscall.h
  parisc: stop using task->ptrace for {single,block}step flags
  parisc: split syscall_trace into two halves
  parisc: add missing TI_TASK macro in syscall.S
  parisc: tracehook_signal_handler
  parisc: tracehook_report_syscall

9 years agolis3lv02d_spi: module unload didn't remove sysfs entry
Samu Onkalo [Wed, 7 Oct 2009 23:32:35 +0000]
lis3lv02d_spi: module unload didn't remove sysfs entry

In module unload, lis3lv02d core driver sysfs clean up was not called.

Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com>
Acked-by: Daniel Mack <daniel@caiaq.de>
Cc: √Čric Piel <eric.piel@tremplin-utc.net>
Cc: "Trisal, Kalhan" <kalhan.trisal@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agommc: sdio: don't require CISTPL_VERS_1 to contain 4 strings
David Vrabel [Wed, 7 Oct 2009 23:32:33 +0000]
mmc: sdio: don't require CISTPL_VERS_1 to contain 4 strings

The PC Card 8.0 specification (vol.  4, section 3.2.10) says the
TPLLV1_INFO field of the CISTPL_VERS_1 tuple must contain 4 strings.  Some
cards don't have all 4 so just parse as many as we can.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: David Vrabel <david.vrabel@csr.com>
Tested-by: Jonathan Cameron <jic23@cam.ac.uk>
Tested-by: Bing Zhao <bzhao@marvell.com>
Cc: Roel Kluin <roel.kluin@gmail.com>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopage-types: add hwpoison/unpoison feature
Wu Fengguang [Wed, 7 Oct 2009 23:32:32 +0000]
page-types: add hwpoison/unpoison feature

For hwpoison stress testing.  The debugfs mount point is assumed to be
/debug/.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopage-types: introduce kpageflags_flags()
Wu Fengguang [Wed, 7 Oct 2009 23:32:31 +0000]
page-types: introduce kpageflags_flags()

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopage-types: make voffset local variables
Wu Fengguang [Wed, 7 Oct 2009 23:32:30 +0000]
page-types: make voffset local variables

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopage-types: make standalone pagemap/kpageflags read routines
Wu Fengguang [Wed, 7 Oct 2009 23:32:30 +0000]
page-types: make standalone pagemap/kpageflags read routines

Refactor the code to be more modular and easier to reuse.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopage-types: introduce checked_open()
Wu Fengguang [Wed, 7 Oct 2009 23:32:29 +0000]
page-types: introduce checked_open()

This helps merge duplicate code (now and future) and outstand the main
logic.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopage-types: add GPL note
Wu Fengguang [Wed, 7 Oct 2009 23:32:28 +0000]
page-types: add GPL note

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopagemap: document KPF_KSM and show it in page-types
Wu Fengguang [Wed, 7 Oct 2009 23:32:28 +0000]
pagemap: document KPF_KSM and show it in page-types

It indicates to the system admin that processes mapping such pages may be
eating less physical memory than the reported numbers by legacy tools.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Acked-by: Chris Wright <chrisw@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopagemap: export KPF_HWPOISON
Wu Fengguang [Wed, 7 Oct 2009 23:32:27 +0000]
pagemap: export KPF_HWPOISON

This flag indicates a hardware detected memory corruption on the page.
Any future access of the page data may bring down the machine.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agocgroups: update documentation of cgroups tasks and procs files
Paul Menage [Wed, 7 Oct 2009 23:32:26 +0000]
cgroups: update documentation of cgroups tasks and procs files

Update documentation of cgroups tasks and procs files

Document the cgroup.procs file.

Clarify the semantics of the cgroup.procs and tasks files.  Although the
current cgroup.procs interface returns a sorted and uniqified list of
pids, potential future performance enhancements could result in those
properties being removed - explicitly document this aspect of the API.

There are no existing users of cgroup.procs, so compatibility isn't an
issue.  There are users of the "tasks" file, but none that would appear to
break in the event of the sorted property being broken.  The standard
"libcpuset" explicitly sorts the results of reading from the tasks file,
and "libcg" and other users don't appear to care about ordering.

Signed-off-by: Paul Menage <menage@google.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agovideo: includecheck fix: da8xx-fb.c
Jaswinder Singh Rajput [Wed, 7 Oct 2009 23:32:25 +0000]
video: includecheck fix: da8xx-fb.c

fix the following 'make includecheck' warning:

  drivers/video/da8xx-fb.c: linux/device.h is included more than once.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agovideo: includecheck fix: msm, mddi.c
Jaswinder Singh Rajput [Wed, 7 Oct 2009 23:32:24 +0000]
video: includecheck fix: msm, mddi.c

fix the following 'make includecheck' warning:

  drivers/video/msm/mddi.c: linux/delay.h is included more than once.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agofs: includecheck fix: proc, kcore.c
Jaswinder Singh Rajput [Wed, 7 Oct 2009 23:32:24 +0000]
fs: includecheck fix: proc, kcore.c

fix the following 'make includecheck' warning:

  fs/proc/kcore.c: linux/mm.h is included more than once.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomm: includecheck fix: vmalloc.c
Jaswinder Singh Rajput [Wed, 7 Oct 2009 23:32:23 +0000]
mm: includecheck fix: vmalloc.c

fix the following 'make includecheck' warning:

  mm/vmalloc.c: linux/highmem.h is included more than once.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoksm: more on default values
Hugh Dickins [Wed, 7 Oct 2009 23:32:22 +0000]
ksm: more on default values

Adjust the max_kernel_pages default to a quarter of totalram_pages,
instead of nr_free_buffer_pages() / 4: the KSM pages themselves come from
highmem, and even on a 16GB PAE machine, 4GB of KSM pages would only be
pinning 32MB of lowmem with their rmap_items, so no need for the more
obscure calculation (nor for its own special init function).

There is no way for the user to switch KSM on if CONFIG_SYSFS is not
enabled, so in that case default run to KSM_RUN_MERGE.

Update KSM Documentation and Kconfig to reflect the new defaults.

Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoMerge branch 'fix/misc' into for-linus
Takashi Iwai [Thu, 8 Oct 2009 11:00:02 +0000]
Merge branch 'fix/misc' into for-linus

9 years agoMerge branch 'fix/hda' into for-linus
Takashi Iwai [Thu, 8 Oct 2009 10:59:58 +0000]
Merge branch 'fix/hda' into for-linus

9 years agoALSA: ice1724: increase SPDIF and independent stereo buffer sizes
Robert Hancock [Thu, 8 Oct 2009 02:19:21 +0000]
ALSA: ice1724: increase SPDIF and independent stereo buffer sizes

Increase the default and maximum PCM buffer prellocation size for ice1724's
SPDIF and independent stereo pair outputs to 256K, which is the hardware's
maximum supported size. This allows a reduction in interrupt rate and
potentially power usage when an application is not latency-critical.

Signed-off-by: Robert Hancock <hancockrwd@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

9 years agoALSA: opl3: circular locking in the snd_opl3_note_on() and snd_opl3_note_off()
Krzysztof Helt [Wed, 7 Oct 2009 20:51:34 +0000]
ALSA: opl3: circular locking in the snd_opl3_note_on() and snd_opl3_note_off()

Fix following circular locking in the opl3 driver.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-rc3 #87
-------------------------------------------------------
swapper/0 is trying to acquire lock:
 (&opl3->voice_lock){..-...}, at: [<cca748fe>] snd_opl3_note_off+0x1e/0xe0 [snd_opl3_synth]

but task is already holding lock:
 (&opl3->sys_timer_lock){..-...}, at: [<cca75169>] snd_opl3_timer_func+0x19/0xc0 [snd_opl3_synth]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&opl3->sys_timer_lock){..-...}:
       [<c02461d5>] validate_chain+0xa25/0x1040
       [<c0246aca>] __lock_acquire+0x2da/0xab0
       [<c024731a>] lock_acquire+0x7a/0xa0
       [<c044c300>] _spin_lock_irqsave+0x40/0x60
       [<cca75046>] snd_opl3_note_on+0x686/0x790 [snd_opl3_synth]
       [<cca68912>] snd_midi_process_event+0x322/0x590 [snd_seq_midi_emul]
       [<cca74245>] snd_opl3_synth_event_input+0x15/0x20 [snd_opl3_synth]
       [<cca4dcc0>] snd_seq_deliver_single_event+0x100/0x200 [snd_seq]
       [<cca4de07>] snd_seq_deliver_event+0x47/0x1f0 [snd_seq]
       [<cca4e50b>] snd_seq_dispatch_event+0x3b/0x140 [snd_seq]
       [<cca5008c>] snd_seq_check_queue+0x10c/0x120 [snd_seq]
       [<cca5037b>] snd_seq_enqueue_event+0x6b/0xe0 [snd_seq]
       [<cca4e0fd>] snd_seq_client_enqueue_event+0xdd/0x100 [snd_seq]
       [<cca4eb7a>] snd_seq_write+0xea/0x190 [snd_seq]
       [<c02827b6>] vfs_write+0x96/0x160
       [<c0282c9d>] sys_write+0x3d/0x70
       [<c0202c45>] syscall_call+0x7/0xb

-> #0 (&opl3->voice_lock){..-...}:
       [<c02467e6>] validate_chain+0x1036/0x1040
       [<c0246aca>] __lock_acquire+0x2da/0xab0
       [<c024731a>] lock_acquire+0x7a/0xa0
       [<c044c300>] _spin_lock_irqsave+0x40/0x60
       [<cca748fe>] snd_opl3_note_off+0x1e/0xe0 [snd_opl3_synth]
       [<cca751f0>] snd_opl3_timer_func+0xa0/0xc0 [snd_opl3_synth]
       [<c022ac46>] run_timer_softirq+0x166/0x1e0
       [<c02269e8>] __do_softirq+0x78/0x110
       [<c0226ac6>] do_softirq+0x46/0x50
       [<c0226e26>] irq_exit+0x36/0x40
       [<c0204bd2>] do_IRQ+0x42/0xb0
       [<c020328e>] common_interrupt+0x2e/0x40
       [<c021092f>] apm_cpu_idle+0x10f/0x290
       [<c0201b11>] cpu_idle+0x21/0x40
       [<c04443cd>] rest_init+0x4d/0x60
       [<c055c835>] start_kernel+0x235/0x280
       [<c055c066>] i386_start_kernel+0x66/0x70

other info that might help us debug this:

2 locks held by swapper/0:
 #0:  (&opl3->tlist){+.-...}, at: [<c022abd0>] run_timer_softirq+0xf0/0x1e0
 #1:  (&opl3->sys_timer_lock){..-...}, at: [<cca75169>] snd_opl3_timer_func+0x19/0xc0 [snd_opl3_synth]

stack backtrace:
Pid: 0, comm: swapper Not tainted 2.6.32-rc3 #87
Call Trace:
 [<c0245188>] print_circular_bug+0xc8/0xd0
 [<c02467e6>] validate_chain+0x1036/0x1040
 [<c0247f14>] ? check_usage_forwards+0x54/0xd0
 [<c0246aca>] __lock_acquire+0x2da/0xab0
 [<c024731a>] lock_acquire+0x7a/0xa0
 [<cca748fe>] ? snd_opl3_note_off+0x1e/0xe0 [snd_opl3_synth]
 [<c044c300>] _spin_lock_irqsave+0x40/0x60
 [<cca748fe>] ? snd_opl3_note_off+0x1e/0xe0 [snd_opl3_synth]
 [<cca748fe>] snd_opl3_note_off+0x1e/0xe0 [snd_opl3_synth]
 [<c044c307>] ? _spin_lock_irqsave+0x47/0x60
 [<cca751f0>] snd_opl3_timer_func+0xa0/0xc0 [snd_opl3_synth]
 [<c022ac46>] run_timer_softirq+0x166/0x1e0
 [<c022abd0>] ? run_timer_softirq+0xf0/0x1e0
 [<cca75150>] ? snd_opl3_timer_func+0x0/0xc0 [snd_opl3_synth]
 [<c02269e8>] __do_softirq+0x78/0x110
 [<c044c0fd>] ? _spin_unlock+0x1d/0x20
 [<c025915f>] ? handle_level_irq+0xaf/0xe0
 [<c0226ac6>] do_softirq+0x46/0x50
 [<c0226e26>] irq_exit+0x36/0x40
 [<c0204bd2>] do_IRQ+0x42/0xb0
 [<c024463c>] ? trace_hardirqs_on_caller+0x12c/0x180
 [<c020328e>] common_interrupt+0x2e/0x40
 [<c0208d88>] ? default_idle+0x38/0x50
 [<c021092f>] apm_cpu_idle+0x10f/0x290
 [<c0201b11>] cpu_idle+0x21/0x40
 [<c04443cd>] rest_init+0x4d/0x60
 [<c055c835>] start_kernel+0x235/0x280
 [<c055c210>] ? unknown_bootoption+0x0/0x210
 [<c055c066>] i386_start_kernel+0x66/0x70

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

9 years agoALSA: ICE1712/24 - Change the Multi Track Peak control (level meters) from MIXER...
Pavel Hofman [Tue, 6 Oct 2009 14:04:11 +0000]
ALSA: ICE1712/24 - Change the Multi Track Peak control (level meters) from MIXER to PCM type

* PLEASE NOTE - this change requires the corresponding update of
  envy24control for ice1712 - kind of an ABI change.
* The "Multi Track Peak" control is read-only level meters indicator.
* The control is VERY confusing to most users since it is currently displayed
  in regular mixers. E.g. alsamixer ignores its read-only status
  and allows changing the levels with keys which makes no sense.

Signed-off-by: Pavel Hofman <pavel.hofman@ivitera.com>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

9 years agoMerge branch 'drm-next' of ../drm-next into drm-linus
Dave Airlie [Thu, 8 Oct 2009 04:03:05 +0000]
Merge branch 'drm-next' of ../drm-next into drm-linus

conflict in radeon since new init path merged with vga arb code.

Conflicts:
drivers/gpu/drm/radeon/radeon.h
drivers/gpu/drm/radeon/radeon_asic.h
drivers/gpu/drm/radeon/radeon_device.c

9 years agotracing: user local buffer variable for trace branch tracer
Steven Rostedt [Thu, 8 Oct 2009 01:53:41 +0000]
tracing: user local buffer variable for trace branch tracer

Just using the tr->buffer for the API to trace_buffer_lock_reserve
is not good enough. This is because the tr->buffer may change, and we
do not want to commit with a different buffer that we reserved from.

This patch uses a local variable to hold the buffer that was used to
reserve and commit with.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

9 years agotracing: fix warning on kernel/trace/trace_branch.c andtrace_hw_branches.c
Zhenwen Xu [Thu, 8 Oct 2009 01:21:46 +0000]
tracing: fix warning on kernel/trace/trace_branch.c andtrace_hw_branches.c

fix warnings that caused the API change of trace_buffer_lock_reserve()
change files: kernel/trace/trace_hw_branch.c
              kernel/trace/trace_branch.c

Signed-off-by: Zhenwen Xu <helight.xu@gmail.com>
LKML-Reference: <20091008012146.GA4170@helight>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

9 years agodrm/radeon/kms: fix vline register for second head.
Dave Airlie [Thu, 8 Oct 2009 01:32:49 +0000]
drm/radeon/kms: fix vline register for second head.

Both r100/r600 had this wrong, use the macro to extract the register
to relocate.

Signed-off-by: Dave Airlie <airlied@redhat.com>

9 years agodrm/r600: avoid assigning vb twice in blit code
Robert Noland [Mon, 5 Oct 2009 15:56:44 +0000]
drm/r600: avoid assigning vb twice in blit code

There is no need to assign vb before you know that space is available.

[agd5f: adapted for kernel tree.]

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

9 years agodrm/radeon: use list_for_each_entry instead of list_for_each
Dave Airlie [Wed, 7 Oct 2009 23:28:19 +0000]
drm/radeon: use list_for_each_entry instead of list_for_each

This is just a cleanup of the list macro usage.

Signed-off-by: Dave Airlie <airlied@redhat.com>

9 years agodrm/radeon/kms: Fix AGP support for R600/RV770 family (v2)
Jerome Glisse [Tue, 6 Oct 2009 17:04:30 +0000]
drm/radeon/kms: Fix AGP support for R600/RV770 family (v2)

For AGP to work unmapped access must cover VRAM & AGP as
AGP is treated like VRAM by the GPU (ie physical address).
This patch properly setup the virtual memory system aperture
to cover AGP if AGP is enabled. It seems that there is memory
corruption after resume when using AGP (RV770 seems unaffected
thought). Version 2 just fix merge issue with updated AGP
fallback patch.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

9 years agodrm/radeon/kms: Fallback to non AGP when acceleration fails to initialize (v2)
Jerome Glisse [Tue, 6 Oct 2009 17:04:29 +0000]
drm/radeon/kms: Fallback to non AGP when acceleration fails to initialize (v2)

When GPU acceleration is not working with AGP try to fallback to non
AGP GART (either PCI or PCIE GART). This should make KMS failure on
AGP less painfull. We still need to find out what is wrong when AGP
fails but at least user have a lot of more chances to get a working
configuration with acceleration. This patch also cleanup R600/RV770
fallback path so they use same code as others asics. Version 2
factorize agp disabling logic to avoid code duplication and bugs.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

9 years agodrm/radeon/kms: Fix RS600/RV515/R520/RS690 IRQ
Jerome Glisse [Wed, 7 Oct 2009 09:08:22 +0000]
drm/radeon/kms: Fix RS600/RV515/R520/RS690 IRQ

Bad generated header file leaded to use wrong register
to check IRQ status and acknowledge them. Fix the header
and use proper registers.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

9 years agoftrace: check for failure for all conversions
Steven Rostedt [Wed, 7 Oct 2009 20:57:56 +0000]
ftrace: check for failure for all conversions

Due to legacy code from back when the dynamic tracer used a daemon,
only core kernel code was checking for failures. This is no longer
the case. We must check for failures any time we perform text modifications.

Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

9 years agotracing: correct module boundaries for ftrace_release
jolsa@redhat.com [Wed, 7 Oct 2009 17:00:35 +0000]
tracing: correct module boundaries for ftrace_release

When the module is about the unload we release its call records.
The ftrace_release function was given wrong values representing
the module core boundaries, thus not releasing its call records.

Plus making ftrace_release function module specific.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1254934835-363-3-git-send-email-jolsa@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

9 years agofutex: fix requeue_pi key imbalance
Darren Hart [Wed, 7 Oct 2009 18:46:54 +0000]
futex: fix requeue_pi key imbalance

If futex_wait_requeue_pi() wakes prior to requeue, we drop the
reference to the source futex_key twice, once in
handle_early_requeue_pi_wakeup() and once on our way out.

Remove the drop from the handle_early_requeue_pi_wakeup() and keep
the get/drops together in futex_wait_requeue_pi().

Reported-by: Helge Bahmann <hcb@chaoticmind.net>
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Helge Bahmann <hcb@chaoticmind.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: stable-2.6.31 <stable@kernel.org>
LKML-Reference: <4ACCE21E.5030805@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

9 years agotracing: fix transposed numbers of lock_depth and preempt_count
Steven Rostedt [Sun, 27 Sep 2009 11:02:07 +0000]
tracing: fix transposed numbers of lock_depth and preempt_count

The lock_depth and preempt_count numbers in the latency format is
transposed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

9 years agoamd64_edac: beef up DRAM error injection
Borislav Petkov [Thu, 24 Sep 2009 09:05:30 +0000]
amd64_edac: beef up DRAM error injection

When injecting DRAM ECC errors (F3xBC_x8), EccVector[15:0] is a bitmask
of which bits should be error injected when written to and holds the
payload of 16-bit DRAM word when read, respectively.

Add /sysfs members to show the DRAM ECC section/word/vector.

Fail wrong injection values entered over /sysfs instead of truncating
them.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>

9 years agoamd64_edac: fix DRAM base and limit extraction
Borislav Petkov [Tue, 22 Sep 2009 14:48:37 +0000]
amd64_edac: fix DRAM base and limit extraction

On Fam10h and above, F1x[1, 0][7C:40] are DRAM Base/Limit registers
which specify the destination node of a DRAM address. Those address
boundaries are being extracted into ->dram_base[] and ->dram_limit[].
Correct the extraction masks to match the respective address bits.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>

9 years agoamd64_edac: fix chip select handling
Borislav Petkov [Mon, 21 Sep 2009 12:35:51 +0000]
amd64_edac: fix chip select handling

Different processor families support a different number of chip selects.
Handle this in a family-dependent way with the proper values assigned at
init time (see amd64_set_dct_base_and_mask).

Remove _DCSM_COUNT defines since they're used at one place and originate
from public documentation.

CC: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>

9 years agoamd64_edac: simple fix to allow reporting of CECC errors
Keith Mannthey [Fri, 18 Sep 2009 12:35:23 +0000]
amd64_edac: simple fix to allow reporting of CECC errors

This allows the errors to be further decoded and mapped to csrows.
Tested with ECC debug dimms and an Rev F cpu based system.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>

9 years agoamd64_edac: fix K8 intlv_sel check
Borislav Petkov [Fri, 18 Sep 2009 10:39:19 +0000]
amd64_edac: fix K8 intlv_sel check

The check when DRAM interleaving is enabled should be done against the
pvt->dram_IntlvSel field and not against the ->dram_limit.

Simplify first loop and fixup printk formatting while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>