7 years agovtime: Warn if irqs aren't disabled on system time accounting APIs
Frederic Weisbecker [Mon, 19 Nov 2012 16:00:24 +0000]
vtime: Warn if irqs aren't disabled on system time accounting APIs

System time accounting APIs such as vtime_account_system() and
vtime_account_idle() need to be irqsafe. Current callers include
irq entry, exit and kvm, all of which have been checked against that
requirement. Now it's better to grow that with an automatic check
in case we have further callers or we missed something.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>

7 years agovtime: No need to disable irqs on vtime_account()
Frederic Weisbecker [Tue, 13 Nov 2012 23:26:54 +0000]
vtime: No need to disable irqs on vtime_account()

vtime_account() is only called from irq entry. irqs
are always disabled at this point so we can safely
remove the irq disabling guards on that function.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>

7 years agovtime: Consolidate a bit the ctx switch code
Frederic Weisbecker [Tue, 13 Nov 2012 23:24:25 +0000]
vtime: Consolidate a bit the ctx switch code

On ia64 and powerpc, vtime context switch only consists
in flushing system and user pending time, plus a few
arch housekeeping.

Consolidate that into a generic implementation. s390 is
a special case because pending user and system time accounting
there is hard to dissociate. So it's keeping its own implementation.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>

7 years agovtime: Explicitly account pending user time on process tick
Frederic Weisbecker [Tue, 13 Nov 2012 22:51:06 +0000]
vtime: Explicitly account pending user time on process tick

All vtime implementations just flush the user time on process
tick. Consolidate that in generic code by calling a user time
accounting helper. This avoids an indirect call in ia64 and
prepare to also consolidate vtime context switch code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>

7 years agovtime: Remove the underscore prefix invasion
Frederic Weisbecker [Tue, 13 Nov 2012 17:21:22 +0000]
vtime: Remove the underscore prefix invasion

Prepending irq-unsafe vtime APIs with underscores was actually
a bad idea as the result is a big mess in the API namespace that
is even waiting to be further extended. Also these helpers
are always called from irq safe callers except kvm. Just
provide a vtime_account_system_irqsafe() for this specific
case so that we can remove the underscore prefix on other
vtime functions.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>

7 years agoMerge tag 'cputime-cleanups-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel...
Ingo Molnar [Tue, 30 Oct 2012 07:33:01 +0000]
Merge tag 'cputime-cleanups-for-mingo' of git://git./linux/kernel/git/frederic/linux-dynticks into sched/core

Pull cputime cleanups and optimizations from Frederic Weisbecker:

 * Gather vtime headers that were a bit scattered around

 * Separate irqtime and vtime namespaces that were
   colliding, resulting in useless calls to irqtime accounting.

 * Slightly optimize irq and guest vtime accounting.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agocputime: Separate irqtime accounting from generic vtime
Frederic Weisbecker [Sat, 6 Oct 2012 03:23:22 +0000]
cputime: Separate irqtime accounting from generic vtime

vtime_account() doesn't have the same role in
CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_IRQ_TIME_ACCOUNTING.

In the first case it handles time accounting in any context. In
the second case it only handles irq time accounting.

So when vtime_account() is called from outside vtime_account_irq_*()
this call is pointless to CONFIG_IRQ_TIME_ACCOUNTING.

To fix the confusion, change vtime_account() to irqtime_account_irq()
in CONFIG_IRQ_TIME_ACCOUNTING. This way we ensure future account_vtime()
calls won't waste useless cycles in the irqtime APIs.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>

7 years agocputime: Specialize irq vtime hooks
Frederic Weisbecker [Sat, 6 Oct 2012 02:07:19 +0000]
cputime: Specialize irq vtime hooks

With CONFIG_VIRT_CPU_ACCOUNTING, when vtime_account()
is called in irq entry/exit, we perform a check on the
context: if we are interrupting the idle task we
account the pending cputime to idle, otherwise account
to system time or its sub-areas: tsk->stime, hardirq time,
softirq time, ...

However this check for idle only concerns the hardirq entry
and softirq entry:

* Hardirq may directly interrupt the idle task, in which case
we need to flush the pending CPU time to idle.

* The idle task may be directly interrupted by a softirq if
it calls local_bh_enable(). There is probably no such call
in any idle task but we need to cover every case. Ksoftirqd
is not concerned because the idle time is flushed on context
switch and softirq in the end of hardirq have the idle time
already flushed from the hardirq entry.

In the other cases we always account to system/irq time:

* On hardirq exit we account the time to hardirq time.
* On softirq exit we account the time to softirq time.

To optimize this and avoid the indirect call to vtime_account()
and the checks it performs, specialize the vtime irq APIs and
only perform the check on irq entry. Irq exit can directly call
vtime_account_system().

CONFIG_IRQ_TIME_ACCOUNTING behaviour doesn't change and directly
maps to its own vtime_account() implementation. One may want
to take benefits from the new APIs to optimize irq time accounting
as well in the future.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>

7 years agokvm: Directly account vtime to system on guest switch
Frederic Weisbecker [Fri, 5 Oct 2012 21:07:19 +0000]
kvm: Directly account vtime to system on guest switch

Switching to or from guest context is done on ioctl context.
So by the time we call kvm_guest_enter() or kvm_guest_exit()
we know we are not running the idle task.

As a result, we can directly account the cputime using
vtime_account_system().

There are two good reasons to do this:

* We avoid some useless checks on guest switch. It optimizes
a bit this fast path.

* In the case of CONFIG_IRQ_TIME_ACCOUNTING, calling vtime_account()
checks for irq time to account. This is pointless since we know
we are not in an irq on guest switch. This is wasting cpu cycles
for no good reason. vtime_account_system() OTOH is a no-op in
this config option.

* We can remove the irq disable/enable around kvm guest switch in s390.

A further optimization may consist in introducing a vtime_account_guest()
that directly calls account_guest_time().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Xiantao Zhang <xiantao.zhang@intel.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>

7 years agovtime: Make vtime_account_system() irqsafe
Frederic Weisbecker [Wed, 24 Oct 2012 16:05:51 +0000]
vtime: Make vtime_account_system() irqsafe

vtime_account_system() currently has only one caller with
vtime_account() which is irq safe.

Now we are going to call it from other places like kvm where
irqs are not always disabled by the time we account the cputime.

So let's make it irqsafe. The arch implementation part is now
prefixed with "__".

vtime_account_idle() arch implementation is prefixed accordingly
to stay consistent.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>

7 years agovtime: Gather vtime declarations to their own header file
Frederic Weisbecker [Fri, 5 Oct 2012 21:07:19 +0000]
vtime: Gather vtime declarations to their own header file

These APIs are scattered around and are going to expand a bit.
Let's create a dedicated header file for sanity.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>

7 years agosched: Describe CFS load-balancer
Peter Zijlstra [Tue, 3 Jul 2012 11:53:26 +0000]
sched: Describe CFS load-balancer

Add some scribbles on how and why the load-balancer works..

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1341316406.23484.64.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking
Paul Turner [Thu, 4 Oct 2012 11:18:32 +0000]
sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking

While per-entity load-tracking is generally useful, beyond computing shares
distribution, e.g. runnable based load-balance (in progress), governors,
power-management, etc.

These facilities are not yet consumers of this data.  This may be trivially
reverted when the information is required; but avoid paying the overhead for
calculations we will not use until then.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141507.422162369@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Make __update_entity_runnable_avg() fast
Paul Turner [Thu, 4 Oct 2012 11:18:32 +0000]
sched: Make __update_entity_runnable_avg() fast

__update_entity_runnable_avg forms the core of maintaining an entity's runnable
load average.  In this function we charge the accumulated run-time since last
update and handle appropriate decay.  In some cases, e.g. a waking task, this
time interval may be much larger than our period unit.

Fortunately we can exploit some properties of our series to perform decay for a
blocked update in constant time and account the contribution for a running
update in essentially-constant* time.

[*]: For any running entity they should be performing updates at the tick which
gives us a soft limit of 1 jiffy between updates, and we can compute up to a
32 jiffy update in a single pass.

C program to generate the magic constants in the arrays:

  #include <math.h>
  #include <stdio.h>

  #define N 32
  #define WMULT_SHIFT 32

  const long WMULT_CONST = ((1UL << N) - 1);
  double y;

  long runnable_avg_yN_inv[N];
  void calc_mult_inv() {
   int i;
   double yn = 0;

   printf("inverses\n");
   for (i = 0; i < N; i++) {
   yn = (double)WMULT_CONST * pow(y, i);
   runnable_avg_yN_inv[i] = yn;
   printf("%2d: 0x%8lx\n", i, runnable_avg_yN_inv[i]);
   }
   printf("\n");
  }

  long mult_inv(long c, int n) {
   return (c * runnable_avg_yN_inv[n]) >>  WMULT_SHIFT;
  }

  void calc_yn_sum(int n)
  {
   int i;
   double sum = 0, sum_fl = 0, diff = 0;

   /*
    * We take the floored sum to ensure the sum of partial sums is never
    * larger than the actual sum.
    */
   printf("sum y^n\n");
   printf("   %8s  %8s %8s\n", "exact", "floor", "error");
   for (i = 1; i <= n; i++) {
   sum = (y * sum + y * 1024);
   sum_fl = floor(y * sum_fl+ y * 1024);
   printf("%2d: %8.0f  %8.0f %8.0f\n", i, sum, sum_fl,
   sum_fl - sum);
   }
   printf("\n");
  }

  void calc_conv(long n) {
   long old_n;
   int i = -1;

   printf("convergence (LOAD_AVG_MAX, LOAD_AVG_MAX_N)\n");
   do {
   old_n = n;
   n = mult_inv(n, 1) + 1024;
   i++;
   } while (n != old_n);
   printf("%d> %ld\n", i - 1, n);
   printf("\n");
  }

  void main() {
   y = pow(0.5, 1/(double)N);
   calc_mult_inv();
   calc_conv(1024);
   calc_yn_sum(N);
  }

[ Compile with -lm ]
Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141507.277808946@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Update_cfs_shares at period edge
Paul Turner [Thu, 4 Oct 2012 11:18:31 +0000]
sched: Update_cfs_shares at period edge

Now that our measurement intervals are small (~1ms) we can amortize the posting
of update_shares() to be about each period overflow.  This is a large cost
saving for frequently switching tasks.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141507.200772172@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Refactor update_shares_cpu() -> update_blocked_avgs()
Paul Turner [Thu, 4 Oct 2012 11:18:31 +0000]
sched: Refactor update_shares_cpu() -> update_blocked_avgs()

Now that running entities maintain their own load-averages the work we must do
in update_shares() is largely restricted to the periodic decay of blocked
entities.  This allows us to be a little less pessimistic regarding our
occupancy on rq->lock and the associated rq->clock updates required.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141507.133999170@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Replace update_shares weight distribution with per-entity computation
Paul Turner [Thu, 4 Oct 2012 11:18:31 +0000]
sched: Replace update_shares weight distribution with per-entity computation

Now that the machinery in place is in place to compute contributed load in a
bottom up fashion; replace the shares distribution code within update_shares()
accordingly.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141507.061208672@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Maintain runnable averages across throttled periods
Paul Turner [Thu, 4 Oct 2012 11:18:31 +0000]
sched: Maintain runnable averages across throttled periods

With bandwidth control tracked entities may cease execution according to user
specified bandwidth limits.  Charging this time as either throttled or blocked
however, is incorrect and would falsely skew in either direction.

What we actually want is for any throttled periods to be "invisible" to
load-tracking as they are removed from the system for that interval and
contribute normally otherwise.

Do this by moderating the progression of time to omit any periods in which the
entity belonged to a throttled hierarchy.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141506.998912151@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Normalize tg load contributions against runnable time
Paul Turner [Thu, 4 Oct 2012 11:18:31 +0000]
sched: Normalize tg load contributions against runnable time

Entities of equal weight should receive equitable distribution of cpu time.
This is challenging in the case of a task_group's shares as execution may be
occurring on multiple cpus simultaneously.

To handle this we divide up the shares into weights proportionate with the load
on each cfs_rq.  This does not however, account for the fact that the sum of
the parts may be less than one cpu and so we need to normalize:
  load(tg) = min(runnable_avg(tg), 1) * tg->shares
Where runnable_avg is the aggregate time in which the task_group had runnable
children.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141506.930124292@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Compute load contribution by a group entity
Paul Turner [Thu, 4 Oct 2012 11:18:31 +0000]
sched: Compute load contribution by a group entity

Unlike task entities who have a fixed weight, group entities instead own a
fraction of their parenting task_group's shares as their contributed weight.

Compute this fraction so that we can correctly account hierarchies and shared
entity nodes.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141506.855074415@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Aggregate total task_group load
Paul Turner [Thu, 4 Oct 2012 11:18:30 +0000]
sched: Aggregate total task_group load

Maintain a global running sum of the average load seen on each cfs_rq belonging
to each task group so that it may be used in calculating an appropriate
shares:weight distribution.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141506.792901086@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Account for blocked load waking back up
Paul Turner [Thu, 4 Oct 2012 11:18:30 +0000]
sched: Account for blocked load waking back up

When a running entity blocks we migrate its tracked load to
cfs_rq->blocked_runnable_avg.  In the sleep case this occurs while holding
rq->lock and so is a natural transition.  Wake-ups however, are potentially
asynchronous in the presence of migration and so special care must be taken.

We use an atomic counter to track such migrated load, taking care to match this
with the previously introduced decay counters so that we don't migrate too much
load.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141506.726077467@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Add an rq migration call-back to sched_class
Paul Turner [Thu, 4 Oct 2012 11:18:30 +0000]
sched: Add an rq migration call-back to sched_class

Since we are now doing bottom up load accumulation we need explicit
notification when a task has been re-parented so that the old hierarchy can be
updated.

Adds: migrate_task_rq(struct task_struct *p, int next_cpu)

(The alternative is to do this out of __set_task_cpu, but it was suggested that
this would be a cleaner encapsulation.)

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141506.660023400@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Maintain the load contribution of blocked entities
Paul Turner [Thu, 4 Oct 2012 11:18:30 +0000]
sched: Maintain the load contribution of blocked entities

We are currently maintaining:

  runnable_load(cfs_rq) = \Sum task_load(t)

For all running children t of cfs_rq.  While this can be naturally updated for
tasks in a runnable state (as they are scheduled); this does not account for
the load contributed by blocked task entities.

This can be solved by introducing a separate accounting for blocked load:

  blocked_load(cfs_rq) = \Sum runnable(b) * weight(b)

Obviously we do not want to iterate over all blocked entities to account for
their decay, we instead observe that:

  runnable_load(t) = \Sum p_i*y^i

and that to account for an additional idle period we only need to compute:

  y*runnable_load(t).

This means that we can compute all blocked entities at once by evaluating:

  blocked_load(cfs_rq)` = y * blocked_load(cfs_rq)

Finally we maintain a decay counter so that when a sleeping entity re-awakens
we can determine how much of its load should be removed from the blocked sum.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141506.585389902@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Aggregate load contributed by task entities on parenting cfs_rq
Paul Turner [Thu, 4 Oct 2012 11:18:30 +0000]
sched: Aggregate load contributed by task entities on parenting cfs_rq

For a given task t, we can compute its contribution to load as:

  task_load(t) = runnable_avg(t) * weight(t)

On a parenting cfs_rq we can then aggregate:

  runnable_load(cfs_rq) = \Sum task_load(t), for all runnable children t

Maintain this bottom up, with task entities adding their contributed load to
the parenting cfs_rq sum.  When a task entity's load changes we add the same
delta to the maintained sum.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141506.514678907@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Maintain per-rq runnable averages
Ben Segall [Thu, 4 Oct 2012 10:51:20 +0000]
sched: Maintain per-rq runnable averages

Since runqueues do not have a corresponding sched_entity we instead embed a
sched_avg structure directly.

Signed-off-by: Ben Segall <bsegall@google.com>
Reviewed-by: Paul Turner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141506.442637130@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agosched: Track the runnable average on a per-task entity basis
Paul Turner [Thu, 4 Oct 2012 11:18:29 +0000]
sched: Track the runnable average on a per-task entity basis

Instead of tracking averaging the load parented by a cfs_rq, we can track
entity load directly. With the load for a given cfs_rq then being the sum
of its children.

To do this we represent the historical contribution to runnable average
within each trailing 1024us of execution as the coefficients of a
geometric series.

We can express this for a given task t as:

  runnable_sum(t) = \Sum u_i * y^i, runnable_avg_period(t) = \Sum 1024 * y^i
  load(t) = weight_t * runnable_sum(t) / runnable_avg_period(t)

Where: u_i is the usage in the last i`th 1024us period (approximately 1ms)
~ms and y is chosen such that y^k = 1/2.  We currently choose k to be 32 which
roughly translates to about a sched period.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120823141506.372695337@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agoMerge tag 'stable/for-linus-3.7-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 24 Oct 2012 02:17:27 +0000]
Merge tag 'stable/for-linus-3.7-rc2-tag' of git://git./linux/kernel/git/konrad/xen

Pull xen bug-fixes from Konrad Rzeszutek Wilk:
 - Fix mysterious SIGSEGV or SIGKILL in applications due to corrupting
   of the %eip when returning from a signal handler.
 - Fix various ARM compile issues after the merge fallout.
 - Continue on making more of the Xen generic code usable by ARM
   platform.
 - Fix SR-IOV passthrough to mirror multifunction PCI devices.
 - Fix various compile warnings.
 - Remove hypercalls that don't exist anymore.

* tag 'stable/for-linus-3.7-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: dbgp: Fix warning when CONFIG_PCI is not enabled.
  xen: arm: comment on why 64-bit xen_pfn_t is safe even on 32 bit
  xen: balloon: use correct type for frame_list
  xen/x86: don't corrupt %eip when returning from a signal handler
  xen: arm: make p2m operations NOPs
  xen: balloon: don't include e820.h
  xen: grant: use xen_pfn_t type for frame_list.
  xen: events: pirq_check_eoi_map is X86 specific
  xen: XENMEM_translate_gpfn_list was remove ages ago and is unused.
  xen: sysfs: fix build warning.
  xen: sysfs: include err.h for PTR_ERR etc
  xen: xenbus: quirk uses x86 specific cpuid
  xen PV passthru: assign SR-IOV virtual functions to separate virtual slots
  xen/xenbus: Fix compile warning.
  xen/x86: remove duplicated include from enlighten.c

7 years agoalpha: separate thread-synchronous flags
Al Viro [Sat, 20 Oct 2012 14:52:23 +0000]
alpha: separate thread-synchronous flags

... and fix the race in updating unaligned control ones

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agoMerge tag 'kvm-3.7-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Wed, 24 Oct 2012 01:08:42 +0000]
Merge tag 'kvm-3.7-2' of git://git./virt/kvm/kvm

Pull kvm fixes from Avi Kivity:
 "KVM updates for 3.7-rc2"

* tag 'kvm-3.7-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT
  KVM: apic: fix LDR calculation in x2apic mode
  KVM: MMU: fix release noslot pfn

7 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 24 Oct 2012 01:07:51 +0000]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Most of these are uprobes race fixes from Oleg, and their preparatory
  cleanups.  (It's larger than what I'd normally send for an -rc kernel,
  but they looked significant enough to not delay them.)

  There's also an oprofile fix and an uncore PMU fix."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  perf/x86: Disable uncore on virtualized CPUs
  oprofile, x86: Fix wrapping bug in op_x86_get_ctrl()
  ring-buffer: Check for uninitialized cpu buffer before resizing
  uprobes: Fix the racy uprobe->flags manipulation
  uprobes: Fix prepare_uprobe() race with itself
  uprobes: Introduce prepare_uprobe()
  uprobes: Fix handle_swbp() vs unregister() + register() race
  uprobes: Do not delete uprobe if uprobe_unregister() fails
  uprobes: Don't return success if alloc_uprobe() fails
  uprobes/x86: Only rep+nop can be emulated correctly
  uprobes: Simplify is_swbp_at_addr(), remove stale comments
  uprobes: Kill set_orig_insn()->is_swbp_at_addr()
  uprobes: Introduce copy_opcode(), kill read_opcode()
  uprobes: Kill set_swbp()->is_swbp_at_addr()
  uprobes: Restrict valid_vma(false) to skip VM_SHARED vmas
  uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXEC
  uprobes: Change write_opcode() to use FOLL_FORCE
  uprobes: Move clear_thread_flag(TIF_UPROBE) to uprobe_notify_resume()
  uprobes: Kill UTASK_BP_HIT state
  uprobes: Fix UPROBE_SKIP_SSTEP checks in handle_swbp()
  ...

7 years agoMerge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 24 Oct 2012 01:07:02 +0000]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull core kernel fixes from Ingo Molnar:
 "Two small fixes"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation: Reflect the new location of the NMI watchdog info
  nohz: Fix idle ticks in cpu summary line of /proc/stat

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Wed, 24 Oct 2012 01:05:56 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
 "Among the usual minor bug fixes the more interesting patches are the
  perf counters for the latest machine, the missing select to enable
  transparent huge pages and a build fix for the UAPI rework."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390,uapi: do not use uapi/asm-generic/kvm_para.h
  s390/cache: fix data/instruction cache output
  s390: fix linker script for 31 bit builds
  s390/thp: select HAVE_ARCH_TRANSPARENT_HUGEPAGE
  s390/kdump: Use 64 bit mode for 0x10000 entry point
  perf_cpum_cf: Add support for counters available with IBM zEC12
  s390/css: stop stsch loop after cc 3
  s390/cio: use generic bitmap functions
  s390/chpid: make headers usable (again)

7 years agoMerge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux...
Linus Torvalds [Wed, 24 Oct 2012 01:05:15 +0000]
Merge branch 'stable' of git://git./linux/kernel/git/cmetcalf/linux-tile

Pull tile fixes from Chris Metcalf:
 "This fixes one issue with compiler flags that can cause modules not to
  load, and cleans up some warnings with ELF_R_xxx defines."

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  arch/tile: avoid build warnings from duplicate ELF_R_xxx #defines
  arch/tile: avoid generating .eh_frame information in modules

7 years agoMerge tag 'please-pull-uapi-fix' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 24 Oct 2012 01:03:21 +0000]
Merge tag 'please-pull-uapi-fix' of git://git./linux/kernel/git/aegl/linux

Pull ia64 fix from Tony Luck:
 "Fix from dhowells for UAPI fallout"

* tag 'please-pull-uapi-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  UAPI: Make arch/ia64/include/asm/kvm_para.h generic

7 years agoarch/tile: avoid build warnings from duplicate ELF_R_xxx #defines
Chris Metcalf [Fri, 19 Oct 2012 20:29:43 +0000]
arch/tile: avoid build warnings from duplicate ELF_R_xxx #defines

These are now provided in <asm-generic/module.h>, so clean up warnings
by not re-defining them in module.c.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>

7 years agoarch/tile: avoid generating .eh_frame information in modules
Chris Metcalf [Fri, 19 Oct 2012 15:43:11 +0000]
arch/tile: avoid generating .eh_frame information in modules

The tile tool chain uses the .eh_frame information for backtracing.
The vmlinux build drops any .eh_frame sections at link time, but when
present in kernel modules, it causes a module load failure due to the
presence of unsupported pc-relative relocations.  When compiling to
use compiler feedback support, the compiler by default omits .eh_frame
information, so we don't see this problem.  But when not using feedback,
we need to explicitly suppress the .eh_frame.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: stable@vger.kernel.org

7 years agoMerge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Tue, 23 Oct 2012 05:51:07 +0000]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "Fixes for intel and nouveau mainly.

   - intel: disable HSW by default, sdvo fixes, link train regression
     fix
   - nouveau: acpi rom loading regression fix, with a few other fixes
     from the rework
   -core: just other minor fixes and race fixes for ttm."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (24 commits)
  drm/ttm: Fix a theoretical race in ttm_bo_cleanup_refs()
  drm/ttm: Fix a theoretical race
  drm: platform: Don't initialize driver-private data
  drm/debugfs: remove redundant info from gem_names
  drm: fb: cma: Fail gracefully on allocation failure
  drm: fb: cma: Fix typo in debug message
  drm/nouveau/clock: fix missing pll type/addr when matching default entry
  drm/nouveau/fb: fix reporting of memory type on GF8+ IGPs
  drm/nv41/vm: don't init hw pciegart on boards with agp bridge
  drm/nouveau/bios: fetch full 4KiB block to determine ACPI ROM image size
  drm/nouveau: validate vbios size
  drm/nouveau: warn when trying to free mm which is still in use
  drm/nouveau: fix nouveau_mm/nouveau_mm_node leak
  drm/nouveau/bios: improve error handling when reading the vbios from ACPI
  drm/nouveau: handle same-fb page flips
  drm/i915: Initialize obj->pages before use by i915_gem_object_do_bit17_swizzle()
  drm/i915: Add no-lvds quirk for Supermicro X7SPA-H
  drm/i915: Insert i915_preliminary_hw_support variable.
  drm/i915: shut up spurious WARN in the gtt fault handler
  Revert "drm/i915: Try harder to complete DP training pattern 1"
  ...

7 years agoMerge tag 'jfs-3.7-2' of git://github.com/kleikamp/linux-shaggy
Linus Torvalds [Tue, 23 Oct 2012 05:49:34 +0000]
Merge tag 'jfs-3.7-2' of git://github.com/kleikamp/linux-shaggy

Pull jfs fix from Dave Kleikamp:
 "Bug fix: Fix FITRIM argument handling"

* tag 'jfs-3.7-2' of git://github.com/kleikamp/linux-shaggy:
  jfs: Fix FITRIM argument handling

7 years agoMerge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso...
Linus Torvalds [Tue, 23 Oct 2012 05:48:26 +0000]
Merge tag 'ext4_for_linus' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Various bug fixes for ext4.  The most serious of them fixes a security
  bug (CVE-2012-4508) which leads to stale data exposure when we have
  fallocate racing against writes to files undergoing delayed
  allocation.  We also have two fixes for the metadata checksum feature,
  the most serious of which can cause the superblock to have a invalid
  checksum after a power failure."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: Avoid underflow in ext4_trim_fs()
  ext4: Checksum the block bitmap properly with bigalloc enabled
  ext4: fix undefined bit shift result in ext4_fill_flex_info
  ext4: fix metadata checksum calculation for the superblock
  ext4: race-condition protection for ext4_convert_unwritten_extents_endio
  ext4: serialize fallocate with ext4_convert_unwritten_extents

7 years agoMerge tag 'nfs-for-3.7-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Tue, 23 Oct 2012 05:47:38 +0000]
Merge tag 'nfs-for-3.7-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Do not call pnfs_return_layout() from an rpciod context
 - nfs4_ds_disconnect can cause Oopses.  Kill it...
 - Fix the return value for nfs_callback_start_svc
 - Fix a number of compile warnings

* tag 'nfs-for-3.7-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix the return value for nfs_callback_start_svc
  NFSv4.1: Declare osd_pri_2_pnfs_err(), objio_init_read/write to be static
  NFSv4: fs/nfs/nfs4getroot.c needs to include "internal.h"
  NFSv4.1: Use kcalloc() to allocate zeroed arrays instead of kzalloc()
  NFSv4.1: Do not call pnfs_return_layout() from an rpciod context
  NFSv4.1: Kill nfs4_ds_disconnect()

7 years agoMerge tag 'regmap-fix-mmio' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie...
Linus Torvalds [Tue, 23 Oct 2012 05:39:38 +0000]
Merge tag 'regmap-fix-mmio' of git://git./linux/kernel/git/broonie/regmap

Pull regmap fix from Mark Brown:
 "regmap: Fix for dependencies for MMIO

  Trivial dependency issue, not noticed before as the only user of MMIO
  also needs I2C."

* tag 'regmap-fix-mmio' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: select REGMAP if REGMAP_MMIO and REGMAP_IRQ enabled

7 years agodrm/ttm: Fix a theoretical race in ttm_bo_cleanup_refs()
Thomas Hellstrom [Mon, 22 Oct 2012 12:51:26 +0000]
drm/ttm: Fix a theoretical race in ttm_bo_cleanup_refs()

In theory, that function could release the lru lock between
checking for bo on ddestroy list and a successful reserve if the bo
was already reserved, and the function was called with waiting reserves
allowed.
However, all current reservers of a bo on the ddestroy list would
atomically take the bo off the list after a successful reserve so this
race should not have been hit, so no need to backport for stable.

This patch also fixes a case found by Maarten Lankhorst where
ttm_mem_evict_first called with no_wait_gpu would incorrectly
spin waiting for bo idle if trying to evict a busy buffer that
also sits on the ddestroy list.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

7 years agodrm/ttm: Fix a theoretical race
Thomas Hellstrom [Mon, 22 Oct 2012 12:51:25 +0000]
drm/ttm: Fix a theoretical race

The ttm_mem_evict_first function could theoretically drop the
lru lock without retrying if a reservation from off the LRU list
ended up waiting.
However, since currently there are no users that could cause a wait
in that situation so this is not suitable for stable

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

7 years agodrm: platform: Don't initialize driver-private data
Thierry Reding [Mon, 15 Oct 2012 18:03:42 +0000]
drm: platform: Don't initialize driver-private data

Platform device drivers usually use the driver-private data for their
own purposes. Having it overwritten by drm_platform_init() is confusing
and error-prone.

Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>

7 years agodrm/debugfs: remove redundant info from gem_names
Marcin Slusarz [Tue, 16 Oct 2012 21:47:35 +0000]
drm/debugfs: remove redundant info from gem_names

It's a relic of "drm: Convert proc files to seq_file and introduce debugfs",
which wrongly converted DRM_INFO + sprintf to 2 seq_printfs.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Ben Gamari <bgamari@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

7 years agodrm: fb: cma: Fail gracefully on allocation failure
Thierry Reding [Sat, 20 Oct 2012 10:32:47 +0000]
drm: fb: cma: Fail gracefully on allocation failure

The drm_gem_cma_create() function never returns NULL but rather an error
encoded in the return value using the ERR_PTR() macro. Callers therefore
need to check for errors using the IS_ERR() macro. This change allows
drivers to handle contiguous DMA allocation failures gracefully.

Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>

7 years agodrm: fb: cma: Fix typo in debug message
Thierry Reding [Sat, 20 Oct 2012 10:32:46 +0000]
drm: fb: cma: Fix typo in debug message

The debug message showing the resolution of a framebuffer to be
allocated is missing a closing parenthesis.

Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>

7 years agoext4: Avoid underflow in ext4_trim_fs()
Lukas Czerner [Mon, 22 Oct 2012 22:01:19 +0000]
ext4: Avoid underflow in ext4_trim_fs()

Currently if len argument in ext4_trim_fs() is smaller than one block,
the 'end' variable underflow. Avoid that by returning EINVAL if len is
smaller than file system block.

Also remove useless unlikely().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

7 years agoKVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT
Sasha Levin [Fri, 19 Oct 2012 16:11:55 +0000]
KVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT

KVM_PV_REASON_PAGE_NOT_PRESENT kicks cpu out of idleness, but we haven't
marked that spot as an exit from idleness.

Not doing so can cause RCU warnings such as:

[  732.788386] ===============================
[  732.789803] [ INFO: suspicious RCU usage. ]
[  732.790032] 3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63 Tainted: G        W
[  732.790032] -------------------------------
[  732.790032] include/linux/rcupdate.h:738 rcu_read_lock() used illegally while idle!
[  732.790032]
[  732.790032] other info that might help us debug this:
[  732.790032]
[  732.790032]
[  732.790032] RCU used illegally from idle CPU!
[  732.790032] rcu_scheduler_active = 1, debug_locks = 1
[  732.790032] RCU used illegally from extended quiescent state!
[  732.790032] 2 locks held by trinity-child31/8252:
[  732.790032]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff83a67528>] __schedule+0x178/0x8f0
[  732.790032]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81152bde>] cpuacct_charge+0xe/0x200
[  732.790032]
[  732.790032] stack backtrace:
[  732.790032] Pid: 8252, comm: trinity-child31 Tainted: G        W    3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63
[  732.790032] Call Trace:
[  732.790032]  [<ffffffff8118266b>] lockdep_rcu_suspicious+0x10b/0x120
[  732.790032]  [<ffffffff81152c60>] cpuacct_charge+0x90/0x200
[  732.790032]  [<ffffffff81152bde>] ? cpuacct_charge+0xe/0x200
[  732.790032]  [<ffffffff81158093>] update_curr+0x1a3/0x270
[  732.790032]  [<ffffffff81158a6a>] dequeue_entity+0x2a/0x210
[  732.790032]  [<ffffffff81158ea5>] dequeue_task_fair+0x45/0x130
[  732.790032]  [<ffffffff8114ae29>] dequeue_task+0x89/0xa0
[  732.790032]  [<ffffffff8114bb9e>] deactivate_task+0x1e/0x20
[  732.790032]  [<ffffffff83a67c29>] __schedule+0x879/0x8f0
[  732.790032]  [<ffffffff8117e20d>] ? trace_hardirqs_off+0xd/0x10
[  732.790032]  [<ffffffff810a37a5>] ? kvm_async_pf_task_wait+0x1d5/0x2b0
[  732.790032]  [<ffffffff83a67cf5>] schedule+0x55/0x60
[  732.790032]  [<ffffffff810a37c4>] kvm_async_pf_task_wait+0x1f4/0x2b0
[  732.790032]  [<ffffffff81139e50>] ? abort_exclusive_wait+0xb0/0xb0
[  732.790032]  [<ffffffff81139c25>] ? prepare_to_wait+0x25/0x90
[  732.790032]  [<ffffffff810a3a66>] do_async_page_fault+0x56/0xa0
[  732.790032]  [<ffffffff83a6a6e8>] async_page_fault+0x28/0x30

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

7 years agoKVM: apic: fix LDR calculation in x2apic mode
Gleb Natapov [Sun, 14 Oct 2012 11:08:58 +0000]
KVM: apic: fix LDR calculation in x2apic mode

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Chegu Vinod  <chegu_vinod@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

7 years agoKVM: MMU: fix release noslot pfn
Xiao Guangrong [Tue, 16 Oct 2012 12:07:03 +0000]
KVM: MMU: fix release noslot pfn

We can not directly call kvm_release_pfn_clean to release the pfn
since we can meet noslot pfn which is used to cache mmio info into
spte

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>

7 years agoMerge branch 'drm-nouveau-fixes' of git://git.freedesktop.org/git/nouveau/linux-2...
Dave Airlie [Mon, 22 Oct 2012 07:50:07 +0000]
Merge branch 'drm-nouveau-fixes' of git://git.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes

Fixes from Ben, off note:
ACPI ROM regression fix,
some IGP and AGP regressions fixes from rework fallout.

* 'drm-nouveau-fixes' of git://git.freedesktop.org/git/nouveau/linux-2.6:
  drm/nouveau/clock: fix missing pll type/addr when matching default entry
  drm/nouveau/fb: fix reporting of memory type on GF8+ IGPs
  drm/nv41/vm: don't init hw pciegart on boards with agp bridge
  drm/nouveau/bios: fetch full 4KiB block to determine ACPI ROM image size
  drm/nouveau: validate vbios size
  drm/nouveau: warn when trying to free mm which is still in use
  drm/nouveau: fix nouveau_mm/nouveau_mm_node leak
  drm/nouveau/bios: improve error handling when reading the vbios from ACPI
  drm/nouveau: handle same-fb page flips

7 years agomodule_signing: fix printk format warning
Randy Dunlap [Sun, 21 Oct 2012 01:59:31 +0000]
module_signing: fix printk format warning

Fix the warning:

  kernel/module_signing.c:195:2: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t'

by using the proper 'z' modifier for printing a size_t.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux...
Linus Torvalds [Mon, 22 Oct 2012 05:54:24 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/geert/linux-m68k

Pull m68k updates from Geert Uytterhoeven:
 "Just the expected UAPI disintegration and the "new" kcmp syscall."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: Wire up kcmp
  m68k: Remove empty #ifdef/#else/#endif block
  UAPI: (Scripted) Disintegrate arch/m68k/include/asm

7 years agoInput: fix use-after-free introduced with dynamic minor changes
Dmitry Torokhov [Mon, 22 Oct 2012 00:57:20 +0000]
Input: fix use-after-free introduced with dynamic minor changes

Commit 7f8d4cad1e4e ("Input: extend the number of event (and other)
devices") made evdev, joydev and mousedev to embed struct cdev into
their respective structures representing input devices.

Unfortunately character device structure may outlive the parent
structure unless we do not set it up as parent of character device so
that it will stay pinned until character device is freed.

Also, now that parent structure is pinned while character device exists
we do not need to pin and unpin it every time user opens or closes it.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agochar_dev: pin parent kobject
Dmitry Torokhov [Mon, 22 Oct 2012 00:57:19 +0000]
char_dev: pin parent kobject

In certain cases (for example when a cdev structure is embedded into
another object whose lifetime is controlled by a separate kobject) it is
beneficial to tie lifetime of another object to the lifetime of
character device so that related object is not freed until after
char_dev object is freed.

To achieve this let's pin kobject's parent when doing cdev_add() and
unpin when last reference to cdev structure is being released.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agodrm/nouveau/clock: fix missing pll type/addr when matching default entry
Ben Skeggs [Mon, 22 Oct 2012 04:10:16 +0000]
drm/nouveau/clock: fix missing pll type/addr when matching default entry

This issue is a regression from 70790f4f819875e8f390871fd15bbbf823f28e1b,
and causes us to miss a special-case for C51 (NV4E) chipsets and return
the wrong reference frequency for the VPLLs.

Should fix fdo#56202

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

7 years agoext4: Checksum the block bitmap properly with bigalloc enabled
Tao Ma [Mon, 22 Oct 2012 04:34:32 +0000]
ext4: Checksum the block bitmap properly with bigalloc enabled

In mke2fs, we only checksum the whole bitmap block and it is right.
While in the kernel, we use EXT4_BLOCKS_PER_GROUP to indicate the
size of the checksumed bitmap which is wrong when we enable bigalloc.
The right size should be EXT4_CLUSTERS_PER_GROUP and this patch fixes
it.

Also as every caller of ext4_block_bitmap_csum_set and
ext4_block_bitmap_csum_verify pass in EXT4_BLOCKS_PER_GROUP(sb)/8,
we'd better removes this parameter and sets it in the function itself.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Cc: stable@vger.kernel.org

7 years agodrm/nouveau/fb: fix reporting of memory type on GF8+ IGPs
Ben Skeggs [Mon, 22 Oct 2012 03:40:36 +0000]
drm/nouveau/fb: fix reporting of memory type on GF8+ IGPs

Purely a cosmetic issue at this point.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

7 years agodrm/nv41/vm: don't init hw pciegart on boards with agp bridge
Ben Skeggs [Mon, 22 Oct 2012 00:56:07 +0000]
drm/nv41/vm: don't init hw pciegart on boards with agp bridge

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

7 years agodrm/nouveau/bios: fetch full 4KiB block to determine ACPI ROM image size
Ben Skeggs [Mon, 22 Oct 2012 00:08:19 +0000]
drm/nouveau/bios: fetch full 4KiB block to determine ACPI ROM image size

Buggy firmware leads to bad things happening otherwise..

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

7 years agodrm/nouveau: validate vbios size
Marcin Slusarz [Sun, 21 Oct 2012 23:59:20 +0000]
drm/nouveau: validate vbios size

Without checking, we could detect vbios size as 0, allocate 0-byte array
(kmalloc returns invalid pointer for such allocation) and crash in
nouveau_bios_score while checking for vbios signature.

Reported-by: Heinz Diehl <htd@fritha.org>
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

7 years agodrm/nouveau: warn when trying to free mm which is still in use
Marcin Slusarz [Sun, 21 Oct 2012 22:21:39 +0000]
drm/nouveau: warn when trying to free mm which is still in use

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

7 years agodrm/nouveau: fix nouveau_mm/nouveau_mm_node leak
Marcin Slusarz [Sun, 21 Oct 2012 22:20:32 +0000]
drm/nouveau: fix nouveau_mm/nouveau_mm_node leak

v2: use already existing parent

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

7 years agodrm/nouveau/bios: improve error handling when reading the vbios from ACPI
Martin Peres [Sat, 20 Oct 2012 09:03:36 +0000]
drm/nouveau/bios: improve error handling when reading the vbios from ACPI

Reported-by: Pawel Sikora <pawel.sikora@agmk.net>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

7 years agodrm/nouveau: handle same-fb page flips
Marcin Slusarz [Fri, 5 Oct 2012 10:26:32 +0000]
drm/nouveau: handle same-fb page flips

It's questionable use case, but weston/wayland already relies on this
behaviour, and other drivers don't care about it, so it's a matter of
compatibility.  Without it, process invoking such page flip hangs in
unkillable state, trying to reserve the same buffer twice.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

7 years agoMerge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel...
Dave Airlie [Sun, 21 Oct 2012 23:55:29 +0000]
Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes

Daniel writes:
The big thing is the disabling of the hsw support by default, cc: stable.
We've aimed for basic hsw support in 3.6, but due to a few bad
happenstances we've screwed up and only 3.8 will have better modeset
support than vesa. To avoid yet another round of fallout from such a
gaffle on for the next platform we've added a module option to disable
early hw support by default. That should also give us more flexibility in
bring-up.

 Otherwise just small fixes:
 - 3 fixes from Egbert for sdvo corner cases
 - invert-brightness quirk entry from Egbert
 - revert a dp link training change, it regresses some setups
 - and shut up a spurious WARN in our gem fault handler.
 - regression fix for an oops on bit17 swizzling machines, introduce in 3.7
 - another no-lvds quirk

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: Initialize obj->pages before use by i915_gem_object_do_bit17_swizzle()
  drm/i915: Add no-lvds quirk for Supermicro X7SPA-H
  drm/i915: Insert i915_preliminary_hw_support variable.
  drm/i915: shut up spurious WARN in the gtt fault handler
  Revert "drm/i915: Try harder to complete DP training pattern 1"
  DRM/i915: Restore sdvo_flags after dtd->mode->dtd Roundrtrip.
  DRM/i915: Don't clone SDVO LVDS with analog.
  DRM/i915: Add QUIRK_INVERT_BRIGHTNESS for NCR machines.
  DRM/i915: Don't delete DPLL Multiplier during DAC init.

7 years agoMerge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/roste...
Ingo Molnar [Sun, 21 Oct 2012 17:53:34 +0000]
Merge branch 'tip/perf/urgent' of git://git./linux/kernel/git/rostedt/linux-trace into perf/urgent

Pull ftrace ring-buffer resizing fix from Steve Rostedt.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agoDocumentation: Reflect the new location of the NMI watchdog info
Jean Delvare [Sun, 21 Oct 2012 10:05:51 +0000]
Documentation: Reflect the new location of the NMI watchdog info

Commit 9919cba7 ("watchdog: Update documentation") moved the
NMI watchdog documentation from nmi_watchdog.txt to
lockup-watchdogs.txt. Update the index file accordingly.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/20121021120551.4656d99b@endymion.delvare
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agoMerge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg...
Ingo Molnar [Sun, 21 Oct 2012 16:18:17 +0000]
Merge branch 'uprobes/core' of git://git./linux/kernel/git/oleg/misc into perf/urgent

Pull various uprobes bugfixes from Oleg Nesterov - mostly race and
failure path fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agoMerge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile...
Ingo Molnar [Sun, 21 Oct 2012 16:11:20 +0000]
Merge branch 'urgent' of git://git./linux/kernel/git/rric/oprofile into perf/urgent

Pull event-wrapping Oprofile fix from Robert Richter.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agoLinux 3.7-rc2
Linus Torvalds [Sat, 20 Oct 2012 19:11:32 +0000]
Linux 3.7-rc2

7 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas...
Linus Torvalds [Sat, 20 Oct 2012 16:48:10 +0000]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/cmarinas/linux-aarch64

Pull arm64 fixes from Catalin Marinas:
 "Main changes:
   - AArch64 Linux compilation fixes following 3.7-rc1 changes
     (MODULES_USE_ELF_RELA, update_vsyscall() prototype)
   - Unnecessary register setting in start_thread() (thanks to Al Viro)
   - ptrace fixes"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
  arm64: fix alignment padding in assembly code
  arm64: ptrace: use HW_BREAKPOINT_EMPTY type for disabled breakpoints
  arm64: ptrace: make structure padding explicit for debug registers
  arm64: No need to set the x0-x2 registers in start_thread()
  arm64: Ignore memory blocks below PHYS_OFFSET
  arm64: Fix the update_vsyscall() prototype
  arm64: Select MODULES_USE_ELF_RELA
  arm64: Remove duplicate inclusion of mmu_context.h in smp.c

7 years agoarm64: fix alignment padding in assembly code
Marc Zyngier [Fri, 19 Oct 2012 16:33:27 +0000]
arm64: fix alignment padding in assembly code

An interesting effect of using the generic version of linkage.h
is that the padding is defined in terms of x86 NOPs, which can have
even more interesting effects when the assembly code looks like this:

ENTRY(func1)
mov x0, xzr
ENDPROC(func1)
// fall through
ENTRY(func2)
mov x0, #1
ret
ENDPROC(func2)

Admittedly, the code is not very nice. But having code from another
architecture doesn't look completely sane either.

The fix is to add arm64's version of linkage.h, which causes the insertion
of proper AArch64 NOPs.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

7 years agoperf/x86: Disable uncore on virtualized CPUs
Yan, Zheng [Tue, 21 Aug 2012 09:08:37 +0000]
perf/x86: Disable uncore on virtualized CPUs

Initializing uncore PMU on virtualized CPU may hang the kernel.
This is because kvm does not emulate the entire hardware. Thers
are lots of uncore related MSRs, making kvm enumerate them all
is a non-trival task. So just disable uncore on virtualized CPU.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Tested-by: Pekka Enberg <penberg@kernel.org>
Cc: a.p.zijlstra@chello.nl
Cc: eranian@google.com
Cc: andi@firstfloor.org
Cc: avi@redhat.com
Link: http://lkml.kernel.org/r/1345540117-14164-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agouse clamp_t in UNAME26 fix
Kees Cook [Sat, 20 Oct 2012 01:45:53 +0000]
use clamp_t in UNAME26 fix

The min/max call needed to have explicit types on some architectures
(e.g. mn10300). Use clamp_t instead to avoid the warning:

  kernel/sys.c: In function 'override_release':
  kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a cast [enabled by default]

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 20 Oct 2012 01:39:36 +0000]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Assorted small fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf python: Properly link with libtraceevent
  perf hists browser: Add back callchain folding symbol
  perf tools: Fix build on sparc.
  perf python: Link with libtraceevent
  perf python: Initialize 'page_size' variable
  tools lib traceevent: Fix missed freeing of subargs in free_arg() in filter
  lib tools traceevent: Add back pevent assignment in __pevent_parse_format()
  perf hists browser: Fix off-by-two bug on the first column
  perf tools: Remove warnings on JIT samples for srcline sort key
  perf tools: Fix segfault when using srcline sort key
  perf: Require exclude_guest to use PEBS - kernel side enforcement
  perf tool: Precise mode requires exclude_guest

7 years agoperf python: Properly link with libtraceevent
Arnaldo Carvalho de Melo [Thu, 18 Oct 2012 14:38:35 +0000]
perf python: Properly link with libtraceevent

Namhyung Kim reported that the build fails with:

  GEN python/perf.so
  gcc: error: python_ext_build/tmp//../../libtraceevent.a: No such file or directory
  error: command 'gcc' failed with exit status 1
  cp: cannot stat `python_ext_build/lib/perf.so': No such file or directory
  make: *** [python/perf.so] Error 1

We need to propagate the TE_PATH variable to the setup.py file.

Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-8umiPbm4sxpknKivbjgykhut@git.kernel.org
[ Fixed superfluous variable build error. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agoMerge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git...
Ingo Molnar [Sat, 20 Oct 2012 00:32:56 +0000]
Merge tag 'perf-urgent-for-mingo' of git://git./linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

* The python binding needs to link with libtraceevent and to initialize
  the 'page_size' variable so that mmaping works again.

* The callchain folding character that appears on the TUI just before
  the overhead had disappeared due to recent changes, add it back.

* Intel PEBS in VT-x context uses the DS address as a guest linear address,
  even though its programmed by the host as a host linear address. This either
  results in guest memory corruption and or the hardware faulting and 'crashing'
  the virtual machine.  Therefore we have to disable PEBS on VT-x enter and
  re-enable on VT-x exit, enforcing a strict exclude_guest.

  Kernel side enforcement fix by Peter Zijlstra, tooling side fix by David Ahern.

* Fix build on sparc due to UAPI, fix from David Miller.

* Fixes for the srclike sort key for unresolved symbols and when processing
  samples in JITted code, where we don't have an ELF file, just an special
  symbol table, fixes from Namhyung Kim.

* Fix some leaks in libtraceevent, from Steven Rostedt.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

7 years agoMerge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Sat, 20 Oct 2012 00:32:37 +0000]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM soc fixes from Olof Johansson:
 "A set of fixes and some minor cleanups for -rc2:

   - A series from Arnd that fixes warnings in drivers and other code
     included by ARM defconfigs.  Most have been acked by corresponding
     maintainers (and seem quite hard to argue not picking up anyway in
     the few exception cases).
   - A few misc patches from the list for integrator/vt8500/i.MX
   - A batch of fixes to OMAP platforms, fixing:
     - boot problems on beaglebone,
     - regression fixes for local timers
     - clockdomain locking fixes
     - a few boot/sparse warnings
   - For Tegra:
     - Clock rate calculation overflow fix
     - Revert a change that removed timer clocks and a fix for symbol
       name clashes
   - For Renesas:
     - IO accessor / annotation cleanups to remove warnings
   - For Kirkwood/Dove/mvebu:
     - Fixes for device trees for Dove (some minor cleanups, some fixes)
     - Fixes for the mvebu gpio driver
     - Fix build problem for Feroceon due to missing ifdefs
     - Fix lsxl DTS files"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (31 commits)
  ARM: kirkwood: fix buttons on lsxl boards
  ARM: kirkwood: fix LEDs names for lsxl boards
  ARM: Kirkwood: fix disabling CACHE_FEROCEON_L2
  gpio: mvebu: Add missing breaks in mvebu_gpio_irq_set_type
  ARM: dove: Add crypto engine to DT
  ARM: dove: Remove watchdog from DT
  ARM: dove: Restructure SoC device tree descriptor
  ARM: dove: Fix clock names of sata and gbe
  ARM: dove: Fix tauros2 device tree init
  ARM: dove: Add pcie clock support
  ARM: OMAP2+: Allow kernel to boot even if GPMC fails to reserve memory
  ARM: OMAP: clockdomain: Fix locking on _clkdm_clk_hwmod_enable / disable
  ARM: s3c: mark s3c2440_clk_add as __init_refok
  spi/s3c64xx: use correct dma_transfer_direction type
  ARM: OMAP4: devices: fixup OMAP4 DMIC platform device error message
  ARM: OMAP2+: clock data: Add dev-id for the omap-gpmc dummy fck
  ARM: OMAP: resolve sparse warning concerning debug_card_init()
  ARM: OMAP4: Fix twd_local_timer_register regression
  ARM: tegra: add tegra_timer clock
  ARM: tegra: rename tegra system timer
  ...

7 years agoMODSIGN: Move the magic string to the end of a module and eliminate the search
David Howells [Sat, 20 Oct 2012 00:19:29 +0000]
MODSIGN: Move the magic string to the end of a module and eliminate the search

Emit the magic string that indicates a module has a signature after the
signature data instead of before it.  This allows module_sig_check() to
be made simpler and faster by the elimination of the search for the
magic string.  Instead we just need to do a single memcmp().

This works because at the end of the signature data there is the
fixed-length signature information block.  This block then falls
immediately prior to the magic number.

From the contents of the information block, it is trivial to calculate
the size of the signature data and thus the size of the actual module
data.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agoMerge tag 'kirkwood_fixes_for_v3.7' of git://git.infradead.org/users/jcooper/linux...
Olof Johansson [Fri, 19 Oct 2012 23:17:51 +0000]
Merge tag 'kirkwood_fixes_for_v3.7' of git://git.infradead.org/users/jcooper/linux into fixes

From Jason Cooper:
 - improve #ifdef logic to prevent linker errors with CACHE_FEROCEON_L2
 - lsxl board dts fixes

* tag 'kirkwood_fixes_for_v3.7' of git://git.infradead.org/users/jcooper/linux:
  ARM: kirkwood: fix buttons on lsxl boards
  ARM: kirkwood: fix LEDs names for lsxl boards
  ARM: Kirkwood: fix disabling CACHE_FEROCEON_L2

7 years agoMODSIGN: Cleanup .gitignore
David Howells [Fri, 19 Oct 2012 22:56:45 +0000]
MODSIGN: Cleanup .gitignore

The module build process no longer creates intermediate files for module
signing, so remove them from .gitignore.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agoMODSIGN: perlify sign-file and merge in x509keyid
David Howells [Fri, 19 Oct 2012 22:56:37 +0000]
MODSIGN: perlify sign-file and merge in x509keyid

Turn sign-file into perl and merge in x509keyid.  The latter doesn't
need to be a separate script as it doesn't actually need to work out the
SHA1 sum of the X.509 certificate itself, since it can get that from the
X.509 certificate.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agoMerge branch 'testing/driver-warnings' of git://git.kernel.org/pub/scm/linux/kernel...
Olof Johansson [Fri, 19 Oct 2012 22:40:18 +0000]
Merge branch 'testing/driver-warnings' of git://git./linux/kernel/git/arm/arm-soc into fixes

A collection of warning fixes on non-ARM code from Arnd Bergmann:

* 'testing/driver-warnings' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: s3c: mark s3c2440_clk_add as __init_refok
  spi/s3c64xx: use correct dma_transfer_direction type
  pcmcia: sharpsl: don't discard sharpsl_pcmcia_ops
  USB: EHCI: mark ehci_orion_conf_mbus_windows __devinit
  mm/slob: use min_t() to compare ARCH_SLAB_MINALIGN
  SCSI: ARM: make fas216_dumpinfo function conditional
  SCSI: ARM: ncr5380/oak uses no interrupts

7 years agohold task->mempolicy while numa_maps scans.
KAMEZAWA Hiroyuki [Fri, 19 Oct 2012 08:00:55 +0000]
hold task->mempolicy while numa_maps scans.

  /proc/<pid>/numa_maps scans vma and show mempolicy under
  mmap_sem. It sometimes accesses task->mempolicy which can
  be freed without mmap_sem and numa_maps can show some
  garbage while scanning.

This patch tries to take reference count of task->mempolicy at reading
numa_maps before calling get_vma_policy(). By this, task->mempolicy
will not be freed until numa_maps reaches its end.

V2->v3
  -  updated comments to be more verbose.
  -  removed task_lock() in numa_maps code.
V1->V2
  -  access task->mempolicy only once and remember it.  Becase kernel/exit.c
     can overwrite it.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agoMerge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Linus Torvalds [Fri, 19 Oct 2012 21:15:16 +0000]
Merge branch 'x86/urgent' of git://git./linux/kernel/git/tip/tip

Pull miscellaneous x86 fixes from Peter Anvin:
 "The biggest ones are fixing suspend/resume breakage on 32 bits, and an
  interrim fix for mapping over holes that allows AMD kit with more than
  1 TB.

  A final solution for the latter is in the works, but involves some
  fairly invasive changes that will probably mean it will only be
  appropriate for 3.8."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, MCE: Remove bios_cmci_threshold sysfs attribute
  x86, amd, mce: Avoid NULL pointer reference on CPU northbridge lookup
  x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
  x86/cache_info: Use ARRAY_SIZE() in amd_l3_attrs()
  x86/reboot: Remove quirk entry for SBC FITPC
  x86, suspend: Correct the restore of CR4, EFER; skip computing EFLAGS.ID

7 years agoMerge branch 'akpm' (Fixes from Andrew)
Linus Torvalds [Fri, 19 Oct 2012 21:07:55 +0000]
Merge branch 'akpm' (Fixes from Andrew)

Merge misc fixes from Andrew Morton:
 "Seven fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (7 patches)
  lib/dma-debug.c: fix __hash_bucket_find()
  mm: compaction: correct the nr_strict va isolated check for CMA
  firmware/memmap: avoid type conflicts with the generic memmap_init()
  pidns: remove recursion from free_pid_ns()
  drivers/video/backlight/lm3639_bl.c: return proper error in lm3639_bled_mode_store() error paths
  kernel/sys.c: fix stack memory content leak via UNAME26
  linux/coredump.h needs asm/siginfo.h

7 years agolib/dma-debug.c: fix __hash_bucket_find()
Ming Lei [Fri, 19 Oct 2012 20:57:01 +0000]
lib/dma-debug.c: fix __hash_bucket_find()

If there is only one match, the unique matched entry should be returned.

Without the fix, the upcoming dma debug interfaces ("dma-debug: new
interfaces to debug dma mapping errors") can't work reliably because
only device and dma_addr are passed to dma_mapping_error().

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Tested-by: Shuah Khan <shuah.khan@hp.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agomm: compaction: correct the nr_strict va isolated check for CMA
Mel Gorman [Fri, 19 Oct 2012 20:56:57 +0000]
mm: compaction: correct the nr_strict va isolated check for CMA

Thierry reported that the "iron out" patch for isolate_freepages_block()
had problems due to the strict check being too strict with "mm:
compaction: Iron out isolate_freepages_block() and
isolate_freepages_range() -fix1".  It's possible that more pages than
necessary are isolated but the check still fails and I missed that this
fix was not picked up before RC1.  This same problem has been identified
in 3.7-RC1 by Tony Prisk and should be addressed by the following patch.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Tony Prisk <linux@prisktech.co.nz>
Reported-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Richard Davies <richard@arachsys.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agofirmware/memmap: avoid type conflicts with the generic memmap_init()
Fengguang Wu [Fri, 19 Oct 2012 20:56:55 +0000]
firmware/memmap: avoid type conflicts with the generic memmap_init()

Fix this build error:

  drivers/firmware/memmap.c:240:19: error: conflicting types for 'memmap_init'
  arch/ia64/include/asm/pgtable.h:565:17: note: previous declaration of 'memmap_init' was here

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bernhard Walle <bwalle@suse.de>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agopidns: remove recursion from free_pid_ns()
Cyrill Gorcunov [Fri, 19 Oct 2012 20:56:53 +0000]
pidns: remove recursion from free_pid_ns()

free_pid_ns() operates in a recursive fashion:

free_pid_ns(parent)
  put_pid_ns(parent)
    kref_put(&ns->kref, free_pid_ns);
      free_pid_ns

thus if there was a huge nesting of namespaces the userspace may trigger
avalanche calling of free_pid_ns leading to kernel stack exhausting and a
panic eventually.

This patch turns the recursion into an iterative loop.

Based on a patch by Andrew Vagin.

[akpm@linux-foundation.org: export put_pid_ns() to modules]
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agodrivers/video/backlight/lm3639_bl.c: return proper error in lm3639_bled_mode_store...
Axel Lin [Fri, 19 Oct 2012 20:56:52 +0000]
drivers/video/backlight/lm3639_bl.c: return proper error in lm3639_bled_mode_store() error paths

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agokernel/sys.c: fix stack memory content leak via UNAME26
Kees Cook [Fri, 19 Oct 2012 20:56:51 +0000]
kernel/sys.c: fix stack memory content leak via UNAME26

Calling uname() with the UNAME26 personality set allows a leak of kernel
stack contents.  This fixes it by defensively calculating the length of
copy_to_user() call, making the len argument unsigned, and initializing
the stack buffer to zero (now technically unneeded, but hey, overkill).

CVE-2012-0957

Reported-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agolinux/coredump.h needs asm/siginfo.h
Richard Weinberger [Fri, 19 Oct 2012 20:56:47 +0000]
linux/coredump.h needs asm/siginfo.h

Commit 5ab1c309b344 ("coredump: pass siginfo_t* to do_coredump() and
below, not merely signr") added siginfo_t to linux/coredump.h but forgot
to include asm/siginfo.h.  This breaks the build for UML/i386.  (And any
other arch where asm/siginfo.h is not magically preincluded...)

  In file included from arch/x86/um/elfcore.c:2:0: include/linux/coredump.h:15:25: error: unknown type name 'siginfo_t'
  make[1]: *** [arch/x86/um/elfcore.o] Error 1

Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agoremap_file_pages: correctly handle the case of a NULL vm_ops pointer
Linus Torvalds [Fri, 19 Oct 2012 20:37:57 +0000]
remap_file_pages: correctly handle the case of a NULL vm_ops pointer

In commit 0b173bc4daa8 ("mm: kill vma flag VM_CAN_NONLINEAR") we
replaced the VM_CAN_NONLINEAR test with checking whether the mapping has
a '->remap_pages()' vm operation, but there is no guarantee that there
it even has a vm_ops pointer at all.

Add the appropriate test for NULL vm_ops.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agodrm/i915: Initialize obj->pages before use by i915_gem_object_do_bit17_swizzle()
Chris Wilson [Fri, 19 Oct 2012 14:51:06 +0000]
drm/i915: Initialize obj->pages before use by i915_gem_object_do_bit17_swizzle()

If we leave obj->pages set to NULL before attempting to deswizzle them,
then an OOPS is well deserved.

Fixes regression introduced in commit 9da3da660d8c19a54f6e93361d147509be3fff84
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 1 15:20:22 2012 +0100

    drm/i915: Replace the array of pages with a scatterlist

Reported-and-tested-by: Krzysztof Kolasa
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

7 years agoMerge tag 'xtensa-next-20121018' of git://github.com/czankel/xtensa-linux
Linus Torvalds [Fri, 19 Oct 2012 19:52:06 +0000]
Merge tag 'xtensa-next-20121018' of git://github.com/czankel/xtensa-linux

Pull Xtensa patchset from Chris Zankel:
 "These are all limited to the xtensa subtree and include some important
  changes (adding long missing system calls for newer libc versions and
  other fixes) and the UAPI changes"

* tag 'xtensa-next-20121018' of git://github.com/czankel/xtensa-linux:
  xtensa: add missing system calls to the syscall table
  xtensa: minor compiler warning fix
  xtensa: Use Kbuild infrastructure to handle asm-generic headers
  UAPI: (Scripted) Disintegrate arch/xtensa/include/asm
  xtensa: fix unaligned usermode access
  xtensa: reorganize SR referencing
  xtensa: fix boot parameters parsing
  xtensa: fix missing return in do_page_fault for SIGBUS case
  xtensa: copy_thread with CLONE_VM must not copy live parent AR windows
  xtensa: fix memmove(), bcopy(), and memcpy().
  xtensa: ISS: fix rs_put_char
  xtensa: ISS: fix specific simcalls

7 years agodrm/i915: Add no-lvds quirk for Supermicro X7SPA-H
Chris Wilson [Thu, 18 Oct 2012 20:07:01 +0000]
drm/i915: Add no-lvds quirk for Supermicro X7SPA-H

Reported-and-tested-by: Francois Tigeot <ftigeot@wolfpond.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55375
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>