9 years agonuma: slab: use numa_mem_id() for slab local memory node
Lee Schermerhorn [Wed, 26 May 2010 21:45:03 +0000]
numa: slab: use numa_mem_id() for slab local memory node

Example usage of generic "numa_mem_id()":

The mainline slab code, since ~ 2.6.19, does not handle memoryless nodes
well.  Specifically, the "fast path"--____cache_alloc()--will never
succeed as slab doesn't cache offnode object on the per cpu queues, and
for memoryless nodes, all memory will be "off node" relative to
numa_node_id().  This adds significant overhead to all kmem cache
allocations, incurring a significant regression relative to earlier
kernels [from before slab.c was reorganized].

This patch uses the generic topology function "numa_mem_id()" to return
the "effective local memory node" for the calling context.  This is the
first node in the local node's generic fallback zonelist-- the same node
that "local" mempolicy-based allocations would use.  This lets slab cache
these "local" allocations and avoid fallback/refill on every allocation.

N.B.: Slab will need to handle node and memory hotplug events that could
change the value returned by numa_mem_id() for any given node if recent
changes to address memory hotplug don't already address this.  E.g., flush
all per cpu slab queues before rebuilding the zonelists while the
"machine" is held in the stopped state.

Performance impact on "hackbench 400 process 200"

2.6.34-rc3-mmotm-100405-1609 no-patch this-patch
ia64 no memoryless nodes [avg of 10]:     11.713       11.637  ~0.65 diff
ia64 cpus all on memless nodes  [10]:    228.259       26.484  ~8.6x speedup

The slowdown of the patched kernel from ~12 sec to ~28 seconds when
configured with memoryless nodes is the result of all cpus allocating from
a single node's mm pagepool.  The cache lines of the single node are
distributed/interleaved over the memory of the real physical nodes, but
the zone lock, list heads, ...  of the single node with memory still each
live in a single cache line that is accessed from all processors.

x86_64 [8x6 AMD] [avg of 40]: 2.883    2.845

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agonuma: ia64: support numa_mem_id() for memoryless nodes
Lee Schermerhorn [Wed, 26 May 2010 21:45:01 +0000]
numa: ia64: support numa_mem_id() for memoryless nodes

Enable 'HAVE_MEMORYLESS_NODES' by default when NUMA configured on ia64.
Initialize percpu 'numa_mem' variable when starting secondary cpus.
Generic initialization will handle the boot cpu.

Nothing uses 'numa_mem_id()' yet.  Subsequent patch with modify slab to
use this.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agonuma: introduce numa_mem_id()- effective local memory node id
Lee Schermerhorn [Wed, 26 May 2010 21:45:00 +0000]
numa: introduce numa_mem_id()- effective local memory node id

Introduce numa_mem_id(), based on generic percpu variable infrastructure
to track "nearest node with memory" for archs that support memoryless
nodes.

Define API in <linux/topology.h> when CONFIG_HAVE_MEMORYLESS_NODES
defined, else stubs.  Architectures will define HAVE_MEMORYLESS_NODES
if/when they support them.

Archs can override definitions of:

numa_mem_id() - returns node number of "local memory" node
set_numa_mem() - initialize [this cpus'] per cpu variable 'numa_mem'
cpu_to_mem()  - return numa_mem for specified cpu; may be used as lvalue

Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
This will initialize the boot cpu at boot time, and all cpus on change of
numa_zonelist_order, or when node or memory hot-plug requires zonelist
rebuild.  Archs that support memoryless nodes will need to initialize
'numa_mem' for secondary cpus as they're brought on-line.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agonuma: ia64: use generic percpu var numa_node_id() implementation
Lee Schermerhorn [Wed, 26 May 2010 21:44:59 +0000]
numa: ia64: use generic percpu var numa_node_id() implementation

ia64:  Use generic percpu implementation of numa_node_id()
   + intialize per cpu 'numa_node'
   + remove ia64 cpu_to_node() macro;  use generic
   + define CONFIG_USE_PERCPU_NUMA_NODE_ID when NUMA configured

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agonuma: x86_64: use generic percpu var numa_node_id() implementation
Lee Schermerhorn [Wed, 26 May 2010 21:44:58 +0000]
numa: x86_64: use generic percpu var numa_node_id() implementation

x86 arch specific changes to use generic numa_node_id() based on generic
percpu variable infrastructure.  Back out x86's custom version of
numa_node_id()

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agonuma: add generic percpu var numa_node_id() implementation
Lee Schermerhorn [Wed, 26 May 2010 21:44:56 +0000]
numa: add generic percpu var numa_node_id() implementation

Rework the generic version of the numa_node_id() function to use the new
generic percpu variable infrastructure.

Guard the new implementation with a new config option:

        CONFIG_USE_PERCPU_NUMA_NODE_ID.

Archs which support this new implemention will default this option to 'y'
when NUMA is configured.  This config option could be removed if/when all
archs switch over to the generic percpu implementation of numa_node_id().
Arch support involves:

  1) converting any existing per cpu variable implementations to use
     this implementation.  x86_64 is an instance of such an arch.
  2) archs that don't use a per cpu variable for numa_node_id() will
     need to initialize the new per cpu variable "numa_node" as cpus
     are brought on-line.  ia64 is an example.
  3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g.,
     when NUMA is configured.  This is required because I have
     retained the old implementation by default to allow archs to
     be modified incrementally, as desired.

Subsequent patches will convert x86_64 and ia64 to use this implemenation.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoDocumentation/filesystems/Locking: update documentation on llseek() wrt BKL
Jan Blunck [Wed, 26 May 2010 21:44:54 +0000]
Documentation/filesystems/Locking: update documentation on llseek() wrt BKL

The inode's i_size is not protected by the big kernel lock.  Therefore it
does not make sense to recommend taking the BKL in filesystems llseek
operations.  Instead it should use the inode's mutex or use just use
i_size_read() instead.  Add a note that this is not protecting
file->f_pos.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agofs/: do not fallback to default_llseek() when readdir() uses BKL
jan Blunck [Wed, 26 May 2010 21:44:53 +0000]
fs/: do not fallback to default_llseek() when readdir() uses BKL

Do not use the fallback default_llseek() if the readdir operation of the
filesystem still uses the big kernel lock.

Since llseek() modifies
file->f_pos of the directory directly it may need locking to not confuse
readdir which usually uses file->f_pos directly as well

Since the special characteristics of the BKL (unlocked on schedule) are
not necessary in this case, the inode mutex can be used for locking as
provided by generic_file_llseek().  This is only possible since all
filesystems, except reiserfs, either use a directory as a flat file or
with disk address offsets.  Reiserfs on the other hand uses a 32bit hash
off the filename as the offset so generic_file_llseek() can get used as
well since the hash is always smaller than sb->s_maxbytes (= (512 << 32) -
blocksize).

Signed-off-by: Jan Blunck <jblunck@suse.de>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Anders Larsen <al@alarsen.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agost: use noop_llseek() instead of default_llseek()
Jan Blunck [Wed, 26 May 2010 21:44:51 +0000]
st: use noop_llseek() instead of default_llseek()

st_open() suggests that llseek() doesn't work: "We really want to do
nonseekable_open(inode, filp); here, but some versions of tar incorrectly
call lseek on tapes and bail out if that fails.  So we disallow pread()
and pwrite(), but permit lseeks."

Instead of using the fallback default_llseek() the driver should use
noop_llseek() which leaves the file->f_pos untouched but succeeds.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kai Makisara <Kai.Makisara@kolumbus.fi>
Cc: Willem Riede <osst@riede.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoosst: use noop_llseek() instead of default_llseek()
Jan Blunck [Wed, 26 May 2010 21:44:50 +0000]
osst: use noop_llseek() instead of default_llseek()

__os_scsi_tape_open() suggests that llseek() doesn't work: "We really want
to do nonseekable_open(inode, filp); here, but some versions of tar
incorrectly call lseek on tapes and bail out if that fails.  So we
disallow pread() and pwrite(), but permit lseeks."

Instead of using the fallback default_llseek() the driver should use
noop_llseek() which leaves the file->f_pos untouched but succeeds.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Willem Riede <osst@riede.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agovfs: introduce noop_llseek()
jan Blunck [Wed, 26 May 2010 21:44:48 +0000]
vfs: introduce noop_llseek()

This is an implementation of ->llseek useable for the rare special case
when userspace expects the seek to succeed but the (device) file is
actually not able to perform the seek.  In this case you use noop_llseek()
instead of falling back to the default implementation of ->llseek.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agortc-m41t80: use nonseekable_open()
Jan Blunck [Wed, 26 May 2010 21:44:47 +0000]
rtc-m41t80: use nonseekable_open()

Use nonseekable_open() for this since seeking is not supported anyway.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Gortmaker <p_gortmaker@yahoo.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomISDN: remove unnecessary test on f_pos
Jan Blunck [Wed, 26 May 2010 21:44:46 +0000]
mISDN: remove unnecessary test on f_pos

This test is not doing anything since it is always false if the
mISDN_read() is called from vfs_read().  Besides that the driver uses
nonseekable_open() and is not using off or file->f_pos anywhere.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agofrv: remove "struct file *" argument from sysctl ->proc_handler
Jan Blunck [Wed, 26 May 2010 21:44:46 +0000]
frv: remove "struct file *" argument from sysctl ->proc_handler

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoosst: update ppos instead of using file->f_pos
Jan Blunck [Wed, 26 May 2010 21:44:44 +0000]
osst: update ppos instead of using file->f_pos

osst_read()/osst_write() modify file->f_pos directly instead of the ppos
given to them.  The VFS later updates the file->f_pos and overwrites it
with the value of ppos.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Willem Riede <osst@riede.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoparisc: use asm-generic/scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:43 +0000]
parisc: use asm-generic/scatterlist.h

parisc uses iova and iova_length in scatterlist structure instead of
dma_address and dma_length.  However, the accessor are used so we can
convert parisc to use asm-generic/scatterlist.h easily.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomn10300: use asm-generic/scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:42 +0000]
mn10300: use asm-generic/scatterlist.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agofrv: use asm-generic/scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:42 +0000]
frv: use asm-generic/scatterlist.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoblackfin: use use asm-generic/scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:41 +0000]
blackfin: use use asm-generic/scatterlist.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoxtensa: use use asm-generic/scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:41 +0000]
xtensa: use use asm-generic/scatterlist.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomips: use use asm-generic/scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:40 +0000]
mips: use use asm-generic/scatterlist.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agom68k: use asm-generic/scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:39 +0000]
m68k: use asm-generic/scatterlist.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agom32r: use use asm-generic/scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:37 +0000]
m32r: use use asm-generic/scatterlist.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoh8300: use asm-generic/scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:36 +0000]
h8300: use asm-generic/scatterlist.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agocris: use asm-generic/scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:35 +0000]
cris: use asm-generic/scatterlist.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoavr32: use asm-generic/scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:35 +0000]
avr32: use asm-generic/scatterlist.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoasm-generic: remove ARCH_HAS_SG_CHAIN in scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:34 +0000]
asm-generic: remove ARCH_HAS_SG_CHAIN in scatterlist.h

There are more architectures that don't support ARCH_HAS_SG_CHAIN than
those that support it.  This removes removes ARCH_HAS_SG_CHAIN in
asm-generic/scatterlist.h and lets arhictectures to define it.

It's clearer than defining ARCH_HAS_SG_CHAIN asm-generic/scatterlist.h and
undefing it in arhictectures that don't support it.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoalpha: use asm-generic/scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:34 +0000]
alpha: use asm-generic/scatterlist.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopowerpc: use asm-generic/scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:33 +0000]
powerpc: use asm-generic/scatterlist.h

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agox86_32: use asm-generic/scatterlist.h
Andrew Morton [Wed, 26 May 2010 21:44:33 +0000]
x86_32: use asm-generic/scatterlist.h

Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoasm-generic: add NEED_SG_DMA_LENGTH to define sg_dma_len()
FUJITA Tomonori [Wed, 26 May 2010 21:44:32 +0000]
asm-generic: add NEED_SG_DMA_LENGTH to define sg_dma_len()

There are only two ways to define sg_dma_len(); use sg->dma_length or
sg->length.  This patch introduces NEED_SG_DMA_LENGTH that enables
architectures to choose sg->dma_length or sg->length.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoasm-generic: remove ISA_DMA_THRESHOLD in scatterlist.h
FUJITA Tomonori [Wed, 26 May 2010 21:44:30 +0000]
asm-generic: remove ISA_DMA_THRESHOLD in scatterlist.h

This is the first half of the attempt to use asm-generic/scatterlist.h
on every architecture.

There are only two ways to define scatterlist structure. So it's easy
to convert every architecture to use asm-generic/scatterlist.h.

This patch:

The trick for ISA_DMA_THRESHOLD in asm-generic/scatterlist.h doesn't work
for powerpc.  This lets architectures defin ISA_DMA_THRESHOLD.

Hopefully, we can remove ISA_DMA_THRESHOLD in the future; we can do better
to decide if the bouncing is necessary or not.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agolktdm: add support for hardlockup, softlockup and hung task crashes
Frederic Weisbecker [Wed, 26 May 2010 21:44:29 +0000]
lktdm: add support for hardlockup, softlockup and hung task crashes

This adds three new types of kernel "crashes" in the lkdtm driver to
trigger hardlockups, softlockups and task hung states at will.

The first two are useful to test the new generic lockup detector and check
its further regressions.  The latter one is a bonus to check the hung task
detector regressions even though it's not currently in rework.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Simon Kagstrom <simon.kagstrom@netinsight.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoinitramfs: add support for in-kernel initramfs compressed with LZO
Albin Tonnerre [Wed, 26 May 2010 21:44:28 +0000]
initramfs: add support for in-kernel initramfs compressed with LZO

Add the necessary parts to be enable the use of LZO-compressed initramfs
build into the kernel.

Signed-off-by: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoradix-tree: fix radix_tree_prev_hole() underflow case
Cesar Eduardo Barros [Wed, 26 May 2010 21:44:27 +0000]
radix-tree: fix radix_tree_prev_hole() underflow case

radix_tree_prev_hole() used LONG_MAX to detect underflow; however,
ULONG_MAX is clearly what was intended, both here and by its only user
(count_history_pages at mm/readahead.c).

Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoaio: fix the compat vectored operations
Jeff Moyer [Wed, 26 May 2010 21:44:26 +0000]
aio: fix the compat vectored operations

The aio compat code was not converting the struct iovecs from 32bit to
64bit pointers, causing either EINVAL to be returned from io_getevents, or
EFAULT as the result of the I/O.  This patch passes a compat flag to
io_submit to signal that pointer conversion is necessary for a given iocb
array.

A variant of this was tested by Michael Tokarev.  I have also updated the
libaio test harness to exercise this code path with good success.
Further, I grabbed a copy of ltp and ran the
testcases/kernel/syscall/readv and writev tests there (compiled with -m32
on my 64bit system).  All seems happy, but extra eyes on this would be
welcome.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_COMPAT=n build]
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: <stable@kernel.org> [2.6.35.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agocompat: factor out compat_rw_copy_check_uvector from compat_do_readv_writev
Jeff Moyer [Wed, 26 May 2010 21:44:25 +0000]
compat: factor out compat_rw_copy_check_uvector from compat_do_readv_writev

It was reported in http://lkml.org/lkml/2010/3/8/309 that 32 bit readv and
writev AIO operations were not functioning properly.  It turns out that
the code to convert the 32bit io vectors to 64 bits was never written.
The results of that can be pretty bad, but in my testing, it mostly ended
up in generating EFAULT as we walked off the list of I/O vectors provided.

This patch set fixes the problem in my environment.  are greatly
appreciated.

This patch:

Factor out code that will be used by both compat_do_readv_writev and the
compat aio submission code paths.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: <stable@kernel.org> [2.6.35.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopanic: call console_verbose() in panic
Anton Blanchard [Wed, 26 May 2010 21:44:24 +0000]
panic: call console_verbose() in panic

Most distros turn the console verbosity down and that means a backtrace
after a panic never makes it to the console.  I assume we haven't seen
this because a panic is often preceeded by an oops which will have called
console_verbose.  There are however a lot of places we call panic
directly, and they are broken.

Use console_verbose like we do in the oops path to ensure a directly
called panic will print a backtrace.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agofs/affs: use ERR_CAST
Julia Lawall [Wed, 26 May 2010 21:44:23 +0000]
fs/affs: use ERR_CAST

Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
type T;
T x;
identifier f;
@@

T f (...) { <+...
- ERR_PTR(PTR_ERR(x))
+ x
 ...+> }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoDocumentation/DMA-API-HOWTO: add ARCH_KMALLOC_MINALIGN description
FUJITA Tomonori [Wed, 26 May 2010 21:44:23 +0000]
Documentation/DMA-API-HOWTO: add ARCH_KMALLOC_MINALIGN description

Add ARCH_KMALLOC_MINALIGN description in "Platform Issues" section.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoDocumentation: move the error handling to the better place in DMA-API-HOWTO
FUJITA Tomonori [Wed, 26 May 2010 21:44:22 +0000]
Documentation: move the error handling to the better place in DMA-API-HOWTO

Handing DMA mapping errors is essential.  Let's put it in the more
appropriate place rather than the end of the doc.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoDocumentation: update scatterlist struct description in DMA-API-HOWTO
FUJITA Tomonori [Wed, 26 May 2010 21:44:21 +0000]
Documentation: update scatterlist struct description in DMA-API-HOWTO

Now we have <asm-generic/scatterlist.h>.  Architectures should use it
instead of inventing the own scatterlist struct.  Let's update the
description.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoDocumentation: add SCSI drivers' mapping error handling to DMA-API-HOWTO
FUJITA Tomonori [Wed, 26 May 2010 21:44:21 +0000]
Documentation: add SCSI drivers' mapping error handling to DMA-API-HOWTO

Add the concrete DMA mapping error handling for SCSI drivers on the
queuecommand path.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agodma-mapping: remove deprecated dma_sync_single and dma_sync_sg API
FUJITA Tomonori [Wed, 26 May 2010 21:44:20 +0000]
dma-mapping: remove deprecated dma_sync_single and dma_sync_sg API

Since 2.6.5, it had been commented, 'for backwards compatibility,
removed in 2.7.x'. Since 2.6.31, it have been marked as __deprecated.

I think that we can remove the API safely now.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoStaging: saa7134-go7007: replace dma_sync_single with dma_sync_single_for_cpu
FUJITA Tomonori [Wed, 26 May 2010 21:44:20 +0000]
Staging: saa7134-go7007: replace dma_sync_single with dma_sync_single_for_cpu

dma_sync_single() is deprecated and will be removed soon.

No functional change since dma_sync_single is the wrapper of
dma_sync_single_for_cpu.

saa7134-go7007.c is commented out but anyway let's replace it.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoDocumentation: add networking driver's mapping error handling to DMA-API-HOWTO
FUJITA Tomonori [Wed, 26 May 2010 21:44:19 +0000]
Documentation: add networking driver's mapping error handling to DMA-API-HOWTO

Adds the concrete DMA mapping error handling for Networking drivers on the
transmit path.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agodma-mapping: remove unnecessary sync_single_range_* in dma_map_ops
FUJITA Tomonori [Wed, 26 May 2010 21:44:18 +0000]
dma-mapping: remove unnecessary sync_single_range_* in dma_map_ops

sync_single_range_for_cpu and sync_single_range_for_device hooks are
unnecessary because sync_single_for_cpu and sync_single_for_device can
be used instead.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoswiotlb: remove unnecessary swiotlb_sync_single_range_*
FUJITA Tomonori [Wed, 26 May 2010 21:44:18 +0000]
swiotlb: remove unnecessary swiotlb_sync_single_range_*

swiotlb_sync_single_range_for_cpu and swiotlb_sync_single_range_for_device
are unnecessary because swiotlb_sync_single_for_cpu and
swiotlb_sync_single_for_device can be used instead.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopowerpc: remove unnecessary sync_single_range_* in swiotlb_dma_ops
FUJITA Tomonori [Wed, 26 May 2010 21:44:17 +0000]
powerpc: remove unnecessary sync_single_range_* in swiotlb_dma_ops

sync_single_range_for_cpu and sync_single_range_for_device hooks in
swiotlb_dma_ops are unnecessary because sync_single_for_cpu and
sync_single_for_device are used there.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Becky Bruce <beckyb@kernel.crashing.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agox86: remove unnecessary sync_single_range_* in swiotlb_dma_ops
FUJITA Tomonori [Wed, 26 May 2010 21:44:16 +0000]
x86: remove unnecessary sync_single_range_* in swiotlb_dma_ops

sync_single_range_for_cpu and sync_single_range_for_device hooks in
swiotlb_dma_ops are unnecessary because sync_single_for_cpu and
sync_single_for_device are used there.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoia64: remove unnecessary sync_single_range_* in swiotlb_dma_ops
FUJITA Tomonori [Wed, 26 May 2010 21:44:15 +0000]
ia64: remove unnecessary sync_single_range_* in swiotlb_dma_ops

sync_single_range_for_cpu and sync_single_range_for_device hooks in
swiotlb_dma_ops are unnecessary because sync_single_for_cpu and
sync_single_for_device are used there.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agodrivers/edac: convert logging messages direct uses of __FILE__ to %s, __FILE
Joe Perches [Wed, 26 May 2010 21:44:14 +0000]
drivers/edac: convert logging messages direct uses of __FILE__ to %s, __FILE

Reduces text by eliminating multiple __FILE__ uses.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Tim Small <tim@buttersideup.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agolib/random32: export pseudo-random number generator for modules
Joe Eykholt [Wed, 26 May 2010 21:44:13 +0000]
lib/random32: export pseudo-random number generator for modules

This patch moves the definition of struct rnd_state and the inline
__seed() function to linux/random.h.  It renames the static __random32()
function to prandom32() and exports it for use in modules.

prandom32() is useful as a privately-seeded pseudo random number generator
that can give the same result every time it is initialized.

For FCoE FC-BB-6 VN2VN mode self-selected unique FC address generation, we
need an pseudo-random number generator seeded with the 64-bit world-wide
port name.  A truly random generator or one seeded with randomness won't
do because the same sequence of numbers should be generated each time we
boot or the link comes up.

A prandom32_seed() inline function is added to the header file.  It is
inlined not for speed, but so the function won't be expanded in the base
kernel, but only in the module that uses it.

Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoINIT_SIGHAND: use SIG_DFL instead of NULL
Oleg Nesterov [Wed, 26 May 2010 21:44:12 +0000]
INIT_SIGHAND: use SIG_DFL instead of NULL

Cosmetic, no changes in the compiled code. Just s/NULL/SIG_DFL/ to make
it more readable and grep-friendly.

Note: probably SIG_IGN makes more sense, we could kill ignore_signals().
But then kernel_init() should do flush_signal_handlers() before exec().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Mathias Krause <Mathias.Krause@secunet.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopids: fix fork_idle() to setup ->pids correctly
Oleg Nesterov [Wed, 26 May 2010 21:44:11 +0000]
pids: fix fork_idle() to setup ->pids correctly

copy_process(pid => &init_struct_pid) doesn't do attach_pid/etc.

It shouldn't, but this means that the idle threads run with the wrong
pids copied from the caller's task_struct. In x86 case the caller is
either kernel_init() thread or keventd.

In particular, this means that after the series of cpu_up/cpu_down an
idle thread (which never exits) can run with .pid pointing to nowhere.

Change fork_idle() to initialize idle->pids[] correctly. We only set
.pid = &init_struct_pid but do not add .node to list, INIT_TASK() does
the same for the boot-cpu idle thread (swapper).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Mathias Krause <Mathias.Krause@secunet.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopids: init_struct_pid.tasks should never see the swapper process
Oleg Nesterov [Wed, 26 May 2010 21:44:10 +0000]
pids: init_struct_pid.tasks should never see the swapper process

"statically initialize struct pid for swapper" commit 820e45db says:

Statically initialize a struct pid for the swapper process (pid_t == 0)
and attach it to init_task.  This is needed so task_pid(), task_pgrp()
and task_session() interfaces work on the swapper process also.

OK, but:

- it doesn't make sense to add init_task.pids[].node into
  init_struct_pid.tasks[], and in fact this just wrong.

  idle threads are special, they shouldn't be visible on any
  global list. In particular do_each_pid_task(init_struct_pid)
  shouldn't see swapper.

  This is the actual reason why kill(0, SIGKILL) from /sbin/init
  (which starts with 0,0 special pids) crashes the kernel. The
  signal sent to pgid/sid == 0 must never see idle threads, even
  if the previous patch fixed the crash itself.

- we have other idle threads running on the non-boot CPUs, see
  the next patch.

Change INIT_STRUCT_PID/INIT_PID_LINK to create the empty/unhashed
hlist_head/hlist_node. Like any other idle thread swapper can never exit,
so detach_pid()->__hlist_del() is not possible, but we could change
INIT_PID_LINK() to set pprev = &next if needed.

All we need is the valid swapper->pids[].pid == &init_struct_pid.

Reported-by: Mathias Krause <mathias.krause@secunet.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Mathias Krause <Mathias.Krause@secunet.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoINIT_TASK() should initialize ->thread_group list
Oleg Nesterov [Wed, 26 May 2010 21:44:08 +0000]
INIT_TASK() should initialize ->thread_group list

The trivial /sbin/init doing

int main(void)
{
kill(0, SIGKILL)
}

crashes the kernel.

This happens because __kill_pgrp_info(init_struct_pid) also sends SIGKILL
to the swapper process which runs with the uninitialized ->thread_group.

Change INIT_TASK() to initialize ->thread_group properly.

Note: the real problem is that the swapper process must not be visible to
signals, see the next patch. But this change is right anyway and fixes
the crash.

Reported-and-tested-by: Mathias Krause <mathias.krause@secunet.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Mathias Krause <Mathias.Krause@secunet.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agopids: increase pid_max based on num_possible_cpus
Hedi Berriche [Wed, 26 May 2010 21:44:06 +0000]
pids: increase pid_max based on num_possible_cpus

On a system with a substantial number of processors, the early default
pid_max of 32k will not be enough.  A system with 1664 CPU's, there are
25163 processes started before the login prompt.  It's estimated that with
2048 CPU's we will pass the 32k limit.  With 4096, we'll reach that limit
very early during the boot cycle, and processes would stall waiting for an
available pid.

This patch increases the early maximum number of pids available, and
increases the minimum number of pids that can be set during runtime.

[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Hedi Berriche <hedi@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg KH <gregkh@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: John Stoffel <john@stoffel.org>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agorapidio: fix maintenance access to higher memory areas
Thomas Moll [Wed, 26 May 2010 21:44:05 +0000]
rapidio: fix maintenance access to higher memory areas

Fix the maintenance access functions to farend RapidIO devices.
1. Fixed shift of the given offset, to open the maintenance window
2. Mask offset to limit access to the opened maintenance window
3. Added extended destid part to rowtear register, required for 16bit mode

This method is matching maintenance transactions generation described
by Freescale in the appnote AN2932. With this modification full access
to a 16MB maintenance window is possible, this patch is required for
IDT cps switches. For easier handling of the access routines, the
access was limited to aligned memory regions. This should be no problem
because all registers are 32bit wide.

Signed-off-by: Thomas Moll <thomas.moll@sysgo.com>
Tested-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agorapidio: use default route value for CPS switches
Alexandre Bounine [Wed, 26 May 2010 21:44:05 +0000]
rapidio: use default route value for CPS switches

Fix to use correct default value for routing table entries.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agorapidio: add switch domain routines
Alexandre Bounine [Wed, 26 May 2010 21:44:04 +0000]
rapidio: add switch domain routines

Add switch specific domain routines required for 16-bit routing support in
switches with hierarchical implementation of routing tables.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agorapidio: modify initialization of switch operations
Alexandre Bounine [Wed, 26 May 2010 21:44:03 +0000]
rapidio: modify initialization of switch operations

Modify the way how RapidIO switch operations are declared.  Multiple
assignments through the linker script replaced by single initialization
call.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agorapidio: add debug configuration option
Alexandre Bounine [Wed, 26 May 2010 21:44:03 +0000]
rapidio: add debug configuration option

Add debug configuration option for RapidIO subsystem.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agorapidio: fix typos and minor edits
Alexandre Bounine [Wed, 26 May 2010 21:44:02 +0000]
rapidio: fix typos and minor edits

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agorapidio: add enabling SRIO port RX and TX
Thomas Moll [Wed, 26 May 2010 21:44:01 +0000]
rapidio: add enabling SRIO port RX and TX

Add the functionality to enable Input receiver and Output transmitter of
every port, to allow non-maintenance traffic.

Signed-off-by: Thomas Moll <thomas.moll@sysgo.com>
Signed-off-by: Alexandre Bounine <abounine@tundra.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agorapidio, powerpc/85xx: Add MChk handler for SRIO port
Alexandre Bounine [Wed, 26 May 2010 21:44:00 +0000]
rapidio, powerpc/85xx: Add MChk handler for SRIO port

Add Machine Check exception handling into RapidIO port driver for
Freescale SoCs (MPC85xx).

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agorapidio, powerpc/85xx: add Port-Write message handler for SRIO port
Alexandre Bounine [Wed, 26 May 2010 21:44:00 +0000]
rapidio, powerpc/85xx: add Port-Write message handler for SRIO port

Add RapidIO Port-Write message handler for Freescale SoCs with RapidIO
port.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agorapidio: add Port-Write handling for EM
Alexandre Bounine [Wed, 26 May 2010 21:43:59 +0000]
rapidio: add Port-Write handling for EM

Add RapidIO Port-Write message handling in the context of Error
   Management Extensions Specification Rev.1.3.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agorapidio: add switch locking during discovery
Alexandre Bounine [Wed, 26 May 2010 21:43:58 +0000]
rapidio: add switch locking during discovery

Add switch access locking during RapidIO discovery.  Access lock is
required when reading switch routing table contents due to indexed
mechanism of RT addressing.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agorapidio: add IDT CPS/TSI switches
Alexandre Bounine [Wed, 26 May 2010 21:43:57 +0000]
rapidio: add IDT CPS/TSI switches

Extentions to RapidIO switch support:

1. modify switch route operation declarations to allow using single
   switch-specific file for family of switches that share the same route
   table operations.

2. add standard route table operations for switches that that support
   route table manipulation registers as defined in the Rev.1.3 of RapidIO
   specification.

3. add clear-route-table operation for switches

4. add CPSxx and TSIxxx families of RapidIO switches

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Tested-by: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agodrivers/char/applicom.c: use memdup_user
Julia Lawall [Wed, 26 May 2010 21:43:56 +0000]
drivers/char/applicom.c: use memdup_user

Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agodrivers/char/ppdev.c: use kasprintf
Julia Lawall [Wed, 26 May 2010 21:43:55 +0000]
drivers/char/ppdev.c: use kasprintf

kasprintf combines kmalloc and sprintf, and takes care of the size
calculation itself.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression a,flag;
expression list args;
statement S;
@@

  a =
-  \(kmalloc\|kzalloc\)(...,flag)
+  kasprintf(flag,args)
  <... when != a
  if (a == NULL || ...) S
  ...>
- sprintf(a,args);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Michael Buesch <mb@bu3sch.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agochar drivers: RAM oops/panic logger
Marco Stornelli [Wed, 26 May 2010 21:43:54 +0000]
char drivers: RAM oops/panic logger

Ramoops, like mtdoops, can log oops/panic information but in RAM.  It can
be used with persistent RAM for systems without flash support.  In
addition, for this systems, with this driver, it's no more needed add to
the kernel the mtd subsystem with advantage in footprint.

It can be used in a very easy way with persistent RAM for systems without
flash support.  For these systems, with this driver, it is no longer
required to cinlude mtd subsystem with an advantage in footprint.  In
addition, you can save flash space and store this information only in RAM.

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Simon Kagstrom <simon.kagstrom@netinsight.net>
Cc: David Woodhouse <David.Woodhouse@intel.com>
Cc; Anders Grafstrom <anders.grafstrom@netinsight.net>
Cc: Yuasa Yoichi <yuasa@linux-mips.org>
Cc: Jamie Lokier <jamie@shareable.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipmi: handle run_to_completion properly in deliver_recv_msg()
Jiri Kosina [Wed, 26 May 2010 21:43:53 +0000]
ipmi: handle run_to_completion properly in deliver_recv_msg()

If run_to_completion flag is set, it means that we are running in a
single-threaded mode, and thus no locks are held.

This fixes a deadlock when IPMI notifier is being called during panic.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Corey Minyard <minyard@acm.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipmi: update driver to use dev_printk and its constructs
Myron Stowe [Wed, 26 May 2010 21:43:52 +0000]
ipmi: update driver to use dev_printk and its constructs

Update core IPMI driver printk()'s with dev_printk(), and its constructs,
to provide additional device topology information.

An example of the additional device topology for a PNP device -
  ipmi_si 00:02: probing via ACPI
  ipmi_si 00:02: [io  0x0ca2-0x0ca3] regsize 1 spacing 1 irq 0
  ipmi_si 00:02: Found new BMC (man_id: 0x00000b, prod_id: 0x0000, ...
  ipmi_si 00:02: IPMI kcs interface initialized

and for a PCI device -
  ipmi_si 0000:01:04.6: probing via PCI
  ipmi_si 0000:01:04.6: PCI INT A -> GSI 21 (level, low) -> IRQ 21
  ipmi_si 0000:01:04.6: [mem 0xf1ef0000-0xf1ef00ff] regsize 1 spaci...
  ipmi_si 0000:01:04.6: IPMI kcs interface initialized

[minyard@acm.org: rework to fix rejects, extended it a bit]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Myron Stowe <myron.stowe@hp.com>
Signed-off-by: Corey Minyard <minyard@acm.org>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipmi: convert tracking of the ACPI device pointer to a PNP device
Myron Stowe [Wed, 26 May 2010 21:43:51 +0000]
ipmi: convert tracking of the ACPI device pointer to a PNP device

Convert PNP patch (git 9e368fa011d4e0aa050db348d69514900520e40b) to
maintain a pointer to a PNP device, 'pnp_dev', instead of the ACPI device,
'acpi_dev', that is currently being tracked with PNP based IPMI device
discovery.

Signed-off-by: Myron Stowe <myron.stowe@hp.com>
Acked-by: Zhao Yakui <yakui.zhao@intel.com>
Acked-by: Corey Minyard <minyard@acm.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipmi: change timeout and event poll to one second
Corey Minyard [Wed, 26 May 2010 21:43:50 +0000]
ipmi: change timeout and event poll to one second

The timeouts in IPMI are in the 1-5 second range in message handling, so a
1 second timeout is a reasonable thing to do.  This should help with
reducing power consumption on idle systems.

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipmi: attempt to register multiple SIs of the same type
Matthew Garrett [Wed, 26 May 2010 21:43:49 +0000]
ipmi: attempt to register multiple SIs of the same type

Some odd systems may have multiple BMCs, and we want to be able to support
them.  Let's make the assumption that if a system legitimately has
multiple BMCs then each BMC's SI will be of the same type, and also that
we won't see multiple SIs of the same type unless we have multiple BMCs.
If these hold true then we should register all SIs of the same type.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipmi: reduce polling
Matthew Garrett [Wed, 26 May 2010 21:43:49 +0000]
ipmi: reduce polling

We can reasonably alter the poll rate depending on whether we're
performing a transaction or merely waiting for an event.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipmi: reduce polling when interrupts are available
Matthew Garrett [Wed, 26 May 2010 21:43:48 +0000]
ipmi: reduce polling when interrupts are available

If we're not currently in the middle of a transaction, and if we have
interrupts, there's no real reason to poll the controller more frequently
than the core IPMI code does.  Set the interrupt_disabled flag
appropriately as the interrupt state changes, and make the timeout code
reset itself only if the transaction is incomplete or we have no
interrupts.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipmi: change device discovery order
Matthew Garrett [Wed, 26 May 2010 21:43:47 +0000]
ipmi: change device discovery order

The ipmi spec provides an ordering for si discovery.  Change the driver to
match, with the exception of preferring smbios to SPMI as HPs (at least)
contain accurate information in the former but not the latter.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipmi: only register one si per bmc
Matthew Garrett [Wed, 26 May 2010 21:43:46 +0000]
ipmi: only register one si per bmc

Only register one si per bmc.  Use any user-provided devices first,
followed by the first device with an irq, followed by the first device
discovered.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipmi: split device discovery and registration
Matthew Garrett [Wed, 26 May 2010 21:43:46 +0000]
ipmi: split device discovery and registration

The ipmi spec indicates that we should only make use of one si per bmc, so
separate device discovery and registration to make that possible.

[thenzl@redhat.com: fix mutex use]
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipmi: change addr_source to an enum rather than strings
Matthew Garrett [Wed, 26 May 2010 21:43:45 +0000]
ipmi: change addr_source to an enum rather than strings

Switch from a char* to an enum to identify the address source of SIs,
making it easier to handle them appropriately during registration.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipc/sem.c: use ERR_CAST
Julia Lawall [Wed, 26 May 2010 21:43:44 +0000]
ipc/sem.c: use ERR_CAST

Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
type T;
T x;
identifier f;
@@

T f (...) { <+...
- ERR_PTR(PTR_ERR(x))
+ x
 ...+> }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipc/sem.c: update description of the implementation
Manfred Spraul [Wed, 26 May 2010 21:43:43 +0000]
ipc/sem.c: update description of the implementation

ipc/sem.c begins with a 15 year old description about bugs in the initial
implementation in Linux-1.0.  The patch replaces that with a top level
description of the current code.

A TODO could be derived from this text:

The opengroup man page for semop() does not mandate FIFO.  Thus there is
no need for a semaphore array list of pending operations.

If

- this list is removed
- the per-semaphore array spinlock is removed (possible if there is no
  list to protect)
- sem_otime is moved into the semaphores and calculated on demand during
  semctl()

then the array would be read-mostly - which would significantly improve
scaling for applications that use semaphore arrays with lots of entries.

The price would be expensive semctl() calls:

for(i=0;i<sma->sem_nsems;i++) spin_lock(sma->sem_lock);
<do stuff>
for(i=0;i<sma->sem_nsems;i++) spin_unlock(sma->sem_lock);

I'm not sure if the complexity is worth the effort, thus here is the
documentation of the current behavior first.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipc/sem.c: cacheline align the ipc spinlock for semaphores
Manfred Spraul [Wed, 26 May 2010 21:43:42 +0000]
ipc/sem.c: cacheline align the ipc spinlock for semaphores

Cacheline align the spinlock for sysv semaphores.  Without the patch, the
spinlock and sem_otime [written by every semop that modified the array]
and sem_base [read in the hot path of try_atomic_semop()] can be in the
same cacheline.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipc/sem.c: move wake_up_process out of the spinlock section
Manfred Spraul [Wed, 26 May 2010 21:43:41 +0000]
ipc/sem.c: move wake_up_process out of the spinlock section

The wake-up part of semtimedop() consists out of two steps:

- the right tasks must be identified.
- they must be woken up.

Right now, both steps run while the array spinlock is held.  This patch
reorders the code and moves the actual wake_up_process() behind the point
where the spinlock is dropped.

The code also moves setting sem->sem_otime to one place: It does not make
sense to set the last modify time multiple times.

[akpm@linux-foundation.org: repair kerneldoc]
[akpm@linux-foundation.org: fix uninitialised retval]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoipc/sem.c: optimize update_queue() for bulk wakeup calls
Manfred Spraul [Wed, 26 May 2010 21:43:40 +0000]
ipc/sem.c: optimize update_queue() for bulk wakeup calls

The following series of patches tries to fix the spinlock contention
reported by Chris Mason - his benchmark exposes problems of the current
code:

- In the worst case, the algorithm used by update_queue() is O(N^2).
  Bulk wake-up calls can enter this worst case.  The patch series fix
  that.

  Note that the benchmark app doesn't expose the problem, it just should
  be fixed: Real world apps might do the wake-ups in another order than
  perfect FIFO.

- The part of the code that runs within the semaphore array spinlock is
  significantly larger than necessary.

  The patch series fixes that.  This change is responsible for the main
  improvement.

- The cacheline with the spinlock is also used for a variable that is
  read in the hot path (sem_base) and for a variable that is unnecessarily
  written to multiple times (sem_otime).  The last step of the series
  cacheline-aligns the spinlock.

This patch:

The SysV semaphore code allows to perform multiple operations on all
semaphores in the array as atomic operations.  After a modification,
update_queue() checks which of the waiting tasks can complete.

The algorithm that is used to identify the tasks is O(N^2) in the worst
case.  For some cases, it is simple to avoid the O(N^2).

The patch adds a detection logic for some cases, especially for the case
of an array where all sleeping tasks are single sembuf operations and a
multi-sembuf operation is used to wake up multiple tasks.

A big database application uses that approach.

The patch fixes wakeup due to semctl(,,SETALL,) - the initial version of
the patch breaks that.

[akpm@linux-foundation.org: make do_smart_update() static]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoidr: fix backtrack logic in idr_remove_all
Imre Deak [Wed, 26 May 2010 21:43:38 +0000]
idr: fix backtrack logic in idr_remove_all

Currently idr_remove_all will fail with a use after free error if
idr::layers is bigger than 2, which on 32 bit systems corresponds to items
more than 1024.  This is due to stepping back too many levels during
backtracking.  For simplicity let's assume that IDR_BITS=1 -> we have 2
nodes at each level below the root node and each leaf node stores two IDs.
 (In reality for 32 bit systems IDR_BITS=5, with 32 nodes at each sub-root
level and 32 IDs in each leaf node).  The sequence of freeing the nodes at
the moment is as follows:

layer
1 ->                       a(7)
2 ->            b(3)                  c(5)
3 ->        d(1)   e(2)           f(4)    g(6)

Until step 4 things go fine, but then node c is freed, whereas node g
should be freed first.  Since node c contains the pointer to node g we'll
have a use after free error at step 6.

How many levels we step back after visiting the leaf nodes is currently
determined by the msb of the id we are currently visiting:

Step
1.          node d with IDs 0,1 is freed, current ID is advanced to 2.
            msb of the current ID bit 1. This means we need to step back
            1 level to node b and take the next sibling, node e.
2-3.        node e with IDs 2,3 is freed, current ID is 4, msb is bit 2.
            This means we need to step back 2 levels to node a, freeing
            node b on the way.
4-5.        node f with IDs 4,5 is freed, current ID is 6, msb is still
            bit 2. This means we again need to step back 2 levels to node
            a and free c on the way.
6.          We should visit node g, but its pointer is not available as
            node c was freed.

The fix changes how we determine the number of levels to step back.
Instead of deducting this merely from the msb of the current ID, we should
really check if advancing the ID causes an overflow to a bit position
corresponding to a given layer.  In the above example overflow from bit 0
to bit 1 should mean stepping back 1 level.  Overflow from bit 1 to bit 2
should mean stepping back 2 levels and so on.

The fix was tested with IDs up to 1 << 20, which corresponds to 4 layers
on 32 bit systems.

Signed-off-by: Imre Deak <imre.deak@nokia.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: <stable@kernel.org> [2.6.34.1]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agocpuhotplug: do not need cpu_hotplug_begin() when CONFIG_HOTPLUG_CPU=n
Lai Jiangshan [Wed, 26 May 2010 21:43:36 +0000]
cpuhotplug: do not need cpu_hotplug_begin() when CONFIG_HOTPLUG_CPU=n

Since when CONFIG_HOTPLUG_CPU=n, get_online_cpus() do nothing, so we don't
need cpu_hotplug_begin() either.

This patch moves cpu_hotplug_begin()/cpu_hotplug_done() into the code
block of CONFIG_HOTPLUG_CPU=y.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agofault-injection: add CPU notifier error injection module
Akinobu Mita [Wed, 26 May 2010 21:43:36 +0000]
fault-injection: add CPU notifier error injection module

I used this module to test the series of modification to the cpu notifiers
code.

Example1: inject CPU offline error (-1 == -EPERM)

# modprobe cpu-notifier-error-inject cpu_down_prepare_error=-1
# echo 0 > /sys/devices/system/cpu/cpu1/online
bash: echo: write error: Operation not permitted

Example2: inject CPU online error (-2 == -ENOENT)

# modprobe cpu-notifier-error-inject cpu_up_prepare_error=-2
# echo 1 > /sys/devices/system/cpu/cpu1/online
bash: echo: write error: No such file or directory

[akpm@linux-foundation.org: fix Kconfig help text]
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomd: convert cpu notifier to return encapsulate errno value
Akinobu Mita [Wed, 26 May 2010 21:43:35 +0000]
md: convert cpu notifier to return encapsulate errno value

By the previous modification, the cpu notifier can return encapsulate
errno value.  This converts the cpu notifiers for raid5.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agos390: convert cpu notifier to return encapsulate errno value
Akinobu Mita [Wed, 26 May 2010 21:43:34 +0000]
s390: convert cpu notifier to return encapsulate errno value

By the previous modification, the cpu notifier can return encapsulate
errno value.  This converts the cpu notifiers for s390.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoehca: convert cpu notifier to return encapsulate errno value
Akinobu Mita [Wed, 26 May 2010 21:43:34 +0000]
ehca: convert cpu notifier to return encapsulate errno value

By the previous modification, the cpu notifier can return encapsulate
errno value. This converts the cpu notifiers for ehca.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoiucv: convert cpu notifier to return encapsulate errno value
Akinobu Mita [Wed, 26 May 2010 21:43:33 +0000]
iucv: convert cpu notifier to return encapsulate errno value

By the previous modification, the cpu notifier can return encapsulate
errno value.  This converts the cpu notifiers for iucv.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoslab: convert cpu notifier to return encapsulate errno value
Akinobu Mita [Wed, 26 May 2010 21:43:32 +0000]
slab: convert cpu notifier to return encapsulate errno value

By the previous modification, the cpu notifier can return encapsulate
errno value.  This converts the cpu notifiers for slab.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agokernel/: convert cpu notifier to return encapsulate errno value
Akinobu Mita [Wed, 26 May 2010 21:43:32 +0000]
kernel/: convert cpu notifier to return encapsulate errno value

By the previous modification, the cpu notifier can return encapsulate
errno value.  This converts the cpu notifiers for kernel/*.c

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agotopology: convert cpu notifier to return encapsulate errno value
Akinobu Mita [Wed, 26 May 2010 21:43:31 +0000]
topology: convert cpu notifier to return encapsulate errno value

By the previous modification, the cpu notifier can return encapsulate
errno value.  This converts the cpu notifiers for topology.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agox86: convert cpu notifier to return encapsulate errno value
Akinobu Mita [Wed, 26 May 2010 21:43:30 +0000]
x86: convert cpu notifier to return encapsulate errno value

By the previous modification, the cpu notifier can return encapsulate
errno value.  This converts the cpu notifiers for msr, cpuid, and
therm_throt.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>