9 years agodevcgroup_inode_permission: take "is it a device node" checks to inlined wrapper
Al Viro [Sun, 19 Jun 2011 17:01:04 +0000]
devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper

inode_permission() calls devcgroup_inode_permission() and almost all such
calls are _not_ for device nodes; let's at least keep the common path

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agofix comment in generic_permission()
Al Viro [Sun, 19 Jun 2011 05:56:53 +0000]
fix comment in generic_permission()

CAP_DAC_OVERRIDE is enough for MAY_EXEC on directory, even if
no exec bits are set.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agokill obsolete comment for follow_down()
Al Viro [Fri, 17 Jun 2011 23:20:48 +0000]
kill obsolete comment for follow_down()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agoproc_sys_permission() is OK in RCU mode
Al Viro [Sun, 19 Jun 2011 00:42:00 +0000]
proc_sys_permission() is OK in RCU mode

nothing blocking there, since all instances of sysctl
->permissions() method are non-blocking - both of them,
that is.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agoreiserfs_permission() doesn't need to bail out in RCU mode
Al Viro [Sun, 19 Jun 2011 00:37:33 +0000]
reiserfs_permission() doesn't need to bail out in RCU mode

nothing blocking other than generic_permission() (and
check_acl callback does bail out in RCU mode).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agoproc_fd_permission() is doesn't need to bail out in RCU mode
Al Viro [Sun, 19 Jun 2011 00:35:23 +0000]
proc_fd_permission() is doesn't need to bail out in RCU mode

nothing blocking except generic_permission()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agonilfs2_permission() doesn't need to bail out in RCU mode
Al Viro [Sun, 19 Jun 2011 00:21:44 +0000]
nilfs2_permission() doesn't need to bail out in RCU mode

Nothing blocking except for generic_permission().  Which will DTRT.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agologfs doesn't need ->permission() at all
Al Viro [Sun, 19 Jun 2011 00:17:22 +0000]
logfs doesn't need ->permission() at all

... and never did, what with its ->permission() being what we do by default
when ->permission is NULL...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agocoda_ioctl_permission() is safe in RCU mode
Al Viro [Sun, 19 Jun 2011 00:11:43 +0000]
coda_ioctl_permission() is safe in RCU mode

return (mask & MAY_EXEC) ? -EACCES : 0; is non-blocking...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agocifs_permission() doesn't need to bail out in RCU mode
Al Viro [Sun, 19 Jun 2011 00:03:36 +0000]
cifs_permission() doesn't need to bail out in RCU mode

nothing potentially blocking except generic_permission(), which
will DTRT

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agobad_inode_permission() is safe from RCU mode
Al Viro [Sat, 18 Jun 2011 23:59:04 +0000]
bad_inode_permission() is safe from RCU mode

return -EIO; is *not* a blocking operation, thank you very much.
Nick, what the hell have you been smoking?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agoubifs: dereferencing an ERR_PTR in ubifs_mount()
Dan Carpenter [Mon, 20 Jun 2011 07:10:24 +0000]
ubifs: dereferencing an ERR_PTR in ubifs_mount()

d251ed271d5 "ubifs: fix sget races" left out the goto from this
error path so the static checkers complain that we're dereferencing
"sb" when it's an ERR_PTR.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd
Linus Torvalds [Thu, 16 Jun 2011 22:02:20 +0000]
Merge git://git./linux/kernel/git/ebiederm/linux-2.6-nsfd

* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd:
  proc: Fix Oops on stat of /proc/<zombie pid>/ns/net

9 years agomigrate: don't account swapcache as shmem
Andrea Arcangeli [Thu, 16 Jun 2011 19:56:19 +0000]
migrate: don't account swapcache as shmem

swapcache will reach the below code path in migrate_page_move_mapping,
and swapcache is accounted as NR_FILE_PAGES but it's not accounted as

Hugh pointed out we must use PageSwapCache instead of comparing
mapping to &swapper_space, to avoid build failure with CONFIG_SWAP=n.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoMerge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuil...
Linus Torvalds [Thu, 16 Jun 2011 17:26:58 +0000]
Merge branch 'rc-fixes' of git://git./linux/kernel/git/mmarek/kbuild-2.6

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
  kbuild: Call depmod.sh via shell
  perf: clear out make flags when calling kernel make kernelver

9 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
Linus Torvalds [Thu, 16 Jun 2011 17:21:59 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  AFS: Use i_generation not i_version for the vnode uniquifier
  AFS: Set s_id in the superblock to the volume name
  vfs: Fix data corruption after failed write in __block_write_begin()
  afs: afs_fill_page reads too much, or wrong data
  VFS: Fix vfsmount overput on simultaneous automount
  fix wrong iput on d_inode introduced by e6bc45d65d
  Delay struct net freeing while there's a sysfs instance refering to it
  afs: fix sget() races, close leak on umount
  ubifs: fix sget races
  ubifs: split allocation of ubifs_info into a separate function
  fix leak in proc_set_super()

9 years agoMerge branch 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 16 Jun 2011 16:46:24 +0000]
Merge branch 'sh-fixes-for-linus' of git://git./linux/kernel/git/lethal/sh-3.x

* 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-3.x:
  sh: sh7724: Add USBHS DMAEngine support
  sh: ecovec: Add renesas_usbhs support
  sh, exec: remove redundant set_fs(USER_DS)
  drivers: sh: resume enabled clocks fix
  dmaengine: shdma: SH_DMAC_MAX_CHANNELS message fix
  sh: Fix up xchg/cmpxchg corruption with gUSA RB.
  sh: Remove compressed kernel libgcc dependency.
  sh: fix wrong icache/dcache address-array start addr in cache-debugfs.

9 years agoMerge branch 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 16 Jun 2011 16:46:08 +0000]
Merge branch 'rmobile-fixes-for-linus' of git://git./linux/kernel/git/lethal/sh-3.x

* 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-3.x:
  ARM: mach-shmobile: mackerel: tidyup usbhs driver settings
  ARM: mach-shmobile: Correct SCIF port types for SH7367.
  ARM: mach-shmobile: sh73a0 gic_arch_extn.irq_set_wake() fix
  ARM: mach-shmobile: Mackerel USB platform data update
  ARM: mach-shmobile: AG5EVM SDHI1 platform data update

9 years agoMerge branch 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 16 Jun 2011 16:45:47 +0000]
Merge branch 'fbdev-fixes-for-linus' of git://git./linux/kernel/git/lethal/fbdev-3.x

* 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-3.x:
  fbdev: sh_mobile_hdmi: fix regression: statically enable RTPM
  fbdev/atyfb: Fix 2 defined-but-not-used warnings
  efifb: Fix call to wrong unregister function
  video: s3c-fb: move enabling channel for window
  video: s3c-fb: fix virtual resolution checking
  video: s3c-fb: fix misleading kfree in remove function

9 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Thu, 16 Jun 2011 16:44:20 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  SELinux: skip file_name_trans_write() when policy downgraded.
  selinux: fix case of names with whitespace/multibytes on /selinux/create

9 years agoAFS: Use i_generation not i_version for the vnode uniquifier
David Howells [Mon, 13 Jun 2011 23:45:44 +0000]
AFS: Use i_generation not i_version for the vnode uniquifier

Store the AFS vnode uniquifier in the i_generation field, not the i_version
field of the inode struct.  i_version can then be given the AFS data version

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agoAFS: Set s_id in the superblock to the volume name
David Howells [Mon, 13 Jun 2011 23:38:44 +0000]
AFS: Set s_id in the superblock to the volume name

Set s_id in the superblock to the name of the AFS volume that this superblock
corresponds to.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agovfs: Fix data corruption after failed write in __block_write_begin()
Jan Kara [Mon, 13 Jun 2011 22:58:27 +0000]
vfs: Fix data corruption after failed write in __block_write_begin()

I've got a report of a file corruption from fsxlinux on ext3. The important
operations to the page were:
mapwrite to a hole
partial write to the page
read - found the page zeroed from the end of the normal write

The culprit seems to be that if get_block() fails in __block_write_begin()
(e.g. transient ENOSPC in ext3), the function does ClearPageUptodate(page).
Thus when we retry the write, the logic in __block_write_begin() thinks zeroing
of the page is needed and overwrites old data.  In fact, I don't see why we
should ever need to zero the uptodate bit here - either the page was uptodate
when we entered __block_write_begin() and it should stay so when we leave it,
or it was not uptodate and noone had right to set it uptodate during
__block_write_begin() so it remains !uptodate when we leave as well. So just
remove clearing of the bit.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agoafs: afs_fill_page reads too much, or wrong data
Anton Blanchard [Mon, 13 Jun 2011 21:31:12 +0000]
afs: afs_fill_page reads too much, or wrong data

afs_fill_page should read the page that is about to be written but
the current implementation has a number of issues. If we aren't
extending the file we always read PAGE_CACHE_SIZE at offset 0. If we
are extending the file we try to read the entire file.

Change afs_fill_page to read PAGE_CACHE_SIZE at the right offset,
clamped to i_size.

While here, avoid calling afs_fill_page when we are doing a

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agostaging: fix iio builds when IIO_RING_BUFFER is not enabled
Randy Dunlap [Tue, 14 Jun 2011 22:00:18 +0000]
staging: fix iio builds when IIO_RING_BUFFER is not enabled

Fix build by moving enum list outside of

  drivers/staging/iio/accel/adis16201_core.c:413: error: 'ADIS16201_SCAN_SUPPLY' undeclared here (not in a function)
  drivers/staging/iio/accel/adis16201_core.c:417: error: 'ADIS16201_SCAN_TEMP' undeclared here (not in a function)

  drivers/staging/iio/accel/adis16203_core.c:374: error: 'ADIS16203_SCAN_SUPPLY' undeclared here (not in a function)
  drivers/staging/iio/accel/adis16203_core.c:378: error: 'ADIS16203_SCAN_AUX_ADC' undeclared here (not in a function)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: linux-iio@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoVFS: Fix vfsmount overput on simultaneous automount
Al Viro [Thu, 16 Jun 2011 14:10:06 +0000]
VFS: Fix vfsmount overput on simultaneous automount

[Kudos to dhowells for tracking that crap down]

If two processes attempt to cause automounting on the same mountpoint at the
same time, the vfsmount holding the mountpoint will be left with one too few
references on it, causing a BUG when the kernel tries to clean up.

The problem is that lock_mount() drops the caller's reference to the
mountpoint's vfsmount in the case where it finds something already mounted on
the mountpoint as it transits to the mounted filesystem and replaces path->mnt
with the new mountpoint vfsmount.

During a pathwalk, however, we don't take a reference on the vfsmount if it is
the same as the one in the nameidata struct, but do_add_mount() doesn't know

The fix is to make sure we have a ref on the vfsmount of the mountpoint before
calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
left with an extra ref on the mountpoint vfsmount which needs releasing.
We can handle that in follow_managed() by not making assumptions about what
we can and what we cannot get from lookup_mnt() as the current code does.

The callers of follow_managed() expect that reference to path->mnt will be
grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
keep track of whether such reference has been grabbed and assume that it'll
happen in those and only those cases that'll have us return with changed
path->mnt.  That assumption is almost correct - it breaks in case of
racing automounts and in even harder to hit race between following a mountpoint
and a couple of mount --move.  The thing is, we don't need to make that
assumption at all - after the end of loop in follow_manage() we can check
if path->mnt has ended up unchanged and do mntput() if needed.

The BUG can be reproduced with the following test program:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <sys/wait.h>
int main(int argc, char **argv)
int pid, ws;
struct stat buf;
pid = fork();
stat(argv[1], &buf);
if (pid > 0) wait(&ws);
return 0;

and the following procedure:

 (1) Mount an NFS volume that on the server has something else mounted on a
     subdirectory.  For instance, I can mount / from my server:

mount warthog:/ /mnt -t nfs4 -r

     On the server /data has another filesystem mounted on it, so NFS will see
     a change in FSID as it walks down the path, and will mark /mnt/data as
     being a mountpoint.  This will cause the automount code to be triggered.

     !!! Do not look inside the mounted fs at this point !!!

 (2) Run the above program on a file within the submount to generate two
     simultaneous automount requests:

/tmp/forkstat /mnt/data/testfile

 (3) Unmount the automounted submount:

umount /mnt/data

 (4) Unmount the original mount:

umount /mnt

     At this point the kernel should throw a BUG with something like the

BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]

Note that the bug appears on the root dentry of the original mount, not the
mountpoint and not the submount because sys_umount() hasn't got to its final
mntput_no_expire() yet, but this isn't so obvious from the call trace:

 [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
 [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
 [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
 [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
 [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
 [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
 [<ffffffff811862a8>] mntput+0x3b/0x44
 [<ffffffff81186d87>] release_mounts+0xa2/0xbf
 [<ffffffff811876af>] sys_umount+0x47a/0x4ba
 [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
 [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b

as do_umount() is inlined.  However, you can see release_mounts() in there.

Note also that it may be necessary to have multiple CPU cores to be able to
trigger this bug.

Tested-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agofix wrong iput on d_inode introduced by e6bc45d65d
Török Edwin [Wed, 15 Jun 2011 21:06:14 +0000]
fix wrong iput on d_inode introduced by e6bc45d65d

Git bisection shows that commit e6bc45d65df8599fdbae73be9cec4ceed274db53 causes
BUG_ONs under high I/O load:

kernel BUG at fs/inode.c:1368!
[ 2862.501007] Call Trace:
[ 2862.501007]  [<ffffffff811691d8>] d_kill+0xf8/0x140
[ 2862.501007]  [<ffffffff81169c19>] dput+0xc9/0x190
[ 2862.501007]  [<ffffffff8115577f>] fput+0x15f/0x210
[ 2862.501007]  [<ffffffff81152171>] filp_close+0x61/0x90
[ 2862.501007]  [<ffffffff81152251>] sys_close+0xb1/0x110
[ 2862.501007]  [<ffffffff814c14fb>] system_call_fastpath+0x16/0x1b

A reliable way to reproduce this bug is:
Login to KDE, run 'rsnapshot sync', and apt-get install openjdk-6-jdk,
and apt-get remove openjdk-6-jdk.

The buggy part of the patch is this:
struct inode *inode = NULL;
-               if (nd.last.name[nd.last.len])
-                       goto slashes;
                inode = dentry->d_inode;
-               if (inode)
-                       ihold(inode);
+               if (nd.last.name[nd.last.len] || !inode)
+                       goto slashes;
+               ihold(inode)
if (inode)
iput(inode); /* truncate the inode here */

If nd.last.name[nd.last.len] is nonzero (and thus goto slashes branch is taken),
and dentry->d_inode is non-NULL, then this code now does an additional iput on
the inode, which is wrong.

Fix this by only setting the inode variable if nd.last.name[nd.last.len] is 0.

Reference: https://lkml.org/lkml/2011/6/15/50
Reported-by: Norbert Preining <preining@logic.at>
Reported-by: Török Edwin <edwintorok@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9 years agomm: get rid of the most spurious find_vma_prev() users
Linus Torvalds [Thu, 16 Jun 2011 07:35:09 +0000]
mm: get rid of the most spurious find_vma_prev() users

We have some users of this function that date back to before the vma
list was doubly linked, and just are silly.  These days, you can find
the previous vma by just following the vma->vm_prev pointer.

In some cases you don't need any find_vma() lookup at all, and in other
cases you're better off with the regular "find_vma()" that uses the vma
cache front-end lookup.

Some "find_vma_prev()" users are still valid, though.  For example, in
the case of a stack that grows up, it can be the case that we don't find
any 'vma' at all (because we're looking up an address that is past the
last vma), and that the stack that we want to grow is the 'prev' vma.

But that kind of special case aside, we generally should prefer to use

Noticed due to a totally unrelated POWER memory corruption bug that just
happened to hit in 'find_vma_prev()' and made me go "Hmm - why are we
using that function here?".

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agosh: sh7724: Add USBHS DMAEngine support
Kuninori Morimoto [Wed, 15 Jun 2011 06:08:28 +0000]
sh: sh7724: Add USBHS DMAEngine support

Signed-off-by: Kuninori Morimoto <morimoto.kuninori@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

9 years agosh: ecovec: Add renesas_usbhs support
Kuninori Morimoto [Wed, 15 Jun 2011 06:08:18 +0000]
sh: ecovec: Add renesas_usbhs support

Signed-off-by: Kuninori Morimoto <morimoto.kuninori@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

9 years agoMerge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm
Linus Torvalds [Thu, 16 Jun 2011 05:01:36 +0000]
Merge branch 'fixes' of /home/rmk/linux-2.6-arm

* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: footbridge: fix clock event support
  ARM: footbridge: fix debug macros
  ARM: initrd: disable initrds outside of memory
  ARM: extend Code: line by one 16-bit quantity for Thumb instructions
  ARM: 6955/1: cmpxchg syscall should data abort if page not write
  ARM: 6954/1: zImage: fix Thumb2 breakage
  ARM: 6953/1: DT: don't try to access physical address zero
  ARM: 6949/2: mach-u300: fix compilaton warning in IO accessors
  Revert "ARM: 6944/1: mm: allow ASID 0 to be allocated to tasks"
  Revert "ARM: 6943/1: mm: use TTBR1 instead of reserved context ID"
  davinci: make PCM platform devices static
  arm: davinci: Fix fallout from generic irq chip conversion
  ARM: 6894/1: mmci: trigger card detect IRQs on falling and rising edges
  ARM: 6952/1: fix lockdep warning of "unannotated irqs-off"
  ARM: 6951/1: include .bss in memory layout information
  ARM: 6948/1: Fix .size directives for __arm{7,9}tdmi_proc_info
  ARM: 6947/2: mach-u300: fix compilation error in timer
  ARM: 6946/1: vexpress: move v2m clock init to init_early
  ARM: mx51/sdma: Check the chip revision in run-time
  arm: mxs: include asm/processor.h for cpu_relax()

9 years agoRevert "fs/exec.c: use BUILD_BUG_ON for VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP"
Linus Torvalds [Thu, 16 Jun 2011 04:53:52 +0000]

This reverts commit 7f81c8890c15a10f5220bebae3b6dfae4961962a.

It turns out that it's not actually a build-time check on x86-64 UML,
which does some seriously crazy stuff with VM_STACK_FLAGS.

The VM_STACK_FLAGS define depends on the arch-supplied
VM_STACK_DEFAULT_FLAGS value, and on x86-64 UML we have


(test_thread_flag(TIF_IA32) ? vm_stack_flags32 : vm_stack_flags)

#define VM_STACK_DEFAULT_FLAGS vm_stack_flags

(yes, seriously: two different #define's for that thing, with the first
one being inside an "#ifdef TIF_IA32")

It's possible that it is UML that should just be fixed in this area, but
for now let's just undo the (very small) optimization.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoDocumentation: fix cgroup typos and formatting
Jörg Sommer [Wed, 15 Jun 2011 20:00:47 +0000]
Documentation: fix cgroup typos and formatting

Fix format and spelling.

Signed-off-by: Jörg Sommer <joerg@alea.gnuu.de>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoDocumentation: update cgroupfs mount point
Jörg Sommer [Wed, 15 Jun 2011 19:59:45 +0000]
Documentation: update cgroupfs mount point

According to commit 676db4af0430 ("cgroupfs: create /sys/fs/cgroup to
mount cgroupfs on") the canonical mountpoint for the cgroup filesystem
is /sys/fs/cgroup.  Hence, this should be used in the documentation.

Signed-off-by: Jörg Sommer <joerg@alea.gnuu.de>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoDocumentation: update kmemleak supported archs
Maxin B. John [Wed, 15 Jun 2011 19:58:29 +0000]
Documentation: update kmemleak supported archs

Instead of listing the architectures that are supported by
kmemleak in Documentation/kmemleak.txt, just refer people to
the list of supported architecutures in lib/Kconfig.debug so
that Documentation/kmemleak.txt does not need more updates
for this.

Signed-off-by: Maxin B. John <maxin.john@gmail.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoDocumentation: update printk-formats.txt
Andrew Murray [Wed, 15 Jun 2011 19:57:09 +0000]
Documentation: update printk-formats.txt

This patch updates the incomplete documentation concerning the printk
extended format specifiers.

Signed-off-by: Andrew Murray <amurray@mpc-data.co.uk>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 16 Jun 2011 04:45:18 +0000]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Check if lowest_mask is initialized in find_lowest_rq()
  sched: Fix need_resched() when checking peempt

9 years agoalpha: fix several security issues
Dan Rosenberg [Wed, 15 Jun 2011 22:09:01 +0000]
alpha: fix several security issues

Fix several security issues in Alpha-specific syscalls.  Untested, but
mostly trivial.

1. Signedness issue in osf_getdomainname allows copying out-of-bounds
kernel memory to userland.

2. Signedness issue in osf_sysinfo allows copying large amounts of
kernel memory to userland.

3. Typo (?) in osf_getsysinfo bounds minimum instead of maximum copy
size, allowing copying large amounts of kernel memory to userland.

4. Usage of user pointer in osf_wait4 while under KERNEL_DS allows
privilege escalation via writing return value of sys_wait4 to kernel

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agodrivers/misc/apds990x.c: apds990x_chip_on() should depend on CONFIG_PM || CONFIG_PM_R...
Geert Uytterhoeven [Wed, 15 Jun 2011 22:08:59 +0000]
drivers/misc/apds990x.c: apds990x_chip_on() should depend on CONFIG_PM || CONFIG_PM_RUNTIME

Fixes this warning:

  drivers/misc/apds990x.c: At top level:
  drivers/misc/apds990x.c:613: warning: `apds990x_chip_on' defined but not used

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Samu Onkalo <samu.p.onkalo@nokia.com>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoksm: fix NULL pointer dereference in scan_get_next_rmap_item()
Hugh Dickins [Wed, 15 Jun 2011 22:08:58 +0000]
ksm: fix NULL pointer dereference in scan_get_next_rmap_item()

Andrea Righi reported a case where an exiting task can race against
ksmd::scan_get_next_rmap_item (http://lkml.org/lkml/2011/6/1/742) easily
triggering a NULL pointer dereference in ksmd.

ksm_scan.mm_slot == &ksm_mm_head with only one registered mm

CPU 1 (__ksm_exit) CPU 2 (scan_get_next_rmap_item)
  list_empty() is false
lock slot == &ksm_mm_head
(list now empty)
slot = list_entry(slot->mm_list.next)
(list is empty, so slot is still ksm_mm_head)
slot->mm == NULL ... Oops

Close this race by revalidating that the new slot is not simply the list
head again.

Andrea's test case:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>

#define BUFSIZE getpagesize()

int main(int argc, char **argv)
void *ptr;

if (posix_memalign(&ptr, getpagesize(), BUFSIZE) < 0) {
if (madvise(ptr, BUFSIZE, MADV_MERGEABLE) < 0) {
*(char *)NULL = 0;

return 0;

Reported-by: Andrea Righi <andrea@betterlinux.com>
Tested-by: Andrea Righi <andrea@betterlinux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agortc: fix build warnings in defconfigs
Wanlong Gao [Wed, 15 Jun 2011 22:08:56 +0000]
rtc: fix build warnings in defconfigs

RTC_CLASS is changed to bool, so 'm' is invalid.

Signed-off-by: Wanlong Gao <wanlong.gao@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agodrivers/tty/serial/pch_uart.c: don't oops if dmi_get_system_info returns NULL
Alexander Stein [Wed, 15 Jun 2011 22:08:55 +0000]
drivers/tty/serial/pch_uart.c: don't oops if dmi_get_system_info returns NULL

If dmi_get_system_info() returns NULL, pch_uart_init_port() will
dereferencea a zero pointer.

This oops was observed on an Atom based board which has no BIOS, but
a bootloder which doesn't provide DMI data.

Signed-off-by: Alexander Stein <alexander.stein@systec-electronic.com>
Cc: Greg KH <gregkh@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agodrivers/char/hpet.c: fix periodic-emulation for delayed interrupts
Nils Carlson [Wed, 15 Jun 2011 22:08:54 +0000]
drivers/char/hpet.c: fix periodic-emulation for delayed interrupts

When interrupts are delayed due to interrupt masking or due to other
interrupts being serviced the HPET periodic-emuation would fail.  This
happened because given an interval t and a time for the current interrupt
m we would compute the next time as t + m.  This works until we are
delayed for > t, in which case we would be writing a new value which is in
fact in the past.

This can be solved by computing the next time instead as (k * t) + m where
k is large enough to be in the future.  The exact computation of k is
described in a comment to the code.

More detail:

Assuming an interval of 5 between each expected interrupt we have a normal
case of

t0: interrupt, read t0 from comparator, set next interrupt t0 + 5
t5: interrupt, read t5 from comparator, set next interrupt t5 + 5
t10: interrupt, read t10 from comparator, set next interrupt t10 + 5

So, what happens when the interrupt is serviced too late?

t0: interrupt, read t0 from comparator, set next interrupt t0 + 5
t11: delayed interrupt serviced, read t5 from comparator, set next
interrupt t5 + 5, which is in the past!
... counter loops ...
t10: Much much later, get the next interrupt.

This can happen either because we have interrupts masked for too long
(some stupid driver goes on a printk rampage) or just because we are
pushing the limits of the interval (too small a period), or both most

My solution is to read the main counter as well and set the next interrupt
to occur at the right interval, for example:

t0: interrupt, read t0 from comparator, set next interrupt t0 + 5
t11: delayed interrupt serviced, read t5 from comparator, set next
interrupt t15 as t10 has been missed.
t15: back on track.

Signed-off-by: Nils Carlson <nils.carlson@ericsson.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoDocumentation/feature-removal-schedule.txt: remove ns_cgroup from feature-removal...
akpm@linux-foundation.org [Wed, 15 Jun 2011 22:08:52 +0000]
Documentation/feature-removal-schedule.txt: remove ns_cgroup from feature-removal-schedule.txt

Commit a77aea92010acf ("cgroup: remove the ns_cgroup") removed the
ns_cgroup but it forgot to remove the related doc in

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomm: compaction: abort compaction if too many pages are isolated and caller is asynchr...
Mel Gorman [Wed, 15 Jun 2011 22:08:52 +0000]
mm: compaction: abort compaction if too many pages are isolated and caller is asynchronous V2

Asynchronous compaction is used when promoting to huge pages.  This is all
very nice but if there are a number of processes in compacting memory, a
large number of pages can be isolated.  An "asynchronous" process can
stall for long periods of time as a result with a user reporting that
firefox can stall for 10s of seconds.  This patch aborts asynchronous
compaction if too many pages are isolated as it's better to fail a
hugepage promotion than stall a process.

[minchan.kim@gmail.com: return COMPACT_PARTIAL for abort]
Reported-and-tested-by: Ury Stankevich <urykhy@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomm: vmscan: do not use page_count without a page pin
Andrea Arcangeli [Wed, 15 Jun 2011 22:08:51 +0000]
mm: vmscan: do not use page_count without a page pin

It is unsafe to run page_count during the physical pfn scan because
compound_head could trip on a dangling pointer when reading
page->first_page if the compound page is being freed by another CPU.

[mgorman@suse.de: split out patch]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomm: compaction: ensure that the compaction free scanner does not move to the next...
Mel Gorman [Wed, 15 Jun 2011 22:08:50 +0000]
mm: compaction: ensure that the compaction free scanner does not move to the next zone

Compaction works with two scanners, a migration and a free scanner.  When
the scanners crossover, migration within the zone is complete.  The
location of the scanner is recorded on each cycle to avoid excesive

When a zone is small and mostly reserved, it's very easy for the migration
scanner to be close to the end of the zone.  Then the following situation
can occurs

  o migration scanner isolates some pages near the end of the zone
  o free scanner starts at the end of the zone but finds that the
    migration scanner is already there
  o free scanner gets reinitialised for the next cycle as
    cc->migrate_pfn + pageblock_nr_pages
    moving the free scanner into the next zone
  o migration scanner moves into the next zone

When this happens, NR_ISOLATED accounting goes haywire because some of the
accounting happens against the wrong zone.  One zones counter remains
positive while the other goes negative even though the overall global
count is accurate.  This was reported on X86-32 with !SMP because !SMP
allows the negative counters to be visible.  The fact that it is the bug
should theoritically be possible there.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agocompaction: checks correct fragmentation index
Shaohua Li [Wed, 15 Jun 2011 22:08:49 +0000]
compaction: checks correct fragmentation index

fragmentation_index() returns -1000 when the allocation might succeed
This doesn't match the comment and code in compaction_suitable(). I
thought compaction_suitable should return COMPACT_PARTIAL in -1000
case, because in this case allocation could succeed depending on

The impact of this is that compaction starts and compact_finished() is
called which rechecks the watermarks and the free lists.  It should have
the same result in that compaction should not start but is more expensive.

Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomm/memory-failure.c: fix page isolated count mismatch
Minchan Kim [Wed, 15 Jun 2011 22:08:48 +0000]
mm/memory-failure.c: fix page isolated count mismatch

Pages isolated for migration are accounted with the vmstat counters
NR_ISOLATE_[ANON|FILE].  Callers of migrate_pages() are expected to
increment these counters when pages are isolated from the LRU.  Once the
pages have been migrated, they are put back on the LRU or freed and the
isolated count is decremented.

Memory failure is not properly accounting for pages it isolates causing
the NR_ISOLATED counters to be negative.  On SMP builds, this goes
unnoticed as negative counters are treated as 0 due to expected per-cpu
drift.  On UP builds, the counter is treated by too_many_isolated() as a
large value causing processes to enter D state during page reclaim or
compaction.  This patch accounts for pages isolated by memory failure

[mel@csn.ul.ie: rewrote changelog]
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agogcov: disable CONFIG_CONSTRUCTORS when not needed by CONFIG_GCOV_KERNEL
Josh Triplett [Wed, 15 Jun 2011 22:08:47 +0000]
gcov: disable CONFIG_CONSTRUCTORS when not needed by CONFIG_GCOV_KERNEL

CONFIG_CONSTRUCTORS controls support for running constructor functions at
kernel init time.  According to commit b99b87f70c7785ab ("kernel:
constructor support"), gcov (CONFIG_GCOV_KERNEL) needs this.  However,
CONFIG_CONSTRUCTORS currently defaults to y, with no option to disable it,
and CONFIG_GCOV_KERNEL depends on it.  Instead, default it to n and have
CONFIG_GCOV_KERNEL select it, so that the normal case of

Observed in the short list of =y values in a minimal kernel configuration.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoMAINTAINERS: add entry for legacy eeprom driver
Jean Delvare [Wed, 15 Jun 2011 22:08:46 +0000]
MAINTAINERS: add entry for legacy eeprom driver

I shall maintain the legacy eeprom driver, until we finally get rid of it.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomemcg: avoid percpu cached charge draining at softlimit
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:46 +0000]
memcg: avoid percpu cached charge draining at softlimit

Based on Michal Hocko's comment.

We are not draining per cpu cached charges during soft limit reclaim
because background reclaim doesn't care about charges.  It tries to free
some memory and charges will not give any.

Cached charges might influence only selection of the biggest soft limit
offender but as the call is done only after the selection has been already
done it makes no change.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomemcg: fix percpu cached charge draining frequency
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:45 +0000]
memcg: fix percpu cached charge draining frequency

For performance, memory cgroup caches some "charge" from res_counter into
per cpu cache.  This works well but because it's cache, it needs to be
flushed in some cases.  Typical cases are

   1. when someone hit limit.

   2. when rmdir() is called and need to charges to be 0.

But "1" has problem.

Recently, with large SMP machines, we see many kworker runs because of
flushing memcg's cache.  Bad things in implementation are that even if a
cpu contains a cache for memcg not related to a memcg which hits limit,
drain code is called.

This patch does
        A) check percpu cache contains a useful data or not.
        B) check other asynchronous percpu draining doesn't run.
        C) don't call local cpu callback.

(*)This patch avoid changing the calling condition with hard-limit.

When I run "cat 1Gfile > /dev/null" under 300M limit memcg,

13767 kamezawa  20   0 98.6m  424  416 D 10.0  0.0   0:00.61 cat
   58 root      20   0     0    0    0 S  0.6  0.0   0:00.09 kworker/2:1
   60 root      20   0     0    0    0 S  0.6  0.0   0:00.08 kworker/4:1
    4 root      20   0     0    0    0 S  0.3  0.0   0:00.02 kworker/0:0
   57 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/1:1
   61 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/5:1
   62 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/6:1
   63 root      20   0     0    0    0 S  0.3  0.0   0:00.05 kworker/7:1

 2676 root      20   0 98.6m  416  416 D  9.3  0.0   0:00.87 cat
 2626 kamezawa  20   0 15192 1312  920 R  0.3  0.0   0:00.28 top
    1 root      20   0 19384 1496 1204 S  0.0  0.0   0:00.66 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kworker/0:0

[akpm@linux-foundation.org: make percpu_charge_mutex static, tweak comments]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomemcg: fix wrong check of noswap with softlimit
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:44 +0000]
memcg: fix wrong check of noswap with softlimit

Hierarchical reclaim doesn't swap out if memsw and resource limits are
thye same (memsw_is_minimum == true) because we would hit mem+swap limit
anyway (during hard limit reclaim).

If it comes to the soft limit we shouldn't consider memsw_is_minimum at
all because it doesn't make much sense.  Either the soft limit is bellow
the hard limit and then we cannot hit mem+swap limit or the direct reclaim
takes a precedence.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomemcg: clear mm->owner when last possible owner leaves
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:43 +0000]
memcg: clear mm->owner when last possible owner leaves

The following crash was reported:

> Call Trace:
> [<ffffffff81139792>] mem_cgroup_from_task+0x15/0x17
> [<ffffffff8113a75a>] __mem_cgroup_try_charge+0x148/0x4b4
> [<ffffffff810493f3>] ? need_resched+0x23/0x2d
> [<ffffffff814cbf43>] ? preempt_schedule+0x46/0x4f
> [<ffffffff8113afe8>] mem_cgroup_charge_common+0x9a/0xce
> [<ffffffff8113b6d1>] mem_cgroup_newpage_charge+0x5d/0x5f
> [<ffffffff81134024>] khugepaged+0x5da/0xfaf
> [<ffffffff81078ea0>] ? __init_waitqueue_head+0x4b/0x4b
> [<ffffffff81133a4a>] ? add_mm_counter.constprop.5+0x13/0x13
> [<ffffffff81078625>] kthread+0xa8/0xb0
> [<ffffffff814d13e8>] ? sub_preempt_count+0xa1/0xb4
> [<ffffffff814d5664>] kernel_thread_helper+0x4/0x10
> [<ffffffff814ce858>] ? retint_restore_args+0x13/0x13
> [<ffffffff8107857d>] ? __init_kthread_worker+0x5a/0x5a

What happens is that khugepaged tries to charge a huge page against an mm
whose last possible owner has already exited, and the memory controller
crashes when the stale mm->owner is used to look up the cgroup to charge.

mm->owner has never been set to NULL with the last owner going away, but
nobody cared until khugepaged came along.

Even then it wasn't a problem because the final mmput() on an mm was
forced to acquire and release mmap_sem in write-mode, preventing an
exiting owner to go away while the mmap_sem was held, and until "692e0b3
mm: thp: optimize memcg charge in khugepaged", the memory cgroup charge
was protected by mmap_sem in read-mode.

Instead of going back to relying on the mmap_sem to enforce lifetime of a
task, this patch ensures that mm->owner is properly set to NULL when the
last possible owner is exiting, which the memory controller can handle
just fine.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomemcg: fix init_page_cgroup nid with sparsemem
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:42 +0000]
memcg: fix init_page_cgroup nid with sparsemem

Commit 21a3c9646873 ("memcg: allocate memory cgroup structures in local
nodes") makes page_cgroup allocation as NUMA aware.  But that caused a
problem https://bugzilla.kernel.org/show_bug.cgi?id=36192.

The problem was getting a NID from invalid struct pages, which was not
initialized because it was out-of-node, out of [node_start_pfn,

Now, with sparsemem, page_cgroup_init scans pfn from 0 to max_pfn.  But
this may scan a pfn which is not on any node and can access memmap which
is not initialized.

This makes page_cgroup_init() for SPARSEMEM node aware and remove a code
to get nid from page->flags.  (Then, we'll use valid NID always.)

[akpm@linux-foundation.org: try to fix up comments]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomm: memory.numa_stat: fix file permission
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:41 +0000]
mm: memory.numa_stat: fix file permission

Commit 406eb0c9ba76 ("memcg: add memory.numastat api for numa
statistics") adds memory.numa_stat file for memory cgroup.  But the file
permissions are wrong.

  [kamezawa@bluextal linux-2.6]$ ls -l /cgroup/memory/A/memory.numa_stat
  ---------- 1 root root 0 Jun  9 18:36 /cgroup/memory/A/memory.numa_stat

This patch fixes the permission as

  [root@bluextal kamezawa]# ls -l /cgroup/memory/A/memory.numa_stat
  -r--r--r-- 1 root root 0 Jun 10 16:49 /cgroup/memory/A/memory.numa_stat

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoleds: fix the incorrect display in menuconfig
Eric Miao [Wed, 15 Jun 2011 22:08:40 +0000]
leds: fix the incorrect display in menuconfig

Seems when a config option does not have a dependency of the menuconfig,
it messes the display of the rest configs, even if it's a hidden one.

Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomm: fix negative commitlimit when gigantic hugepages are allocated
Rafael Aquini [Wed, 15 Jun 2011 22:08:39 +0000]
mm: fix negative commitlimit when gigantic hugepages are allocated

When 1GB hugepages are allocated on a system, free(1) reports less
available memory than what really is installed in the box.  Also, if the
total size of hugepages allocated on a system is over half of the total
memory size, CommitLimit becomes a negative number.

The problem is that gigantic hugepages (order > MAX_ORDER) can only be
allocated at boot with bootmem, thus its frames are not accounted to
'totalram_pages'.  However, they are accounted to hugetlb_total_pages()

What happens to turn CommitLimit into a negative number is this
calculation, in fs/proc/meminfo.c:

        allowed = ((totalram_pages - hugetlb_total_pages())
                * sysctl_overcommit_ratio / 100) + total_swap_pages;

A similar calculation occurs in __vm_enough_memory() in mm/mmap.c.

Also, every vm statistic which depends on 'totalram_pages' will render
confusing values, as if system were 'missing' some part of its memory.

Impact of this bug:

When gigantic hugepages are allocated and sysctl_overcommit_memory ==
OVERCOMMIT_NEVER.  In a such situation, __vm_enough_memory() goes through
the mentioned 'allowed' calculation and might end up mistakenly returning
-ENOMEM, thus forcing the system to start reclaiming pages earlier than it
would be ususal, and this could cause detrimental impact to overall
system's performance, depending on the workload.

Besides the aforementioned scenario, I can only think of this causing
annoyances with memory reports from /proc/meminfo and free(1).

[akpm@linux-foundation.org: standardize comment layout]
Reported-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Rafael Aquini <aquini@linux.com>
Acked-by: Russ Anderson <rja@sgi.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomm/memory_hotplug.c: fix building of node hotplug zonelist
KAMEZAWA Hiroyuki [Wed, 15 Jun 2011 22:08:38 +0000]
mm/memory_hotplug.c: fix building of node hotplug zonelist

During memory hotplug we refresh zonelists when we online a page in a new
zone.  It means that the node's zonelist is not initialized until pages
are onlined.  So for example, "nid" passed by MEM_GOING_ONLINE notifier
will point to NODE_DATA(nid) which has no zone fallback list.  Moreover,
if we hot-add cpu-only nodes, alloc_pages() will do no fallback.

This patch makes a zonelist when a new pgdata is available.

Note: in production, at fujitsu, memory should be onlined before cpu
      and our server didn't have any memory-less nodes and had no problems.

      But recent changes in MEM_GOING_ONLINE+page_cgroup
      will access not initialized zonelist of node.
      Anyway, there are memory-less node and we need some care.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoinit/calibrate.c: remove annoying printk
Borislav Petkov [Wed, 15 Jun 2011 22:08:37 +0000]
init/calibrate.c: remove annoying printk

Remove calibrate_delay_direct()'s KERN_DEBUG printk related to bogomips
calculation as it appears when booting every core on setups with
'ignore_loglevel' which dmesg people scan for possible issues.  As the
message doesn't show very useful information to the widest audience of
kernel boot message gazers, it should be removed.

Introduced by commit d2b463135f84 ("init/calibrate.c: fix for critical
bogoMIPS intermittent calculation failure").

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andrew Worsley <amworsley@gmail.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agow1: W1_MASTER_DS1WM should depend on GENERIC_HARDIRQS
Geert Uytterhoeven [Wed, 15 Jun 2011 22:08:35 +0000]
w1: W1_MASTER_DS1WM should depend on GENERIC_HARDIRQS

On m68k (which doesn't support generic hardirqs yet):

  drivers/w1/masters/ds1wm.c: In function `ds1wm_probe':
  drivers/w1/masters/ds1wm.c: error: implicit declaration of function `irq_set_irq_type'

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Jean-Franois Dagenais <dagenaisj@sonatest.com>
Cc: Matt Reimer <mreimer@vpop.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoinclude/asm-generic/pgtable.h: fix unbalanced parenthesis
Nicolas Kaiser [Wed, 15 Jun 2011 22:08:34 +0000]
include/asm-generic/pgtable.h: fix unbalanced parenthesis

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoMAINTAINERS: add videobuf2 maintainers
Pawel Osciak [Wed, 15 Jun 2011 22:08:32 +0000]
MAINTAINERS: add videobuf2 maintainers

Add maintainers for the videobuf2 V4L2 driver framework.

Signed-off-by: Pawel Osciak <pawel@osciak.com>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoleds: move LEDS_GPIO_REGISTER out of menuconfig NEW_LEDS
Uwe Kleine-König [Wed, 15 Jun 2011 22:08:31 +0000]
leds: move LEDS_GPIO_REGISTER out of menuconfig NEW_LEDS

Commit 4440673a95e6 ("leds: provide helper to register "leds-gpio"
devices") broke the display of the NEW_LEDS menu as it didn't depend on
NEW_LEDS and so made "LED drivers" and "LED Triggers" appear at the same
level as "LED Support" instead of below it as it was before 4440673a.

Moving LEDS_GPIO_REGISTER out of the menuconfig NEW_LEDS fixes this
unintended side effect.

Reported-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agodrivers/leds/leds-asic3: make LEDS_ASIC3 depend on LEDS_CLASS
Axel Lin [Wed, 15 Jun 2011 22:08:31 +0000]
drivers/leds/leds-asic3: make LEDS_ASIC3 depend on LEDS_CLASS

We call led_classdev_unregister/led_classdev_register in
asic3_led_remove/asic3_led_probe, thus make LEDS_ASIC3 depend on

This patch fixes below build error if LEDS_CLASS is not configured.

    LD      .tmp_vmlinux1
  drivers/built-in.o: In function `asic3_led_remove':
  clkdev.c:(.devexit.text+0x1860): undefined reference to `led_classdev_unregister'
  drivers/built-in.o: In function `asic3_led_probe':
  clkdev.c:(.devinit.text+0xcee8): undefined reference to `led_classdev_register'
  make: *** [.tmp_vmlinux1] Error 1

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Paul Parsons <lost.distance@yahoo.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoMAINTAINERS: Balbir has moved
Balbir Singh [Wed, 15 Jun 2011 22:08:30 +0000]
MAINTAINERS: Balbir has moved

Update my email address.  Email will start to the old address bouncing

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agouts: make default hostname configurable, rather than always using "(none)"
Josh Triplett [Wed, 15 Jun 2011 22:08:28 +0000]
uts: make default hostname configurable, rather than always using "(none)"

The "hostname" tool falls back to setting the hostname to "localhost" if
/etc/hostname does not exist.  Distribution init scripts have the same
fallback.  However, if userspace never calls sethostname, such as when
booting with init=/bin/sh, or otherwise booting a minimal system without
the usual init scripts, the default hostname of "(none)" remains,
unhelpfully appearing in various places such as prompts ("root@(none):~#")
and logs.  Furthermore, "(none)" doesn't typically resolve to anything

Make the default hostname configurable.  This removes the need for the
standard fallback, provides a useful default for systems that never call
sethostname, and makes minimal systems that much more useful with less
configuration.  Distributions could choose to use "localhost" here to
avoid the fallback, while embedded systems may wish to use a specific
target hostname.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: David Miller <davem@davemloft.net>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Kel Modderman <kel@otaku42.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoBUILD_BUG_ON_ZERO: fix sparse breakage
Dr. David Alan Gilbert [Wed, 15 Jun 2011 22:08:27 +0000]
BUILD_BUG_ON_ZERO: fix sparse breakage

BUILD_BUG_ON_ZERO and BUILD_BUG_ON_NULL must return values, even in the
CHECKER case otherwise various users of it become syntactically invalid.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomm: compaction: fix special case -1 order checks
Michal Hocko [Wed, 15 Jun 2011 22:08:25 +0000]
mm: compaction: fix special case -1 order checks

Commit 56de7263fcf3 ("mm: compaction: direct compact when a high-order
allocation fails") introduced a check for cc->order == -1 in
compact_finished.  We should continue compacting in that case because
the request came from userspace and there is no particular order to
compact for.  Similar check has been added by 82478fb7 (mm: compaction:
prevent division-by-zero during user-requested compaction) for

The check is, however, done after zone_watermark_ok which uses order as a
right hand argument for shifts.  Not only watermark check is pointless if
we can break out without it but it also uses 1 << -1 which is not well
defined (at least from C standard).  Let's move the -1 check above

[minchan.kim@gmail.com> - caught compaction_suitable]
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomm: fix wrong kunmap_atomic() pointer
Steven Rostedt [Wed, 15 Jun 2011 22:08:23 +0000]
mm: fix wrong kunmap_atomic() pointer

Running a ktest.pl test, I hit the following bug on x86_32:

  ------------[ cut here ]------------
  WARNING: at arch/x86/mm/highmem_32.c:81 __kunmap_atomic+0x64/0xc1()
   Hardware name:
  Modules linked in:
  Pid: 93, comm: sh Not tainted 2.6.39-test+ #1
  Call Trace:
   [<c04450da>] warn_slowpath_common+0x7c/0x91
   [<c042f5df>] ? __kunmap_atomic+0x64/0xc1
   [<c042f5df>] ? __kunmap_atomic+0x64/0xc1^M
   [<c0445111>] warn_slowpath_null+0x22/0x24
   [<c042f5df>] __kunmap_atomic+0x64/0xc1
   [<c04d4a22>] unmap_vmas+0x43a/0x4e0
   [<c04d9065>] exit_mmap+0x91/0xd2
   [<c0443057>] mmput+0x43/0xad
   [<c0448358>] exit_mm+0x111/0x119
   [<c044855f>] do_exit+0x1ff/0x5fa
   [<c0454ea2>] ? set_current_blocked+0x3c/0x40
   [<c0454f24>] ? sigprocmask+0x7e/0x8e
   [<c0448b55>] do_group_exit+0x65/0x88
   [<c0448b90>] sys_exit_group+0x18/0x1c
   [<c0c3915f>] sysenter_do_call+0x12/0x38
  ---[ end trace 8055f74ea3c0eb62 ]---

Running a ktest.pl git bisect, found the culprit: commit e303297e6c3a
("mm: extended batches for generic mmu_gather")

But although this was the commit triggering the bug, it was not the one
originally responsible for the bug.  That was commit d16dfc550f53 ("mm:
mmu_gather rework").

The code in zap_pte_range() has something that looks like the following:

pte =  pte_offset_map_lock(mm, pmd, addr, &ptl);
do {
} while (pte++, addr += PAGE_SIZE, addr != end);
pte_unmap_unlock(pte - 1, ptl);

The pte starts off pointing at the first element in the page table
directory that was returned by the pte_offset_map_lock().  When it's done
with the page, pte will be pointing to anything between the next entry and
the first entry of the next page inclusive.  By doing a pte - 1, this puts
the pte back onto the original page, which is all that pte_unmap_unlock()

In most archs (64 bit), this is not an issue as the pte is ignored in the
pte_unmap_unlock().  But on 32 bit archs, where things may be kmapped, it
is essential that the pte passed to pte_unmap_unlock() resides on the same
page that was given by pte_offest_map_lock().

The problem came in d16dfc55 ("mm: mmu_gather rework") where it introduced
a "break;" from the while loop.  This alone did not seem to easily trigger
the bug.  But the modifications made by e303297e6 caused that "break;" to
be hit on the first iteration, before the pte++.

The pte not being incremented will now cause pte_unmap_unlock(pte - 1) to
be pointing to the previous page.  This will cause the wrong page to be
unmapped, and also trigger the warning above.

The simple solution is to just save the pointer given by
pte_offset_map_lock() and use it in the unlock.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agodrivers/misc/cs5535-mfgpt.c: fix wrong if condition
Christian Gmeiner [Wed, 15 Jun 2011 22:08:22 +0000]
drivers/misc/cs5535-mfgpt.c: fix wrong if condition

Fix the wrong `if' condition for the check if the requested timer is

The bitmap avail is used to store if a timer is used already.  test_bit()
is used to check if the requested timer is available.  If a bit in the
avail bitmap is set it means that the timer is available.

The runtime effect would be that allocating a specific timer always fails
(versus telling cs5535_mfgpt_alloc_timer to allocate the first available
timer, which works).

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Andres Salomon <dilinger@queued.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agodrivers/misc/spear13xx_pcie_gadget.c: fix a memory leak in spear_pcie_gadget_probe...
Axel Lin [Wed, 15 Jun 2011 22:08:21 +0000]
drivers/misc/spear13xx_pcie_gadget.c: fix a memory leak in spear_pcie_gadget_probe error path

In the case of goto err_kzalloc, we should kfree target.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Pratyush Anand <pratyush.anand@st.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomm: increase RECLAIM_DISTANCE to 30
KOSAKI Motohiro [Wed, 15 Jun 2011 22:08:20 +0000]
mm: increase RECLAIM_DISTANCE to 30

Recently, Robert Mueller reported (http://lkml.org/lkml/2010/9/12/236)
that zone_reclaim_mode doesn't work properly on his new NUMA server (Dual
Xeon E5520 + Intel S5520UR MB).  He is using Cyrus IMAPd and it's built on
a very traditional single-process model.

  * a master process which reads config files and manages the other
  * multiple imapd processes, one per connection
  * multiple pop3d processes, one per connection
  * multiple lmtpd processes, one per connection
  * periodical "cleanup" processes.

There are thousands of independent processes.  The problem is, recent
Intel motherboard turn on zone_reclaim_mode by default and traditional
prefork model software don't work well on it.  Unfortunatelly, such models
are still typical even in the 21st century.  We can't ignore them.

This patch raises the zone_reclaim_mode threshold to 30.  30 doesn't have
any specific meaning.  but 20 means that one-hop QPI/Hypertransport and
such relatively cheap 2-4 socket machine are often used for traditional
servers as above.  The intention is that these machines don't use

Note: ia64 and Power have arch specific RECLAIM_DISTANCE definitions.
This patch doesn't change such high-end NUMA machine behavior.

Dave Hansen said:

: I know specifically of pieces of x86 hardware that set the information
: in the BIOS to '21' *specifically* so they'll get the zone_reclaim_mode
: behavior which that implies.
: They've done performance testing and run very large and scary benchmarks
: to make sure that they _want_ this turned on.  What this means for them
: is that they'll probably be de-optimized, at least on newer versions of
: the kernel.
: If you want to do this for particular systems, maybe _that_'s what we
: should do.  Have a list of specific configurations that need the
: defaults overridden either because they're buggy, or they have an
: unusual hardware configuration not really reflected in the distance
: table.

And later said:

: The original change in the hardware tables was for the benefit of a
: benchmark.  Said benchmark isn't going to get run on mainline until the
: next batch of enterprise distros drops, at which point the hardware where
: this was done will be irrelevant for the benchmark.  I'm sure any new
: hardware will just set this distance to another yet arbitrary value to
: make the kernel do what it wants.  :)
: Also, when the hardware got _set_ to this initially, I complained.  So, I
: guess I'm getting my way now, with this patch.  I'm cool with it.

Reported-by: Robert Mueller <robm@fastmail.fm>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agocheckpatch: add warning for uses of printk_ratelimit
Joe Perches [Wed, 15 Jun 2011 22:08:17 +0000]
checkpatch: add warning for uses of printk_ratelimit

Warn about uses of printk_ratelimit() because it uses a global state and
can hide subsequent useful messages.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agokmsg_dump.h: fix build when CONFIG_PRINTK is disabled
Randy Dunlap [Wed, 15 Jun 2011 22:08:17 +0000]
kmsg_dump.h: fix build when CONFIG_PRINTK is disabled

Fix <linux/kmsg_dump.h> when CONFIG_PRINTK is not enabled:

  include/linux/kmsg_dump.h:56: error: 'EINVAL' undeclared (first use in this function)
  include/linux/kmsg_dump.h:61: error: 'EINVAL' undeclared (first use in this function)

Looks like commit 595dd3d8bf95 ("kmsg_dump: fix build for
CONFIG_PRINTK=n") uses EINVAL without having the needed header file(s),
but I'm sure that I build tested that patch also.  oh well.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomemcg: add documentation for the memory.numastat API
Ying Han [Wed, 15 Jun 2011 22:08:16 +0000]
memcg: add documentation for the memory.numastat API

[akpm@linux-foundation.org: rework text, fit it into 80-cols]
Signed-off-by: Ying Han <yinghan@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agovmscan: implement swap token priority aging
KOSAKI Motohiro [Wed, 15 Jun 2011 22:08:15 +0000]
vmscan: implement swap token priority aging

While testing for memcg aware swap token, I observed a swap token was
often grabbed an intermittent running process (eg init, auditd) and they
never release a token.


Some processes (eg init, auditd, audispd) wake up when a process exiting.
And swap token can be get first page-in process when a process exiting
makes no swap token owner.  Thus such above intermittent running process
often get a token.

And currently, swap token priority is only decreased at page fault path.
Then, if the process sleep immediately after to grab swap token, the swap
token priority never be decreased.  That's obviously undesirable.

This patch implement very poor (and lightweight) priority aging.  It only
be affect to the above corner case and doesn't change swap tendency
workload performance (eg multi process qsbench load)

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agovmscan: implement swap token trace
KOSAKI Motohiro [Wed, 15 Jun 2011 22:08:14 +0000]
vmscan: implement swap token trace

This is useful for observing swap token activity.

example output:

             zsh-1845  [000]   598.962716: update_swap_token_priority:
mm=ffff88015eaf7700 old_prio=1 new_prio=0
          memtoy-1830  [001]   602.033900: update_swap_token_priority:
mm=ffff880037a45880 old_prio=947 new_prio=949
          memtoy-1830  [000]   602.041509: update_swap_token_priority:
mm=ffff880037a45880 old_prio=949 new_prio=951
          memtoy-1830  [000]   602.051959: update_swap_token_priority:
mm=ffff880037a45880 old_prio=951 new_prio=953
          memtoy-1830  [000]   602.052188: update_swap_token_priority:
mm=ffff880037a45880 old_prio=953 new_prio=955
          memtoy-1830  [001]   602.427184: put_swap_token:
             zsh-1789  [000]   602.427281: replace_swap_token:
old_token_mm=          (null) old_prio=0 new_token_mm=ffff88015eaf7018
             zsh-1789  [001]   602.433456: update_swap_token_priority:
mm=ffff88015eaf7018 old_prio=2 new_prio=4
             zsh-1789  [000]   602.437613: update_swap_token_priority:
mm=ffff88015eaf7018 old_prio=4 new_prio=6
             zsh-1789  [000]   602.443924: update_swap_token_priority:
mm=ffff88015eaf7018 old_prio=6 new_prio=8
             zsh-1789  [000]   602.451873: update_swap_token_priority:
mm=ffff88015eaf7018 old_prio=8 new_prio=10
             zsh-1789  [001]   602.462639: update_swap_token_priority:
mm=ffff88015eaf7018 old_prio=10 new_prio=12

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel<riel@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agovmscan,memcg: memcg aware swap token
KOSAKI Motohiro [Wed, 15 Jun 2011 22:08:13 +0000]
vmscan,memcg: memcg aware swap token

Currently, memcg reclaim can disable swap token even if the swap token mm
doesn't belong in its memory cgroup.  It's slightly risky.  If an admin
creates very small mem-cgroup and silly guy runs contentious heavy memory
pressure workload, every tasks are going to lose swap token and then
system may become unresponsive.  That's bad.

This patch adds 'memcg' parameter into disable_swap_token().  and if the
parameter doesn't match swap token, VM doesn't disable it.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Rik van Riel<riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agodrivers/video/backlight/adp8870_bl.c: add missed props.type conversion
Andrew Morton [Wed, 15 Jun 2011 22:08:12 +0000]
drivers/video/backlight/adp8870_bl.c: add missed props.type conversion

Cc: Michael Hennerich <michael.hennerich@analog.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agobacklight: new driver for the ADP8870 backlight devices
Michael Hennerich [Wed, 15 Jun 2011 22:08:11 +0000]
backlight: new driver for the ADP8870 backlight devices

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Michal Hocko [Wed, 15 Jun 2011 22:08:11 +0000]

Commit a8bef8ff6ea1 ("mm: migration: avoid race between shift_arg_pages()
and rmap_walk() during migration by not migrating temporary stacks")
introduced a BUG_ON() to ensure that VM_STACK_FLAGS and
VM_STACK_INCOMPLETE_SETUP do not overlap.  The check is a compile time
one, so BUILD_BUG_ON is more appropriate.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agolib/bitmap.c: fix kernel-doc notation
Randy Dunlap [Wed, 15 Jun 2011 22:08:10 +0000]
lib/bitmap.c: fix kernel-doc notation

Fix new kernel-doc warnings in lib/bitmap.c:

  Warning(lib/bitmap.c:596): No description found for parameter 'buf'
  Warning(lib/bitmap.c:596): Excess function parameter 'bp' description in '__bitmap_parselist'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomm/memory.c: fix kernel-doc notation
Randy Dunlap [Wed, 15 Jun 2011 22:08:09 +0000]
mm/memory.c: fix kernel-doc notation

Fix new kernel-doc warnings in mm/memory.c:

  Warning(mm/memory.c:1327): No description found for parameter 'tlb'
  Warning(mm/memory.c:1327): Excess function parameter 'tlbp' description in 'unmap_vmas'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agomm: remove khugepaged double thp vmstat update with CONFIG_NUMA=n
Andrea Arcangeli [Wed, 15 Jun 2011 22:08:08 +0000]
mm: remove khugepaged double thp vmstat update with CONFIG_NUMA=n

Johannes noticed the vmstat update is already taken care of by
khugepaged_alloc_hugepage() internally.  The only places that are required
to update the vmstat are the callers of alloc_hugepage (callers of
khugepaged_alloc_hugepage aren't).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Johannes Weiner <jweiner@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoproc: Fix Oops on stat of /proc/<zombie pid>/ns/net
Eric W. Biederman [Wed, 15 Jun 2011 19:47:04 +0000]
proc: Fix Oops on stat of /proc/<zombie pid>/ns/net

Don't call iput with the inode half setup to be a namespace filedescriptor.
Instead rearrange the code so that we don't initialize ei->ns_ops until
after I ns_ops->get succeeds, preventing us from invoking ns_ops->put
when ns_ops->get failed.

Reported-by: Ingo Saitz <Ingo.Saitz@stud.uni-hannover.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

9 years agokbuild: Call depmod.sh via shell
Michal Marek [Wed, 15 Jun 2011 20:15:47 +0000]
kbuild: Call depmod.sh via shell

The script has the executable bit in git, but plain old patch(1) can't
create executable files.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Marek <mmarek@suse.cz>

9 years agoperf: clear out make flags when calling kernel make kernelver
Andy Whitcroft [Wed, 15 Jun 2011 13:35:00 +0000]
perf: clear out make flags when calling kernel make kernelver

When generating the perf version from the kernel version using 'make
kernelver' it is necessary to clear out any MAKEFLAGS otherwise they may
trigger additional output which pollute the contents.

Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>

9 years agosched: Check if lowest_mask is initialized in find_lowest_rq()
Steven Rostedt [Tue, 14 Jun 2011 22:36:25 +0000]
sched: Check if lowest_mask is initialized in find_lowest_rq()

On system boot up, the lowest_mask is initialized with an
early_initcall(). But RT tasks may wake up on other
early_initcall() callers before the lowest_mask is initialized,
causing a system crash.

Commit "d72bce0e67 rcu: Cure load woes" was the first commit
to wake up RT tasks in early init. Before this commit this bug
should not happen.

Reported-by: Andrew Theurer <habanero@linux.vnet.ibm.com>
Tested-by: Andrew Theurer <habanero@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20110614223657.824872966@goodmis.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agosched: Fix need_resched() when checking peempt
Hillf Danton [Tue, 14 Jun 2011 22:36:24 +0000]
sched: Fix need_resched() when checking peempt

The RT preempt check tests the wrong task if NEED_RESCHED is
set. It currently checks the local CPU task. It is supposed to
check the task that is running on the runqueue we are about to
wake another task on.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110614223657.450239027@goodmis.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>

9 years agoARM: mach-shmobile: mackerel: tidyup usbhs driver settings
Kuninori Morimoto [Wed, 15 Jun 2011 06:16:35 +0000]
ARM: mach-shmobile: mackerel: tidyup usbhs driver settings

- usb0 pipe is same as default. own pipe config is not needed
- usb1 lost get_id function

Signed-off-by: Kuninori Morimoto <morimoto.kuninori@renesas.com>
Acked-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

9 years agofbdev: sh_mobile_hdmi: fix regression: statically enable RTPM
Guennadi Liakhovetski [Tue, 14 Jun 2011 14:27:22 +0000]
fbdev: sh_mobile_hdmi: fix regression: statically enable RTPM

A recent modification to the runtime PM code on mach-shmobile made a wrong
RTPM implementation in the sh_mobile_hdmi driver apparent, which broke
HDMI hotplug detection support on ap4evb. This patch does not implement a
proper dynamic RTPM support for sh_mobile_hdmi, instead it restores the
previous working state by statically enabling it. A more power-efficient
solution should be implemented for the next kernel version.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

9 years agosignal.c: fix kernel-doc notation
Randy Dunlap [Tue, 14 Jun 2011 22:50:11 +0000]
signal.c: fix kernel-doc notation

Fix kernel-doc warnings in signal.c:

  Warning(kernel/signal.c:2374): No description found for parameter 'nset'
  Warning(kernel/signal.c:2374): Excess function parameter 'set' description in 'sys_rt_sigprocmask'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoMerge branch 'for-linus' of git://git.infradead.org/users/eparis/selinux into for...
James Morris [Tue, 14 Jun 2011 23:41:48 +0000]
Merge branch 'for-linus' of git://git.infradead.org/users/eparis/selinux into for-linus

9 years agox86 idle: APM requires pm_idle/default_idle unconditionally when a module
Andy Whitcroft [Tue, 14 Jun 2011 19:45:10 +0000]
x86 idle: APM requires pm_idle/default_idle unconditionally when a module

[ Also from Ben Hutchings <ben@decadent.org.uk> and Vitaliy Ivanov
  <vitalivanov@gmail.com> ]

Commit 06ae40ce073d ("x86 idle: EXPORT_SYMBOL(default_idle, pm_idle)
only when APM demands it") removed the export for pm_idle/default_idle
unless the apm module was modularised and CONFIG_APM_CPU_IDLE was set.

But the apm module uses pm_idle/default_idle unconditionally,
CONFIG_APM_CPU_IDLE only affects the bios idle threshold.  Adjust the
export accordingly.

[ Used #ifdef instead of #if defined() as it's shorter, and what both
  Ben and Vitaliy used.. Andy, you're out-voted ;)    - Linus ]

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9 years agoMerge branch 'for-linus-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg...
Linus Torvalds [Tue, 14 Jun 2011 18:28:54 +0000]
Merge branch 'for-linus-fixes' of git://git./linux/kernel/git/gerg/m68knommu

* 'for-linus-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: use kernel processor defines for conditional optimizations
  m68knommu: create config options for CPU classes
  m68knommu: fix linker script exported name sections

9 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Tue, 14 Jun 2011 18:28:38 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/security-testing-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
  TOMOYO: Fix oops in tomoyo_mount_acl().

9 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt...
Linus Torvalds [Tue, 14 Jun 2011 18:25:56 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/egtvedt/avr32-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/avr32-2.6:
  avr32, exec: remove redundant set_fs(USER_DS)
  avr32: make intc_resume() return void to conform to syscore_ops
  avr32: add some more at91 to cpu.h definition
  avr32: set CONFIG_CC_OPTIMIZE_FOR_SIZE=y for all defconfigs
  avr32/at32ap: fix mapping of platform device id for USART
  avr32: fix use of non-existing portnr variable in at32_map_usart()

9 years agoMerge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied...
Linus Torvalds [Tue, 14 Jun 2011 18:25:32 +0000]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm: Compare only lower 32 bits of framebuffer map offsets
  drm/i915: Don't leak in i915_gem_shmem_pread_slow()
  drm/radeon/kms: do bounds checking for 3D_LOAD_VBPNTR and bump array limit
  drm/radeon/kms: fix mac g5 quirk
  x86/uv/x2apic: update for change in pci bridge handling.
  alpha, drm: Remove obsolete Alpha support in MGA DRM code
  alpha/drm: Cleanup Alpha support in DRM generic code
  savage: remove unnecessary if statement
  drm/radeon: fix GUI idle IH debug statements
  drm/radeon/kms: check modes against max pixel clock
  drm: fix fbs in DRM_IOCTL_MODE_GETRESOURCES ioctl