12 years agoAdd wait_event_killable
Matthew Wilcox [Thu, 6 Dec 2007 17:00:00 +0000]
Add wait_event_killable

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoAdd schedule_timeout_killable
Matthew Wilcox [Thu, 6 Dec 2007 16:59:46 +0000]
Add schedule_timeout_killable

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoUse mutex_lock_killable in vfs_readdir
Liam R. Howlett [Thu, 6 Dec 2007 22:39:54 +0000]
Use mutex_lock_killable in vfs_readdir

Signed-off-by: Liam R. Howlett <howlett@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoAdd mutex_lock_killable
Liam R. Howlett [Thu, 6 Dec 2007 22:37:59 +0000]
Add mutex_lock_killable

Similar to mutex_lock_interruptible, it can be interrupted by a fatal
signal only.

Signed-off-by: Liam R. Howlett <howlett@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoUse lock_page_killable
Matthew Wilcox [Thu, 6 Dec 2007 16:19:57 +0000]
Use lock_page_killable

Replacing lock_page with lock_page_killable in do_generic_mapping_read()
allows us to kill `cat' of a file on an NFS-mounted filesystem

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoAdd lock_page_killable
Matthew Wilcox [Thu, 6 Dec 2007 16:18:49 +0000]
Add lock_page_killable

This routine is like lock_page, but can be interrupted by a fatal signal

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoAdd fatal_signal_pending
Matthew Wilcox [Thu, 6 Dec 2007 16:15:50 +0000]
Add fatal_signal_pending

Like signal_pending, but it's only true for signals which are fatal to
this process

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoAdd TASK_WAKEKILL
Matthew Wilcox [Thu, 6 Dec 2007 16:13:16 +0000]
Add TASK_WAKEKILL

Set TASK_WAKEKILL for TASK_STOPPED and TASK_TRACED, add TASK_KILLABLE and
use TASK_WAKEKILL in signal_wake_up()

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoexit: Use task_is_*
Matthew Wilcox [Thu, 6 Dec 2007 16:09:35 +0000]
exit: Use task_is_*

Also restructure the loop in do_wait()

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agosignal: Use task_is_*
Matthew Wilcox [Thu, 6 Dec 2007 16:07:35 +0000]
signal: Use task_is_*

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agosched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
Matthew Wilcox [Thu, 6 Dec 2007 16:07:07 +0000]
sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoptrace: Use task_is_*
Matthew Wilcox [Thu, 6 Dec 2007 16:06:16 +0000]
ptrace: Use task_is_*

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agopower: Use task_is_*
Matthew Wilcox [Thu, 6 Dec 2007 16:06:01 +0000]
power: Use task_is_*

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agowait: Use TASK_NORMAL
Matthew Wilcox [Thu, 6 Dec 2007 22:34:36 +0000]
wait: Use TASK_NORMAL

Also move wake_up_locked() to be with the related functions

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoproc/base.c: Use task_is_*
Matthew Wilcox [Thu, 6 Dec 2007 16:04:01 +0000]
proc/base.c: Use task_is_*

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoproc/array.c: Use TASK_REPORT
Matthew Wilcox [Thu, 6 Dec 2007 16:03:36 +0000]
proc/array.c: Use TASK_REPORT

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoperfmon: Use task_is_*
Matthew Wilcox [Thu, 6 Dec 2007 16:02:55 +0000]
perfmon: Use task_is_*

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoAdd macros to replace direct uses of TASK_ flags
Matthew Wilcox [Thu, 6 Dec 2007 15:55:25 +0000]
Add macros to replace direct uses of TASK_ flags

With the changes to support TASK_KILLABLE, ->state becomes a bitmask, and
moving these tests to convenience macros will fix all the users.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

12 years agoUse wake_up_locked() in eventpoll
Matthew Wilcox [Thu, 30 Aug 2007 20:10:22 +0000]
Use wake_up_locked() in eventpoll

Replace the uses of __wake_up_locked with wake_up_locked

Signed-off-by: Matthew Wilcox <matthew@wil.cx>

12 years agoMerge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
Linus Torvalds [Thu, 6 Dec 2007 17:43:26 +0000]
Merge branch 'upstream' of git://ftp.linux-mips.org/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] Oprofile: Fix computation of number of counters.
  [MIPS] Alchemy: fix IRQ bases
  [MIPS] Alchemy: replace ffs() with __ffs()
  [MIPS] BCM1480: Fix interrupt routing, take 2.

12 years agoTiny clean-up of OPROFILE/KPROBES configuration
Linus Torvalds [Thu, 6 Dec 2007 17:41:12 +0000]
Tiny clean-up of OPROFILE/KPROBES configuration

Make the Kconfig.instrumentation file a bit easier on the eyes, and use
the new ARCH_SUPPORTS_OPROFILE for x86[-64].

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoFix oprofile configuration breakage
Ralf Baechle [Thu, 6 Dec 2007 16:53:19 +0000]
Fix oprofile configuration breakage

The cleanup 09cadedbdc01f1a4bea1f427d4fb4642eaa19da9 broke the oprofile
configuration for MIPS by allowing oprofile support to be built for
kernel models where oprofile doesn't have a chance in hell to work.

Just a dependecy list on a number of architectures is - surprise - broken
and should as per past discussions probably in most considered to be
broken in most cases.  So I introduce a dependency for the oprofile
configuration on ARCH_SUPPORTS_OPROFILE.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years ago[MIPS] Oprofile: Fix computation of number of counters.
Ralf Baechle [Thu, 6 Dec 2007 09:12:28 +0000]
[MIPS] Oprofile: Fix computation of number of counters.

VSMP kernels will split the available performance counters between the two
processors / cores.  But don't do this when we're not on a VSMP system ...

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

12 years ago[MIPS] Alchemy: fix IRQ bases
Sergei Shtylyov [Wed, 5 Dec 2007 16:08:26 +0000]
[MIPS] Alchemy: fix IRQ bases

Do what the commits commits f3e8d1da389fe2e514e31f6e93c690c8e1243849 and
9d360ab4a7568a8d177280f651a8a772ae52b9b9 failed to achieve -- actually
convert the Alchemy code to irq_cpu.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

12 years ago[MIPS] Alchemy: replace ffs() with __ffs()
Sergei Shtylyov [Wed, 5 Dec 2007 16:08:24 +0000]
[MIPS] Alchemy: replace ffs() with __ffs()

Fix havoc wrought by commit 56f621c7f6f735311eed3f36858b402013023c18 --
au_ffs() and ffs() are equivalent, that patch should have just replaced one
with another.  Now replace ffs() with __ffs() which returns an unbiased bit
number.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

12 years ago[MIPS] BCM1480: Fix interrupt routing, take 2.
Ralf Baechle [Thu, 6 Dec 2007 17:15:57 +0000]
[MIPS] BCM1480: Fix interrupt routing, take 2.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

12 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
Linus Torvalds [Wed, 5 Dec 2007 17:27:46 +0000]
Merge git://git./linux/kernel/git/mingo/linux-2.6-sched

* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
  futex: correctly return -EFAULT not -EINVAL
  lockdep: in_range() fix
  lockdep: fix debug_show_all_locks()
  sched: style cleanups
  futex: fix for futex_wait signal stack corruption

12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Wed, 5 Dec 2007 17:26:52 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/selinux-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  VM/Security: add security hook to do_brk
  Security: round mmap hint address above mmap_min_addr
  security: protect from stack expantion into low vm addresses
  Security: allow capable check to permit mmap or low vm space
  SELinux: detect dead booleans
  SELinux: do not clear f_op when removing entries

12 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Wed, 5 Dec 2007 17:26:13 +0000]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  [LRO]: fix lro_gen_skb() alignment
  [TCP]: NAGLE_PUSH seems to be a wrong way around
  [TCP]: Move prior_in_flight collect to more robust place
  [TCP] FRTO: Use of existing funcs make code more obvious & robust
  [IRDA]: Move ircomm_tty_line_info() under #ifdef CONFIG_PROC_FS
  [ROSE]: Trivial compilation CONFIG_INET=n case
  [IPVS]: Fix sched registration race when checking for name collision.
  [IPVS]: Don't leak sysctl tables if the scheduler registration fails.

12 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
Linus Torvalds [Wed, 5 Dec 2007 17:25:53 +0000]
Merge git://git./linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: Update defconfig.
  [SPARC]: Add missing of_node_put
  [SPARC64]: check for possible NULL pointer dereference
  [SPARC]: Add missing "space"
  [SPARC64]: Add missing "space"
  [SPARC64]: Add missing pci_dev_put
  [SYSCTL_CHECK]: Fix typo in KERN_SPARC_SCONS_PWROFF entry string.
  [SPARC64]: Missing mdesc_release() in ldc_init().

12 years agoremove nonsense force-casts from ocfs2
Al Viro [Wed, 5 Dec 2007 08:46:47 +0000]
remove nonsense force-casts from ocfs2

endianness annotations in networking code had been in place for quite a
while; in particular, sin_port and s_addr are annotated as big-endian.

Code in ocfs2 had __force casts added apparently to shut the sparse
warnings up; of course, these days they only serve to *produce* warnings
for no reason whatsoever...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoregression: bfs endianness bug
Al Viro [Wed, 5 Dec 2007 08:32:52 +0000]
regression: bfs endianness bug

BFS_FILEBLOCKS() expects struct bfs_inode * (on-disk data, with little-
endian fields), not struct bfs_inode_info * (in-core stuff, with host-
endian ones).

It's a macro and fields with the right names are present in
bfs_inode_info, so it compiles, but on big-endian host it gives bogus
results.

Introduced in commit f433dc56344cb72cc3de5ba0819021cec3aef807 ("Fixes to
the BFS filesystem driver").

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agofcrypt endianness misannotations
Al Viro [Wed, 5 Dec 2007 08:38:56 +0000]
fcrypt endianness misannotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agono need to mess with KBUILD_CFLAGS on uml-i386 anymore
Al Viro [Wed, 5 Dec 2007 08:36:15 +0000]
no need to mess with KBUILD_CFLAGS on uml-i386 anymore

Now that X86_32 is provided on Kconfig level for uml-i386, there's no
need to play with it explicitly on Makefile level anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoregression: cifs endianness bug
Al Viro [Wed, 5 Dec 2007 08:24:38 +0000]
regression: cifs endianness bug

access_flags_to_mode() gets on-the-wire data (little-endian) and treats
it as host-endian.

Introduced in commit e01b64001359034d04c695388870936ed3d1b56b ("[CIFS]
enable get mode from ACL when cifsacl mount option specified")

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoVM/Security: add security hook to do_brk
Eric Paris [Wed, 5 Dec 2007 07:45:31 +0000]
VM/Security: add security hook to do_brk

Given a specifically crafted binary do_brk() can be used to get low pages
available in userspace virtual memory and can thus be used to circumvent
the mmap_min_addr low memory protection.  Add security checks in do_brk().

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Alan Cox <alan@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoSLUB's ksize() fails for size > 2048
Vegard Nossum [Wed, 5 Dec 2007 07:45:30 +0000]
SLUB's ksize() fails for size > 2048

I can't pass memory allocated by kmalloc() to ksize() if it is allocated by
SLUB allocator and size is larger than (I guess) PAGE_SIZE / 2.

The error of ksize() seems to be that it does not check if the allocation
was made by SLUB or the page allocator.

Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Christoph Lameter <clameter@sgi.com>, Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoproc: fix proc_dir_entry refcounting
Alexey Dobriyan [Wed, 5 Dec 2007 07:45:28 +0000]
proc: fix proc_dir_entry refcounting

Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.

This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():

1) PDE leak

remove_proc_entry de_put
----------------- ------
[refcnt = 1]
if (atomic_read(&de->count) == 0)
if (atomic_dec_and_test(&de->count))
if (de->deleted)
/* also not taken! */
free_proc_entry(de);
else
de->deleted = 1;
[refcount=0, deleted=1]

2) use after free

remove_proc_entry de_put
----------------- ------
[refcnt = 1]

if (atomic_dec_and_test(&de->count))
if (atomic_read(&de->count) == 0)
free_proc_entry(de);
/* boom! */
if (de->deleted)
free_proc_entry(de);

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1
EIP is at strnlen+0x6/0x18
EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
       c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
       f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
Call Trace:
 [<c10ac4f0>] vsnprintf+0x2ad/0x49b
 [<c10ac779>] vscnprintf+0x14/0x1f
 [<c1018e6b>] vprintk+0xc5/0x2f9
 [<c10379f1>] handle_fasteoi_irq+0x0/0xab
 [<c1004f44>] do_IRQ+0x9f/0xb7
 [<c117db3b>] preempt_schedule_irq+0x3f/0x5b
 [<c100264e>] need_resched+0x1f/0x21
 [<c10190ba>] printk+0x1b/0x1f
 [<c107c8ad>] de_put+0x3d/0x50
 [<c107c8f8>] proc_delete_inode+0x38/0x41
 [<c107c8c0>] proc_delete_inode+0x0/0x41
 [<c1066298>] generic_delete_inode+0x5e/0xc6
 [<c1065aa9>] iput+0x60/0x62
 [<c1063c8e>] d_kill+0x2d/0x46
 [<c1063fa9>] dput+0xdc/0xe4
 [<c10571a1>] __fput+0xb0/0xcd
 [<c1054e49>] filp_close+0x48/0x4f
 [<c1055ee9>] sys_close+0x67/0xa5
 [<c10026b6>] sysenter_past_esp+0x5f/0x85
=======================
Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44

Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen => nobody
can mark PDE deleted.

Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.

Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agojbd: Fix assertion failure in fs/jbd/checkpoint.c
Jan Kara [Wed, 5 Dec 2007 07:45:27 +0000]
jbd: Fix assertion failure in fs/jbd/checkpoint.c

Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.

If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).

We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state.  The locking there is subtle though (as
everywhere in JBD ;().  We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.

Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either.  Better be safe if something
changes in future...

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agomm: fix XIP file writes
Nick Piggin [Wed, 5 Dec 2007 07:45:25 +0000]
mm: fix XIP file writes

Writing to XIP files at a non-page-aligned offset results in data corruption
because the writes were always sent to the start of the page.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agogpio_cs5535: disable AUX on output
Ben Gardner [Wed, 5 Dec 2007 07:45:24 +0000]
gpio_cs5535: disable AUX on output

The AMD CS5535/CS5536 GPIO has two alternate output modes: AUX-1 and AUX-2.
When either AUX is enabled, the cs5535_gpio driver cannot control the
output.

Some BIOS code for the Geode processor enables AUX-1 for GPIO-1, which
configures it as the PC BEEP output.

This patch will disable AUX-1 and AUX-2 when the user enables output.

Signed-of-by: Ben Gardner <gardner.ben@gmail.com>
Cc: Richard Knutsson <ricknu-0@student.ltu.se>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoAvoid potential NULL dereference in unregister_sysctl_table
Pavel Emelyanov [Wed, 5 Dec 2007 07:45:24 +0000]
Avoid potential NULL dereference in unregister_sysctl_table

register_sysctl_table() can return NULL sometimes, e.g.  when kmalloc()
returns NULL or when sysctl check fails.

I've also noticed, that many (most?) code in the kernel doesn't check for
the return value from register_sysctl_table() and later simply calls the
unregister_sysctl_table() with potentially NULL argument.

This is unlikely on a common kernel configuration, but in case we're
dealing with modules and/or fault-injection support, there's a slight
possibility of an OOPS.

Changing all the users to check for return code from the registering does
not look like a good solution - there are too many code doing this and
failure in sysctl tables registration is not a good reason to abort module
loading (in most of the cases).

So I think, that we can just have this check in unregister_sysctl_table
just to avoid accidental OOPS-es (actually, the unregister_sysctl_table()
did exactly this, before the start_unregistering() appeared).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoBlackfin SPI driver: reconfigure speed_hz and bits_per_word in each spi transfer
Bryan Wu [Wed, 5 Dec 2007 07:45:23 +0000]
Blackfin SPI driver: reconfigure speed_hz and bits_per_word in each spi transfer

 - reconfigure SPI baud from speed_hz of each spi transfer
 - according to spi_transfer.bits_per_word to reprogram register and setup
   correct SPI operation handlers

Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoBlackfin SPI driver: move hard coded pin_req to board file
Bryan Wu [Wed, 5 Dec 2007 07:45:22 +0000]
Blackfin SPI driver: move hard coded pin_req to board file

Remove some sort of bloaty code, try to get these pin_req arrays built at compile-time

 - move this static things to the blackfin board file
 - add pin_req array to struct bfin5xx_spi_master
 - tested on BF537/BF548 with SPI flash

Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoBlackfin SPI driver: use void __iomem * for regs_base
Bryan Wu [Wed, 5 Dec 2007 07:45:22 +0000]
Blackfin SPI driver: use void __iomem * for regs_base

Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoBlackfin SPI driver: use cpu_relax() to replace continue in while busywait
Bryan Wu [Wed, 5 Dec 2007 07:45:21 +0000]
Blackfin SPI driver: use cpu_relax() to replace continue in while busywait

Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: spi_bfin: resequence DMA start/stop
Sonic Zhang [Wed, 5 Dec 2007 07:45:21 +0000]
spi: spi_bfin: resequence DMA start/stop

Set correct baud for spi mmc and enable SPI only after DMA is started.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: spi_bfin: update handling of delay-after-deselect
Bryan Wu [Wed, 5 Dec 2007 07:45:20 +0000]
spi: spi_bfin: update handling of delay-after-deselect

Move cs_chg_udelay handling (specific to this driver) to cs_deactive(), fixing
a bug when some SPI LCD driver needs delay after cs_deactive.

Fix bug reported by Cameron Barfield <cbarfield@cyberdata.net>
https://blackfin.uclinux.org/gf/project/uclinux-dist/forum/?action=ForumBrowse&forum_id=39&_forum_action=ForumMessageBrowse&thread_id=23630&feedback=Message%20replied.

Cc: Cameron Barfield <cbarfield@cyberdata.net>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: spi_bfin: bugfix for 8..16 bit word sizes
Bryan Wu [Wed, 5 Dec 2007 07:45:19 +0000]
spi: spi_bfin: bugfix for 8..16 bit word sizes

Fix bug in u16_cs_chg_reader to read data_len-2 bytes data firstly, then read
out the last 2 bytes data

Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: spi_bfin: handle multiple spi_masters
Bryan Wu [Wed, 5 Dec 2007 07:45:18 +0000]
spi: spi_bfin: handle multiple spi_masters

Move global SPI regs_base and dma_ch to struct driver_data.  Test on BF54x SPI
Flash with 2 spi_master devices enabled.

Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: spi_bfin: relocate spin/waits
Sonic Zhang [Wed, 5 Dec 2007 07:45:18 +0000]
spi: spi_bfin: relocate spin/waits

Move spin/waits to more correct locations in bfin SPI driver.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: spi_bfin: change handling of communication parameters
Sonic Zhang [Wed, 5 Dec 2007 07:45:17 +0000]
spi: spi_bfin: change handling of communication parameters

Fix SPI driver to work with SPI flash ST M25P16 on bf548

Currently the SPI driver enables the SPI controller and sets the SPI baud
register for each SPI transfer.  But they should never be changed within a SPI
message session, in which several SPI transfers are pumped.

This patch moves SPI setting to the begining of a message session, and
never disables SPI controller until an error occurs.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: spi_bfin, rearrange portmux calls
Sonic Zhang [Wed, 5 Dec 2007 07:45:16 +0000]
spi: spi_bfin, rearrange portmux calls

Move pin muxing to setup and cleanup methods.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: spi_bfin uses portmux for additional busses
Sonic Zhang [Wed, 5 Dec 2007 07:45:16 +0000]
spi: spi_bfin uses portmux for additional busses

Use portmux mechanism to support SPI busses 1 and 2, instead of just the
original bus 0.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: spi_bfin uses platform device resources
Bryan Wu [Wed, 5 Dec 2007 07:45:15 +0000]
spi: spi_bfin uses platform device resources

Update spi driver to support multi-ports by using platform resources; tested
on STAMP537+SPI_MMC, other boards need more testing.  Plus other minor
updates.

Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: spi_bfin, don't bypass spi framework
Mike Frysinger [Wed, 5 Dec 2007 07:45:14 +0000]
spi: spi_bfin, don't bypass spi framework

Prevent people from setting bits in ctl_reg that the SPI framework already
handles, hopefully we can one day drop ctl_reg completely

Signed-off-by: Mike Frysinger <michael.frysinger@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: spi_bfin handles spi_transfer.cs_change
Bryan Wu [Wed, 5 Dec 2007 07:45:14 +0000]
spi: spi_bfin handles spi_transfer.cs_change

Respect per-transfer cs_change field (protocol tweaking support) by
adding and using cs_active/cs_deactive functions.

Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: spi_bfin cleanups, error handling
Bryan Wu [Wed, 5 Dec 2007 07:45:13 +0000]
spi: spi_bfin cleanups, error handling

Cleanup and error handling

 - add error handling in SPI bus driver with selecting clients
 - use proper defines to access Blackfin MMRs
 - remove useless SSYNCs
 - cleaner use of portmux calls

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: bfin spi uses portmux calls
Michael Hennerich [Wed, 5 Dec 2007 07:45:13 +0000]
spi: bfin spi uses portmux calls

Use new Blackfin portmux interface, add error handling.

Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: initial BF54x SPI support
Bryan Wu [Wed, 5 Dec 2007 07:45:12 +0000]
spi: initial BF54x SPI support

Initial BF54x SPI support

 - support BF54x SPI0
 - clean up some code (whitespace etc)
 - will support multiports in the future
 - start using portmux calls

Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: use simplified spi_sync() calling convention
Marc Pignat [Wed, 5 Dec 2007 07:45:11 +0000]
spi: use simplified spi_sync() calling convention

Given the patch which simplifies the spi_sync calling convention, this one
updates the callers of that routine which tried using it according to the
previous specification.  (Most didn't.)

Signed-off-by: Marc Pignat <marc.pignat@hevs.ch>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: simplify spi_sync() calling convention
Marc Pignat [Wed, 5 Dec 2007 07:45:10 +0000]
spi: simplify spi_sync() calling convention

Simplify spi_sync calling convention, eliminating the need to check both
the return value AND the message->status.  In consequence, this corrects
misbehaviours of spi_read and spi_write (which only checked the former) and
their callers.

Signed-off-by: Marc Pignat <marc.pignat@hevs.ch>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agospi: at25 driver is for EEPROM not FLASH
David Brownell [Wed, 5 Dec 2007 07:45:10 +0000]
spi: at25 driver is for EEPROM not FLASH

Add comment to at25 driver that it's for EEPROM chips, not FLASH
chips ... the AT25 series has both types of chip, and sometimes
they're even pin-compatible.  The command sets are different, as
is the treatment of erasure.  (FLASH needs explicit erasure, but
with EEPROM it's implicit.)  Note that all vendors seem to have
this same confusion in their *25* series SPI memory parts.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoSPI: use mutex not semaphore
David Brownell [Wed, 5 Dec 2007 07:45:09 +0000]
SPI: use mutex not semaphore

Make spi_write_then_read() use a mutex not a binary semaphore.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoAdd EXPORT_SYMBOL(ksize);
Tetsuo Handa [Wed, 5 Dec 2007 07:45:08 +0000]
Add EXPORT_SYMBOL(ksize);

mm/slub.c exports ksize(), but mm/slob.c and mm/slab.c don't.

It's used by binfmt_flat, which can be built as a module.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agomm/backing-dev.c: fix percpu_counter_destroy call bug in bdi_init
Denis Cheng [Wed, 5 Dec 2007 07:45:07 +0000]
mm/backing-dev.c: fix percpu_counter_destroy call bug in bdi_init

this call should use the array index j, not i.  But with this approach, just
one int i is enough, int j is not needed.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoufs: fix nexstep dir block size
Evgeniy Dushistov [Wed, 5 Dec 2007 07:45:06 +0000]
ufs: fix nexstep dir block size

This patch fixes regression, introduced since 2.6.16.  NextStep variant of
UFS as OpenStep uses directory block size equals to 1024.  Without this
change, ufs_check_page fails in many cases.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Dave Bailey <dsbailey@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoRTC: assure proper memory ordering with respect to RTC_DEV_BUSY flag
Jiri Kosina [Wed, 5 Dec 2007 07:45:05 +0000]
RTC: assure proper memory ordering with respect to RTC_DEV_BUSY flag

We must make sure that the RTC_DEV_BUSY flag has proper lock semantics,
i.e.  that the RTC_DEV_BUSY stores clearing the flag don't get reordered
before the preceeding stores and loads and vice versa.

Spotted by Nick Piggin.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Brownell <david-b@pacbell.net>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agofix clone(CLONE_NEWPID)
Eric W. Biederman [Wed, 5 Dec 2007 07:45:04 +0000]
fix clone(CLONE_NEWPID)

Currently we are complicating the code in copy_process, the clone ABI, and
if we fix the bugs sys_setsid itself, with an unnecessary open coded
version of sys_setsid.

So just simplify everything and don't special case the session and pgrp of
the initial process in a pid namespace.

Having this special case actually presents to user space the classic linux
startup conditions with session == pgrp == 0 for /sbin/init.

We already handle sending signals to processes in a child pid namespace.

We need to handle sending signals to processes in a parent pid namespace
for cases like SIGCHILD and SIGIO.

This makes nothing extra visible inside a pid namespace.  So this extra
special case appears to have no redeeming merits.

Further removing this special case increases the flexibility of how we can
use pid namespaces, by not requiring the initial process in a pid namespace
to be a daemon.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agoaio: only account I/O wait time in read_events if there are active requests
Jeff Moyer [Wed, 5 Dec 2007 07:45:02 +0000]
aio: only account I/O wait time in read_events if there are active requests

On 2.6.24, top started showing 100% iowait on one CPU when a UML instance was
running (but completely idle).  The UML code sits in io_getevents waiting for
an event to be submitted and completed.

Fix this by checking ctx->reqs_active before scheduling to determine whether
or not we are waiting for I/O.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years agofutex: correctly return -EFAULT not -EINVAL
Thomas Gleixner [Wed, 5 Dec 2007 14:46:09 +0000]
futex: correctly return -EFAULT not -EINVAL

return -EFAULT not -EINVAL. Found by review.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

12 years agolockdep: in_range() fix
Oleg Nesterov [Wed, 5 Dec 2007 14:46:09 +0000]
lockdep: in_range() fix

Torsten Kaiser wrote:

| static inline int in_range(const void *start, const void *addr, const void *end)
| {
|         return addr >= start && addr <= end;
| }
| This  will return true, if addr is in the range of start (including)
| to end (including).
|
| But debug_check_no_locks_freed() seems does:
| const void *mem_to = mem_from + mem_len
| -> mem_to is the last byte of the freed range, that fits in_range
| lock_from = (void *)hlock->instance;
| -> first byte of the lock
| lock_to = (void *)(hlock->instance + 1);
| -> first byte of the next lock, not last byte of the lock that is being checked!
|
| The test is:
| if (!in_range(mem_from, lock_from, mem_to) &&
|                                         !in_range(mem_from, lock_to, mem_to))
|                         continue;
| So it tests, if the first byte of the lock is in the range that is freed ->OK
| And if the first byte of the *next* lock is in the range that is freed
| -> Not OK.

We can also simplify in_range checks, we need only 2 comparisons, not 4.
If the lock is not in memory range, it should be either at the left of range
or at the right.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

12 years agolockdep: fix debug_show_all_locks()
Ingo Molnar [Wed, 5 Dec 2007 14:46:09 +0000]
lockdep: fix debug_show_all_locks()

fix the oops that can be seen in:

   http://bugzilla.kernel.org/attachment.cgi?id=13828&action=view

it is not safe to print the locks of running tasks.

(even with this fix we have a small race - but this is a debug
 function after all.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

12 years agosched: style cleanups
Ingo Molnar [Wed, 5 Dec 2007 14:46:09 +0000]
sched: style cleanups

style cleanup of various changes that were done recently.

no code changed:

      text    data     bss     dec     hex filename
     23680    2542      28   26250    668a sched.o.before
     23680    2542      28   26250    668a sched.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>

12 years agofutex: fix for futex_wait signal stack corruption
Steven Rostedt [Wed, 5 Dec 2007 14:46:09 +0000]
futex: fix for futex_wait signal stack corruption

David Holmes found a bug in the -rt tree with respect to
pthread_cond_timedwait. After trying his test program on the latest git
from mainline, I found the bug was there too.  The bug he was seeing
that his test program showed, was that if one were to do a "Ctrl-Z" on a
process that was in the pthread_cond_timedwait, and then did a "bg" on
that process, it would return with a "-ETIMEDOUT" but early. That is,
the timer would go off early.

Looking into this, I found the source of the problem. And it is a rather
nasty bug at that.

Here's the relevant code from kernel/futex.c: (not in order in the file)

[...]
smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val,
                          struct timespec __user *utime, u32 __user *uaddr2,
                          u32 val3)
{
        struct timespec ts;
        ktime_t t, *tp = NULL;
        u32 val2 = 0;
        int cmd = op & FUTEX_CMD_MASK;

        if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) {
                if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
                        return -EFAULT;
                if (!timespec_valid(&ts))
                        return -EINVAL;

                t = timespec_to_ktime(ts);
                if (cmd == FUTEX_WAIT)
                        t = ktime_add(ktime_get(), t);
                tp = &t;
        }
[...]
        return do_futex(uaddr, op, val, tp, uaddr2, val2, val3);
}

[...]

long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
                u32 __user *uaddr2, u32 val2, u32 val3)
{
        int ret;
        int cmd = op & FUTEX_CMD_MASK;
        struct rw_semaphore *fshared = NULL;

        if (!(op & FUTEX_PRIVATE_FLAG))
                fshared = &current->mm->mmap_sem;

        switch (cmd) {
        case FUTEX_WAIT:
                ret = futex_wait(uaddr, fshared, val, timeout);

[...]

static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared,
                      u32 val, ktime_t *abs_time)
{
[...]
               struct restart_block *restart;
                restart = &current_thread_info()->restart_block;
                restart->fn = futex_wait_restart;
                restart->arg0 = (unsigned long)uaddr;
                restart->arg1 = (unsigned long)val;
                restart->arg2 = (unsigned long)abs_time;
                restart->arg3 = 0;
                if (fshared)
                        restart->arg3 |= ARG3_SHARED;
                return -ERESTART_RESTARTBLOCK;
[...]

static long futex_wait_restart(struct restart_block *restart)
{
        u32 __user *uaddr = (u32 __user *)restart->arg0;
        u32 val = (u32)restart->arg1;
        ktime_t *abs_time = (ktime_t *)restart->arg2;
        struct rw_semaphore *fshared = NULL;

        restart->fn = do_no_restart_syscall;
        if (restart->arg3 & ARG3_SHARED)
                fshared = &current->mm->mmap_sem;
        return (long)futex_wait(uaddr, fshared, val, abs_time);
}

So when the futex_wait is interrupt by a signal we break out of the
hrtimer code and set up or return from signal. This code does not return
back to userspace, so we set up a RESTARTBLOCK.  The bug here is that we
save the "abs_time" which is a pointer to the stack variable "ktime_t t"
from sys_futex.

This returns and unwinds the stack before we get to call our signal. On
return from the signal we go to futex_wait_restart, where we update all
the parameters for futex_wait and call it. But here we have a problem
where abs_time is no longer valid.

I verified this with print statements, and sure enough, what abs_time
was set to ends up being garbage when we get to futex_wait_restart.

The solution I did to solve this (with input from Linus Torvalds)
was to add unions to the restart_block to allow system calls to
use the restart with specific parameters.  This way the futex code now
saves the time in a 64bit value in the restart block instead of storing
it on the stack.

Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64
in thread_info.h, when there's a #ifdef __KERNEL__ just below that.
Not sure what that is there for.  If this turns out to be a problem, I've
tested this with using "unsigned int" for u32 and "unsigned long long" for
u64 and it worked just the same. I'm using u32 and u64 just to be
consistent with what the futex code uses.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>

12 years ago[SPARC64]: Update defconfig.
David S. Miller [Tue, 4 Dec 2007 08:38:22 +0000]
[SPARC64]: Update defconfig.

Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[SPARC]: Add missing of_node_put
Julia Lawall [Tue, 4 Dec 2007 08:33:07 +0000]
[SPARC]: Add missing of_node_put

There should be an of_node_put when breaking out of a loop that iterates
using for_each_node_by_type.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[SPARC64]: check for possible NULL pointer dereference
Cyrill Gorcunov [Wed, 21 Nov 2007 01:32:19 +0000]
[SPARC64]: check for possible NULL pointer dereference

This patch adds checking for possible NULL pointer dereference
if of_find_property() failed.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[SPARC]: Add missing "space"
Joe Perches [Tue, 20 Nov 2007 07:45:16 +0000]
[SPARC]: Add missing "space"

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[SPARC64]: Add missing "space"
Joe Perches [Tue, 20 Nov 2007 07:43:00 +0000]
[SPARC64]: Add missing "space"

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[SPARC64]: Add missing pci_dev_put
Julia Lawall [Tue, 20 Nov 2007 06:50:01 +0000]
[SPARC64]: Add missing pci_dev_put

There should be a pci_dev_put when breaking out of a loop that iterates
over calls to pci_get_device and similar functions.

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[SYSCTL_CHECK]: Fix typo in KERN_SPARC_SCONS_PWROFF entry string.
David S. Miller [Tue, 20 Nov 2007 05:35:42 +0000]
[SYSCTL_CHECK]: Fix typo in KERN_SPARC_SCONS_PWROFF entry string.

Based upon a report by Mikael Pettersson.

Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[SPARC64]: Missing mdesc_release() in ldc_init().
David S. Miller [Thu, 15 Nov 2007 04:17:24 +0000]
[SPARC64]: Missing mdesc_release() in ldc_init().

Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[LRO]: fix lro_gen_skb() alignment
Andrew Gallatin [Wed, 5 Dec 2007 10:31:42 +0000]
[LRO]: fix lro_gen_skb() alignment

Add a field to the lro_mgr struct so that drivers can specify how much
padding is required to align layer 3 headers when a packet is copied
into a freshly allocated skb by inet_lro.c:lro_gen_skb().  Without
padding, skbs generated by LRO will cause alignment warnings on
architectures which require strict alignment (seen on sparc64).

Myri10GE is updated to use this field.

Signed-off-by: Andrew Gallatin <gallatin@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[TCP]: NAGLE_PUSH seems to be a wrong way around
Ilpo Järvinen [Wed, 5 Dec 2007 10:25:32 +0000]
[TCP]: NAGLE_PUSH seems to be a wrong way around

The comment in tcp_nagle_test suggests that. This bug is very
very old, even 2.4.0 seems to have it.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[TCP]: Move prior_in_flight collect to more robust place
Ilpo Järvinen [Wed, 5 Dec 2007 10:21:35 +0000]
[TCP]: Move prior_in_flight collect to more robust place

The previous location is after sacktag processing, which affects
counters tcp_packets_in_flight depends on. This may manifest as
wrong behavior if new SACK blocks are present and all is clear
for call to tcp_cong_avoid, which in the case of
tcp_reno_cong_avoid bails out early because it thinks that
TCP is not limited by cwnd.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[TCP] FRTO: Use of existing funcs make code more obvious & robust
Ilpo Järvinen [Wed, 5 Dec 2007 10:20:21 +0000]
[TCP] FRTO: Use of existing funcs make code more obvious & robust

Though there's little need for everything that tcp_may_send_now
does (actually, even the state had to be adjusted to pass some
checks FRTO does not want to occur), it's more robust to let it
make the decision if sending is allowed. State adjustments
needed:
- Make sure snd_cwnd limit is not hit in there
- Disable nagle (if necessary) through the frto_counter == 2

The result of check for frto_counter in argument to call for
tcp_enter_frto_loss can just be open coded, therefore there
isn't need to store the previous frto_counter past
tcp_may_send_now.

In addition, returns can then be combined.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[IRDA]: Move ircomm_tty_line_info() under #ifdef CONFIG_PROC_FS
Pavel Emelyanov [Wed, 5 Dec 2007 10:18:48 +0000]
[IRDA]: Move ircomm_tty_line_info() under #ifdef CONFIG_PROC_FS

The function in question is called only from ircomm_tty_read_proc,
which is under this option. Move this helper to the same place.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[ROSE]: Trivial compilation CONFIG_INET=n case
Pavel Emelyanov [Wed, 5 Dec 2007 10:18:15 +0000]
[ROSE]: Trivial compilation CONFIG_INET=n case

The rose_rebuild_header() consists only of some variables in
case INET=n, and gcc will warn us about it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[IPVS]: Fix sched registration race when checking for name collision.
Pavel Emelyanov [Tue, 4 Dec 2007 08:45:06 +0000]
[IPVS]: Fix sched registration race when checking for name collision.

The register_ip_vs_scheduler() checks for the scheduler with the
same name under the read-locked __ip_vs_sched_lock, then drops,
takes it for writing and puts the scheduler in list.

This is racy, since we can have a race window between the lock
being re-locked for writing.

The fix is to search the scheduler with the given name right under
the write-locked __ip_vs_sched_lock.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

12 years ago[IPVS]: Don't leak sysctl tables if the scheduler registration fails.
Pavel Emelyanov [Tue, 4 Dec 2007 08:43:24 +0000]
[IPVS]: Don't leak sysctl tables if the scheduler registration fails.

In case we load lblc or lblcr module we can leak some sysctl
tables if the call to register_ip_vs_scheduler() fails.

I've looked at the register_ip_vs_scheduler() code and saw, that
the only reason to fail is the name collision, so I think that
with some 3rd party schedulers this becomes a relevant issue. No?

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

12 years agoVM/Security: add security hook to do_brk
Eric Paris [Tue, 4 Dec 2007 16:06:55 +0000]
VM/Security: add security hook to do_brk

Given a specifically crafted binary do_brk() can be used to get low
pages available in userspace virtually memory and can thus be used to
circumvent the mmap_min_addr low memory protection.  Add security checks
in do_brk().

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>

12 years agoSecurity: round mmap hint address above mmap_min_addr
Eric Paris [Mon, 26 Nov 2007 23:47:40 +0000]
Security: round mmap hint address above mmap_min_addr

If mmap_min_addr is set and a process attempts to mmap (not fixed) with a
non-null hint address less than mmap_min_addr the mapping will fail the
security checks.  Since this is just a hint address this patch will round
such a hint address above mmap_min_addr.

gcj was found to try to be very frugal with vm usage and give hint addresses
in the 8k-32k range.  Without this patch all such programs failed and with
the patch they happily get a higher address.

This patch is wrappad in CONFIG_SECURITY since mmap_min_addr doesn't exist
without it and there would be no security check possible no matter what.  So
we should not bother compiling in this rounding if it is just a waste of
time.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>

12 years agosecurity: protect from stack expantion into low vm addresses
Eric Paris [Mon, 26 Nov 2007 23:47:26 +0000]
security: protect from stack expantion into low vm addresses

Add security checks to make sure we are not attempting to expand the
stack into memory protected by mmap_min_addr

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>

12 years agoSecurity: allow capable check to permit mmap or low vm space
Eric Paris [Mon, 26 Nov 2007 23:47:46 +0000]
Security: allow capable check to permit mmap or low vm space

On a kernel with CONFIG_SECURITY but without an LSM which implements
security_file_mmap it is impossible for an application to mmap addresses
lower than mmap_min_addr.  Based on a suggestion from a developer in the
openwall community this patch adds a check for CAP_SYS_RAWIO.  It is
assumed that any process with this capability can harm the system a lot
more easily than writing some stuff on the zero page and then trying to
get the kernel to trip over itself.  It also means that programs like X
on i686 which use vm86 emulation can work even with mmap_min_addr set.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>

12 years agoSELinux: detect dead booleans
Stephen Smalley [Mon, 26 Nov 2007 16:12:53 +0000]
SELinux: detect dead booleans

Instead of using f_op to detect dead booleans, check the inode index
against the number of booleans and check the dentry name against the
boolean name for that index on reads and writes.  This prevents
incorrect use of a boolean file opened prior to a policy reload while
allowing valid use of it as long as it still corresponds to the same
boolean in the policy.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>

12 years agoSELinux: do not clear f_op when removing entries
Stephen Smalley [Wed, 21 Nov 2007 14:01:36 +0000]
SELinux: do not clear f_op when removing entries

Do not clear f_op when removing entries since it isn't safe to do.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>

12 years agoMerge branch 'upstream-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik...
Linus Torvalds [Tue, 4 Dec 2007 20:21:11 +0000]
Merge branch 'upstream-fixes' of /linux/kernel/git/jgarzik/netdev-2.6

* 'upstream-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
  PHY: Add the phy_device_release device method.
  gianfar: fix compile warning
  pasemi_mac: Fix reuse of free'd skb
  SMC911X: Fix using of dereferenced skb after netif_rx
  sky2: recovery deadlock fix
  Fix memory corruption in fec_mpc52xx
  Don't claim to do IPv6 checksum offload
  cxgb - revert file mode changes.

12 years agoPHY: Add the phy_device_release device method.
Anton Vorontsov [Tue, 4 Dec 2007 13:17:33 +0000]
PHY: Add the phy_device_release device method.

Lately I've got this nice badness on mdio bus removal:

Device 'e0103120:06' does not have a release() function, it is broken and must be fixed.
------------[ cut here ]------------
Badness at drivers/base/core.c:107
NIP: c015c1a8 LR: c015c1a8 CTR: c0157488
REGS: c34bdcf0 TRAP: 0700   Not tainted  (2.6.23-rc5-g9ebadfbb-dirty)
MSR: 00029032 <EE,ME,IR,DR>  CR: 24088422  XER: 00000000
...
[c34bdda0] [c015c1a8] device_release+0x78/0x80 (unreliable)
[c34bddb0] [c01354cc] kobject_cleanup+0x80/0xbc
[c34bddd0] [c01365f0] kref_put+0x54/0x6c
[c34bdde0] [c013543c] kobject_put+0x24/0x34
[c34bddf0] [c015c384] put_device+0x1c/0x2c
[c34bde00] [c0180e84] mdiobus_unregister+0x2c/0x58
...

Though actually there is nothing broken, it just device
subsystem core expects another "pattern" of resource managment.

This patch implement phy device's release function, thus
we're getting rid of this badness.

Also small hidden bug fixed, hope none other introduced. ;-)

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Acked-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

12 years agogianfar: fix compile warning
Grant Likely [Sun, 2 Dec 2007 05:10:03 +0000]
gianfar: fix compile warning

Eliminate an uninitialized variable warning.  The code is correct, but
a pointer to the automatic variable 'addr' is passed to dma_alloc_coherent.
Since addr has never been initialized, and the compiler doesn't know
what dma_alloc_coherent will do with it, it complains.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Jeff Garzik <jeff@garzik.org>