12 years agoblock: fix intermittent dm timeout based oops
Hannes Reinecke [Thu, 23 Apr 2009 08:32:59 +0000 (10:32 +0200)]
block: fix intermittent dm timeout based oops

Very rarely under stress testing of dm, oopses are occuring as
something tampers with an old stack frame.  This has been traced back
to blk_abort_queue() leaving a timeout_list pointing to the stack.
The reason is that sometimes blk_abort_request() won't delete the
timer (if the request is marked as complete but before the timer has
been removed, a small race window).  Fix this by splicing back from
the ususally empty list to the q->timeout_list.

Signed-off-by: Hannes Reinecke <>
Signed-off-by: Jens Axboe <>
12 years agoumem: fix request_queue lock warning
Sage Weil [Thu, 23 Apr 2009 06:37:58 +0000 (08:37 +0200)]
umem: fix request_queue lock warning

The umem driver issues two warnings on boot, due to blk_plug_device() and
blk_remove_plug() being called without q->queue_lock held.  Starting with
e48ec690 (block: extend queue_flag bitops), the queue_flag_* functions
warn if q->queue_lock doesn't appear to be locked.  In fact, q->queue_lock
is NULL (though that apparently isn't otherwise a problem as the driver is
using card->lock for everything).

Although blk_init_queue() with take a request_fn_proc and spinlock_t*,
there isn't a corresponding init helper that takes a make_request_fn.
Setting queue_lock to &card->lock explicitly seems to work fine for me.
The warning goes away and the device appears to behave.

[    1.531881] v2.3 : Micro Memory(tm) PCI memory board block driver
[    1.538136] umem 0000:02:01.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[    1.545018] umem 0000:02:01.0: Micro Memory(tm) controller found (PCI Mem Module (Battery Backup))
[    1.554176] umem 0000:02:01.0: CSR 0xfc9ffc00 -> 0xffffc200013d0c00 (0x100)
[    1.561279] umem 0000:02:01.0: Size 1048576 KB, Battery 1 Disabled (FAILURE), Battery 2 Disabled (FAILURE)
[    1.571114] umem 0000:02:01.0: Window size 16777216 bytes, IRQ 20
[    1.577304] umem 0000:02:01.0: memory NOT initialized. Consider over-writing whole device.
[    1.585989]  umema:<4>------------[ cut here ]------------
[    1.591775] WARNING: at include/linux/blkdev.h:492 blk_plug_device+0x6d/0x106()
[    1.592025] Hardware name: H8SSL
[    1.592025] Modules linked in:
[    1.592025] Pid: 1, comm: swapper Not tainted 2.6.29 #8
[    1.592025] Call Trace:
[    1.592025]  [<ffffffff8023c994>] warn_slowpath+0xd3/0xf2
[    1.592025]  [<ffffffff8025a5b5>] ? save_trace+0x3f/0x9b
[    1.592025]  [<ffffffff8025a68b>] ? add_lock_to_list+0x7a/0xba
[    1.592025]  [<ffffffff8025e609>] ? validate_chain+0xb3b/0xce8
[    1.592025]  [<ffffffff80441556>] ? mm_make_request+0x27/0x59
[    1.592025]  [<ffffffff80441556>] ? mm_make_request+0x27/0x59
[    1.592025]  [<ffffffff8025ef04>] ? __lock_acquire+0x74e/0x7b9
[    1.592025]  [<ffffffff8025a70e>] ? get_lock_stats+0x34/0x5e
[    1.592025]  [<ffffffff8025a746>] ? put_lock_stats+0xe/0x27
[    1.592025]  [<ffffffff80441556>] ? mm_make_request+0x27/0x59
[    1.592025]  [<ffffffff803ad165>] blk_plug_device+0x6d/0x106
[    1.592025]  [<ffffffff80441575>] mm_make_request+0x46/0x59
[    1.592025]  [<ffffffff803ac2d9>] generic_make_request+0x335/0x3cf
[    1.592025]  [<ffffffff8027fcc7>] ? mempool_alloc_slab+0x11/0x13
[    1.592025]  [<ffffffff8027fdce>] ? mempool_alloc+0x45/0x101
[    1.592025]  [<ffffffff8025a746>] ? put_lock_stats+0xe/0x27
[    1.592025]  [<ffffffff803adda5>] submit_bio+0x10a/0x119
[    1.592025]  [<ffffffff802c8d00>] submit_bh+0xe5/0x109
[    1.592025]  [<ffffffff802cbf43>] block_read_full_page+0x2aa/0x2cb
[    1.592025]  [<ffffffff802cf4c4>] ? blkdev_get_block+0x0/0x4c
[    1.592025]  [<ffffffff805c90a8>] ? _spin_unlock_irq+0x36/0x51
[    1.592025]  [<ffffffff80286836>] ? __lru_cache_add+0x92/0xb2
[    1.592025]  [<ffffffff802cf008>] blkdev_readpage+0x13/0x15
[    1.592025]  [<ffffffff8027de06>] read_cache_page_async+0x90/0x134
[    1.592025]  [<ffffffff802ceff5>] ? blkdev_readpage+0x0/0x15
[    1.592025]  [<ffffffff802f5f1c>] ? adfspart_check_ICS+0x0/0x16c
[    1.592025]  [<ffffffff8027deb8>] read_cache_page+0xe/0x45
[    1.592025]  [<ffffffff802f5170>] read_dev_sector+0x2e/0x93
[    1.592025]  [<ffffffff802f5f44>] adfspart_check_ICS+0x28/0x16c
[    1.592025]  [<ffffffff8025d427>] ? trace_hardirqs_on+0xd/0xf
[    1.592025]  [<ffffffff802f5f1c>] ? adfspart_check_ICS+0x0/0x16c
[    1.592025]  [<ffffffff802f59c5>] rescan_partitions+0x168/0x2fb
[    1.592025]  [<ffffffff802ceae9>] __blkdev_get+0x259/0x336
[    1.592025]  [<ffffffff803ca1e2>] ? kobject_put+0x47/0x4b
[    1.592025]  [<ffffffff802cebd1>] blkdev_get+0xb/0xd
[    1.592025]  [<ffffffff802f5773>] register_disk+0xc4/0x12b
[    1.592025]  [<ffffffff803b2a7b>] add_disk+0xc3/0x12d
[    1.592025]  [<ffffffff808a1d4a>] ? mm_init+0x0/0x1a5
[    1.592025]  [<ffffffff808a1e73>] mm_init+0x129/0x1a5
[    1.592025]  [<ffffffff808a1d4a>] ? mm_init+0x0/0x1a5
[    1.592025]  [<ffffffff80209056>] _stext+0x56/0x130
[    1.592025]  [<ffffffff80274932>] ? register_irq_proc+0xae/0xca
[    1.592025]  [<ffffffff802f0000>] ? proc_pid_lookup+0xb4/0x18b
[    1.592025]  [<ffffffff8087f975>] kernel_init+0x132/0x18b
[    1.592025]  [<ffffffff8020d17a>] child_rip+0xa/0x20
[    1.592025]  [<ffffffff8020cb40>] ? restore_args+0x0/0x30
[    1.592025]  [<ffffffff8087f843>] ? kernel_init+0x0/0x18b
[    1.592025]  [<ffffffff8020d170>] ? child_rip+0x0/0x20
[    1.592025] ---[ end trace 7150b3b86da74e1e ]---
[    1.889858] ------------[ cut here ]------------[ve_plug+0x5f/0x91()
[    1.893848] Hardware name: H8SSL
[    1.893848] Modules linked in:
[    1.893848] Pid: 1, comm: swapper Tainted: G        W  2.6.29 #8
[    1.893848] Call Trace:
[    1.893848]  [<ffffffff8023c994>] warn_slowpath+0xd3/0xf2
[    1.893848]  [<ffffffff805c8411>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[    1.893848]  [<ffffffff8020cb40>] ? restore_args+0x0/0x30
[    1.893848]  [<ffffffff80254245>] ? __atomic_notifier_call_chain+0x0/0xb2
[    1.893848]  [<ffffffff805c90a3>] ? _spin_unlock_irq+0x31/0x51
[    1.893848]  [<ffffffff805c90bf>] ? _spin_unlock_irq+0x4d/0x51
[    1.893848]  [<ffffffff8044157d>] ? mm_make_request+0x4e/0x59
[    1.893848]  [<ffffffff8025a70e>] ? get_lock_stats+0x34/0x5e
[    1.893848]  [<ffffffff8025a75d>] ? put_lock_stats+0x25/0x27
[    1.893848]  [<ffffffff80441504>] ? mm_unplug_device+0x25/0x50
[    1.893848]  [<ffffffff803acf23>] blk_remove_plug+0x5f/0x91
[    1.893848]  [<ffffffff8044150f>] mm_unplug_device+0x30/0x50
[    1.893848]  [<ffffffff803ab74a>] blk_unplug+0x78/0x7d
[    1.893848]  [<ffffffff803ab75c>] blk_backing_dev_unplug+0xd/0xf
[    1.893848]  [<ffffffff802c853c>] block_sync_page+0x4a/0x4c
[    1.893848]  [<ffffffff8027da1c>] sync_page+0x44/0x4d
[    1.893848]  [<ffffffff805c66fd>] __wait_on_bit_lock+0x42/0x8a
[    1.893848]  [<ffffffff8027d9d8>] ? sync_page+0x0/0x4d
[    1.893848]  [<ffffffff8027d9c4>] __lock_page+0x64/0x6b
[    1.893848]  [<ffffffff802508db>] ? wake_bit_function+0x0/0x2a
[    1.893848]  [<ffffffff8027de4a>] read_cache_page_async+0xd4/0x134
[    1.893848]  [<ffffffff802ceff5>] ? blkdev_readpage+0x0/0x15
[    1.893848]  [<ffffffff802f5f1c>] ? adfspart_check_ICS+0x0/0x16c
[    1.893848]  [<ffffffff8027deb8>] read_cache_page+0xe/0x45
[    1.893848]  [<ffffffff802f5170>] read_dev_sector+0x2e/0x93
[    1.893848]  [<ffffffff802f5f44>] adfspart_check_ICS+0x28/0x16c
[    1.893848]  [<ffffffff8025d427>] ? trace_hardirqs_on+0xd/0xf
[    1.893848]  [<ffffffff802f5f1c>] ? adfspart_check_ICS+0x0/0x16c
[    1.893848]  [<ffffffff802f59c5>] rescan_partitions+0x168/0x2fb
[    1.893848]  [<ffffffff802ceae9>] __blkdev_get+0x259/0x336
[    1.893848]  [<ffffffff803ca1e2>] ? kobject_put+0x47/0x4b
[    1.893848]  [<ffffffff802cebd1>] blkdev_get+0xb/0xd
[    1.893848]  [<ffffffff802f5773>] register_disk+0xc4/0x12b
[    1.893848]  [<ffffffff803b2a7b>] add_disk+0xc3/0x12d
[    1.893848]  [<ffffffff808a1d4a>] ? mm_init+0x0/0x1a5
[    1.893848]  [<ffffffff808a1e73>] mm_init+0x129/0x1a5
[    1.893848]  [<ffffffff808a1d4a>] ? mm_init+0x0/0x1a5
[    1.893848]  [<ffffffff80209056>] _stext+0x56/0x130
[    1.893848]  [<ffffffff80274932>] ? register_irq_proc+0xae/0xca
[    1.893848]  [<ffffffff802f0000>] ? proc_pid_lookup+0xb4/0x18b
[    1.893848]  [<ffffffff8087f975>] kernel_init+0x132/0x18b
[    1.893848]  [<ffffffff8020d17a>] child_rip+0xa/0x20
[    1.893848]  [<ffffffff8020cb40>] ? restore_args+0x0/0x30
[    1.893848]  [<ffffffff8087f843>] ? kernel_init+0x0/0x18b
[    1.893848]  [<ffffffff8020d170>] ? child_rip+0x0/0x20
[    1.893848] ---[ end trace 7150b3b86da74e1f ]---

Signed-off-by: Sage Weil <>
Signed-off-by: Jens Axboe <>
12 years agoblock: simplify I/O stat accounting
Jerome Marchand [Wed, 22 Apr 2009 12:01:49 +0000 (14:01 +0200)]
block: simplify I/O stat accounting

This simplifies I/O stat accounting switching code and separates it
completely from I/O scheduler switch code.

Requests are accounted according to the state of their request queue
at the time of the request allocation. There is no need anymore to
flush the request queue when switching I/O accounting state.

Signed-off-by: Jerome Marchand <>
Signed-off-by: Jens Axboe <>
12 years agopktcdvd.h should include mempool.h
Alexander Beregalov [Tue, 21 Apr 2009 07:33:14 +0000 (09:33 +0200)]
pktcdvd.h should include mempool.h

Fix this build error:
In file included from fs/compat_ioctl.c:104:
include/linux/pktcdvd.h:285: error: expected specifier-qualifier-list before 'mempool_t'

Signed-off-by: Alexander Beregalov <>
Signed-off-by: Jens Axboe <>
12 years agocfq-iosched: use the default seek distance when there aren't enough seek samples
Jeff Moyer [Tue, 21 Apr 2009 05:31:56 +0000 (07:31 +0200)]
cfq-iosched: use the default seek distance when there aren't enough seek samples

If the cfq io context doesn't have enough samples yet to provide a mean
seek distance, then use the default threshold we have for seeky IO instead
of defaulting to 0.

Signed-off-by: Jeff Moyer <>
Signed-off-by: Jens Axboe <>
12 years agocfq-iosched: make seek_mean converge more quickly
Jeff Moyer [Tue, 21 Apr 2009 05:25:04 +0000 (07:25 +0200)]
cfq-iosched: make seek_mean converge more quickly

Right now, depending on the first sector to which a process issues I/O,
the seek time may start out way out of whack. So make sure we start
with 0 sectors in seek, instead of the offset of the first request

Signed-off-by: Jeff Moyer <>
Signed-off-by: Jens Axboe <>
12 years agoblock: make blk_abort_queue() ignore non-request based devices
Jens Axboe [Fri, 17 Apr 2009 06:36:50 +0000 (08:36 +0200)]
block: make blk_abort_queue() ignore non-request based devices

There's nothing to do for those devices, since the timeout handling is
based on requests.

Signed-off-by: Jens Axboe <>
12 years agoblock: include empty disks in /proc/diskstats
Tejun Heo [Fri, 17 Apr 2009 06:34:48 +0000 (08:34 +0200)]
block: include empty disks in /proc/diskstats

/proc/diskstats used to show stats for all disks whether they're
zero-sized or not and their non-zero partitions.  Commit
074a7aca7afa6f230104e8e65eba3420263714a5 accidentally changed the
behavior such that it doesn't print out zero sized disks.  This patch
implements DISK_PITER_INCL_EMPTY_PART0 flag to partition iterator and
uses it in diskstats_show() such that empty part0 is shown in

Reported and bisectd by Dianel Collins.

Signed-off-by: Tejun Heo <>
Reported-by: Daniel Collins <>
Signed-off-by: Jens Axboe <>
12 years agobio: use bio_kmalloc() in copy/map functions
Tejun Heo [Wed, 15 Apr 2009 13:10:27 +0000 (22:10 +0900)]
bio: use bio_kmalloc() in copy/map functions

Impact: remove possible deadlock condition

There is no reason to use mempool backed allocation for map functions.
Also, because kern mapping is used inside LLDs (e.g. for EH), using
mempool backed allocation can lead to deadlock under extreme
conditions (mempool already consumed by the time a request reached EH
and requests are blocked on EH).

Switch copy/map functions to bio_kmalloc().

Signed-off-by: Tejun Heo <>
Signed-off-by: Jens Axboe <>
12 years agobio: fix bio_kmalloc()
Tejun Heo [Wed, 15 Apr 2009 17:50:51 +0000 (19:50 +0200)]
bio: fix bio_kmalloc()

Impact: fix bio_kmalloc() and its destruction path

bio_kmalloc() was broken in two ways.

* bvec_alloc_bs() first allocates bvec using kmalloc() and then
  ignores it and allocates again like non-kmalloc bvecs.

* bio_kmalloc_destructor() didn't check for and free bio integrity

This patch fixes the above problems.  kmalloc patch is separated out
from bio_alloc_bioset() and allocates the requested number of bvecs as
inline bvecs.

* bio_alloc_bioset() no longer takes NULL @bs.  None other than
  bio_kmalloc() used it and outside users can't know how it was
  allocated anyway.

* Define and use BIO_POOL_NONE so that pool index check in
  bvec_free_bs() triggers if inline or kmalloc allocated bvec gets

* Relocate destructors on top of each allocation function so that how
  they're used is more clear.

Jens Axboe suggested allocating bvecs inline.

Signed-off-by: Tejun Heo <>
Signed-off-by: Jens Axboe <>
12 years agoblock: fix queue bounce limit setting
Tejun Heo [Wed, 15 Apr 2009 13:10:25 +0000 (22:10 +0900)]
block: fix queue bounce limit setting

Impact: don't set GFP_DMA in q->bounce_gfp unnecessarily

All DMA address limits are expressed in terms of the last addressable
unit (byte or page) instead of one plus that.  However, when
determining bounce_gfp for 64bit machines in blk_queue_bounce_limit(),
it compares the specified limit against 0x100000000UL to determine
whether it's below 4G ending up falsely setting GFP_DMA in

As DMA zone is very small on x86_64, this makes larger SG_IO transfers
very eager to trigger OOM killer.  Fix it.  While at it, rename the
parameter to @dma_mask for clarity and convert comment to proper
winged style.

Signed-off-by: Tejun Heo <>
Signed-off-by: Jens Axboe <>
12 years agoblock: fix SG_IO vector request data length handling
Tejun Heo [Wed, 15 Apr 2009 13:10:24 +0000 (22:10 +0900)]
block: fix SG_IO vector request data length handling

Impact: fix SG_IO behavior such that it matches the documentation

SG_IO howto says that if ->dxfer_len and sum of iovec disagress, the
shorter one wins.  However, the current implementation returns -EINVAL
for such cases.  Trim iovc if it's longer than ->dxfer_len.

This patch uses iov_*() helpers which take struct iovec * by casting
struct sg_iovec * to it.  sg_iovec is always identical to iovec and
this will be further cleaned up with later patches.

Signed-off-by: Tejun Heo <>
Signed-off-by: Jens Axboe <>
12 years agoscatterlist: make sure sg_miter_next() doesn't return 0 sized mappings
Tejun Heo [Wed, 15 Apr 2009 13:10:23 +0000 (22:10 +0900)]
scatterlist: make sure sg_miter_next() doesn't return 0 sized mappings

Impact: fix not-so-critical but annoying bug

sg_miter_next() returns 0 sized mapping if there is an zero sized sg
entry in the list or at the end of each iteration.  As the users
always check the ->length field, this bug shouldn't be critical other
than causing unnecessary iteration.

Fix it.

Signed-off-by: Tejun Heo <>
Signed-off-by: Jens Axboe <>
12 years agoLinux 2.6.30-rc3
Linus Torvalds [Wed, 22 Apr 2009 03:07:00 +0000 (20:07 -0700)]
Linux 2.6.30-rc3

12 years agodriver synchronization: make scsi_wait_scan more advanced
Arjan van de Ven [Tue, 21 Apr 2009 20:32:54 +0000 (13:32 -0700)]
driver synchronization: make scsi_wait_scan more advanced

There is currently only one way for userspace to say "wait for my storage
device to get ready for the modules I just loaded": to load the
scsi_wait_scan module. Expectations of userspace are that once this
module is loaded, all the (storage) devices for which the drivers
were loaded before the module load are present.

Now, there are some issues with the implementation, and the async
stuff got caught in the middle of this: The existing code only
waits for the scsy async probing to finish, but it did not take
into account at all that probing might not have begun yet.
(Russell ran into this problem on his computer and the fix works for him)

This patch fixes this more thoroughly than the previous "fix", which
had some bad side effects (namely, for kernel code that wanted to wait for
the scsi scan it would also do an async sync, which would deadlock if you did
it from async context already.. there's a report about that on lkml):
The patch makes the module first wait for all device driver probes, and then it
will wait for the scsi parallel scan to finish.

Signed-off-by: Arjan van de Ven <>
Tested-by: Russell King <>
Signed-off-by: Linus Torvalds <>
12 years agoTrivial: fix a typo in slow-work.h
Jonathan Corbet [Tue, 21 Apr 2009 22:30:32 +0000 (16:30 -0600)]
Trivial: fix a typo in slow-work.h

Fix a comment typo in slow-work.h

...a trivial mistake, but it will mess up kerneldoc if nothing else.

Signed-off-by: Jonathan Corbet <>
Signed-off-by: Linus Torvalds <>
12 years agoPERCPU: Collect the DECLARE/DEFINE declarations together
David Howells [Tue, 21 Apr 2009 22:00:29 +0000 (23:00 +0100)]
PERCPU: Collect the DECLARE/DEFINE declarations together

Collect the DECLARE/DEFINE declarations together in linux/percpu-defs.h so
that they're in one place, and give them descriptive comments, particularly
the SHARED_ALIGNED variant.

It would be nice to collect these in linux/percpu.h, but that's not possible
without sorting out the severe #include recursion between the x86 arch headers
and the general headers (and possibly other arches too).

Signed-off-by: David Howells <>
Signed-off-by: Linus Torvalds <>
12 years agoFRV: Fix the section attribute on UP DECLARE_PER_CPU()
David Howells [Tue, 21 Apr 2009 22:00:24 +0000 (23:00 +0100)]
FRV: Fix the section attribute on UP DECLARE_PER_CPU()

In non-SMP mode, the variable section attribute specified by DECLARE_PER_CPU()
does not agree with that specified by DEFINE_PER_CPU().  This means that
architectures that have a small data section references relative to a base
register may throw up linkage errors due to too great a displacement between
where the base register points and the per-CPU variable.

On FRV, the .h declaration says that the variable is in the .sdata section, but
the .c definition says it's actually in the .data section.  The linker throws
up the following errors:

kernel/built-in.o: In function `release_task':
kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o

To fix this, DECLARE_PER_CPU() should simply apply the same section attribute
as does DEFINE_PER_CPU().  However, this is made slightly more complex by
virtue of the fact that there are several variants on DEFINE, so these need to
be matched by variants on DECLARE.

Signed-off-by: David Howells <>
Signed-off-by: Linus Torvalds <>
12 years agoMerge git://
Linus Torvalds [Tue, 21 Apr 2009 21:12:58 +0000 (14:12 -0700)]
Merge git://git./linux/kernel/git/mason/btrfs-unstable

* git://
  Btrfs: fix btrfs fallocate oops and deadlock
  Btrfs: use the right node in reada_for_balance
  Btrfs: fix oops on page->mapping->host during writepage
  Btrfs: add a priority queue to the async thread helpers
  Btrfs: use WRITE_SYNC for synchronous writes

12 years agoMerge branch 'i2c-for-linus' of git://
Linus Torvalds [Tue, 21 Apr 2009 21:12:43 +0000 (14:12 -0700)]
Merge branch 'i2c-for-linus' of git://

* 'i2c-for-linus' of git://
  go7007: Convert to the new i2c device binding model

12 years agobfin_5xx: misplaced parentheses
Roel Kluin [Tue, 21 Apr 2009 19:24:58 +0000 (12:24 -0700)]
bfin_5xx: misplaced parentheses

`!' has a higher precedence than `&', parentheses are misplaced.

Signed-off-by: Roel Kluin <>
Cc: Alan Cox <>
Acked-by: Sonic Zhang <>
Cc: Bryan Wu <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agovmscan,memcg: reintroduce sc->may_swap
KOSAKI Motohiro [Tue, 21 Apr 2009 19:24:57 +0000 (12:24 -0700)]
vmscan,memcg: reintroduce sc->may_swap

Commit a6dc60f8975ad96d162915e07703a4439c80dcf0 ("vmscan: rename
sc.may_swap to may_unmap") removed the may_swap flag, but memcg had used
it as a flag for "we need to use swap?", as the name indicate.

And in the current implementation, memcg cannot reclaim mapped file
caches when mem+swap hits the limit.

re-introduce may_swap flag and handle it at get_scan_ratio().  This
patch doesn't influence any scan_control users other than memcg.

Signed-off-by: KOSAKI Motohiro <>
Signed-off-by: Daisuke Nishimura <>
Acked-by: Johannes Weiner <>
Cc: Balbir Singh <>
Cc: KAMEZAWA Hiroyuki <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agoedac: ppc mpc85xx fix mc err detect
Dave Jiang [Tue, 21 Apr 2009 19:24:56 +0000 (12:24 -0700)]
edac: ppc mpc85xx fix mc err detect

Error found by Jeff Haran.

The error detect register is 0s when no errors are detected.  The check
code is incorrect, so reverse check sense.

Reported-by: Jeff Haran <jharan@Brocade.COM>
Signed-off-by: Dave Jiang <>
Signed-off-by: Doug Thompson <>
Cc: Benjamin Herrenschmidt <>
Acked-by: Kumar Gala <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agoscsi: mpt: suppress debugobjects warning
Eric Paris [Tue, 21 Apr 2009 19:24:54 +0000 (12:24 -0700)]
scsi: mpt: suppress debugobjects warning


ODEBUG: object is on stack, but not annotated
------------[ cut here ]------------
WARNING: at lib/debugobjects.c:253 __debug_object_init+0x1f3/0x276()
Hardware name: VMware Virtual Platform
Modules linked in: mptspi(+) mptscsih mptbase scsi_transport_spi ext3 jbd mbcache
Pid: 540, comm: insmod Not tainted 2.6.28-mm1 #2
Call Trace:
 [<c042c51c>] warn_slowpath+0x74/0x8a
 [<c0469600>] ? start_critical_timing+0x96/0xb7
 [<c060c8ea>] ? _spin_unlock_irqrestore+0x2f/0x3c
 [<c0446fad>] ? trace_hardirqs_off_caller+0x18/0xaf
 [<c044704f>] ? trace_hardirqs_off+0xb/0xd
 [<c060c8ea>] ? _spin_unlock_irqrestore+0x2f/0x3c
 [<c042cb84>] ? release_console_sem+0x1a5/0x1ad
 [<c05013e6>] __debug_object_init+0x1f3/0x276
 [<c0501494>] debug_object_init+0x13/0x17
 [<c0433c56>] init_timer+0x10/0x1a
 [<e08e5b54>] mpt_config+0x1c1/0x2b7 [mptbase]
 [<e08e3b82>] ? kmalloc+0x8/0xa [mptbase]
 [<e08e3b82>] ? kmalloc+0x8/0xa [mptbase]
 [<e08e6fa2>] mpt_do_ioc_recovery+0x950/0x1212 [mptbase]
 [<c04496c2>] ? __lock_acquire+0xa69/0xacc
 [<c060c8f1>] ? _spin_unlock_irqrestore+0x36/0x3c
 [<c060c3af>] ? _spin_unlock_irq+0x22/0x26
 [<c04f2d8b>] ? string+0x2b/0x76
 [<c04f310e>] ? vsnprintf+0x338/0x7b3
 [<c04496c2>] ? __lock_acquire+0xa69/0xacc
 [<c060c8ea>] ? _spin_unlock_irqrestore+0x2f/0x3c
 [<c04496c2>] ? __lock_acquire+0xa69/0xacc
 [<c044897d>] ? debug_check_no_locks_freed+0xeb/0x105
 [<c060c8f1>] ? _spin_unlock_irqrestore+0x36/0x3c
 [<c04488bc>] ? debug_check_no_locks_freed+0x2a/0x105
 [<c0446b8c>] ? lock_release_holdtime+0x43/0x48
 [<c043f742>] ? up_read+0x16/0x29
 [<c05076f8>] ? pci_get_slot+0x66/0x72
 [<e08e89ca>] mpt_attach+0x881/0x9b1 [mptbase]
 [<e091c8e5>] mptspi_probe+0x11/0x354 [mptspi]

Noticing that every caller of mpt_config has its CONFIGPARMS struct
declared on the stack and thus the &pCfg->timer is always on the stack I
changed init_timer() to init_timer_on_stack() and it seems to have shut

Cc: "Moore, Eric Dean" <>
Cc: James Bottomley <>
Cc: Thomas Gleixner <>
Acked-by: "Desai, Kashyap" <>
Cc: <> [2.6.29.x]
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agosgi-xp/sgi-gru: allow modules to load on non-uv systems
Robin Holt [Tue, 21 Apr 2009 19:24:53 +0000 (12:24 -0700)]
sgi-xp/sgi-gru: allow modules to load on non-uv systems

For an upcoming distro release, we need to have the xp kernel module
loadable even when not on UV equipment.  The xpc module will not load.
This will allow one set of modules dependent upon xp to work on either UV
or non-UV equipment.

Signed-off-by: Robin Holt <>
Signed-off-by: Jack Steiner <>
Cc: Ingo Molnar <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agouml: kill a kconfig warning
WANG Cong [Tue, 21 Apr 2009 19:24:52 +0000 (12:24 -0700)]
uml: kill a kconfig warning

Got this warning from Kconfig:

   boolean symbol INPUT tested for 'm'? test forced to 'n'

because INPUT is tristate, not bool.

Signed-off-by: WANG Cong <>
Cc: Sam Ravnborg <>
Cc: Jeff Dike <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agofrv: insert PCI root bus resources for the MB93090 devel motherboard
David Howells [Tue, 21 Apr 2009 19:24:51 +0000 (12:24 -0700)]
frv: insert PCI root bus resources for the MB93090 devel motherboard

Insert PCI root bus resources for the FRV-based MB93090 development kit
motherboard.  This is required because the CPU's window onto the PCI bus
address space is considerably smaller than the CPU's full address space
and non-PCI devices lie outside of the PCI window that we might want to

Without this patch, the PCI root bus uses the platform-level bus
resources, and these are then confined to the PCI window, thus making
platform_device_add() reject devices outside of this window.

Signed-off-by: David Howells <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agortc-cmos: fix printk output
Krzysztof Halasa [Tue, 21 Apr 2009 19:24:49 +0000 (12:24 -0700)]
rtc-cmos: fix printk output

With no IRQ available/defined, RTC-CMOS driver prints something like:
rtc0: alarms up to one no, y3k, 114 bytes nvram
I guess the following is a bit easier to understand:
rtc0: no alarms, y3k, 114 bytes nvram

Signed-off-by: Krzysztof Halasa <>
Cc: David Brownell <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agospi: documentation: emphasise spi_master.setup() semantics
David Brownell [Tue, 21 Apr 2009 19:24:49 +0000 (12:24 -0700)]
spi: documentation: emphasise spi_setup() semantics

This is a doc-only patch which I hope will reduce the number of
spi_master controller driver patches starting out with a common
implementation bug.

(As in: almost every spi_master driver I see starts out with its
version of this bug.  Sigh.)

It just re-emphasizes that the setup() method may be called for one
device while a transfer is active on another ...  which means that most
driver implementations shouldn't touch any registers.

Signed-off-by: David Brownell <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agoMAINTAINERS: add a more searchable string for the H8300 architecture.
Robert P. J. Day [Tue, 21 Apr 2009 19:24:47 +0000 (12:24 -0700)]
MAINTAINERS: add a more searchable string for the H8300 architecture.

Add a parenthesized string of "H8300" for more convenient searchability
in the MAINTAINERS file.

Signed-off-by: Robert P. J. Day <>
Cc: Yoshinori Sato <>
Cc: Joe Perches <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agoMAINTAINERS: add Matt Mackall to embedded maintainers
Matt Mackall [Tue, 21 Apr 2009 19:24:47 +0000 (12:24 -0700)]
MAINTAINERS: add Matt Mackall to embedded maintainers

Impact: make more work for myself

Signed-off-by: Matt Mackall <>
Cc: David Woodhouse <>
Acked-by: Paul Gortmaker <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agospi: pxa2xx: limit reaches -1
Roel Kluin [Tue, 21 Apr 2009 19:24:46 +0000 (12:24 -0700)]
spi: pxa2xx: limit reaches -1

On line 944 the return value of flush() is considered as a boolean,
but limit reaches -1 upon timeout which evaluates to true.

On 540, 594, 720 the same occurs for wait_ssp_rx_stall()
On 536 the same occurs for wait_dma_channel_stop()

Signed-off-by: Roel Kluin <>
Acked-by: Eric Miao <>
Cc: David Brownell <>
Cc: Russell King <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agoMAINTAINERS: update KMEMTRACE pattern after file rename
Joe Perches [Tue, 21 Apr 2009 19:24:45 +0000 (12:24 -0700)]
MAINTAINERS: update KMEMTRACE pattern after file rename

Signed-off-by: Joe Perches <>
Acked-by: Pekka Enberg <>
Acked-by: Eduard - Gabriel Munteanu <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agoMAINTAINERS: remove include/asm-*/suspend* file patterns
Joe Perches [Tue, 21 Apr 2009 19:24:44 +0000 (12:24 -0700)]
MAINTAINERS: remove include/asm-*/suspend* file patterns

There are no more arches with suspend support using these directories.

Signed-off-by: Joe Perches <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agopxa2xx_spi: restore DRCMR on resume
Daniel Ribeiro [Tue, 21 Apr 2009 19:24:43 +0000 (12:24 -0700)]
pxa2xx_spi: restore DRCMR on resume

If DMA is enabled, any spi_sync call after suspend/resume would block
forever, because DRCMR is lost on suspend.  This patch restores DRCMR to
the same values set by probe.

Signed-off-by: Daniel Ribeiro <>
Signed-off-by: David Brownell <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agodrivers/input/serio/hp_sdc.c: fix crash when removing hp_sdc module
Helge Deller [Tue, 21 Apr 2009 19:24:42 +0000 (12:24 -0700)]
drivers/input/serio/hp_sdc.c: fix crash when removing hp_sdc module

On parisc machines, which don't have HIL, removing the hp_sdc module
panics the kernel.  Fix this by returning early in hp_sdc_exit() if no HP
SDC controller was found.

Add functionality to probe for the hp_sdc_mlc kernel module (which takes
care of the upper layer HIL functionality on parisc) after two seconds.
This is needed to get all the other HIL drivers (keyboard / mouse/ ..)
drivers automatically loaded by udev later as well.

Signed-off-by: Helge Deller <>
Cc: Geert Uytterhoeven <>
Cc: Frans Pop <>
Cc: Kyle McMartin <>
Cc: Grant Grundler <>
Acked-by: Dmitry Torokhov <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agomemcg: use rcu_dereference to access mm->owner
KAMEZAWA Hiroyuki [Tue, 21 Apr 2009 19:24:41 +0000 (12:24 -0700)]
memcg: use rcu_dereference to access mm->owner

mm->owner should be accessed with rcu_dereference().

Reported-by: KOSAKI Motohiro <>
Signed-off-by: KAMEZAWA Hiroyuki <>
Acked-by: Balbir Singh <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agohugetlbfs: return negative error code for bad mount option
Akinobu Mita [Tue, 21 Apr 2009 19:24:05 +0000 (12:24 -0700)]
hugetlbfs: return negative error code for bad mount option

This fixes the following BUG:

  # mount -o size=MM -t hugetlbfs none /huge
  hugetlbfs: Bad value 'MM' for mount option 'size=MM'
  ------------[ cut here ]------------
  kernel BUG at fs/super.c:996!

Due to


in vfs_kern_mount().

Also, remove unused #include <linux/quotaops.h>

Cc: William Irwin <>
Cc: <>
Signed-off-by: Akinobu Mita <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agoipmi: add oem message handling
dann frazier [Tue, 21 Apr 2009 19:24:05 +0000 (12:24 -0700)]
ipmi: add oem message handling

Enable userspace to receive messages that a BMC transmits using an OEM
medium.  This is used by the HP iLO2.

Based on code originally written by Patrick Schoeller.

Signed-off-by: dann frazier <>
Signed-off-by: Corey Minyard <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agoipmi: fix statistics counting issues
Corey Minyard [Tue, 21 Apr 2009 19:24:04 +0000 (12:24 -0700)]
ipmi: fix statistics counting issues

Bela Lubkin noticed that the statistics for send IPMB and LAN commands
in the IPMI driver could be incremented even if an error occurred.  Move
the increments to the proper place to avoid this.

Also add some statistics for retransmissions that failed, and some little
helper functions to neaten up the code a little.

Signed-off-by: Corey Minyard <>
Cc: Bela Lubkin <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agoipmi: test for event buffer before using
Corey Minyard [Tue, 21 Apr 2009 19:24:03 +0000 (12:24 -0700)]
ipmi: test for event buffer before using

The IPMI driver would attempt to use the event buffer even if that
didn't exist on the BMC.  This patch modified the IPMI driver to check
for the event buffer's existence before trying to use it.

Signed-off-by: Corey Minyard <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agoipmi: fix platform return check
Corey Minyard [Tue, 21 Apr 2009 19:24:02 +0000 (12:24 -0700)]
ipmi: fix platform return check

The wrong return value is being tested when allocating a platform device
in the IPMI SI code.  Check the right value.

Signed-off-by: Corey Minyard <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agoclocksource: add enable() and disable() callbacks
Magnus Damm [Tue, 21 Apr 2009 19:24:02 +0000 (12:24 -0700)]
clocksource: add enable() and disable() callbacks

Add enable() and disable() callbacks for clocksources.

This allows us to put unused clocksources in power save mode.  The
functions clocksource_enable() and clocksource_disable() wrap the
callbacks and are inserted in the timekeeping code to enable before use
and disable after switching to a new clocksource.

Signed-off-by: Magnus Damm <>
Acked-by: John Stultz <>
Cc: Thomas Gleixner <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agoclocksource: pass clocksource to read() callback
Magnus Damm [Tue, 21 Apr 2009 19:24:00 +0000 (12:24 -0700)]
clocksource: pass clocksource to read() callback

Pass clocksource pointer to the read() callback for clocksources.  This
allows us to share the callback between multiple instances.

[ fix powerpc build of clocksource pass clocksource mods]
[ cleanup]
Signed-off-by: Magnus Damm <>
Acked-by: John Stultz <>
Cc: Thomas Gleixner <>
Signed-off-by: Hugh Dickins <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agopxafb: lcsr1 is unused without CONFIG_FB_PXA_OVERLAY
Denis V. Lunev [Tue, 21 Apr 2009 19:23:59 +0000 (12:23 -0700)]
pxafb: lcsr1 is unused without CONFIG_FB_PXA_OVERLAY

Fixes the warning:

  drivers/video/pxafb.c: In function 'pxafb_handle_irq':
  drivers/video/pxafb.c:1442: warning: unused variable 'lcsr1'

[ save an ifdef]
Signed-off-by: Denis V. Lunev <>
Cc: Eric Miao <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agoasiliantfb: add missing return statement
Vlada Peric [Tue, 21 Apr 2009 19:23:59 +0000 (12:23 -0700)]
asiliantfb: add missing return statement

Commit 032220ba (asiliantfb: fix cmap memory leaks) changed the function
init_asiliant from void to int, resulting in the following compile warning:

  drivers/video/asiliantfb.c: In function `init_asiliant':
  drivers/video/asiliantfb.c:536: warning: control reaches end of non-void function

Fix the warning by returning 0.

Signed-off-by: Vlada Peric <>
Cc: Andres Salomon <>
Signed-off-by: Andrew Morton <>
Signed-off-by: Linus Torvalds <>
12 years agogo7007: Convert to the new i2c device binding model
Jean Delvare [Tue, 21 Apr 2009 19:47:22 +0000 (21:47 +0200)]
go7007: Convert to the new i2c device binding model

Move the go7007 driver away from the legacy i2c binding model, which
is going away really soon now.

The I2C addresses of the audio and video chips in s2250-board didn't
look quite right, apparently they were left-aligned values when Linux
wants right-aligned values, so I fixed them too.

Signed-off-by: Jean Delvare <>
Cc: Greg Kroah-Hartman <>
12 years agoBtrfs: fix btrfs fallocate oops and deadlock
Chris Mason [Tue, 21 Apr 2009 15:53:38 +0000 (11:53 -0400)]
Btrfs: fix btrfs fallocate oops and deadlock

Btrfs fallocate was incorrectly starting a transaction with a lock held
on the extent_io tree for the file, which could deadlock.  Strictly
speaking it was using join_transaction which would be safe, but it is better
to move the transaction outside of the lock.

When preallocated extents are overwritten, btrfs_mark_buffer_dirty was
being called on an unlocked buffer.  This was triggering an assertion and
oops because the lock is supposed to be held.

The bug was calling btrfs_mark_buffer_dirty on a leaf after btrfs_del_item had
been run.  btrfs_del_item takes care of dirtying things, so the solution is a
to skip the btrfs_mark_buffer_dirty call in this case.

Signed-off-by: Chris Mason <>
12 years agoMerge git://
Linus Torvalds [Tue, 21 Apr 2009 15:27:30 +0000 (08:27 -0700)]
Merge git://git./linux/kernel/git/steve/gfs2-2.6-fixes

* git://
  GFS2: Fix page_mkwrite() return code
  GFS2: Clear dirty bit at end of inode glock sync

12 years agoMerge branch 'sh/for-2.6.30' of git://
Linus Torvalds [Tue, 21 Apr 2009 15:16:14 +0000 (08:16 -0700)]
Merge branch 'sh/for-2.6.30' of git://git./linux/kernel/git/lethal/sh-2.6

* 'sh/for-2.6.30' of git://
  sh: Fix mmap2 for handling differing PAGE_SIZEs.
  sh: sh7723: Don't default enable the RTC clock.
  sh: sh7722: Don't default enable the RTC clock.
  rtc: rtc-sh: clock framework support.

12 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Tue, 21 Apr 2009 14:56:17 +0000 (07:56 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6

* 'for-linus' of git://
  reiserfs: fix j_last_flush_trans_id type
  fs: Mark get_filesystem_list() as __init function.
  kill vfs_stat_fd / vfs_lstat_fd
  Separate out common fstatat code into vfs_fstatat
  ecryptfs: use memdup_user()
  ncpfs: use memdup_user()
  xfs: use memdup_user()
  sysfs: use memdup_user()
  btrfs: use memdup_user()
  xattr: use memdup_user()
  autofs4: use memchr() in invalid_string()
  Documentation/filesystems: remove out of date reference to BKL being held
  Fix i_mutex vs. readdir handling in nfsd
  fs/compat_ioctl: fix build when !BLOCK
  Fix autofs_expire()
  No need for crossing to mountpoint in audit_tag_tree()
  Safer nfsd_cross_mnt()
  Touch all affected namespaces on propagation of mount

12 years agoFix SYSCALL_ALIAS for older MIPS assembler
Thomas Bogendoerfer [Tue, 21 Apr 2009 11:44:13 +0000 (13:44 +0200)]
Fix SYSCALL_ALIAS for older MIPS assembler

Older MIPS assembler don't support .set for defining aliases.
Using = works for old and new assembers.

Signed-off-by: Thomas Bogendoerfer <>
Acked-by: Ralf Baechle <>
Signed-off-by: Linus Torvalds <>
12 years agoNFS: Fix the XDR iovec calculation in nfs3_xdr_setaclargs
Trond Myklebust [Mon, 20 Apr 2009 18:58:35 +0000 (14:58 -0400)]
NFS: Fix the XDR iovec calculation in nfs3_xdr_setaclargs

Commit ae46141ff08f1965b17c531b571953c39ce8b9e2 (NFSv3: Fix posix ACL code)
introduces a bug in the calculation of the XDR header iovec. In the case
where we are inlining the acls, we need to adjust the length of the iovec
req->rq_svec, in addition to adjusting the total buffer length.

Tested-by: Leonardo Chiquitto <>
Tested-by: Suresh Jayaraman <>
Signed-off-by: Trond Myklebust <>
Signed-off-by: Linus Torvalds <>
12 years agoMerge branch 'sh/stable-updates' into sh/for-2.6.30
Paul Mundt [Tue, 21 Apr 2009 08:12:16 +0000 (17:12 +0900)]
Merge branch 'sh/stable-updates' into sh/for-2.6.30

12 years agoreiserfs: fix j_last_flush_trans_id type
Al Viro [Tue, 21 Apr 2009 03:29:41 +0000 (23:29 -0400)]
reiserfs: fix j_last_flush_trans_id type

Conversion in commit 600ed41675d8c384519d8f0b3c76afed39ef2f4b had missed
that one, but converted format from %lu to %u.  As the result,
/proc/..../journal got buggered on 64bit boxen.

Signed-off-by: Al Viro <>
12 years agofs: Mark get_filesystem_list() as __init function.
Tetsuo Handa [Thu, 9 Apr 2009 11:17:52 +0000 (20:17 +0900)]
fs: Mark get_filesystem_list() as __init function.

"int get_filesystem_list(char * buf)" is called by only
"static void __init get_fs_names(char *page)".
We can mark get_filesystem_list() as "__init".

Signed-off-by: Tetsuo Handa <>
Signed-off-by: Al Viro <>
12 years agokill vfs_stat_fd / vfs_lstat_fd
Christoph Hellwig [Wed, 8 Apr 2009 20:34:03 +0000 (16:34 -0400)]
kill vfs_stat_fd / vfs_lstat_fd

There's really no reason to keep vfs_stat_fd and vfs_lstat_fd with
Oleg's vfs_fstatat.  Use vfs_fstatat for the few cases having the
directory fd, and switch all others to vfs_stat / vfs_lstat.

Reviewed-by: Christoph Hellwig <>
Signed-off-by: Al Viro <>
12 years agoSeparate out common fstatat code into vfs_fstatat
Oleg Drokin [Wed, 8 Apr 2009 16:05:42 +0000 (20:05 +0400)]
Separate out common fstatat code into vfs_fstatat

This is a version incorporating Christoph's suggestion.

Separate out common *fstatat functionality into a single function
instead of duplicating it all over the code.

Signed-off-by: Oleg Drokin <>
Signed-off-by: Al Viro <>
12 years agoecryptfs: use memdup_user()
Li Zefan [Wed, 8 Apr 2009 07:09:29 +0000 (15:09 +0800)]
ecryptfs: use memdup_user()

Remove open-coded memdup_user().

Signed-off-by: Li Zefan <>
Signed-off-by: Al Viro <>
12 years agoncpfs: use memdup_user()
Li Zefan [Wed, 8 Apr 2009 07:08:53 +0000 (15:08 +0800)]
ncpfs: use memdup_user()

Remove open-coded memdup_user()

Signed-off-by: Li Zefan <>
Signed-off-by: Al Viro <>
12 years agoxfs: use memdup_user()
Li Zefan [Wed, 8 Apr 2009 07:08:04 +0000 (15:08 +0800)]
xfs: use memdup_user()

Remove open-coded memdup_user()

Signed-off-by: Li Zefan <>
Signed-off-by: Al Viro <>
12 years agosysfs: use memdup_user()
Li Zefan [Wed, 8 Apr 2009 07:07:30 +0000 (15:07 +0800)]
sysfs: use memdup_user()

Remove open-coded memdup_user().

Signed-off-by: Li Zefan <>
Signed-off-by: Al Viro <>
12 years agobtrfs: use memdup_user()
Li Zefan [Wed, 8 Apr 2009 07:06:54 +0000 (15:06 +0800)]
btrfs: use memdup_user()

Remove open-coded memdup_user().

Note this changes some GFP_NOFS to GFP_KERNEL, since copy_from_user() may
cause pagefault, it's pointless to pass GFP_NOFS to kmalloc().

Signed-off-by: Li Zefan <>
Signed-off-by: Al Viro <>
12 years agoxattr: use memdup_user()
Li Zefan [Wed, 8 Apr 2009 07:06:12 +0000 (15:06 +0800)]
xattr: use memdup_user()

Remove open-coded memdup_user()

Signed-off-by: Li Zefan <>
Signed-off-by: Al Viro <>
12 years agoautofs4: use memchr() in invalid_string()
Al Viro [Tue, 7 Apr 2009 15:12:46 +0000 (11:12 -0400)]
autofs4: use memchr() in invalid_string()

Signed-off-by: Al Viro <>
12 years agoDocumentation/filesystems: remove out of date reference to BKL being held
Adrian McMenamin [Tue, 21 Apr 2009 01:38:28 +0000 (18:38 -0700)]
Documentation/filesystems: remove out of date reference to BKL being held

Documentation/filesystems/vfs.txt incorrectly states that the kernel is
locked during the call to statfs (Documentation/filesystems/Locking
correctly says it is not). This patch removes the offending sentence.

remove reference to BKL being held in statfs

Signed-off-by: Adrian McMenamin <>
Signed-off-by: Randy Dunlap <>
Cc: Alexander Viro <>
Signed-off-by: Al Viro <>
12 years agoFix i_mutex vs. readdir handling in nfsd
David Woodhouse [Mon, 20 Apr 2009 22:18:37 +0000 (23:18 +0100)]
Fix i_mutex vs. readdir handling in nfsd

Commit 14f7dd63 ("Copy XFS readdir hack into nfsd code") introduced a
bug to generic code which had been extant for a long time in the XFS
version -- it started to call through into lookup_one_len() and hence
into the file systems' ->lookup() methods without i_mutex held on the

This patch fixes it by locking the directory's i_mutex again before
calling the filldir functions. The original deadlocks which commit
14f7dd63 was designed to avoid are still avoided, because they were due
to fs-internal locking, not i_mutex.

While we're at it, fix the return type of nfsd_buffered_readdir() which
should be a __be32 not an int -- it's an NFS errno, not a Linux errno.
And return nfserrno(-ENOMEM) when allocation fails, not just -ENOMEM.
Sparse would have caught that, if it wasn't so busy bitching about

Commit 05f4f678 ("nfsd4: don't do lookup within readdir in recovery
code") introduced a similar problem with calling lookup_one_len()
without i_mutex, which this patch also addresses. To fix that, it was
necessary to fix the called functions so that they expect i_mutex to be
held; that part was done by J. Bruce Fields.

Signed-off-by: David Woodhouse <>
Umm-I-can-live-with-that-by: Al Viro <>
Reported-by: J. R. Okajima <>
Tested-by: J. Bruce Fields <>
LKML-Reference: <8036.1237474444@jrobl>
Signed-off-by: Al Viro <>
12 years agofs/compat_ioctl: fix build when !BLOCK
Alexander Beregalov [Mon, 20 Apr 2009 08:23:02 +0000 (12:23 +0400)]
fs/compat_ioctl: fix build when !BLOCK

In file included from fs/compat_ioctl.c:61:
include/linux/loop.h:59: error: field 'lo_bio_list' has incomplete type

Signed-off-by: Alexander Beregalov <>
Signed-off-by: Al Viro <>
12 years agoFix autofs_expire()
Al Viro [Sat, 18 Apr 2009 15:19:26 +0000 (11:19 -0400)]
Fix autofs_expire()

mnt should remain the same for all iterations through the list;
as it is, if we have a busy mount, mnt follows into it and isn't
restored for the next iteration.

Signed-off-by: Al Viro <>
12 years agoNo need for crossing to mountpoint in audit_tag_tree()
Al Viro [Sat, 18 Apr 2009 07:25:41 +0000 (03:25 -0400)]
No need for crossing to mountpoint in audit_tag_tree()

is_under() will DTRT anyway.  And yes, is_subdir() behaviour
is intentional.

Signed-off-by: Al Viro <>
12 years agoSafer nfsd_cross_mnt()
Al Viro [Sat, 18 Apr 2009 06:32:31 +0000 (02:32 -0400)]
Safer nfsd_cross_mnt()

AFAICS, we have a subtle bug there: if we have crossed mountpoint
*and* it got mount --move'd away, we'll be holding only one
reference to fs containing dentry - exp->ex_path.mnt.  IOW, we
ought to dput() before exp_put().

Signed-off-by: Al Viro <>
12 years agoTouch all affected namespaces on propagation of mount
Al Viro [Tue, 7 Apr 2009 16:15:39 +0000 (12:15 -0400)]
Touch all affected namespaces on propagation of mount

We shouldn't just touch the namespace of current process

Caught-by: Trond Myklebust <>
Signed-off-by: Al Viro <>
Al Viro [Tue, 7 Apr 2009 13:03:30 +0000 (09:03 -0400)]

Missing conversion from kernel to userland dev_t; this sucker
breaks as soon as we get sufficiently many autofs mounts for
new_encode_dev(s_dev) != s_dev.

Note: this is the minimal fix.

Signed-off-by: Al Viro <>
12 years agosh: Fix mmap2 for handling differing PAGE_SIZEs.
Toshinobu Sugioka [Mon, 20 Apr 2009 22:34:53 +0000 (07:34 +0900)]
sh: Fix mmap2 for handling differing PAGE_SIZEs.

mmap2 uses a fixed page shift of 12, regardless of the PAGE_SIZE setting.
Fix up the mmap2 code to add some sanity checks on the mapping, and to
update pgoff accordingly.

Error handling bits based on 4280e3126f641898f0ed1a931645373d3489e2a6
("frv: fix mmap2 error handling").

Signed-off-by: Toshinobu Sugioka <>
Signed-off-by: Paul Mundt <>
12 years agoBtrfs: use the right node in reada_for_balance
Chris Mason [Mon, 20 Apr 2009 19:50:10 +0000 (15:50 -0400)]
Btrfs: use the right node in reada_for_balance

reada_for_balance was using the wrong index into the path node array,
so it wasn't reading the right blocks.  We never directly used the
results of the read done by this function because the btree search is
started over at the end.

This fixes reada_for_balance to reada in the correct node and to
avoid searching past the last slot in the node.  It also makes sure to
hold the parent lock while we are finding the nodes to read.

Signed-off-by: Chris Mason <>
12 years agoBtrfs: fix oops on page->mapping->host during writepage
Chris Mason [Mon, 20 Apr 2009 19:50:09 +0000 (15:50 -0400)]
Btrfs: fix oops on page->mapping->host during writepage

The extent_io writepage call updates the writepage index in the inode
as it makes progress.  But, it was doing the update after unlocking the page,
which isn't legal because page->mapping can't be trusted once the page
is unlocked.

This lead to an oops, especially common with compression turned on.  The
fix here is to update the writeback index before unlocking the page.

Signed-off-by: Chris Mason <>
12 years agoBtrfs: add a priority queue to the async thread helpers
Chris Mason [Mon, 20 Apr 2009 19:50:09 +0000 (15:50 -0400)]
Btrfs: add a priority queue to the async thread helpers

Btrfs is using WRITE_SYNC_PLUG to send down synchronous IOs with a
higher priority.  But, the checksumming helper threads prevent it
from being fully effective.

There are two problems.  First, a big queue of pending checksumming
will delay the synchronous IO behind other lower priority writes.  Second,
the checksumming uses an ordered async work queue.  The ordering makes sure
that IOs are sent to the block layer in the same order they are sent
to the checksumming threads.  Usually this gives us less seeky IO.

But, when we start mixing IO priorities, the lower priority IO can delay
the higher priority IO.

This patch solves both problems by adding a high priority list to the async
helper threads, and a new btrfs_set_work_high_prio(), which is used
to make put a new async work item onto the higher priority list.

The ordering is still done on high priority IO, but all of the high
priority bios are ordered separately from the low priority bios.  This
ordering is purely an IO optimization, it is not involved in data
or metadata integrity.

Signed-off-by: Chris Mason <>
12 years agoBtrfs: use WRITE_SYNC for synchronous writes
Chris Mason [Mon, 20 Apr 2009 19:50:09 +0000 (15:50 -0400)]
Btrfs: use WRITE_SYNC for synchronous writes

Part of reducing fsync/O_SYNC/O_DIRECT latencies is using WRITE_SYNC for
writes we plan on waiting on in the near future.  This patch
mirrors recent changes in other filesystems and the generic code to
use WRITE_SYNC when WB_SYNC_ALL is passed and to use WRITE_SYNC for
other latency critical writes.

Btrfs uses async worker threads for checksumming before the write is done,
and then again to actually submit the bios.  The bio submission code just
runs a per-device list of bios that need to be sent down the pipe.

This list is split into low priority and high priority lists so the
WRITE_SYNC IO happens first.

Signed-off-by: Chris Mason <>
12 years agoMerge branch 'release' of git://
Linus Torvalds [Mon, 20 Apr 2009 19:34:36 +0000 (12:34 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/aegl/linux-2.6

* 'release' of git://
  [IA64] fix allmodconfig compilation breakage.
  [IA64] smp_flush_tlb_mm() should only send IPI's to cpus in cpu_vm_mask
  [IA64] export smp_send_reschedule

12 years ago[IA64] fix allmodconfig compilation breakage.
Isaku Yamahata [Sat, 18 Apr 2009 03:15:23 +0000 (12:15 +0900)]
[IA64] fix allmodconfig compilation breakage.

This patch fixes the following compilation error caused by recursive
inclusion of kernel.h which defines BUILD_BUG_ON().
In this case, the case it catches will be caught by the case
CONFIG_PARAVIRT=n, so removing it would not hurt compile time check
very much. So fix the breakage by removing it.

  CC      arch/ia64/kernel/asm-offsets.s
In file included from include/linux/bitops.h:17,
                 from include/linux/kernel.h:15,
                 from include/linux/sched.h:52,
                 from arch/ia64/kernel/asm-offsets.c:9:
arch/ia64/include/asm/bitops.h: In function 'set_bit':
arch/ia64/include/asm/bitops.h:47: error: implicit declaration of function 'BUILD_BUG_ON'

Signed-off-by: Isaku Yamahata <>
Signed-off-by: Tony Luck <>
12 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Mon, 20 Apr 2009 15:43:06 +0000 (08:43 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/rafael/suspend-2.6

* 'for-linus' of git://
  PM/Suspend: Introduce two new platform callbacks to avoid breakage

12 years agoMerge branch 'drm-linus' of git://
Linus Torvalds [Mon, 20 Apr 2009 15:42:48 +0000 (08:42 -0700)]
Merge branch 'drm-linus' of git://git./linux/kernel/git/airlied/drm-2.6

* 'drm-linus' of git://
  agp: zero pages before sending to userspace
  drm: check for minor master before allowing drop master.
  drm: set/clear is_master when master changed
  drm: clean dirty memory after device release
  drm: count reaches -1

12 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Mon, 20 Apr 2009 15:37:37 +0000 (08:37 -0700)]
Merge branch 'for-linus' of git://

* 'for-linus' of git://
  md: support bitmaps on RAID10 arrays larger then 2 terabytes
  md: update sync_completed and reshape_position even more often.
  md: improve usefulness and accuracy of sysfs file md/sync_completed.
  md: allow setting newly added device to 'in_sync' via sysfs.
  md: tiny md.h cleanups

12 years agoFS-Cache: Add MAINTAINERS record for FS-Cache and CacheFiles
David Howells [Mon, 20 Apr 2009 14:46:45 +0000 (15:46 +0100)]
FS-Cache: Add MAINTAINERS record for FS-Cache and CacheFiles

Add MAINTAINERS record for FS-Cache and CacheFiles.

Signed-off-by: David Howells <>
Signed-off-by: Linus Torvalds <>
12 years agoFRV: Don't attempt to #include <linux/blk.h> as it doesn't exist
David Howells [Mon, 20 Apr 2009 11:46:24 +0000 (12:46 +0100)]
FRV: Don't attempt to #include <linux/blk.h> as it doesn't exist

Stop the FRV arch from attempting to #include <linux/blk.h> as it doesn't

Reported-by: Robert P. J. Day <>
Signed-off-by: David Howells <>
Signed-off-by: Linus Torvalds <>
12 years agodriver: dont update dev_name via device_add path
Kay Sievers [Sat, 18 Apr 2009 22:05:45 +0000 (15:05 -0700)]
driver: dont update dev_name via device_add path

notice one system /proc/iomem some entries missed the name for pci_devices

it turns that dev->dev.kobj name is changed after device_add.

for pci code: via acpi_pci_root_driver.ops.add (aka acpi_pci_root_add)
==> pci_acpi_scan_root is used to scan pci bus/device, and at the same
time we read the resource for pci_dev in the pci_read_bases, we have
res->name = pci_name(pci_dev); pci_name is calling dev_name.

later via acpi_pci_root_driver.ops.start (aka acpi_pci_root_start) ==>
pci_bus_add_device to add all pci_dev in kobj tree.  pci_bus_add_device
will call device_add.

actually in device_add

        /* first, register with generic layer. */
        error = kobject_add(&dev->kobj, dev->kobj.parent, "%s", dev_name(dev));
        if (error)
                goto Error;

will get one new name for that kobj, old name is freed.

[Impact: fix corrupted names in /proc/iomem ]

Signed-off-by: Yinghai Lu <>
Signed-off-by: Linus Torvalds <>
12 years agoGFS2: Fix page_mkwrite() return code
Steven Whitehouse [Mon, 20 Apr 2009 08:45:54 +0000 (09:45 +0100)]
GFS2: Fix page_mkwrite() return code

This allows for the possibility of returning VM_FAULT_OOM as
well as VM_FAULT_SIGBUS. This ensures that the correct action
is taken.

Signed-off-by: Steven Whitehouse <>
12 years agoGFS2: Clear dirty bit at end of inode glock sync
Steven Whitehouse [Mon, 20 Apr 2009 07:58:45 +0000 (08:58 +0100)]
GFS2: Clear dirty bit at end of inode glock sync

The dirty bit can get set during the inode glock sync. Its too
complicated to change that at the moment, so this is the quick
fix - to clear the bit again at the end of the function.

Signed-off-by: Steven Whitehouse <>
12 years agomd: support bitmaps on RAID10 arrays larger then 2 terabytes
NeilBrown [Mon, 20 Apr 2009 01:50:24 +0000 (11:50 +1000)]
md: support bitmaps on RAID10 arrays larger then 2 terabytes

.. and other arrays with components larger than 2 terabytes.

We use a "long" rather than a "sector_t" in part of the bitmap
size calculations, which is sad.

Reported-by: "Mario 'BitKoenig' Holbe" <Mario.Holbe@TU-Ilmenau.DE>
Signed-off-by: NeilBrown <>
12 years agoagp: zero pages before sending to userspace
Shaohua Li [Mon, 20 Apr 2009 00:08:35 +0000 (10:08 +1000)]
agp: zero pages before sending to userspace

AGP pages might be mapped into userspace finally, so the pages should be
set to zero before userspace can use it. Otherwise there is potential
information leakage.

Signed-off-by: Shaohua Li <>
Signed-off-by: Dave Airlie <>
12 years agodrm: check for minor master before allowing drop master.
Dave Airlie [Sun, 19 Apr 2009 23:32:50 +0000 (09:32 +1000)]
drm: check for minor master before allowing drop 

When fast user switching a lot eventually we get to the point,
where we were checking for the wrong thing in this function.

Signed-off-by: Dave Airlie <>
12 years agodrm: set/clear is_master when master changed
Jonas Bonn [Thu, 16 Apr 2009 07:00:02 +0000 (09:00 +0200)]
drm: set/clear is_master when master changed

The variable is_master is being used to track the drm_file that is currently
master, so its value needs to be updated accordingly when the master is

Signed-off-by: Jonas Bonn <>
Signed-off-by: Dave Airlie <>
12 years agodrm: clean dirty memory after device release
Ma Ling [Thu, 16 Apr 2009 09:51:25 +0000 (17:51 +0800)]
drm: clean dirty memory after device release

In current code we register/unregister connector object by
drm_sysfs_connector_add/remove function.

However under some cases, we need to dynamically register or unregister device
multiple times, so we have to go through register -> unregister ->register

Because after device_unregister function our memory is dirty, we need to do
clean operation in order to re-register the device, otherwise the system
will crash.  The patch intends to clean device after device release.

Signed-off-by: Ma Ling <>
Signed-off-by: Dave Airlie <>
12 years agodrm: count reaches -1
Roel Kluin [Thu, 16 Apr 2009 20:57:46 +0000 (22:57 +0200)]
drm: count reaches -1

With a postfix decrement in the test count will reach -1 rather than 0,
subsequent tests fail.

Signed-off-by: Roel Kluin <>
Signed-off-by: Dave Airlie <>
12 years agoPM/Suspend: Introduce two new platform callbacks to avoid breakage
Rafael J. Wysocki [Sun, 19 Apr 2009 18:08:42 +0000 (20:08 +0200)]
PM/Suspend: Introduce two new platform callbacks to avoid breakage

Commit 900af0d973856d6feb6fc088c2d0d3fde57707d3 (PM: Change suspend
code ordering) changed the ordering of suspend code in such a way
that the platform .prepare() callback is now executed after the
device drivers' late suspend callbacks have run.  Unfortunately, this
turns out to break ARM platforms that need to talk via I2C to power
control devices during the .prepare() callback.

For this reason introduce two new platform suspend callbacks,
.prepare_late() and .wake(), that will be called just prior to
disabling non-boot CPUs and right after bringing them back on line,
respectively, and use them instead of .prepare() and .finish() for
ACPI suspend.  Make the PM core execute the .prepare() and .finish()
platform suspend callbacks where they were executed previously (that
is, right after calling the regular suspend methods provided by
device drivers and right before executing their regular resume
methods, respectively).

It is not necessary to make analogous changes to the hibernation
code and data structures at the moment, because they are only used
by ACPI platforms.

Signed-off-by: Rafael J. Wysocki <>
Reported-by: Russell King <>
Acked-by: Len Brown <>
12 years agoMerge git://
Linus Torvalds [Sun, 19 Apr 2009 17:58:20 +0000 (10:58 -0700)]
Merge git://git./linux/kernel/git/rusty/linux-2.6-lguest-and-virtio

* git://
  lguest: document 32-bit and PAE requirements
  lguest: tell git to ignore Documentation/lguest/lguest
  virtio: fix suspend when using virtio_balloon
  lguest: fix guest crash on non-linear addresses in gdt pvops
  lguest: fix crash on vmlinux images

12 years agoMerge branch 'for-linus' of git://
Linus Torvalds [Sun, 19 Apr 2009 17:57:38 +0000 (10:57 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://
  ALSA: hda - Set function_id only on FG nodes
  ALSA: emu10k1 - off by 1 in snd_emu10k1_wait()
  ASoC: OMAP: Fix FS polarity in OSK5912 machine driver
  ASoC: OMAP: Fix DSP_B format in OMAP McBSP DAI driver
  ASoC: Fix include build error in s3c2412-i2s.c
  ASoC: Fix s3c-i2s-v2.c snd_soc_dai changes
  ASoC: s3c-i2s-v2.c fix for s3c_i2sv2_iis_calc_rate
  ASoC: Fix jive_wm8750.c build problems
  ASoC: pxa-ssp: allow setting of dai format 0
  ALSA: hda - Add upper-limit of mixer amp for AD1884A-laptop model, too
  ALSA: hda - Fix headphone-detection on some machines with STAC/IDT codecs
  ALSA: Intel8x0: Add hp_only quirk for SSID 0x1028016a (Dell Inspiron 8600)
  ALSA: Intel8x0: Remove conflicting quirk for SSID 0x103c0934
  ALSA: hda_intel.c - Consolidate bitfields

12 years agoMerge git://
Linus Torvalds [Sun, 19 Apr 2009 17:54:06 +0000 (10:54 -0700)]
Merge git://git./linux/kernel/git/sam/kbuild-fixes

* git://
  kbuild: introduce subdir-ccflags-y
  kbuild: support include/generated

12 years agoRevert "console ASCII glyph 1:1 mapping"
Samuel Thibault [Sat, 18 Apr 2009 20:17:17 +0000 (22:17 +0200)]
Revert "console ASCII glyph 1:1 mapping"

This reverts commit 1c55f18717304100a5f624c923f7cb6511b4116d.

Ingo Brueckl was assuming that reverting to 1:1 mapping for chars >= 128
was not useful, but it happens to be: due to the limitations of the
Linux console, when a blind user wants to read BIG5 on it, he has no
other way than loading a font without SFM and let the 1:1 mapping permit
the screen reader to get the BIG5 encoding.

Signed-off-by: Samuel Thibault <>
Signed-off-by: Linus Torvalds <>
12 years ago<linux/seccomp.h> needs to include <linux/errno.h>.
Ralf Baechle [Sat, 18 Apr 2009 09:30:56 +0000 (11:30 +0200)]
<linux/seccomp.h> needs to include <linux/errno.h>.

<linux/seccomp.h> uses EINVAL so should include <linux/errno.h>.  This
fixes a build error on 64-bit MIPS if CONFIG_SECCOMP is disabled.

Signed-off-by: Ralf Baechle <>
Signed-off-by: Linus Torvalds <>