11 years agoSLAB_PANIC more (proc, posix-timers, shmem)
Alexey Dobriyan [Wed, 17 Oct 2007 06:26:10 +0000]
SLAB_PANIC more (proc, posix-timers, shmem)

These aren't modular, so SLAB_PANIC is OK.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agosoftlockup: add a /proc tuning parameter
Ravikiran G Thirumalai [Wed, 17 Oct 2007 06:26:09 +0000]
softlockup: add a /proc tuning parameter

Control the trigger limit for softlockup warnings.  This is useful for
debugging softlockups, by lowering the softlockup_thresh to identify
possible softlockups earlier.

This patch:
1. Adds a sysctl softlockup_thresh with valid values of 1-60s
   (Higher value to disable false positives)
2. Changes the softlockup printk to print the cpu softlockup time

[akpm@linux-foundation.org: Fix various warnings and add definition of "two"]
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agosoftlockup watchdog: style cleanups
Ingo Molnar [Wed, 17 Oct 2007 06:26:08 +0000]
softlockup watchdog: style cleanups

kernel/softirq.c grew a few style uncleanlinesses in the past few
months, clean that up. No functional changes:

   text    data     bss     dec     hex filename
   1126      76       4    1206     4b6 softlockup.o.before
   1129      76       4    1209     4b9 softlockup.o.after

( the 3 bytes .text increase is due to the "<1>" appended to one of
  the printk messages. )

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agosoftlockup: improve debug output
Ingo Molnar [Wed, 17 Oct 2007 06:26:08 +0000]
softlockup: improve debug output

Improve the debuggability of kernel lockups by enhancing the debug
output of the softlockup detector: print the task that causes the lockup
and try to print a more intelligent backtrace.

The old format was:

  BUG: soft lockup detected on CPU#1!
   [<c0105e4a>] show_trace_log_lvl+0x19/0x2e
   [<c0105f43>] show_trace+0x12/0x14
   [<c0105f59>] dump_stack+0x14/0x16
   [<c015f6bc>] softlockup_tick+0xbe/0xd0
   [<c013457d>] run_local_timers+0x12/0x14
   [<c01346b8>] update_process_times+0x3e/0x63
   [<c0145fb8>] tick_sched_timer+0x7c/0xc0
   [<c0140a75>] hrtimer_interrupt+0x135/0x1ba
   [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80
   [<c0105aa3>] apic_timer_interrupt+0x33/0x38
   [<c0104f8a>] syscall_call+0x7/0xb
   =======================

The new format is:

  BUG: soft lockup detected on CPU#1! [prctl:2363]

  Pid: 2363, comm:                prctl
  EIP: 0060:[<c013915f>] CPU: 1
  EIP is at sys_prctl+0x24/0x18c
   EFLAGS: 00000213    Not tainted  (2.6.22-cfs-v20 #26)
  EAX: 00000001 EBX: 000003e7 ECX: 00000001 EDX: f6df0000
  ESI: 000003e7 EDI: 000003e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8
  CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 000006d0
   [<c0105e4a>] show_trace_log_lvl+0x19/0x2e
   [<c0105f43>] show_trace+0x12/0x14
   [<c01040be>] show_regs+0x1ab/0x1b3
   [<c015f807>] softlockup_tick+0xef/0x108
   [<c013457d>] run_local_timers+0x12/0x14
   [<c01346b8>] update_process_times+0x3e/0x63
   [<c0145fcc>] tick_sched_timer+0x7c/0xc0
   [<c0140a89>] hrtimer_interrupt+0x135/0x1ba
   [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80
   [<c0105aa3>] apic_timer_interrupt+0x33/0x38
   [<c0104f8a>] syscall_call+0x7/0xb
   =======================

Note that in the old format we only knew that some system call locked
up, we didnt know _which_. With the new format we know that it's at a
specific place in sys_prctl(). [which was where i created an artificial
kernel lockup to test the new format.]

This is also useful if the lockup happens in user-space - the user-space
EIP (and other registers) will be printed too. (such a lockup would
either suggest that the task was running at SCHED_FIFO:99 and looping
for more than 10 seconds, or that the softlockup detector has a
false-positive.)

The task name is printed too first, just in case we dont manage to print
a useful backtrace.

[satyam@infradead.org: fix warning]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agosoftlockup: make asm/irq_regs.h available on every platform
Ingo Molnar [Wed, 17 Oct 2007 06:26:07 +0000]
softlockup: make asm/irq_regs.h available on every platform

The softlockup detector would like to use get_irq_regs(), so generalize the
availability on every Linux architecture.

(It is fine for an architecture to always return NULL to get_irq_regs(),
which it does by default.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Ian Molton <spyro@f2s.com>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agofix the softlockup watchdog to actually work
Ingo Molnar [Wed, 17 Oct 2007 06:26:06 +0000]
fix the softlockup watchdog to actually work

this Xen related commit:

   commit 966812dc98e6a7fcdf759cbfa0efab77500a8868
   Author: Jeremy Fitzhardinge <jeremy@goop.org>
   Date:   Tue May 8 00:28:02 2007 -0700

       Ignore stolen time in the softlockup watchdog

broke the softlockup watchdog to never report any lockups. (!)

print_timestamp defaults to 0, this makes the following condition
always true:

if (print_timestamp < (touch_timestamp + 1) ||

and we'll in essence never report soft lockups.

apparently the functionality of the soft lockup watchdog was never
actually tested with that patch applied ...

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agosoftlockup: use cpu_clock() instead of sched_clock()
Ingo Molnar [Wed, 17 Oct 2007 06:26:06 +0000]
softlockup: use cpu_clock() instead of sched_clock()

sched_clock() is not a reliable time-source, use cpu_clock() instead.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agoRemove workaround for unimmunized rcu_dereference from mce_log()
Paul E. McKenney [Wed, 17 Oct 2007 06:26:05 +0000]
Remove workaround for unimmunized rcu_dereference from mce_log()

Remove the rmb() from mce_log(), since the immunized version of
rcu_dereference() makes it unnecessary.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agoImmunize rcu_dereference() against crazy compiler writers
Paul E. McKenney [Wed, 17 Oct 2007 06:26:04 +0000]
Immunize rcu_dereference() against crazy compiler writers

Turns out that compiler writers are a bit more aggressive about optimizing
than one might expect.  This patch prevents a number of such optimizations
from messing up rcu_deference().  This is not merely a theoretical problem, as
evidenced by the rmb() in mce_log().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Josh Triplett <josh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agoMake unregister_binfmt() return void
Alexey Dobriyan [Wed, 17 Oct 2007 06:26:04 +0000]
Make unregister_binfmt() return void

list_del() hardly can fail, so checking for return value is pointless
(and current code always return 0).

Nobody really cared that return value anyway.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agoUse list_head in binfmt handling
Alexey Dobriyan [Wed, 17 Oct 2007 06:26:03 +0000]
Use list_head in binfmt handling

Switch single-linked binfmt formats list to usual list_head's.  This leads
to one-liners in register_binfmt() and unregister_binfmt().  The downside
is one pointer more in struct linux_binfmt.  This is not a problem, since
the set of registered binfmts on typical box is very small -- (ELF +
something distro enabled for you).

Test-booted, played with executable .txt files, modprobe/rmmod binfmt_misc.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agofs/reiserfs/: cleanups
Adrian Bunk [Wed, 17 Oct 2007 06:26:03 +0000]
fs/reiserfs/: cleanups

- remove the following no longer used functions:
  - bitmap.c: reiserfs_claim_blocks_to_be_allocated()
  - bitmap.c: reiserfs_release_claimed_blocks()
  - bitmap.c: reiserfs_can_fit_pages()

- make the following functions static:
  - inode.c: restart_transaction()
  - journal.c: reiserfs_async_progress_wait()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Vladimir V. Saveliev <vs@namesys.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agowriteback: don't propagate AOP_WRITEPAGE_ACTIVATE
Andrew Morton [Wed, 17 Oct 2007 06:26:02 +0000]
writeback: don't propagate AOP_WRITEPAGE_ACTIVATE

This is a writeback-internal marker but we're propagating it all the way back
to userspace!.

Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agomm: document tree_lock->zone.lock lockorder
Nick Piggin [Wed, 17 Oct 2007 06:26:02 +0000]
mm: document tree_lock->zone.lock lockorder

zone->lock is quite an "inner" lock and mostly constrained to page alloc as
well, so like slab locks, it probably isn't something that is critically
important to document here.  However unlike slab locks, zone lock could be
used more widely in future, and page_alloc.c might possibly have more
business to do tricky things with pagecache than does slab.  So...  I don't
think it hurts to document it.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agomm: test and set zone reclaim lock before starting reclaim
David Rientjes [Wed, 17 Oct 2007 06:26:01 +0000]
mm: test and set zone reclaim lock before starting reclaim

Introduces new zone flag interface for testing and setting flags:

int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag)

Instead of setting and clearing ZONE_RECLAIM_LOCKED each time shrink_zone() is
called, this flag is test and set before starting zone reclaim.  Zone reclaim
starts in __alloc_pages() when a zone's watermark fails and the system is in
zone_reclaim_mode.  If it's already in reclaim, there's no need to start again
so it is simply considered full for that allocation attempt.

There is a change of behavior with regard to concurrent zone shrinking.  It is
now possible for try_to_free_pages() or kswapd to already be shrinking a
particular zone when __alloc_pages() starts zone reclaim.  In this case, it is
possible for two concurrent threads to invoke shrink_zone() for a single zone.

This change forbids a zone to be in zone reclaim twice, which was always the
behavior, but allows for concurrent try_to_free_pages() or kswapd shrinking
when starting zone reclaim.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agooom: convert zone_scan_lock from mutex to spinlock
David Rientjes [Wed, 17 Oct 2007 06:26:00 +0000]
oom: convert zone_scan_lock from mutex to spinlock

There's no reason to sleep in try_set_zone_oom() or clear_zonelist_oom() if
the lock can't be acquired; it will be available soon enough once the zonelist
scanning is done.  All other threads waiting for the OOM killer are also
contingent on the exiting task being able to acquire the lock in
clear_zonelist_oom() so it doesn't make sense to put it to sleep.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agooom: add header file to Kbuild as unifdef
David Rientjes [Wed, 17 Oct 2007 06:26:00 +0000]
oom: add header file to Kbuild as unifdef

Preprocess include/linux/oom.h before exporting it to userspace.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agooom: prevent including sched.h in header file
David Rientjes [Wed, 17 Oct 2007 06:25:59 +0000]
oom: prevent including sched.h in header file

It's not necessary to include all of linux/sched.h in linux/oom.h.  Instead,
simply include prototypes for the relevant structs and include linux/types.h
for gfp_t.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agooom: do not take callback_mutex
David Rientjes [Wed, 17 Oct 2007 06:25:58 +0000]
oom: do not take callback_mutex

Since no task descriptor's 'cpuset' field is dereferenced in the execution of
the OOM killer anymore, it is no longer necessary to take callback_mutex.

[akpm@linux-foundation.org: restore cpuset_lock for other patches]
Cc: Andrea Arcangeli <andrea@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agooom: compare cpuset mems_allowed instead of exclusive ancestors
David Rientjes [Wed, 17 Oct 2007 06:25:58 +0000]
oom: compare cpuset mems_allowed instead of exclusive ancestors

Instead of testing for overlap in the memory nodes of the the nearest
exclusive ancestor of both current and the candidate task, it is better to
simply test for intersection between the task's mems_allowed in their task
descriptors.  This does not require taking callback_mutex since it is only
used as a hint in the badness scoring.

Tasks that do not have an intersection in their mems_allowed with the current
task are not explicitly restricted from being OOM killed because it is quite
possible that the candidate task has allocated memory there before and has
since changed its mems_allowed.

Cc: Andrea Arcangeli <andrea@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agooom: suppress extraneous stack and memory dump
David Rientjes [Wed, 17 Oct 2007 06:25:57 +0000]
oom: suppress extraneous stack and memory dump

Suppresses the extraneous stack and memory dump when a parallel OOM killing
has been found.  There's no need to fill the ring buffer with this information
if its already been printed and the condition that triggered the previous OOM
killer has not yet been alleviated.

Cc: Andrea Arcangeli <andrea@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agooom: add oom_kill_allocating_task sysctl
David Rientjes [Wed, 17 Oct 2007 06:25:56 +0000]
oom: add oom_kill_allocating_task sysctl

Adds a new sysctl, 'oom_kill_allocating_task', which will automatically kill
the OOM-triggering task instead of scanning through the tasklist to find a
memory-hogging target.  This is helpful for systems with an insanely large
number of tasks where scanning the tasklist significantly degrades
performance.

Cc: Andrea Arcangeli <andrea@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agooom: serialize out of memory calls
David Rientjes [Wed, 17 Oct 2007 06:25:56 +0000]
oom: serialize out of memory calls

A final allocation attempt with a very high watermark needs to be attempted
before invoking out_of_memory().  OOM killer serialization needs to occur
before this final attempt, otherwise tasks attempting to OOM-lock all zones in
its zonelist may spin and acquire the lock unnecessarily after the OOM
condition has already been alleviated.

If the final allocation does succeed, the zonelist is simply OOM-unlocked and
__alloc_pages() returns the page.  Otherwise, the OOM killer is invoked.

If the task cannot acquire OOM-locks on all zones in its zonelist, it is put
to sleep and the allocation is retried when it gets rescheduled.  One of its
zones is already marked as being in the OOM killer so it'll hopefully be
getting some free memory soon, at least enough to satisfy a high watermark
allocation attempt.  This prevents needlessly killing a task when the OOM
condition would have already been alleviated if it had simply been given
enough time.

Cc: Andrea Arcangeli <andrea@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agooom: add per-zone locking
David Rientjes [Wed, 17 Oct 2007 06:25:55 +0000]
oom: add per-zone locking

OOM killer synchronization should be done with zone granularity so that memory
policy and cpuset allocations may have their corresponding zones locked and
allow parallel kills for other OOM conditions that may exist elsewhere in the
system.  DMA allocations can be targeted at the zone level, which would not be
possible if locking was done in nodes or globally.

Synchronization shall be done with a variation of "trylocks." The goal is to
put the current task to sleep and restart the failed allocation attempt later
if the trylock fails.  Otherwise, the OOM killer is invoked.

Each zone in the zonelist that __alloc_pages() was called with is checked for
the newly-introduced ZONE_OOM_LOCKED flag.  If any zone has this flag present,
the "trylock" to serialize the OOM killer fails and returns zero.  Otherwise,
all the zones have ZONE_OOM_LOCKED set and the try_set_zone_oom() function
returns non-zero.

Cc: Andrea Arcangeli <andrea@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agooom: change all_unreclaimable zone member to flags
David Rientjes [Wed, 17 Oct 2007 06:25:54 +0000]
oom: change all_unreclaimable zone member to flags

Convert the int all_unreclaimable member of struct zone to unsigned long
flags.  This can now be used to specify several different zone flags such as
all_unreclaimable and reclaim_in_progress, which can now be removed and
converted to a per-zone flag.

Flags are set and cleared as follows:

zone_set_flag(struct zone *zone, zone_flags_t flag)
zone_clear_flag(struct zone *zone, zone_flags_t flag)

Defines the first zone flags, ZONE_ALL_UNRECLAIMABLE and ZONE_RECLAIM_LOCKED,
which have the same semantics as the old zone->all_unreclaimable and
zone->reclaim_in_progress, respectively.  Also converts all current users that
set or clear either flag to use the new interface.

Helper functions are defined to test the flags:

int zone_is_all_unreclaimable(const struct zone *zone)
int zone_is_reclaim_locked(const struct zone *zone)

All flag operators are of the atomic variety because there are currently
readers that are implemented that do not take zone->lock.

[akpm@linux-foundation.org: add needed include]
Cc: Andrea Arcangeli <andrea@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agooom: move constraints to enum
David Rientjes [Wed, 17 Oct 2007 06:25:53 +0000]
oom: move constraints to enum

The OOM killer's CONSTRAINT definitions are really more appropriate in an
enum, so define them in include/linux/oom.h.

Cc: Andrea Arcangeli <andrea@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agooom: move prototypes to appropriate header file
David Rientjes [Wed, 17 Oct 2007 06:25:53 +0000]
oom: move prototypes to appropriate header file

Move the OOM killer's extern function prototypes to include/linux/oom.h and
include it where necessary.

[clg@fr.ibm.com: build fix]
Cc: Andrea Arcangeli <andrea@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agoSlab API: remove useless ctor parameter and reorder parameters
Christoph Lameter [Wed, 17 Oct 2007 06:25:51 +0000]
Slab API: remove useless ctor parameter and reorder parameters

Slab constructors currently have a flags parameter that is never used.  And
the order of the arguments is opposite to other slab functions.  The object
pointer is placed before the kmem_cache pointer.

Convert

        ctor(void *object, struct kmem_cache *s, unsigned long flags)

to

        ctor(struct kmem_cache *s, void *object)

throughout the kernel

[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agoSLUB: simplify IRQ off handling
Christoph Lameter [Wed, 17 Oct 2007 06:25:51 +0000]
SLUB: simplify IRQ off handling

Move irq handling out of new slab into __slab_alloc.  That is useful for
Mathieu's cmpxchg_local patchset and also allows us to remove the crude
local_irq_off in early_kmem_cache_alloc().

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agomm: dirty balancing for tasks
Peter Zijlstra [Wed, 17 Oct 2007 06:25:50 +0000]
mm: dirty balancing for tasks

Based on ideas of Andrew:
  http://marc.info/?l=linux-kernel&m=102912915020543&w=2

Scale the bdi dirty limit inversly with the tasks dirty rate.
This makes heavy writers have a lower dirty limit than the occasional writer.

Andrea proposed something similar:
  http://lwn.net/Articles/152277/

The main disadvantage to his patch is that he uses an unrelated quantity to
measure time, which leaves him with a workload dependant tunable. Other than
that the two approaches appear quite similar.

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agomm: per device dirty threshold
Peter Zijlstra [Wed, 17 Oct 2007 06:25:50 +0000]
mm: per device dirty threshold

Scale writeback cache per backing device, proportional to its writeout speed.

By decoupling the BDI dirty thresholds a number of problems we currently have
will go away, namely:

 - mutual interference starvation (for any number of BDIs);
 - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts).

It might be that all dirty pages are for a single BDI while other BDIs are
idling. By giving each BDI a 'fair' share of the dirty limit, each one can have
dirty pages outstanding and make progress.

A global threshold also creates a deadlock for stacked BDIs; when A writes to
B, and A generates enough dirty pages to get throttled, B will never start
writeback until the dirty pages go away. Again, by giving each BDI its own
'independent' dirty limit, this problem is avoided.

So the problem is to determine how to distribute the total dirty limit across
the BDIs fairly and efficiently. A DBI that has a large dirty limit but does
not have any dirty pages outstanding is a waste.

What is done is to keep a floating proportion between the DBIs based on
writeback completions. This way faster/more active devices get a larger share
than slower/idle devices.

[akpm@linux-foundation.org: fix warnings]
[hugh@veritas.com: Fix occasional hang when a task couldn't get out of balance_dirty_pages]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agolib: floating proportions
Peter Zijlstra [Wed, 17 Oct 2007 06:25:49 +0000]
lib: floating proportions

Given a set of objects, floating proportions aims to efficiently give the
proportional 'activity' of a single item as compared to the whole set. Where
'activity' is a measure of a temporal property of the items.

It is efficient in that it need not inspect any other items of the set
in order to provide the answer. It is not even needed to know how many
other items there are.

It has one parameter, and that is the period of 'time' over which the
'activity' is measured.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agomm: count writeback pages per BDI
Peter Zijlstra [Wed, 17 Oct 2007 06:25:48 +0000]
mm: count writeback pages per BDI

Count per BDI writeback pages.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agomm: count reclaimable pages per BDI
Peter Zijlstra [Wed, 17 Oct 2007 06:25:47 +0000]
mm: count reclaimable pages per BDI

Count per BDI reclaimable pages; nr_reclaimable = nr_dirty + nr_unstable.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agomm: scalable bdi statistics counters
Peter Zijlstra [Wed, 17 Oct 2007 06:25:47 +0000]
mm: scalable bdi statistics counters

Provide scalable per backing_dev_info statistics counters.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agomm: bdi init hooks
Peter Zijlstra [Wed, 17 Oct 2007 06:25:46 +0000]
mm: bdi init hooks

provide BDI constructor/destructor hooks

[akpm@linux-foundation.org: compile fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agolib: percpu_counter_init_irq
Peter Zijlstra [Wed, 17 Oct 2007 06:25:46 +0000]
lib: percpu_counter_init_irq

provide a way to tell lockdep about percpu_counters that are supposed to be
used from irq safe contexts.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agolib: percpu_counter_init error handling
Peter Zijlstra [Wed, 17 Oct 2007 06:25:45 +0000]
lib: percpu_counter_init error handling

alloc_percpu can fail, propagate that error.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agolib: percpu_count_sum()
Peter Zijlstra [Wed, 17 Oct 2007 06:25:45 +0000]
lib: percpu_count_sum()

Provide an accurate version of percpu_counter_read.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agolib: percpu_counter_sum_positive
Peter Zijlstra [Wed, 17 Oct 2007 06:25:44 +0000]
lib: percpu_counter_sum_positive

 s/percpu_counter_sum/&_positive/

Because its consitent with percpu_counter_read*

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agolib: percpu_counter_set
Peter Zijlstra [Wed, 17 Oct 2007 06:25:44 +0000]
lib: percpu_counter_set

Provide a method to set a percpu counter to a specified value.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agolib: make percpu_counter_add take s64
Peter Zijlstra [Wed, 17 Oct 2007 06:25:43 +0000]
lib: make percpu_counter_add take s64

percpu_counter is a s64 counter, make _add consitent.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agolib: percpu_counter variable batch
Peter Zijlstra [Wed, 17 Oct 2007 06:25:43 +0000]
lib: percpu_counter variable batch

Because the current batch setup has an quadric error bound on the counter,
allow for an alternative setup.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agolib: percpu_counter_sub
Peter Zijlstra [Wed, 17 Oct 2007 06:25:42 +0000]
lib: percpu_counter_sub

Hugh spotted that some code does:
  percpu_counter_add(&counter, -unsignedlong)

which, when the amount argument is of type s32, sort-of works thanks to
two's-complement. However when we'd change the type to s64 this breaks on 32bit
machines, because the promotion rules zero extend the unsigned number.

Provide percpu_counter_sub() to hide the s64 cast. That is:
  percpu_counter_sub(&counter, foo)
is equal to:
  percpu_counter_add(&counter, -(s64)foo);

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agolib: percpu_counter_add
Peter Zijlstra [Wed, 17 Oct 2007 06:25:42 +0000]
lib: percpu_counter_add

 s/percpu_counter_mod/percpu_counter_add/

Because its a better name, _mod implies modulo.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agonfs: remove congestion_end()
Peter Zijlstra [Wed, 17 Oct 2007 06:25:41 +0000]
nfs: remove congestion_end()

These patches aim to improve balance_dirty_pages() and directly address three
issues:
  1) inter device starvation
  2) stacked device deadlocks
  3) inter process starvation

1 and 2 are a direct result from removing the global dirty limit and using
per device dirty limits. By giving each device its own dirty limit is will
no longer starve another device, and the cyclic dependancy on the dirty limit
is broken.

In order to efficiently distribute the dirty limit across the independant
devices a floating proportion is used, this will allocate a share of the total
limit proportional to the device's recent activity.

3 is done by also scaling the dirty limit proportional to the current task's
recent dirty rate.

This patch:

nfs: remove congestion_end().  It's redundant, clear_bdi_congested() already
wakes the waiters.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agopowerpc: add Altivec/VMX state to coredumps
Mark Nelson [Wed, 17 Oct 2007 06:25:40 +0000]
powerpc: add Altivec/VMX state to coredumps

Update dump_task_altivec() (which has so far never been put to use) so that
it dumps the Altivec/VMX registers (VR[0] - VR[31], VSCR and VRSAVE) in the
same format as the ptrace get_vrregs(), and add the appropriate glue
typedef and #defines to make it work.

A new note type of NT_PPC_VMX was chosen to be 0x100 (arbitrarily) because
it allows the low range values to be used for more generic purposes and
0x100 seems an adequate starting point for PowerPC extensions.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agox86: replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE #define
Mark Nelson [Wed, 17 Oct 2007 06:25:39 +0000]
x86: replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE #define

Replace NT_PRXFPREG with ELF_CORE_XFPREG_TYPE in the coredump code which
allows for more flexibility in the note type for the state of 'extended
floating point' implementations in coredumps.  New note types can now be
added with an appropriate #define.

This does #define ELF_CORE_XFPREG_TYPE to be NT_PRXFPREG in all
current users so there's are no change in behaviour.

This will let us use different note types on powerpc for the Altivec/VMX
state that some PowerPC cpus have (G4, PPC970, POWER6) and for the SPE
(signal processing extension) state that some embedded PowerPC cpus from
Freescale have.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agopartially fix up the lookup_one_noperm mess
Christoph Hellwig [Wed, 17 Oct 2007 06:25:38 +0000]
partially fix up the lookup_one_noperm mess

Try to fix the mess created by sysfs braindamage.

 - refactor code internal to fs/namei.c a little to avoid too much
   duplication:
o __lookup_hash_kern is renamed back to __lookup_hash
o the old __lookup_hash goes away, permission checks moves to
  the two callers
o useless inline qualifiers on above functions go away
 - lookup_one_len_kern loses it's last argument and is renamed to
   lookup_one_noperm to make it's useage a little more clear
 - added kerneldoc comments to describe lookup_one_len aswell as
   lookup_one_noperm and make it very clear that no one should use
   the latter ever.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agoMerge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik...
Linus Torvalds [Wed, 17 Oct 2007 02:06:48 +0000]
Merge branch 'upstream-linus' of /linux/kernel/git/jgarzik/netdev-2.6

* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
  WOL bugfix for 3c59x.c
  skge 1.12
  skge: add a debug interface
  skge: eeprom support
  skge: internal stats
  skge: XM PHY handling fixes
  skge: changing MTU while running causes problems
  skge: fix ram buffer size calculation
  gianfar: Fix compile regression caused by 09f75cd7
  net: Fix new EMAC driver for NAPI changes
  bonding: two small fixes for IPoIB support
  e1000e: don't poke PHY registers to retreive link status
  e1000e: fix error checks
  e1000e: Fix debug printk macro
  tokenring/3c359.c: fixed array index problem
  [netdrvr] forcedeth: remove in-driver copy of net_device_stats
  [netdrvr] forcedeth: improved probe info; dev_printk() cleanups
  forcedeth: fix NAPI rx poll function

11 years agomissing include in mmc
Al Viro [Wed, 17 Oct 2007 00:09:07 +0000]
missing include in mmc

AFAICS, fallout from repacing include of blkdev.h with include of bio.h.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agofix adbhid mismerge
Al Viro [Wed, 17 Oct 2007 00:02:46 +0000]
fix adbhid mismerge

This fixes a lost 'key' variable declaration that went missing in a
mismerge (commit b981d8b3f5e008ff10d993be633ad00564fc22cd)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agoWOL bugfix for 3c59x.c
Steffen Klassert [Tue, 16 Oct 2007 21:24:09 +0000]
WOL bugfix for 3c59x.c

Some NICs (3c905B) can not generate PME in power state PCI_D0, while others
like 3c905C can.  Call pci_enable_wake() with PCI_D3hot should give proper
WOL for 3c905B.

Signed-off-by: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
Tested-by: Harry Coin <hcoin@n4comm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agoskge 1.12
Stephen Hemminger [Tue, 16 Oct 2007 19:15:55 +0000]
skge 1.12

version update

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agoskge: add a debug interface
Stephen Hemminger [Tue, 16 Oct 2007 19:15:54 +0000]
skge: add a debug interface

Add a debugfs interface to look at internal ring state.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agoskge: eeprom support
Stephen Hemminger [Tue, 16 Oct 2007 19:15:53 +0000]
skge: eeprom support

Add ability to read/write EEPROM

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agoskge: internal stats
Stephen Hemminger [Tue, 16 Oct 2007 19:15:52 +0000]
skge: internal stats

Use internal stats structure

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agoskge: XM PHY handling fixes
Stephen Hemminger [Tue, 16 Oct 2007 19:15:51 +0000]
skge: XM PHY handling fixes

Change how PHY is managed on SysKonnect fibre based boards.
Poll for PHY coming up 1 per second, but use interrupt to detect loss.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agoskge: changing MTU while running causes problems
Stephen Hemminger [Tue, 16 Oct 2007 19:15:50 +0000]
skge: changing MTU while running causes problems

Rather than bring network down/up when changing MTU,
only need to impact receiver.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agoskge: fix ram buffer size calculation
Stephen Hemminger [Tue, 16 Oct 2007 19:15:49 +0000]
skge: fix ram buffer size calculation

This fixes problems with transmit hangs on older fiber based SysKonnect boards.

Adjust ram buffer sizing calculation to make it correct on all boards
and make it like the code in sky2 driver.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agogianfar: Fix compile regression caused by 09f75cd7
Li Yang [Tue, 16 Oct 2007 06:18:13 +0000]
gianfar: Fix compile regression caused by 09f75cd7

Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agonet: Fix new EMAC driver for NAPI changes
Benjamin Herrenschmidt [Tue, 16 Oct 2007 05:40:50 +0000]
net: Fix new EMAC driver for NAPI changes

net: Fix new EMAC driver for NAPI changes

This fixes the new EMAC driver for the NAPI updates. The previous patch
by Roland Dreier (already applied) to do that doesn't actually work. This
applies on top of it makes it work on my test Ebony machine.

This patch depends on "net: Add __napi_sycnhronize() to sync with napi poll"
posted previously.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agobonding: two small fixes for IPoIB support
Jay Vosburgh [Mon, 15 Oct 2007 23:44:27 +0000]
bonding: two small fixes for IPoIB support

Two small fixes to IPoIB support for bonding:

1- copy header_ops from slave to bonding for IPoIB slaves
2- move release and destroy logic to UNREGISTER from GOING_DOWN
   notifier to avoid double release

Set bonding to version 3.2.1.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agoe1000e: don't poke PHY registers to retreive link status
Auke Kok [Mon, 15 Oct 2007 21:30:59 +0000]
e1000e: don't poke PHY registers to retreive link status

Apparently poking the link status registers when autonegotiation
is running on the PHY might botch the PHY link on 80003es2lan
devices. While this is a very rare condition we can completely
avoid it alltogether by just using the MAC link bits to provide
the proper information to ethtool.

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agoe1000e: fix error checks
Adrian Bunk [Mon, 15 Oct 2007 21:02:21 +0000]
e1000e: fix error checks

Spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agoe1000e: Fix debug printk macro
Auke Kok [Mon, 15 Oct 2007 21:02:13 +0000]
e1000e: Fix debug printk macro

Spotted by Joe Perches.

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years agotokenring/3c359.c: fixed array index problem
Marcus Meissner [Sat, 13 Oct 2007 08:19:37 +0000]
tokenring/3c359.c: fixed array index problem

The xl_laa array is just 6 bytes long, so we should substract
10 from the index, like is also done some lines above already.

Signed-Off-By: Marcus Meissner <meissner@suse.de>

Signed-off-by: Jeff Garzik <jeff@garzik.org>

11 years ago[netdrvr] forcedeth: remove in-driver copy of net_device_stats
Jeff Garzik [Wed, 17 Oct 2007 00:56:09 +0000]
[netdrvr] forcedeth: remove in-driver copy of net_device_stats

A copy of struct net_device_stats now lives in struct net_device,
making in-driver copies a waste of memory.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

11 years ago[netdrvr] forcedeth: improved probe info; dev_printk() cleanups
Jeff Garzik [Tue, 16 Oct 2007 08:09:09 +0000]
[netdrvr] forcedeth: improved probe info; dev_printk() cleanups

main change:
* greatly improve per-NIC probe diagnostic output.  Similar to other
  net drivers, print out MAC address, PHY info, and various hardware and
  software flags that may be relevant.

other changes:
* similar to other net drivers, only print the initial version message
  when we have found at least one board.

* don't bother to print error message when pci_enable_device() fails,
  it will do so for us.

* use dev_printk() rather than printk() in nv_probe().  This gives
  use a standardized output similar to the rest of the kernel, and
  eliminates the need to manually print out PCI bus id.

* use DRV_NAME constant where appropriate

* clean struct pci_driver indentation

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

11 years agoforcedeth: fix NAPI rx poll function
Ingo Molnar [Wed, 17 Oct 2007 00:44:59 +0000]
forcedeth: fix NAPI rx poll function

fix the forcedeth NAPI poll function to not emit this warning:

[  186.635916] WARNING: at net/core/dev.c:2166 net_rx_action()
[  186.641351]  [<c060d9f5>] net_rx_action+0x145/0x1b0
[  186.646191]  [<c011d752>] __do_softirq+0x42/0x90
[  186.650784]  [<c011d7c6>] do_softirq+0x26/0x30
[  186.655202]  [<c011db48>] local_bh_enable+0x48/0xa0
[  186.660055]  [<c06023e0>] lock_sock_nested+0xa0/0xc0
[  186.664995]  [<c065da16>] tcp_recvmsg+0x16/0xbc0
[  186.669588]  [<c013e94b>] __generic_file_aio_write_nolock+0x27b/0x520
[  186.676001]  [<c0601d75>] sock_common_recvmsg+0x45/0x70
[  186.681202]  [<c05ff5df>] sock_aio_read+0x11f/0x140
[  186.686054]  [<c015c086>] do_sync_read+0xc6/0x110
[  186.690735]  [<c012b9b0>] autoremove_wake_function+0x0/0x40
[  186.696280]  [<c060dcfc>] net_tx_action+0x3c/0xe0
[  186.700961]  [<c015c9c2>] vfs_read+0x132/0x140
[  186.705378]  [<c015cd41>] sys_read+0x41/0x70
[  186.709625]  [<c0102b66>] sysenter_past_esp+0x5f/0x89
[  186.714651]  =======================

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

11 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6
Linus Torvalds [Tue, 16 Oct 2007 23:56:35 +0000]
Merge /pub/scm/linux/kernel/git/bart/ide-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6: (33 commits)
  amd74xx: remove /proc/ide/amd74xx
  amd74xx/via82cxxx: don't initialize drive->dn
  sis5513: remove /proc/ide/sis
  ide: remove CONFIG_IDEDMA_ONLYDISK
  ide: add "hdx=nodma" kernel parameter
  ide: remove hwif->autodma and drive->autodma
  ide: remove "idex=dma" kernel parameter
  ide: remove CONFIG_BLK_DEV_IDEDMA_FORCED
  ide: use PCI_VDEVICE() macro
  sis5513: clear prefetch and postwrite for ATAPI devices
  it8213/piix/slc90e66: "de-couple" PIO and UDMA modes
  ide: unexport noautodma
  ide: unexport ide_tune_dma
  ide: remove ->ide_dma_check (take 2)
  ide-pmac: add PIO autotune fallback to ->ide_dma_check
  ide-cris: add PIO autotune fallback to ->ide_dma_check
  sl82c105: add PIO autotune fallback to ->ide_dma_check
  cs5530/sc1200: add PIO autotune fallback to ->ide_dma_check
  ide: remove ide_use_fast_pio()
  ide: remove drive->init_speed zeroing
  ...

11 years agofix cirrusfb breakage
Al Viro [Tue, 16 Oct 2007 23:27:18 +0000]
fix cirrusfb breakage

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris...
Linus Torvalds [Tue, 16 Oct 2007 23:53:20 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/selinux-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
  SELinux: kills warnings in Improve SELinux performance when AVC misses
  SELinux: improve performance when AVC misses.
  SELinux: policy selectable handling of unknown classes and perms
  SELinux: Improve read/write performance
  SELinux: tune avtab to reduce memory usage

11 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394...
Linus Torvalds [Tue, 16 Oct 2007 23:52:21 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/ieee1394/linux1394-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (25 commits)
  firewire: fw-cdev: reorder wakeup vs. spinlock
  firewire: in-code doc updates.
  firewire: a header cleanup
  firewire: adopt read cycle timer ABI from raw1394
  firewire: fw-ohci: check for misconfigured bus (phyID == 63)
  firewire: fw-ohci: missing dma_unmap_single
  firewire: fw-ohci: log posted write errors
  firewire: fw-ohci: reorder includes
  firewire: fw-ohci: fix includes
  firewire: fw-ohci: enforce read order for selfID generation
  firewire: fw-sbp2: use an own workqueue (fix system responsiveness)
  firewire: fw-sbp2: expose module parameter for workarounds
  firewire: fw-sbp2: add support for multiple logical units per target
  firewire: fw-sbp2: always enable IRQs before calling command ORB callback
  firewire: fw-core: local variable shadows a global one
  firewire: optimize fw_core_add_address_handler
  ieee1394: ieee1394_core.c: use DEFINE_SPINLOCK for spinlock definition
  ieee1394: csr1212: proper refcounting
  ieee1394: nodemgr: fix leak of struct csr1212_keyval
  ieee1394: pcilynx: I2C cleanups
  ...

11 years agoMerge ssh://master.kernel.org/pub/scm/linux/kernel/git/sam/kbuild
Linus Torvalds [Tue, 16 Oct 2007 23:51:11 +0000]
Merge ssh:///pub/scm/linux/kernel/git/sam/kbuild

* ssh://master.kernel.org/pub/scm/linux/kernel/git/sam/kbuild:
  x86: fix boot error introduced by kbuild

11 years agoMerge branch 'release' of ssh://master.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
Linus Torvalds [Tue, 16 Oct 2007 23:48:59 +0000]
Merge branch 'release' of ssh:///linux/kernel/git/aegl/linux-2.6

* 'release' of ssh://master.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Fix build for CONFIG_SMP=n

11 years agoSELinux: kills warnings in Improve SELinux performance when AVC misses
KaiGai Kohei [Wed, 3 Oct 2007 14:42:56 +0000]
SELinux: kills warnings in Improve SELinux performance when AVC misses

This patch kills ugly warnings when the "Improve SELinux performance
when ACV misses" patch.

Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: James Morris <jmorris@namei.org>

11 years agoSELinux: improve performance when AVC misses.
KaiGai Kohei [Fri, 28 Sep 2007 17:20:55 +0000]
SELinux: improve performance when AVC misses.

* We add ebitmap_for_each_positive_bit() which enables to walk on
  any positive bit on the given ebitmap, to improve its performance
  using common bit-operations defined in linux/bitops.h.
  In the previous version, this logic was implemented using a combination
  of ebitmap_for_each_bit() and ebitmap_node_get_bit(), but is was worse
  in performance aspect.
  This logic is most frequestly used to compute a new AVC entry,
  so this patch can improve SELinux performance when AVC misses are happen.
* struct ebitmap_node is redefined as an array of "unsigned long", to get
  suitable for using find_next_bit() which is fasted than iteration of
  shift and logical operation, and to maximize memory usage allocated
  from general purpose slab.
* Any ebitmap_for_each_bit() are repleced by the new implementation
  in ss/service.c and ss/mls.c. Some of related implementation are
  changed, however, there is no incompatibility with the previous
  version.
* The width of any new line are less or equal than 80-chars.

The following benchmark shows the effect of this patch, when we
access many files which have different security context one after
another. The number is more than /selinux/avc/cache_threshold, so
any access always causes AVC misses.

      selinux-2.6      selinux-2.6-ebitmap
AVG:   22.763 [s]          8.750 [s]
STD:    0.265              0.019
------------------------------------------
1st:   22.558 [s]          8.786 [s]
2nd:   22.458 [s]          8.750 [s]
3rd:   22.478 [s]          8.754 [s]
4th:   22.724 [s]          8.745 [s]
5th:   22.918 [s]          8.748 [s]
6th:   22.905 [s]          8.764 [s]
7th:   23.238 [s]          8.726 [s]
8th:   22.822 [s]          8.729 [s]

Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>

11 years agoSELinux: policy selectable handling of unknown classes and perms
Eric Paris [Fri, 21 Sep 2007 18:37:10 +0000]
SELinux: policy selectable handling of unknown classes and perms

Allow policy to select, in much the same way as it selects MLS support, how
the kernel should handle access decisions which contain either unknown
classes or unknown permissions in known classes.  The three choices for the
policy flags are

0 - Deny unknown security access. (default)
2 - reject loading policy if it does not contain all definitions
4 - allow unknown security access

The policy's choice is exported through 2 booleans in
selinuxfs.  /selinux/deny_unknown and /selinux/reject_unknown.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>

11 years agoSELinux: Improve read/write performance
Yuichi Nakamura [Fri, 14 Sep 2007 00:27:07 +0000]
SELinux: Improve read/write performance

It reduces the selinux overhead on read/write by only revalidating
permissions in selinux_file_permission if the task or inode labels have
changed or the policy has changed since the open-time check.  A new LSM
hook, security_dentry_open, is added to capture the necessary state at open
time to allow this optimization.

(see http://marc.info/?l=selinux&m=118972995207740&w=2)

Signed-off-by: Yuichi Nakamura<ynakam@hitachisoft.jp>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>

11 years agoSELinux: tune avtab to reduce memory usage
Yuichi Nakamura [Fri, 24 Aug 2007 02:55:11 +0000]
SELinux: tune avtab to reduce memory usage

This patch reduces memory usage of SELinux by tuning avtab. Number of hash
slots in avtab was 32768. Unused slots used memory when number of rules is
fewer. This patch decides number of hash slots dynamically based on number
of rules. (chain length)^2 is also printed out in avtab_hash_eval to see
standard deviation of avtab hash table.

Signed-off-by: Yuichi Nakamura<ynakam@hitachisoft.jp>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>

11 years agofirewire: fw-cdev: reorder wakeup vs. spinlock
Jay Fenlason [Mon, 8 Oct 2007 21:00:29 +0000]
firewire: fw-cdev: reorder wakeup vs. spinlock

Signed-off-by: Jay Fenlason <fenlason@redhat.com>

Prompted by https://bugzilla.redhat.com/show_bug.cgi?id=323411

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

11 years agofirewire: in-code doc updates.
Yann Dirson [Sun, 7 Oct 2007 00:21:29 +0000]
firewire: in-code doc updates.

Signed-off-by: Yann Dirson <ydirson@altern.org>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (update)

11 years agofirewire: a header cleanup
Stefan Richter [Sun, 7 Oct 2007 00:10:11 +0000]
firewire: a header cleanup

fw_node() is not used (and not useful) outside fw-topology.c.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

11 years agofirewire: adopt read cycle timer ABI from raw1394
Stefan Richter [Sat, 29 Sep 2007 08:41:58 +0000]
firewire: adopt read cycle timer ABI from raw1394

This duplicates the read cycle timer feature of raw1394 (added in Linux
2.6.21) in firewire-core's userspace ABI.  The argument to the ioctl is
reordered though to ensure 32/64 bit compatibility.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>

11 years agofirewire: fw-ohci: check for misconfigured bus (phyID == 63)
Stefan Richter [Wed, 29 Aug 2007 22:11:40 +0000]
firewire: fw-ohci: check for misconfigured bus (phyID == 63)

Check NodeID.nodeNumber as per OHCI 1.1 clause 7.2.3.2.  See also IEEE
1394a table 5B-1.

Also, demote the "node ID not valid" message from error to notification
as it is not an error condition.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

11 years agofirewire: fw-ohci: missing dma_unmap_single
Stefan Richter [Wed, 29 Aug 2007 17:40:28 +0000]
firewire: fw-ohci: missing dma_unmap_single

at_context_queue_packet() didn't clean up in an early exit path.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>

11 years agofirewire: fw-ohci: log posted write errors
Stefan Richter [Mon, 20 Aug 2007 19:58:30 +0000]
firewire: fw-ohci: log posted write errors

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

11 years agofirewire: fw-ohci: reorder includes
Stefan Richter [Mon, 20 Aug 2007 19:41:22 +0000]
firewire: fw-ohci: reorder includes

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

11 years agofirewire: fw-ohci: fix includes
Stefan Richter [Mon, 20 Aug 2007 19:40:30 +0000]
firewire: fw-ohci: fix includes

Add used includes, remove unused includes.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

11 years agofirewire: fw-ohci: enforce read order for selfID generation
Stefan Richter [Sat, 25 Aug 2007 12:08:19 +0000]
firewire: fw-ohci: enforce read order for selfID generation

It seems unlikely, but access to self_id_cpu[0] could at least in theory
be deferred until after the loop over self_id_cpu[1..n] or even after
the subsequent reg_read.  Enforce the desired order by a read barrier.

Also prevent the reg_read from being reordered relative to the for loop.
This isn't necessary if the loop's conditional printk counts as an
implicit barrier, but better make it explicit.

(self_id_cpu[] is a coherent DMA buffer.)

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

11 years agofirewire: fw-sbp2: use an own workqueue (fix system responsiveness)
Stefan Richter [Sun, 12 Aug 2007 10:51:18 +0000]
firewire: fw-sbp2: use an own workqueue (fix system responsiveness)

Firewire-sbp2 did very uncooperative things in the kernel's shared
workqueue:  Sleeping until reception of management status from the
target for up to 2 seconds, and performing SCSI inquiry and all of the
setup of SCSI command set drivers via scsi_add_device.  If there were
transient or permanent error conditions, this caused long blockage of
the kernel's events process, noticeable e.g. by blocked keyboard input.

We now allocate a workqueue process exclusive to fw-sbp2.  As a side
effect, this also increases parallelism of fw-sbp2's login and reconnect
work versus fw-core's device discovery and device update work which is
performed in the shared workqueue.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>

11 years agofirewire: fw-sbp2: expose module parameter for workarounds
Stefan Richter [Mon, 13 Aug 2007 15:48:25 +0000]
firewire: fw-sbp2: expose module parameter for workarounds

On rare occasions, the ability to set one of the workaround flags at
runtime may save the day.

People who experience I/O errors with firewire-sbp2 while the old sbp2
driver worked for them should try workarounds=1 and report to the devel
mailinglist whether that improves things.  Firewire-sbp2 defaults to the
SCSI stack's maximum transfer size per command, while sbp2 limits them
to 128 kBytes.  Flag 1 accomplishes just that.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

11 years agofirewire: fw-sbp2: add support for multiple logical units per target
Stefan Richter [Sat, 25 Aug 2007 12:05:28 +0000]
firewire: fw-sbp2: add support for multiple logical units per target

Fixes "New firewire stack only recognizing half of a chain of drives",
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=242254

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

11 years agofirewire: fw-sbp2: always enable IRQs before calling command ORB callback
Stefan Richter [Sat, 25 Aug 2007 08:40:42 +0000]
firewire: fw-sbp2: always enable IRQs before calling command ORB callback

On IOMMU-less noncoherent architectures, orb->callback will memcpy the
whole SCSI command buffer for READ-like SCSI commands.  It is therefore
friendlier to enable IRQs before the call, like before patch "Add
ref-counting for sbp2 orbs".

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Kristian Høgsberg <krh@redhat.com>

11 years agofirewire: fw-core: local variable shadows a global one
Stefan Richter [Fri, 3 Aug 2007 18:56:31 +0000]
firewire: fw-core: local variable shadows a global one

Sparse warned about it although it was apparently harmless:

drivers/firewire/fw-cdev.c:624:23: warning: symbol 'interrupt' shadows an earlier one
include/asm/hw_irq.h:29:13: originally declared here

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

11 years agofirewire: optimize fw_core_add_address_handler
Stefan Richter [Tue, 17 Jul 2007 00:10:16 +0000]
firewire: optimize fw_core_add_address_handler

Potentially avoids unnecessary loop runs.
Guarantee quadlet-aligned starts of address regions.
Document the return values.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

11 years agoieee1394: ieee1394_core.c: use DEFINE_SPINLOCK for spinlock definition
Matthias Kaehlcke [Fri, 12 Oct 2007 17:57:23 +0000]
ieee1394: ieee1394_core.c: use DEFINE_SPINLOCK for spinlock definition

drivers/ieee1394/ieee1394_core.c: Define spinlock using
DEFINE_SPINLOCK instead of assignment to SPIN_LOCK_UNLOCKED

Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

11 years agoieee1394: csr1212: proper refcounting
Stefan Richter [Sat, 15 Sep 2007 12:50:25 +0000]
ieee1394: csr1212: proper refcounting

At least since nodemgr got rid of coarse global locking, accesses to
struct csr1212_keyval's reference counter should be atomic and coupled
with proper barriers.  Also, calls to csr1212_keep_keyval(kv) should
occur before kv is being used.

(We probably should convert refcnt to struct kref, but how to keep
csr1212_destroy_keyval's implementation non-recursively then?)

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>

11 years agoieee1394: nodemgr: fix leak of struct csr1212_keyval
Stefan Richter [Sat, 15 Sep 2007 12:45:53 +0000]
ieee1394: nodemgr: fix leak of struct csr1212_keyval

csr1212_keep_keyval(kv) in nodemgr_process_root_directory was
unbalanced if ne->vendor_name_kv already exists.  This happens for
example if eth1394 or raw1394 modify the local config ROM and it is
parsed again.

As a bonus, the attempt to add the vendor_name_kv sysfs attribute
when it already exists is now fixed for good.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>