7 years agoaudit: allow interfield comparison in audit rules
Eric Paris [Tue, 3 Jan 2012 19:23:08 +0000]
audit: allow interfield comparison in audit rules

We wish to be able to audit when a uid=500 task accesses a file which is
uid=0.  Or vice versa.  This patch introduces a new audit filter type
AUDIT_FIELD_COMPARE which takes as an 'enum' which indicates which fields
should be compared.  At this point we only define the task->uid vs
inode->uid, but other comparisons can be added.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoKernel: Audit Support For The ARM Platform
Nathaniel Husted [Tue, 3 Jan 2012 19:23:09 +0000]
Kernel: Audit Support For The ARM Platform

This patch provides functionality to audit system call events on the
ARM platform. The implementation was based off the structure of the
MIPS platform and information in this
(http://lists.fedoraproject.org/pipermail/arm/2009-October/000382.html)
mailing list thread. The required audit_syscall_exit and
audit_syscall_entry checks were added to ptrace using the standard
registers for system call values (r0 through r3). A thread information
flag was added for auditing (TIF_SYSCALL_AUDIT) and a meta-flag was
added (_TIF_SYSCALL_WORK) to simplify modifications to the syscall
entry/exit. Now, if either the TRACE flag is set or the AUDIT flag is
set, the syscall_trace function will be executed. The prober changes
were made to Kconfig to allow CONFIG_AUDITSYSCALL to be enabled.

Due to platform availability limitations, this patch was only tested
on the Android platform running the modified "android-goldfish-2.6.29"
kernel. A test compile was performed using Code Sourcery's
cross-compilation toolset and the current linux-3.0 stable kernel. The
changes compile without error. I'm hoping, due to the simple modifications,
the patch is "obviously correct".

Signed-off-by: Nathaniel Husted <nhusted@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: do not call audit_getname on error
Eric Paris [Tue, 3 Jan 2012 19:23:08 +0000]
audit: do not call audit_getname on error

Just a code cleanup really.  We don't need to make a function call just for
it to return on error.  This also makes the VFS function even easier to follow
and removes a conditional on a hot path.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: only allow tasks to set their loginuid if it is -1
Eric Paris [Tue, 3 Jan 2012 19:23:08 +0000]
audit: only allow tasks to set their loginuid if it is -1

At the moment we allow tasks to set their loginuid if they have
CAP_AUDIT_CONTROL.  In reality we want tasks to set the loginuid when they
log in and it be impossible to ever reset.  We had to make it mutable even
after it was once set (with the CAP) because on update and admin might have
to restart sshd.  Now sshd would get his loginuid and the next user which
logged in using ssh would not be able to set his loginuid.

Systemd has changed how userspace works and allowed us to make the kernel
work the way it should.  With systemd users (even admins) are not supposed
to restart services directly.  The system will restart the service for
them.  Thus since systemd is going to loginuid==-1, sshd would get -1, and
sshd would be allowed to set a new loginuid without special permissions.

If an admin in this system were to manually start an sshd he is inserting
himself into the system chain of trust and thus, logically, it's his
loginuid that should be used!  Since we have old systems I make this a
Kconfig option.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: remove task argument to audit_set_loginuid
Eric Paris [Tue, 3 Jan 2012 19:23:08 +0000]
audit: remove task argument to audit_set_loginuid

The function always deals with current.  Don't expose an option
pretending one can use it for something.  You can't.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: allow audit matching on inode gid
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000]
audit: allow audit matching on inode gid

Much like the ability to filter audit on the uid of an inode collected, we
should be able to filter on the gid of the inode.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: allow matching on obj_uid
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000]
audit: allow matching on obj_uid

Allow syscall exit filter matching based on the uid of the owner of an
inode used in a syscall.  aka:

auditctl -a always,exit -S open -F obj_uid=0 -F perm=wa

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: remove audit_finish_fork as it can't be called
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000]
audit: remove audit_finish_fork as it can't be called

Audit entry,always rules are not allowed and are automatically changed in
exit,always rules in userspace.  The kernel refuses to load such rules.

Thus a task in the middle of a syscall (and thus in audit_finish_fork())
can only be in one of two states: AUDIT_BUILD_CONTEXT or AUDIT_DISABLED.
Since the current task cannot be in AUDIT_RECORD_CONTEXT we aren't every
going to actually use the code in audit_finish_fork() since it will
return without doing anything.  Thus drop the code.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: reject entry,always rules
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000]
audit: reject entry,always rules

We deprecated entry,always rules a long time ago.  Reject those rules as
invalid.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: inline audit_free to simplify the look of generic code
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000]
audit: inline audit_free to simplify the look of generic code

make the conditional a static inline instead of doing it in generic code.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: drop audit_set_macxattr as it doesn't do anything
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000]
audit: drop audit_set_macxattr as it doesn't do anything

unused.  deleted.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: inline checks for not needing to collect aux records
Eric Paris [Tue, 3 Jan 2012 19:23:07 +0000]
audit: inline checks for not needing to collect aux records

A number of audit hooks make function calls before they determine that
auxilary records do not need to be collected.  Do those checks as static
inlines since the most common case is going to be that records are not
needed and we can skip the function call overhead.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: drop some potentially inadvisable likely notations
Eric Paris [Tue, 3 Jan 2012 19:23:06 +0000]
audit: drop some potentially inadvisable likely notations

The audit code makes heavy use of likely() and unlikely() macros, but they
don't always make sense.  Drop any that seem questionable and let the
computer do it's thing.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: remove AUDIT_SETUP_CONTEXT as it isn't used
Eric Paris [Tue, 3 Jan 2012 19:23:06 +0000]
audit: remove AUDIT_SETUP_CONTEXT as it isn't used

Audit contexts have 3 states.  Disabled, which doesn't collect anything,
build, which collects info but might not emit it, and record, which
collects and emits.  There is a 4th state, setup, which isn't used.  Get
rid of it.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: inline audit_syscall_entry to reduce burden on archs
Eric Paris [Tue, 3 Jan 2012 19:23:06 +0000]
audit: inline audit_syscall_entry to reduce burden on archs

Every arch calls:

if (unlikely(current->audit_context))
audit_syscall_entry()

which requires knowledge about audit (the existance of audit_context) in
the arch code.  Just do it all in static inline in audit.h so that arch's
can remain blissfully ignorant.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: ia32entry.S sign extend error codes when calling 64 bit code
Eric Paris [Tue, 3 Jan 2012 19:23:06 +0000]
audit: ia32entry.S sign extend error codes when calling 64 bit code

In the ia32entry syscall exit audit fastpath we have assembly code which calls
__audit_syscall_exit directly.  This code was, however, zeroes the upper 32
bits of the return code.  It then proceeded to call code which expects longs
to be 64bits long.  In order to handle code which expects longs to be 64bit we
sign extend the return code if that code is an error.  Thus the
__audit_syscall_exit function can correctly handle using the values in
snprintf("%ld").  This fixes the regression introduced in 5cbf1565f29eb57a86a.

Old record:
type=SYSCALL msg=audit(1306197182.256:281): arch=40000003 syscall=192 success=no exit=4294967283
New record:
type=SYSCALL msg=audit(1306197182.256:281): arch=40000003 syscall=192 success=no exit=-13

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>

7 years agoAudit: push audit success and retcode into arch ptrace.h
Eric Paris [Tue, 3 Jan 2012 19:23:06 +0000]
Audit: push audit success and retcode into arch ptrace.h

The audit system previously expected arches calling to audit_syscall_exit to
supply as arguments if the syscall was a success and what the return code was.
Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
by converting from negative retcodes to an audit internal magic value stating
success or failure.  This helper was wrong and could indicate that a valid
pointer returned to userspace was a failed syscall.  The fix is to fix the
layering foolishness.  We now pass audit_syscall_exit a struct pt_reg and it
in turns calls back into arch code to collect the return value and to
determine if the syscall was a success or failure.  We also define a generic
is_syscall_success() macro which determines success/failure based on if the
value is < -MAX_ERRNO.  This works for arches like x86 which do not use a
separate mechanism to indicate syscall failure.

We make both the is_syscall_success() and regs_return_value() static inlines
instead of macros.  The reason is because the audit function must take a void*
for the regs.  (uml calls theirs struct uml_pt_regs instead of just struct
pt_regs so audit_syscall_exit can't take a struct pt_regs).  Since the audit
function takes a void* we need to use static inlines to cast it back to the
arch correct structure to dereference it.

The other major change is that on some arches, like ia64, MIPS and ppc, we
change regs_return_value() to give us the negative value on syscall failure.
THE only other user of this macro, kretprobe_example.c, won't notice and it
makes the value signed consistently for the audit functions across all archs.

In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
audit code as the return value.  But the ptrace_64.h code defined the macro
regs_return_value() as regs[3].  I have no idea which one is correct, but this
patch now uses the regs_return_value() function, so it now uses regs[3].

For powerpc we previously used regs->result but now use the
regs_return_value() function which uses regs->gprs[3].  regs->gprs[3] is
always positive so the regs_return_value(), much like ia64 makes it negative
before calling the audit code when appropriate.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion]
Acked-by: Tony Luck <tony.luck@intel.com> [for ia64]
Acked-by: Richard Weinberger <richard@nod.at> [for uml]
Acked-by: David S. Miller <davem@davemloft.net> [for sparc]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]

7 years agoseccomp: audit abnormal end to a process due to seccomp
Eric Paris [Tue, 3 Jan 2012 19:23:05 +0000]
seccomp: audit abnormal end to a process due to seccomp

The audit system likes to collect information about processes that end
abnormally (SIGSEGV) as this may me useful intrusion detection information.
This patch adds audit support to collect information when seccomp forces a
task to exit because of misbehavior in a similar way.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: check current inode and containing object when filtering on major and minor
Eric Paris [Tue, 3 Jan 2012 19:23:05 +0000]
audit: check current inode and containing object when filtering on major and minor

The audit system has the ability to filter on the major and minor number of
the device containing the inode being operated upon.  Lets say that
/dev/sda1 has major,minor 8,1 and that we mount /dev/sda1 on /boot.  Now lets
say we add a watch with a filter on 8,1.  If we proceed to open an inode
inside /boot, such as /vboot/vmlinuz, we will match the major,minor filter.

Lets instead assume that one were to use a tool like debugfs and were to
open /dev/sda1 directly and to modify it's contents.  We might hope that
this would also be logged, but it isn't.  The rules will check the
major,minor of the device containing /dev/sda1.  In other words the rule
would match on the major/minor of the tmpfs mounted at /dev.

I believe these rules should trigger on either device.  The man page is
devoid of useful information about the intended semantics.  It only seems
logical that if you want to know everything that happened on a major,minor
that would include things that happened to the device itself...

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: drop the meaningless and format breaking word 'user'
Eric Paris [Tue, 3 Jan 2012 19:23:05 +0000]
audit: drop the meaningless and format breaking word 'user'

userspace audit messages look like so:

type=USER msg=audit(1271170549.415:24710): user pid=14722 uid=0 auid=500 ses=1 subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 msg=''

That third field just says 'user'.  That's useless and doesn't follow the
key=value pair we are trying to enforce.  We already know it came from the
user based on the record type.  Kill that word.  Die.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: dynamically allocate audit_names when not enough space is in the names array
Eric Paris [Tue, 3 Jan 2012 19:23:05 +0000]
audit: dynamically allocate audit_names when not enough space is in the names array

This patch does 2 things.  First it reduces the number of audit_names
allocated in every audit context from 20 to 5.  5 should be enough for all
'normal' syscalls (rename being the worst).  Some syscalls can still touch
more the 5 inodes such as mount.  When rpc filesystem is mounted it will
create inodes and those can exceed 5.  To handle that problem this patch will
dynamically allocate audit_names if it needs more than 5.  This should
decrease the typicall memory usage while still supporting all the possible
kernel operations.

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoaudit: make filetype matching consistent with other filters
Eric Paris [Tue, 3 Jan 2012 19:23:05 +0000]
audit: make filetype matching consistent with other filters

Every other filter that matches part of the inodes list collected by audit
will match against any of the inodes on that list.  The filetype matching
however had a strange way of doing things.  It allowed userspace to
indicated if it should match on the first of the second name collected by
the kernel.  Name collection ordering seems like a kernel internal and
making userspace rules get that right just seems like a bad idea.  As it
turns out the userspace audit writers had no idea it was doing this and
thus never overloaded the value field.  The kernel always checked the first
name collected which for the tested rules was always correct.

This patch just makes the filetype matching like the major, minor, inode,
and LSM rules in that it will match against any of the names collected.  It
also changes the rule validation to reject the old unused rule types.

Noone knew it was there.  Noone used it.  Why keep around the extra code?

Signed-off-by: Eric Paris <eparis@redhat.com>

7 years agoxfs: cleanup xfs_file_aio_write
Christoph Hellwig [Sun, 18 Dec 2011 20:00:14 +0000]
xfs: cleanup xfs_file_aio_write

With all the size field updates out of the way xfs_file_aio_write can
be further simplified by pushing all iolock handling into
xfs_file_dio_aio_write and xfs_file_buffered_aio_write and using
the generic generic_write_sync helper for synchronous writes.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>

7 years agoxfs: always return with the iolock held from xfs_file_aio_write_checks
Christoph Hellwig [Sun, 18 Dec 2011 20:00:13 +0000]
xfs: always return with the iolock held from xfs_file_aio_write_checks

While xfs_iunlock is fine with 0 lockflags the calling conventions are much
cleaner if xfs_file_aio_write_checks never returns without the iolock held.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>

7 years agoxfs: remove the i_new_size field in struct xfs_inode
Christoph Hellwig [Sun, 18 Dec 2011 20:00:12 +0000]
xfs: remove the i_new_size field in struct xfs_inode

Now that we use the VFS i_size field throughout XFS there is no need for the
i_new_size field any more given that the VFS i_size field gets updated
in ->write_end before unlocking the page, and thus is always uptodate when
writeback could see a page.  Removing i_new_size also has the advantage that
we will never have to trim back di_size during a failed buffered write,
given that it never gets updated past i_size.

Note that currently the generic direct I/O code only updates i_size after
calling our end_io handler, which requires a small workaround to make
sure di_size actually makes it to disk.  I hope to fix this properly in
the generic code.

A downside is that we lose the support for parallel non-overlapping O_DIRECT
appending writes that recently was added.  I don't think keeping the complex
and fragile i_new_size infrastructure for this is a good tradeoff - if we
really care about parallel appending writers we should investigate turning
the iolock into a range lock, which would also allow for parallel
non-overlapping buffered writers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

7 years agoxfs: remove the i_size field in struct xfs_inode
Christoph Hellwig [Sun, 18 Dec 2011 20:00:11 +0000]
xfs: remove the i_size field in struct xfs_inode

There is no fundamental need to keep an in-memory inode size copy in the XFS
inode.  We already have the on-disk value in the dinode, and the separate
in-memory copy that we need for regular files only in the XFS inode.

Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
VFS inode i_size field for regular files.  Switch code that was directly
accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
we are limited to regular files direct access of the VFS inode i_size field.

This also allows dropping some fairly complicated code in the write path
which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
that is getting updated inside ->write_end.

Note that we do not bother resetting the VFS i_size when truncating a file
that gets freed to zero as there is no point in doing so because the VFS inode
is no longer in use at this point.  Just relax the assert in xfs_ifree to
only check the on-disk size instead.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>

7 years agoxfs: replace i_pin_wait with a bit waitqueue
Christoph Hellwig [Sun, 18 Dec 2011 20:00:10 +0000]
xfs: replace i_pin_wait with a bit waitqueue

Replace i_pin_wait, which is only used during synchronous inode flushing
with a bit waitqueue.  This trades off a much smaller inode against
slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit)
bytes in the XFS inode.

Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>

7 years agoxfs: replace i_flock with a sleeping bitlock
Christoph Hellwig [Sun, 18 Dec 2011 20:00:09 +0000]
xfs: replace i_flock with a sleeping bitlock

We almost never block on i_flock, the exception is synchronous inode
flushing.  Instead of bloating the inode with a 16/24-byte completion
that we abuse as a semaphore just implement it as a bitlock that uses
a bit waitqueue for the rare sleeping path.  This primarily is a
tradeoff between a much smaller inode and a faster non-blocking
path vs faster wakeups, and we are much better off with the former.

A small downside is that we will lose lockdep checking for i_flock, but
given that it's always taken inside the ilock that should be acceptable.

Note that for example the inode writeback locking is implemented in a
very similar way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

7 years agoxfs: make i_flags an unsigned long
Christoph Hellwig [Sun, 18 Dec 2011 20:00:08 +0000]
xfs: make i_flags an unsigned long

To be used for bit wakeup i_flags needs to be an unsigned long or we'll
run into trouble on big endian systems.  Because of the 1-byte i_update
field right after it this actually causes a fairly large size increase
on its own (4 or 8 bytes), but that increase will be more than offset
by the next two patches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

7 years agoxfs: remove the if_ext_max field in struct xfs_ifork
Christoph Hellwig [Sun, 18 Dec 2011 20:00:07 +0000]
xfs: remove the if_ext_max field in struct xfs_ifork

We spent a lot of effort to maintain this field, but it always equals to the
fork size divided by the constant size of an extent.  The prime use of it is
to assert that the two stay in sync.  Just divide the fork size by the extent
size in the few places that we actually use it and remove the overhead
of maintaining it.  Also introduce a few helpers to consolidate the places
where we actually care about the value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

7 years agoinetpeer: initialize ->redirect_genid in inet_getpeer()
Dan Carpenter [Tue, 17 Jan 2012 10:48:43 +0000]
inetpeer: initialize ->redirect_genid in inet_getpeer()

kmemcheck complains that ->redirect_genid doesn't get initialized.
Presumably it should be set to zero.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agonet: fix NULL-deref in WARN() in skb_gso_segment()
Michał Mirosław [Tue, 17 Jan 2012 10:00:40 +0000]
net: fix NULL-deref in WARN() in skb_gso_segment()

Bug was introduced in commit c8f44affb7244f2ac3e703cab13d55ede27621bb.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agonet: WARN if skb_checksum_help() is called on skb requiring segmentation
Ben Hutchings [Tue, 17 Jan 2012 07:57:56 +0000]
net: WARN if skb_checksum_help() is called on skb requiring segmentation

skb_checksum_help() has never done anything useful with skbs that
require segmentation.  Setting skb->ip_summed = CHECKSUM_NONE makes
them invalid and provokes a later WARNing in skb_gso_segment().

Passing such an skb to skb_checksum_help() indicates a bug, so we
should warn about it immediately.  Move the warning from
skb_gso_segment() into a shared function, and add gso_type and
gso_size to it.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Tue, 17 Jan 2012 20:41:10 +0000]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

* 'for-linus' of git://git.kernel.dk/linux-block:
  cfq-iosched: fix use-after-free of cfqq

7 years agocfq-iosched: fix use-after-free of cfqq
Jens Axboe [Tue, 17 Jan 2012 20:26:11 +0000]
cfq-iosched: fix use-after-free of cfqq

With the changes in life time management between the cfq IO contexts
and the cfq queues, we now risk having cfqd->active_queue being
freed when cfq_slice_expired() is being called. cfq_preempt_queue()
caches this queue and uses it after calling said function, causing
a use-after-free condition. This triggers the following oops,
when cfqq_type() attempts to dereference it:

BUG: unable to handle kernel paging request at ffff8800746c4f0c
IP: [<ffffffff81266d59>] cfqq_type+0xb/0x20
PGD 18d4063 PUD 1fe15067 PMD 1ffb9067 PTE 80000000746c4160
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
CPU 3
Modules linked in:

Pid: 1, comm: init Not tainted 3.2.0-josef+ #367 Bochs Bochs
RIP: 0010:[<ffffffff81266d59>]  [<ffffffff81266d59>] cfqq_type+0xb/0x20
RSP: 0018:ffff880079c11778  EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff880076f3df08 RCX: 0000000000000000
RDX: 0000000000000006 RSI: ffff880074271888 RDI: ffff8800746c4f08
RBP: ffff880079c11778 R08: 0000000000000078 R09: 0000000000000001
R10: 09f911029d74e35b R11: 09f911029d74e35b R12: ffff880076f337f0
R13: ffff8800746c4f08 R14: ffff8800746c4f08 R15: 0000000000000002
FS:  00007f62fd44f700(0000) GS:ffff88007cd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8800746c4f0c CR3: 0000000076c21000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process init (pid: 1, threadinfo ffff880079c10000, task ffff880079c0a040)
Stack:
 ffff880079c117c8 ffffffff812683d8 ffff880079c117a8 ffffffff8125de43
 ffff8800744fcf48 ffff880074b43e98 ffff8800770c8828 ffff880074b43e98
 0000000000000003 0000000000000000 ffff880079c117f8 ffffffff81254149
Call Trace:
 [<ffffffff812683d8>] cfq_insert_request+0x3f5/0x47c
 [<ffffffff8125de43>] ? blk_recount_segments+0x20/0x31
 [<ffffffff81254149>] __elv_add_request+0x1ca/0x200
 [<ffffffff8125aa99>] blk_queue_bio+0x2ef/0x312
 [<ffffffff81258f7b>] generic_make_request+0x9f/0xe0
 [<ffffffff8125907b>] submit_bio+0xbf/0xca
 [<ffffffff81136ec7>] submit_bh+0xdf/0xfe
 [<ffffffff81176d04>] ext3_bread+0x50/0x99
 [<ffffffff811785b3>] dx_probe+0x38/0x291
 [<ffffffff81178864>] ext3_dx_find_entry+0x58/0x219
 [<ffffffff81178ad5>] ext3_find_entry+0xb0/0x406
 [<ffffffff8110c4d5>] ? cache_alloc_debugcheck_after.isra.46+0x14d/0x1a0
 [<ffffffff8110cfbd>] ? kmem_cache_alloc+0xef/0x191
 [<ffffffff8117a330>] ext3_lookup+0x39/0xe1
 [<ffffffff81119461>] d_alloc_and_lookup+0x45/0x6c
 [<ffffffff8111ac41>] do_lookup+0x1e4/0x2f5
 [<ffffffff8111aef6>] link_path_walk+0x1a4/0x6ef
 [<ffffffff8111b557>] path_lookupat+0x59/0x5ea
 [<ffffffff8127406c>] ? __strncpy_from_user+0x30/0x5a
 [<ffffffff8111bce0>] do_path_lookup+0x23/0x59
 [<ffffffff8111cfd6>] user_path_at_empty+0x53/0x99
 [<ffffffff8107b37b>] ? remove_wait_queue+0x51/0x56
 [<ffffffff8111d02d>] user_path_at+0x11/0x13
 [<ffffffff811141f5>] vfs_fstatat+0x3a/0x64
 [<ffffffff8111425a>] vfs_stat+0x1b/0x1d
 [<ffffffff81114359>] sys_newstat+0x1a/0x33
 [<ffffffff81060e12>] ? task_stopped_code+0x42/0x42
 [<ffffffff815d6712>] system_call_fastpath+0x16/0x1b
Code: 89 e6 48 89 c7 e8 fa ca fe ff 85 c0 74 06 4c 89 2b 41 b6 01 5b 44 89 f0 41 5c 41 5d 41 5e 5d c3 55 48 89 e5 66 66 66 66 90 31 c0 <8b> 57 04 f6 c6 01 74 0b 83 e2 20 83 fa 01 19 c0 83 c0 02 5d c3
RIP  [<ffffffff81266d59>] cfqq_type+0xb/0x20
 RSP <ffff880079c11778>
CR2: ffff8800746c4f0c

Get rid of the caching of cfqd->active_queue, and reorder the
check so that it happens before we expire the active queue.

Thanks to Tejun for pin pointing the error location.

Reported-by: Chris Mason <chris.mason@oracle.com>
Tested-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

7 years agox86, opcode: ANDN and Group 17 in x86-opcode-map.txt
Ulrich Drepper [Tue, 17 Jan 2012 19:14:02 +0000]
x86, opcode: ANDN and Group 17 in x86-opcode-map.txt

The Intel documentation at

http://software.intel.com/file/36945

shows the ANDN opcode and Group 17 with encoding f2 and f3 encoding
respectively.  The current version of x86-opcode-map.txt shows them
with f3 and f4.  Unless someone can point to documentation which shows
the currently used encoding the following patch be applied.

Signed-off-by: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/CAOPLpQdq5SuVo9=023CYhbFLAX9rONyjmYq7jJkqc5xwctW5eA@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>

7 years agoMerge branch 'stable/for-linus-fixes-3.3' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Tue, 17 Jan 2012 19:56:29 +0000]
Merge branch 'stable/for-linus-fixes-3.3' of git://git./linux/kernel/git/konrad/xen

* 'stable/for-linus-fixes-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/balloon: Move the registration from device to subsystem.

7 years agoACPI processor: Remove unneeded cpuidle_unregister_driver call
Thomas Renninger [Tue, 17 Jan 2012 16:35:22 +0000]
ACPI processor: Remove unneeded cpuidle_unregister_driver call

Since commit 46bcfad7a819bd17ac4e831b04405152d59784ab registering
and unregistering cpuidle is done in processor_idle.c.
Unregistering via:
acpi_bus_unregister_driver(&acpi_processor_driver)
   -> acpi_processor_remove()
      -> acpi_processor_power_exit()

Remove not needed cpuidle_unregister_driver() call from
acpi_processor_exit

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agointel idle: Make idle driver more robust
Thomas Renninger [Sun, 4 Dec 2011 21:17:29 +0000]
intel idle: Make idle driver more robust

kvm -cpu host passes the original cpuid info to the guest.

Latest kvm version seem to return true for mwait_leaf cpuid
function on recent Intel CPUs. But it does not return mwait
C-states (mwait_substates), instead zero is returned.

While real CPUs seem to always return non-zero values, the intel
idle driver should not get active in kvm (mwait_substates == 0)
case and bail out.
Otherwise a Null pointer exception will happen later when the
cpuidle subsystem tries to get active:
[0.984807] BUG: unable to handle kernel NULL pointer dereference at (null)
[0.984807] IP: [<(null)>] (null)
...
[0.984807][<ffffffff8143cf34>] ? cpuidle_idle_call+0xb4/0x340
[0.984807][<ffffffff8159e7bc>] ? __atomic_notifier_call_chain+0x4c/0x70
[0.984807][<ffffffff81001198>] ? cpu_idle+0x78/0xd0

Reference:
https://bugzilla.novell.com/show_bug.cgi?id=726296

Cc: stable@vger.kernel.org
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Bruno Friedmann <bruno@ioda-net.ch>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agointel_idle: Fix a cast to pointer from integer of different size warning in intel_idle
David Howells [Thu, 15 Dec 2011 13:03:14 +0000]
intel_idle: Fix a cast to pointer from integer of different size warning in intel_idle

Fix the following warning:

drivers/idle/intel_idle.c: In function 'intel_idle_cpuidle_devices_init':
drivers/idle/intel_idle.c:518:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

By making get_driver_data() return a long instead of an int.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI: kernel-parameters.txt : Add intel_idle.max_cstate
Masanari Iida [Wed, 14 Dec 2011 16:18:52 +0000]
ACPI: kernel-parameters.txt : Add intel_idle.max_cstate

Add missing intel_idle.max_cstate in kernel-parameters.txt

Signed-off-by Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agointel_idle: remove redundant local_irq_disable() call
Yanmin Zhang [Tue, 10 Jan 2012 23:48:21 +0000]
intel_idle: remove redundant local_irq_disable() call

irq disabling happens earlier in process_32.c:cpu_idle.  Basically,
cpuidle_state->enter is called, cpu irq is disabled.  cpuidle_state->enter
would turn on irq when exiting.

intel_idle doesn't follow this assumption.  Although it doesn't cause real
issue, it misleads developers.  Remove the call to local_irq_disable() at
entry.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Mingming Zhang <mingmingx.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoMerge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze
Linus Torvalds [Tue, 17 Jan 2012 18:49:06 +0000]
Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze

* 'next' of git://git.monstr.eu/linux-2.6-microblaze:
  USB: EHCI: Don't use NO_IRQ in xilinx ehci driver
  microblaze: Add topology init

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Linus Torvalds [Tue, 17 Jan 2012 18:48:13 +0000]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: virtuoso: Xonar DS: fix polarity of front output
  ALSA: Au88x0 - Reduce the number of playback subdevices of au8830 from 32 to 16
  ALSA: Au88x0 - Support 4 channels playback when AC97 codecs has SDAC bit
  ALSA: HDA: Fix internal microphone on Dell Studio 16 XPS 1645
  ALSA: Don't prompt for CONFIG_SND_COMPRESS_OFFLOAD
  ALSA: HDA: Use LPIB position fix for Macbook Pro 7,1

7 years agotty: remove unused tty_driver->termios_locked
Konstantin Khlebnikov [Tue, 17 Jan 2012 08:54:01 +0000]
tty: remove unused tty_driver->termios_locked

This field is unused since 2.6.28 (commit fe6e29fdb1a7: "tty: simplify
ktermios allocation", to be exact)

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years ago[media] dvb_frontend: Don't call get_frontend() if idle
Mauro Carvalho Chehab [Tue, 17 Jan 2012 18:20:37 +0000]
[media] dvb_frontend: Don't call get_frontend() if idle

If the frontend is in idle state, don't call get_frontend.

Calling get_frontend() when the device is not tuned may
result in wrong parameters to be returned to the
userspace.

I was tempted to not call get_frontend() at all, except
inside the dvb frontend thread, but this won't work for
all cases. The ISDB-T specs (ABNT NBR 15601 and ARIB
STD-B31) allow the broadcaster to dynamically change the
channel specs at runtime. That means that an ISDB-T optimized
application may want/need to monitor the TMCC tables, decoded
at the frontends via get_frontend call.

So, let's do the simpler change here.

Eventually, the logic could be changed to work only if
the device is tuned and has lock, but, even so, the
lock is also standard-dependent. For ISDB-T, the right
lock to wait is that the demod has TMCC lock. So, drivers
may need to implement some logic to detect if the get_frontend
info was retrieved or not.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

7 years agoRevert "capabitlies: ns_capable can use the cap helpers rather than lsm call"
Linus Torvalds [Tue, 17 Jan 2012 18:19:41 +0000]
Revert "capabitlies: ns_capable can use the cap helpers rather than lsm call"

This reverts commit d2a7009f0bb03fa22ad08dd25472efa0568126b9.

J. R. Okajima explains:

 "After this commit, I am afraid access(2) on NFS may not work
  correctly.  The scenario based upon my guess.
   - access(2) overrides the credentials.
   - calls inode_permission() -- ... -- generic_permission() --
      ns_capable().
   - while the old ns_capable() calls security_capable(current_cred()),
     the new ns_capable() calls has_ns_capability(current) --
     security_capable(__task_cred(t)).

  current_cred() returns current->cred which is effective (overridden)
  credentials, but __task_cred(current) returns current->real_cred (the
  NFSD's credential).  And the overridden credentials by access(2) lost."

Requested-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

7 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
David S. Miller [Tue, 17 Jan 2012 17:11:52 +0000]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless

7 years agoASoC: Wait for WM8993 FLL to stabilise
Mark Brown [Tue, 17 Jan 2012 16:28:59 +0000]
ASoC: Wait for WM8993 FLL to stabilise

Ensure the FLL is locked before we return from set_fll().

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>

7 years agocaif: Remove bad WARN_ON in caif_dev
sjur.brandeland@stericsson.com [Tue, 17 Jan 2012 03:03:14 +0000]
caif: Remove bad WARN_ON in caif_dev

Remove WARN_ON and bad handling of SKB without destructor callback
in caif_flow_cb. SKB without destructor cannot be handled as an
error case.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agocaif: Fix typo in Vendor/Product-ID for CAIF modems
sjur.brandeland@stericsson.com [Tue, 17 Jan 2012 03:03:13 +0000]
caif: Fix typo in Vendor/Product-ID for CAIF modems

Fix typo for the Vendor/Product Id for ST-Ericsson CAIF modems.
Discovery is based on fixed USB vendor 0x04cc (ST-Ericsson),
product-id 0x230f (NCM).

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agobnx2x: Disable AN KR work-around for BCM57810
Yaniv Rosner [Tue, 17 Jan 2012 02:33:29 +0000]
bnx2x: Disable AN KR work-around for BCM57810

Disable the work-around for the autoneg KR of the BCM57810 in case the Warpcore version is 0xD108 and above, which fixes this problem.

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agobnx2x: Remove AutoGrEEEn for BCM84833
Yaniv Rosner [Tue, 17 Jan 2012 02:33:28 +0000]
bnx2x: Remove AutoGrEEEn for BCM84833

Disable the autoGrEEEn feature for BCM84833.

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agobnx2x: Remove 100Mb force speed for BCM84833
Yaniv Rosner [Tue, 17 Jan 2012 02:33:27 +0000]
bnx2x: Remove 100Mb force speed for BCM84833

Remove unsupported speed of 100Mb force for BCM84833 due to hardware
limitation.

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agobnx2x: Fix PFC setting on BCM57840
Yaniv Rosner [Tue, 17 Jan 2012 02:33:26 +0000]
bnx2x: Fix PFC setting on BCM57840

This patch handles the second port of a path in a 4-port device of
BCM57840.

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agobnx2x: Fix Super-Isolate mode for BCM84833
Yaniv Rosner [Tue, 17 Jan 2012 02:33:25 +0000]
bnx2x: Fix Super-Isolate mode for BCM84833

The Super-Isolate mode comes to isolate the BCM84833 PHY from the
outside world.  Not doing it correctly, made link partner see the link
before the driver was loaded.

This patch also involves SPIROM version fixes since it is used to
determine whether the common init of the PHY was already executed, and
the common init of this PHY is partially responsible for setting the
Super-Isolate mode.

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agonet: fix some sparse errors
Eric Dumazet [Mon, 16 Jan 2012 19:27:39 +0000]
net: fix some sparse errors

make C=2 CF="-D__CHECK_ENDIAN__" M=net

And fix flowi4_init_output() prototype for sport

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agonet: kill duplicate included header
Shan Wei [Mon, 16 Jan 2012 18:34:24 +0000]
net: kill duplicate included header

For net part, remove duplicate included header.

Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agonet: sh-eth: Fix build error by the value which is not defined
Nobuhiro Iwamatsu [Mon, 16 Jan 2012 16:50:16 +0000]
net: sh-eth: Fix build error by the value which is not defined

-----
drivers/net/ethernet/renesas/sh_eth.c:1706: error: 'pdid' undeclared (first use in this function)
drivers/net/ethernet/renesas/sh_eth.c:1706: error: (Each undeclared identifier is reported only once
drivers/net/ethernet/renesas/sh_eth.c:1706: error: for each function it appears in.)
make[5]: *** [drivers/net/ethernet/renesas/sh_eth.o] Error 1
-----

Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
CC: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agonet: Use device model to get driver name in skb_gso_segment()
Ben Hutchings [Mon, 16 Jan 2012 12:38:59 +0000]
net: Use device model to get driver name in skb_gso_segment()

ethtool operations generally require the caller to hold RTNL and are
not safe to call in atomic context.  The device model provides this
information for most devices; we'll only lose it for some old ISA
drivers.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agobridge: BH already disabled in br_fdb_cleanup()
Eric Dumazet [Mon, 16 Jan 2012 04:35:50 +0000]
bridge: BH already disabled in br_fdb_cleanup()

br_fdb_cleanup() is run from timer interrupt, BH already masked.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Štefan Gula <steweg@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agonet: move sock_update_memcg outside of CONFIG_INET
Glauber Costa [Sun, 15 Jan 2012 22:04:39 +0000]
net: move sock_update_memcg outside of CONFIG_INET

Although only used currently for tcp sockets, this function
is now used in common sock code (for sock_clone())

Commit 475f1b52645a29936b9df1d8fcd45f7e56bd4a9f moved the
declaration of sock_update_clone() to inside sock.c, but
this only fixes the problem when CONFIG_CGROUP_MEM_RES_CTLR_KMEM
is also not defined.

This patch here is verified to fix both problems, although
reverting the previous one is not necessary.

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: David S. Miller <davem@davemloft.net>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agomwl8k: Fixing Sparse ENDIAN CHECK warning
Yogesh Ashok Powar [Tue, 17 Jan 2012 10:15:15 +0000]
mwl8k: Fixing Sparse ENDIAN CHECK warning

Fixing following sparse warning
>drivers/net/wireless/mwl8k.c:2780:15: warning: incorrect type in assignment (different base types)
>drivers/net/wireless/mwl8k.c:2780:15:    expected restricted unsigned short [usertype] channel
>drivers/net/wireless/mwl8k.c:2780:15:    got unsigned short [unsigned] [usertype] hw_value

Signed-off-by: Yogesh Ashok Powar <yogeshp@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

7 years agomac80211: Fix possible race between sta_unblock and network softirq
Helmut Schaa [Tue, 17 Jan 2012 08:22:49 +0000]
mac80211: Fix possible race between sta_unblock and network softirq

All other code paths in sta_unblock synchronize with the network
softirq by using local_bh_disable/enable. Do the same around
ieee80211_sta_ps_deliver_wakeup.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

7 years agomwl8k: fix condition in mwl8k_cmd_encryption_remove_key()
Dan Carpenter [Tue, 17 Jan 2012 07:33:31 +0000]
mwl8k: fix condition in mwl8k_cmd_encryption_remove_key()

The intent here was to check whether key->cipher was WEP40 or WEP104.
We do a similar check correctly in several other places in this file.
The current condition is always true.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

7 years agobrcmfmac: work-around gcc 4.7 build issue
Alexandre Oliva [Mon, 16 Jan 2012 19:00:12 +0000]
brcmfmac: work-around gcc 4.7 build issue

Alexandre Oliva <oliva@lsd.ic.unicamp.br> says:

"It's an issue brought about by GCC 4.7's partial-inlining, that ends up
splitting the udelay function just at the wrong spot, in such a way that
some sanity checks for constants fails, and we end up calling
bad_udelay.

This patch fixes the problem.  Feel free to push it upstream if it makes
sense to you."

Signed-off-by: John W. Linville <linville@tuxdriver.com>

7 years agonet: remove version.h includes in net/openvswitch/
Devendra Naga [Sat, 14 Jan 2012 08:16:21 +0000]
net: remove version.h includes in net/openvswitch/

remove version.h includes in net/openswitch/ as reported by make versioncheck.

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agobql: Fix inconsistency between file mode and attr method.
Hiroaki SHIMODA [Sat, 14 Jan 2012 07:10:21 +0000]
bql: Fix inconsistency between file mode and attr method.

There is no store() method for inflight attribute in the
tx-<n>/byte_queue_limits sysfs directory.
So remove S_IWUSR bit.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agoehea: make some functions and variables static
Thadeu Lima de Souza Cascardo [Fri, 13 Jan 2012 08:06:32 +0000]
ehea: make some functions and variables static

Some functions and variables in ehea are only used in their own file, so
they should be static. One particular function had a very generic name,
print_error_data.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7 years agobrcmsmac: remove PCI suspend/resume from bcma driver
Linus Torvalds [Fri, 13 Jan 2012 22:58:42 +0000]
brcmsmac: remove PCI suspend/resume from bcma driver

The brcmsmac driver isn't a PCI driver any more, it's a bcma one.  The
PCI device has been resumed by the PCI driver (the generic PCI layer,
really), we should be resuming just our own driver state.

Also add pr_debug() calls to show that we now actually get the
suspend/resume events.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

7 years agobcma: connect the bcma bus suspend/resume to the bcma driver suspend/resume
Linus Torvalds [Fri, 13 Jan 2012 22:58:41 +0000]
bcma: connect the bcma bus suspend/resume to the bcma driver suspend/resume

Now the low-level driver actually gets informed that it is getting suspended and resumed.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

7 years agobcma: add stub for bcma_bus_suspend()
Linus Torvalds [Fri, 13 Jan 2012 22:58:40 +0000]
bcma: add stub for bcma_bus_suspend()

.. and connect it up with the pci host bcma driver.

Now, the next step is to connect those bcma bus-level suspend/resume
functions to the actual bcma device suspend resume functions.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

7 years agobcma: convert suspend/resume to pm_ops
Linus Torvalds [Fri, 13 Jan 2012 22:58:39 +0000]
bcma: convert suspend/resume to pm_ops

.. and avoid doing the unnecessary PCI operations - the PCI layer will
do them for us.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

7 years ago[media] DocBook/dvbproperty.xml: Remove DTV_MODULATION from ISDB-T
Mauro Carvalho Chehab [Tue, 17 Jan 2012 12:00:41 +0000]
[media] DocBook/dvbproperty.xml: Remove DTV_MODULATION from ISDB-T

On ISDB-T, each layer can have its own independent modulation,
applied to the carriers that belong to the segments associated
with them. So, there's no sense to define a global modulation
parameter.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

7 years ago[media] DocBook/dvbproperty.xml: Fix ISDB-T delivery system parameters
Mauro Carvalho Chehab [Tue, 17 Jan 2012 11:49:28 +0000]
[media] DocBook/dvbproperty.xml: Fix ISDB-T delivery system parameters

The ISDB-T differs on its way to implement the hierarchical
transmissions: instead of using a low-priority/high-priority
FEC codes, it does that by using different layers, each layer
with their groups of segments. So, those parameters don't make sense
for ISDB-T.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

7 years ago[media] DocBook/dvbproperty.xml: Fix the units for DTV_FREQUENCY
Mauro Carvalho Chehab [Tue, 17 Jan 2012 11:45:48 +0000]
[media] DocBook/dvbproperty.xml: Fix the units for DTV_FREQUENCY

The units for DTV_FREQUENCY are kHz for satellital delivery systems
(DVB-S/DVB-S2/DVB-TURBO/ISDB-S). Fix it at the API spec.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

7 years agoACPI processor: Fix error path, also remove sysdev link
Thomas Renninger [Thu, 17 Nov 2011 22:37:00 +0000]
ACPI processor: Fix error path, also remove sysdev link

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI: processor: fix acpi_get_cpuid for UP processor
Lin Ming [Tue, 13 Dec 2011 01:36:03 +0000]
ACPI: processor: fix acpi_get_cpuid for UP processor

For UP processor, it is likely that no _MAT method or MADT table defined.
So currently acpi_get_cpuid(...) always return -1 for UP processor.
This is wrong. It should return valid value for CPU0.

In the other hand, BIOS may define multiple CPU handles even for UP
processor, for example

        Scope (_PR)
        {
            Processor (CPU0, 0x00, 0x00000410, 0x06) {}
            Processor (CPU1, 0x01, 0x00000410, 0x06) {}
            Processor (CPU2, 0x02, 0x00000410, 0x06) {}
            Processor (CPU3, 0x03, 0x00000410, 0x06) {}
        }

We should only return valid value for CPU0's acpi handle.
And return invalid value for others.

http://marc.info/?t=132329819900003&r=1&w=2

Cc: stable@vger.kernel.org
Reported-and-tested-by: wallak@free.fr
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agointel_idle: fix API misuse
Shaohua Li [Tue, 10 Jan 2012 23:48:19 +0000]
intel_idle: fix API misuse

smp_call_function() only lets all other CPUs execute a specific function,
while we expect all CPUs do in intel_idle.  Without the fix, we could have
one cpu which has auto_demotion enabled or has no broadcast timer setup.
Usually we don't see impact because auto demotion just harms power and the
intel_idle init is called in CPU 0, where boradcast timer delivers
interrupt, but this still could be a problem.

Cc: stable@vger.kernel.org
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agonetfilter: ipset: dumping error triggered removing references twice
Jozsef Kadlecsik [Sat, 14 Jan 2012 15:42:13 +0000]
netfilter: ipset: dumping error triggered removing references twice

If there was a dumping error in the middle, the set-specific variable was
not zeroed out and thus the 'done' function of the dumping wrongly tried
to release the already released reference of the set. The already released
reference was caught by __ip_set_put and triggered a kernel BUG message.
Reported by Jean-Philippe Menil.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

7 years agonetfilter: ipset: autoload set type modules safely
Jozsef Kadlecsik [Tue, 17 Jan 2012 09:39:05 +0000]
netfilter: ipset: autoload set type modules safely

Jan Engelhardt noticed when userspace requests a set type unknown
to the kernel, it can lead to a loop due to the unsafe type module
loading. The issue is fixed in this patch.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

7 years agox86/kconfig: Move the ZONE_DMA entry under a menu
Randy Dunlap [Mon, 16 Jan 2012 19:57:18 +0000]
x86/kconfig: Move the ZONE_DMA entry under a menu

Move the ZONE_DMA kconfig symbol under a menu item instead
of having it listed before everything else in
"make {xconfig | gconfig | nconfig | menuconfig}".

This drops the first line of the top-level kernel config menu
(in 3.2) below and moves it under "Processor type and features".

          [*] DMA memory allocation support
              General setup  --->
          [*] Enable loadable module support  --->
          [*] Enable the block layer  --->
              Processor type and features  --->
              Power management and ACPI options  --->
              Bus options (PCI etc.)  --->
              Executable file formats / Emulations  --->

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-mm@kvack.org <linux-mm@kvack.org>
Link: http://lkml.kernel.org/r/4F14811E.6090107@xenotime.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>

7 years agoACPI APEI: Convert atomicio routines
Myron Stowe [Mon, 7 Nov 2011 23:23:41 +0000]
ACPI APEI: Convert atomicio routines

APEI needs memory access in interrupt context.  The obvious choice is
acpi_read(), but originally it couldn't be used in interrupt context
because it makes temporary mappings with ioremap().  Therefore, we added
drivers/acpi/atomicio.c, which provides:
    acpi_pre_map_gar()     -- ioremap in process context
acpi_atomic_read()     -- memory access in interrupt context
acpi_post_unmap_gar()  -- iounmap

Later we added acpi_os_map_generic_address() (2971852) and enhanced
acpi_read() so it works in interrupt context as long as the address has
been previously mapped (620242a).  Now this sequence:
    acpi_os_map_generic_address()    -- ioremap in process context
    acpi_read()/apei_read()          -- now OK in interrupt context
    acpi_os_unmap_generic_address()
is equivalent to what atomicio.c provides.

This patch introduces apei_read() and apei_write(), which currently are
functional equivalents of acpi_read() and acpi_write().  This is mainly
proactive, to prevent APEI breakages if acpi_read() and acpi_write()
are ever augmented to support the 'bit_offset' field of GAS, as APEI's
__apei_exec_write_register() precludes splitting up functionality
related to 'bit_offset' and APEI's 'mask' (see its
APEI_EXEC_PRESERVE_REGISTER block).

With apei_read() and apei_write() in place, usages of atomicio routines
are converted to apei_read()/apei_write() and existing calls within
osl.c and the CA, based on the re-factoring that was done in an earlier
patch series - http://marc.info/?l=linux-acpi&m=128769263327206&w=2:
    acpi_pre_map_gar()     -->  acpi_os_map_generic_address()
    acpi_post_unmap_gar()  -->  acpi_os_unmap_generic_address()
    acpi_atomic_read()     -->  apei_read()
    acpi_atomic_write()    -->  apei_write()

Note that acpi_read() and acpi_write() currently use 'bit_width'
for accessing GARs which seems incorrect.  'bit_width' is the size of
the register, while 'access_width' is the size of the access the
processor must generate on the bus.  The 'access_width' may be larger,
for example, if the hardware only supports 32-bit or 64-bit reads.  I
wanted to minimize any possible impacts with this patch series so I
did *not* change this behavior.

Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI: Export interfaces for ioremapping/iounmapping ACPI registers
Myron Stowe [Mon, 7 Nov 2011 23:23:34 +0000]
ACPI: Export interfaces for ioremapping/iounmapping ACPI registers

Export remapping and unmapping interfaces - acpi_os_map_generic_address()
and acpi_os_unmap_generic_address() - for ACPI generic registers that are
backed by memory mapped I/O (MMIO).

The acpi_os_map_generic_address() and acpi_os_unmap_generic_address()
declarations may more properly belong in include/acpi/acpiosxf.h next to
acpi_os_read_memory() but I believe that would require the ACPI CA making
them an official part of the ACPI CA - OS interface.

ACPI Generic Address Structure (GAS) reference (ACPI's fixed/generic
hardware registers use the GAS format):
  ACPI Specification, Revision 4.0, Section 5.2.3.1, "Generic Address
  Structure"

Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI: Fix possible alignment issues with GAS 'address' references
Myron Stowe [Mon, 7 Nov 2011 23:23:27 +0000]
ACPI: Fix possible alignment issues with GAS 'address' references

Generic Address Structures (GAS) may reside within ACPI tables which
are byte aligned.  This patch copies GAS 'address' references to a local
variable, which will be naturally aligned, to be used going forward.

ACPI Generic Address Structure (GAS) reference:
  ACPI Specification, Revision 4.0, Section 5.2.3.1, "Generic Address
  Structure"

Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64)
Kurt Garloff [Tue, 17 Jan 2012 09:21:49 +0000]
ACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64)

In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregrard reserved fields.

ia64 did handle the PXM fields almost consistently, but depending on
sgi's sn2 platform. This patch leaves the sn2 logic in, but does also
use 16/32 bits for PXM if the SRAT has rev 2 or higher.

The patch also adds __init to the two pxm accessor functions, as they
access __initdata now and are called from an __init function only anyway.

Note that the code only uses 16 bits for the PXM field in the processor
proximity field; the patch does not address this as 16 bits are more than
enough.

Signed-off-by: Kurt Garloff <kurt@garloff.de>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64)
Kurt Garloff [Tue, 17 Jan 2012 09:20:31 +0000]
ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64)

In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregrard reserved fields.

x86/x86-64 was rather inconsistent prior to this patch; it used 8 bits
for the pxm field in cpu_affinity, but 32 bits in mem_affinity.
This patch makes it consistent: Either use 8 bits consistently (SRAT
rev 1 or lower) or 32 bits (SRAT rev 2 or higher).

cc: x86@kernel.org
Signed-off-by: Kurt Garloff <kurt@garloff.de>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI: Store SRAT table revision
Kurt Garloff [Tue, 17 Jan 2012 09:18:02 +0000]
ACPI: Store SRAT table revision

In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides
32bits for these. The new fields were reserved before.
According to the ACPI spec, the OS must disregrard reserved fields.
In order to know whether or not, we must know what version the SRAT
table has.

This patch stores the SRAT table revision for later consumption
by arch specific __init functions.

Signed-off-by: Kurt Garloff <kurt@garloff.de>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI, APEI, Resolve false conflict between ACPI NVS and APEI
Huang Ying [Thu, 8 Dec 2011 03:25:50 +0000]
ACPI, APEI, Resolve false conflict between ACPI NVS and APEI

Some firmware will access memory in ACPI NVS region via APEI.  That
is, instructions in APEI ERST/EINJ table will read/write ACPI NVS
region.  The original resource conflict checking in APEI code will
check memory/ioport accessed by APEI via general resource management
mech.  But ACPI NVS region is marked as busy already, so that the
false resource conflict will prevent APEI ERST/EINJ to work.

To fix this, this patch excludes ACPI NVS regions when APEI components
request resources.  So that they will not conflict with ACPI NVS
regions.

Reported-and-tested-by: Pavel Ivanov <paivanof@gmail.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI, Record ACPI NVS regions
Huang Ying [Thu, 8 Dec 2011 03:25:49 +0000]
ACPI, Record ACPI NVS regions

Some firmware will access memory in ACPI NVS region via APEI.  That
is, instructions in APEI ERST/EINJ table will read/write ACPI NVS
region.  The original resource conflict checking in APEI code will
check memory/ioport accessed by APEI via general resource management
mechanism.  But ACPI NVS region is marked as busy already, so that the
false resource conflict will prevent APEI ERST/EINJ to work.

To fix this, this patch record ACPI NVS regions, so that we can avoid
request resources for memory region inside it.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI, APEI, EINJ, Refine the fix of resource conflict
Xiao, Hui [Thu, 8 Dec 2011 03:25:48 +0000]
ACPI, APEI, EINJ, Refine the fix of resource conflict

Current fix for resource conflict is to remove the address region <param1 &
param2, ~param2+1> from trigger resource, which is highly relies on valid user
input. This patch is trying to avoid such potential issues by fetching the
exact address region from trigger action table entry.

Signed-off-by: Xiao, Hui <hui.xiao@linux.intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI, APEI, EINJ, Fix resource conflict on some machine
Huang Ying [Thu, 8 Dec 2011 03:25:47 +0000]
ACPI, APEI, EINJ, Fix resource conflict on some machine

Some APEI firmware implementation will access injected address
specified in param1 to trigger the error when injecting memory error.
This will cause resource conflict with RAM.

On one of our testing machine, if injecting at memory address
0x10000000, the following error will be reported in dmesg:

  APEI: Can not request iomem region <0000000010000000-0000000010000008> for GARs.

This patch removes the injecting memory address range from trigger
table resources to avoid conflict.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI, Add RAM mapping support to ACPI atomic IO support
Huang Ying [Thu, 8 Dec 2011 03:25:46 +0000]
ACPI, Add RAM mapping support to ACPI atomic IO support

On one of our testing machine, the following EINJ command lines:

  # echo 0x10000000 > param1
  # echo 0xfffffffffffff000 > param2
  # echo 0x8 > error_type
  # echo 1 > error_inject

Will get:

  echo: write error: Input/output error

The EIO comes from:

    rc = apei_exec_pre_map_gars(&trigger_ctx);

The root cause is as follow.  Normally, ACPI atomic IO support is used
to access IO memory.  But in EINJ of that machine, it is used to
access RAM to trigger the injected error.  And the ioremap() called by
apei_exec_pre_map_gars() can not map the RAM.

This patch add RAM mapping support to ACPI atomic IO support to
satisfy EINJ requirement.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI, APEI, Printk queued error record before panic
Huang Ying [Thu, 8 Dec 2011 03:25:45 +0000]
ACPI, APEI, Printk queued error record before panic

Because printk is not safe inside NMI handler, the recoverable error
records received in NMI handler will be queued to be printked in a
delayed IRQ context via irq_work.  If a fatal error occurs after the
recoverable error and before the irq_work processed, we lost a error
report.

To solve the issue, the queued error records are printked in NMI
handler if system will go panic.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI, APEI, GHES, Distinguish interleaved error report in kernel log
Huang Ying [Thu, 8 Dec 2011 03:25:44 +0000]
ACPI, APEI, GHES, Distinguish interleaved error report in kernel log

In most cases, printk only guarantees messages from different printk
calling will not be interleaved between each other.  But, one APEI
GHES hardware error report will involve multiple printk calling,
normally each for one line.  So it is possible that the hardware error
report comes from different generic hardware error source will be
interleaved.

In this patch, a sequence number is prefixed to each line of error
report.  So that, even if they are interleaved, they still can be
distinguished by the prefixed sequence number.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI, APEI, Remove table not found message
Huang Ying [Thu, 8 Dec 2011 03:25:43 +0000]
ACPI, APEI, Remove table not found message

Because APEI tables are optional, these message may confuse users, for
example,

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/599715

Reported-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI, APEI, Print resource errors in conventional format
Bjorn Helgaas [Thu, 8 Dec 2011 03:25:42 +0000]
ACPI, APEI, Print resource errors in conventional format

Use the normal %pR-like format for MMIO and I/O port ranges.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI, APEI, GHES: Add PCIe AER recovery support
Huang Ying [Thu, 8 Dec 2011 03:25:41 +0000]
ACPI, APEI, GHES: Add PCIe AER recovery support

aer_recover_queue() is called when recoverable PCIe AER errors are
notified by firmware to do the recovery work.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoACPI, Add 64bit read/write support to atomicio on i386
Huang Ying [Thu, 8 Dec 2011 03:25:40 +0000]
ACPI, Add 64bit read/write support to atomicio on i386

There is no 64bit read/write support in ACPI atomicio because
readq/writeq is used to implement 64bit read/write, but readq/writeq
is not available on i386.  This patch implement 64bit read/write
support in atomicio via two readl/writel.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>

7 years agoMerge branch 'tip/perf/urgent-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Ingo Molnar [Tue, 17 Jan 2012 08:51:46 +0000]
Merge branch 'tip/perf/urgent-2' of git://git./linux/kernel/git/rostedt/linux-trace into perf/urgent