8 years agonet neigh: RCU conversion of neigh hash table
Eric Dumazet [Mon, 4 Oct 2010 06:15:44 +0000]
net neigh: RCU conversion of neigh hash table

David

This is the first step for RCU conversion of neigh code.

Next patches will convert hash_buckets[] and "struct neighbour" to RCU
protected objects.

Thanks

[PATCH net-next] net neigh: RCU conversion of neigh hash table

Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
neigh_table", a new structure is defined :

struct neigh_hash_table {
       struct neighbour        **hash_buckets;
       unsigned int            hash_mask;
       __u32                   hash_rnd;
       struct rcu_head         rcu;
};

And "struct neigh_table" has an RCU protected pointer to such a
neigh_hash_table.

This means the signature of (*hash)() function changed: We need to add a
third parameter with the actual hash_rnd value, since this is not
anymore a neigh_table field.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agonet neigh: neigh_delete() and neigh_add() changes
Eric Dumazet [Mon, 4 Oct 2010 04:27:36 +0000]
net neigh: neigh_delete() and neigh_add() changes

neigh_delete() and neigh_add() dont need to touch device refcount,
we hold RTNL when calling them, so device cannot disappear under us.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agonet: add a core netdev->rx_dropped counter
Eric Dumazet [Thu, 30 Sep 2010 21:06:55 +0000]
net: add a core netdev->rx_dropped counter

In various situations, a device provides a packet to our stack and we
drop it before it enters protocol stack :
- softnet backlog full (accounted in /proc/net/softnet_stat)
- bad vlan tag (not accounted)
- unknown/unregistered protocol (not accounted)

We can handle a per-device counter of such dropped frames at core level,
and automatically adds it to the device provided stats (rx_dropped), so
that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)

This is a generalization of commit 8990f468a (net: rx_dropped
accounting), thus reverting it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoppp: Use a real SKB control block in fragmentation engine.
David S. Miller [Tue, 5 Oct 2010 08:36:52 +0000]
ppp: Use a real SKB control block in fragmentation engine.

Do this instead of subverting fields in skb proper.

The macros that could very easily match variable or function
names were also just asking for trouble.

Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipv6: make __ipv6_isatap_ifid static
stephen hemminger [Mon, 4 Oct 2010 20:17:53 +0000]
ipv6: make __ipv6_isatap_ifid static

Another exported symbol only used in one file

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agofib: fib_rules_cleanup can be static
stephen hemminger [Mon, 4 Oct 2010 20:14:17 +0000]
fib: fib_rules_cleanup can be static

fib_rules_cleanup_ups is only defined and used in one place.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agofib: cleanups
Eric Dumazet [Mon, 4 Oct 2010 20:00:18 +0000]
fib: cleanups

Code style cleanups before upcoming functional changes.
C99 initializer for fib_props array.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agowimax: make functions local
stephen hemminger [Mon, 4 Oct 2010 19:59:59 +0000]
wimax: make functions local

Make wimax variables and functions local if possible.
Compile tested only.

This also removes a couple of unused EXPORT_SYMBOL.
If this breaks some out of tree code, please fix that
by putting the code in the kernel tree.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoqlcnic: remove dead code
stephen hemminger [Mon, 4 Oct 2010 15:44:30 +0000]
qlcnic: remove dead code

This driver has several pieces of dead code (found by running
make namespacecheck). This patch removes them.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agocaif: remove duplicated include
Nicolas Kaiser [Mon, 4 Oct 2010 04:35:39 +0000]
caif: remove duplicated include

Remove duplicated include.

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agodon't let BCM63XX_PHY depend on non-existant symbol
Uwe Kleine-König [Sun, 3 Oct 2010 23:43:33 +0000]
don't let BCM63XX_PHY depend on non-existant symbol

The kernel doesn't have a symbol called BCM63XX.  There is a symbol
BCM63XX_ENET (introduced in 9b1fc55a0500, 6 weeks after 09bb9aa0ed that
introduced BCM63XX_PHY), but the driver compiles without that, too.

Cc: Maxime Bizon <mbizon@freebox.fr>
Cc: Florian Fainelli <florian@openwrt.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agonet/phy: fix many "defined but unused" warnings
Uwe Kleine-König [Sun, 3 Oct 2010 23:43:32 +0000]
net/phy: fix many "defined but unused" warnings

MODULE_DEVICE_TABLE only expands to something if it's compiled
for a module.  So when building-in support for the phys, the
mdio_device_id tables are unused.  Marking them with __maybe_unused
fixes the following warnings:

drivers/net/phy/bcm63xx.c:134: warning: 'bcm63xx_tbl' defined but not used
drivers/net/phy/broadcom.c:933: warning: 'broadcom_tbl' defined but not used
drivers/net/phy/cicada.c:162: warning: 'cicada_tbl' defined but not used
drivers/net/phy/davicom.c:222: warning: 'davicom_tbl' defined but not used
drivers/net/phy/et1011c.c:114: warning: 'et1011c_tbl' defined but not used
drivers/net/phy/icplus.c:137: warning: 'icplus_tbl' defined but not used
drivers/net/phy/lxt.c:226: warning: 'lxt_tbl' defined but not used
drivers/net/phy/marvell.c:724: warning: 'marvell_tbl' defined but not used
drivers/net/phy/micrel.c:234: warning: 'micrel_tbl' defined but not used
drivers/net/phy/national.c:154: warning: 'ns_tbl' defined but not used
drivers/net/phy/qsemi.c:141: warning: 'qs6612_tbl' defined but not used
drivers/net/phy/realtek.c:82: warning: 'realtek_tbl' defined but not used
drivers/net/phy/smsc.c:257: warning: 'smsc_tbl' defined but not used
drivers/net/phy/ste10Xp.c:135: warning: 'ste10Xp_tbl' defined but not used
drivers/net/phy/vitesse.c:195: warning: 'vitesse_tbl' defined but not used

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agonet: relax rtnl_dereference()
David S. Miller [Tue, 5 Oct 2010 07:29:48 +0000]
net: relax rtnl_dereference()

rtnl_dereference() is used in contexts where RTNL is held, to fetch an
RCU protected pointer.

Updates to this pointer are prevented by RTNL, so we dont need
smp_read_barrier_depends() and the ACCESS_ONCE() provided in
rcu_dereference_check().

rtnl_dereference() is mainly a macro to document the locking invariant.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipvs: Use frag walker helper in SCTP proto support.
David S. Miller [Tue, 5 Oct 2010 07:27:05 +0000]
ipvs: Use frag walker helper in SCTP proto support.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Simon Horman <horms@verge.net.au>

8 years agonet: dynamic ingress_queue allocation
Eric Dumazet [Sat, 2 Oct 2010 06:11:55 +0000]
net: dynamic ingress_queue allocation

ingress being not used very much, and net_device->ingress_queue being
quite a big object (128 or 256 bytes), use a dynamic allocation if
needed (tc qdisc add dev eth0 ingress ...)

dev_ingress_queue(dev) helper should be used only with RTNL taken.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoqlcnic: set mtu lower limit
Sritej Velaga [Mon, 4 Oct 2010 04:20:16 +0000]
qlcnic: set mtu lower limit

Setting mtu < 68 is not supported.

Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoqlcnic: cleanup port mode setting
Sritej Velaga [Mon, 4 Oct 2010 04:20:15 +0000]
qlcnic: cleanup port mode setting

Port mode setting is not required for Qlogic CNA adapters.

Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoqlcnic: sparse warning fixes
Sucheta Chakraborty [Mon, 4 Oct 2010 04:20:14 +0000]
qlcnic: sparse warning fixes

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoqlcnic: fix vlan TSO on big endian machine
Sucheta Chakraborty [Mon, 4 Oct 2010 04:20:13 +0000]
qlcnic: fix vlan TSO on big endian machine

o desc->vlan_tci is in __le16 format. Doing htons and
  cpu_to_le64 again on vlan_tci, result in invalid value on ppc.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoqlcnic: fix endianess for lro
Sucheta Chakraborty [Mon, 4 Oct 2010 04:20:12 +0000]
qlcnic: fix endianess for lro

ipaddress in ifa->ifa_address field are in big endian format.
Also device requires ip address in big endian only.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoqlcnic: fix diag register
Amit Kumar Salecha [Mon, 4 Oct 2010 04:20:11 +0000]
qlcnic: fix diag register

regs_buff[i] and diag_registers[j] array should use different index
variable.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoqlcnic: fix eswitch stats
Amit Kumar Salecha [Mon, 4 Oct 2010 04:20:10 +0000]
qlcnic: fix eswitch stats

Some of the counters are not implemented in fw.
Fw return NOT AVAILABLE VALUE as (0xffffffffffffffff).
Adding these counters, result in invalid value.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoqlcnic: fix internal loopback test
Amit Kumar Salecha [Mon, 4 Oct 2010 04:20:09 +0000]
qlcnic: fix internal loopback test

o Loop 10 times with delay of 1 ms to rcv packet.
o Print garbage packet.
o Try send/receive MAX(16) packet, instead of exit from test,
  if a packet is not received.

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
David S. Miller [Mon, 4 Oct 2010 18:56:38 +0000]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6

Conflicts:
net/ipv4/Kconfig
net/ipv4/tcp_timer.c

8 years agonet: introduce DST_NOCACHE flag
Eric Dumazet [Mon, 4 Oct 2010 05:17:54 +0000]
net: introduce DST_NOCACHE flag

While doing stress tests with IP route cache disabled, and multi queue
devices, I noticed a very high contention on one rwlock used in
neighbour code.

When many cpus are trying to send frames (possibly using a high
performance multiqueue device) to the same neighbour, they fight for the
neigh->lock rwlock in order to call neigh_hh_init(), and fight on
hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test())

But we dont need to call neigh_hh_init() for dst that are used only
once. It costs four atomic operations at least, on two contended cache
lines, plus the high contention on neigh->lock rwlock.

Introduce a new dst flag, DST_NOCACHE, that is set when dst was not
inserted in route cache.

With the stress test bench, sending 160000000 frames on one neighbour,
results are :

Before patch:

real 2m28.406s
user 0m11.781s
sys 36m17.964s

After patch:

real 1m26.532s
user 0m12.185s
sys 20m3.903s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agosctp: Fix break indentation in sctp_ioctl().
David S. Miller [Mon, 4 Oct 2010 05:14:37 +0000]
sctp: Fix break indentation in sctp_ioctl().

Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agobe2net: add multiple RX queue support
Sathya Perla [Mon, 4 Oct 2010 05:12:27 +0000]
be2net: add multiple RX queue support

This patch adds multiple RX queue support to be2net. There are
upto 4 extra rx-queues per port into which TCP/UDP traffic can be hashed into.
Some of the ethtool stats are now displayed on a per queue basis.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville...
David S. Miller [Mon, 4 Oct 2010 05:09:32 +0000]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless-next-2.6

8 years agoqeth: tagging with VLAN-ID 0
Ursula Braun [Fri, 1 Oct 2010 02:51:13 +0000]
qeth: tagging with VLAN-ID 0

This patch adapts qeth to handle tagged frames with VLAN-ID 0 and
with or without priority information in the tag. It enables qeth to
receive priority-tagged frames on a base interface, for example from
z/OS, without configuring an additional VLAN interface.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agocxgb4: remove a bogus PCI function number check
Dimitris Michailidis [Thu, 30 Sep 2010 09:17:12 +0000]
cxgb4: remove a bogus PCI function number check

Remove a bogus PCI function number check from the driver's .remove
method that causes pci_release_regions not to be called for function 0
if additional functions are attached and one of them is used as primary.

Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agodrivers/atm/idt77252.c: Remove unnecessary error check
Julia Lawall [Sat, 2 Oct 2010 04:37:07 +0000]
drivers/atm/idt77252.c: Remove unnecessary error check

This code does not call deinit_card(card); in an error case, as done in
other error-handling code in the same function.  But actually, the called
function init_sram can only return 0, so there is no need for the error
check at all.

init_sram is also given a void return type, and its single return statement
at the end of the function is dropped.

A simplified version of the sematic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
@r@
statement S1,S2,S3;
constant C1,C2,C3;
@@

*if (...)
 {... S1 return -C1;}
...
*if (...)
 {... when != S1
    return -C2;}
...
*if (...)
 {... S1 return -C3;}
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agodrivers-net-tulip-de4x5c-fix-copy-length-in-de4x5_ioctl-checkpatch-fixes
Andrew Morton [Fri, 1 Oct 2010 11:17:12 +0000]
drivers-net-tulip-de4x5c-fix-copy-length-in-de4x5_ioctl-checkpatch-fixes

ERROR: trailing statements should be on next line
#23: FILE: drivers/net/tulip/de4x5.c:5477:
+ if (copy_to_user(ioc->data, tmp.lval, ioc->len)) return -EFAULT;

total: 1 errors, 0 warnings, 8 lines checked

./patches/drivers-net-tulip-de4x5c-fix-copy-length-in-de4x5_ioctl.patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Please run checkpatch prior to sending patches

Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agosctp: Fix out-of-bounds reading in sctp_asoc_get_hmac()
Dan Rosenberg [Fri, 1 Oct 2010 11:51:47 +0000]
sctp: Fix out-of-bounds reading in sctp_asoc_get_hmac()

The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids
array and attempts to ensure that only a supported hmac entry is
returned.  The current code fails to do this properly - if the last id
in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the
id integer remains set after exiting the loop, and the address of an
out-of-bounds entry will be returned and subsequently used in the parent
function, causing potentially ugly memory corruption.  This patch resets
the id integer to 0 on encountering an invalid id so that NULL will be
returned after finishing the loop if no valid ids are found.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agosctp: prevent reading out-of-bounds memory
Dan Rosenberg [Fri, 1 Oct 2010 11:16:58 +0000]
sctp: prevent reading out-of-bounds memory

Two user-controlled allocations in SCTP are subsequently dereferenced as
sockaddr structs, without checking if the dereferenced struct members fall
beyond the end of the allocated chunk.  There doesn't appear to be any
information leakage here based on how these members are used and
additional checking, but it's still worth fixing.

[akpm@linux-foundation.org: remove unfashionable newlines, fix gmail tab->space conversion]
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipv4: correct IGMP behavior on v3 query during v2-compatibility mode
David Stevens [Thu, 30 Sep 2010 14:29:40 +0000]
ipv4: correct IGMP behavior on v3 query during v2-compatibility mode

A recent patch to allow IGMPv2 responses to IGMPv3 queries
bypasses length checks for valid query lengths, incorrectly
resets the v2_seen timer, and does not support IGMPv1.

The following patch responds with a v2 report as required
by IGMPv2 while correcting the other problems introduced
by the patch.

Signed-Off-By: David L Stevens <dlstevens@us.ibm.com>

Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipmr: cleanups
Eric Dumazet [Fri, 1 Oct 2010 16:15:29 +0000]
ipmr: cleanups

Various code style cleanups

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipmr: RCU protection for mfc_cache_array
Eric Dumazet [Fri, 1 Oct 2010 16:15:08 +0000]
ipmr: RCU protection for mfc_cache_array

Use RCU & RTNL protection for mfc_cache_array[]

ipmr_cache_find() is called under rcu_read_lock();

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipmr: RCU conversion of mroute_sk
Eric Dumazet [Fri, 1 Oct 2010 16:15:01 +0000]
ipmr: RCU conversion of mroute_sk

Use RCU and RTNL to protect (struct mr_table)->mroute_sk

Readers use RCU, writers use RTNL.

ip_ra_control() already use an RCU grace period before
ip_ra_destroy_rcu(), so we dont need synchronize_rcu() in
mrtsock_destruct()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipmr: __pim_rcv() is called under rcu_read_lock
Eric Dumazet [Fri, 1 Oct 2010 16:14:55 +0000]
ipmr: __pim_rcv() is called under rcu_read_lock

No need to get a reference on reg_dev and release it, we are in a
rcu_read_lock() protected section.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agogre: protocol table can be static
stephen hemminger [Fri, 1 Oct 2010 13:58:00 +0000]
gre: protocol table can be static

This table is only used in gre.c

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agonetdev: Depend on INET before selecting INET_LRO
Ben Hutchings [Sun, 3 Oct 2010 15:42:05 +0000]
netdev: Depend on INET before selecting INET_LRO

Since 'select' ignores dependencies, drivers that select INET_LRO must
depend on INET.  This fixes the broken configuration reported in
<http://article.gmane.org/gmane.linux.kernel/825646>.

Reported-by: Subrata Modak <subrata@linux.vnet.ibm.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoRevert "ipv4: Make INET_LRO a bool instead of tristate."
Ben Hutchings [Sun, 3 Oct 2010 15:37:42 +0000]
Revert "ipv4: Make INET_LRO a bool instead of tristate."

This reverts commit e81963b180ac502fda0326edf059b1e29cdef1a2.

LRO is now deprecated in favour of GRO, and only a few drivers use it,
so it is desirable to build it as a module in distribution kernels.

The original change to prevent building it as a module was made in an
attempt to avoid the case where some dependents are set to y and some
to m, and INET_LRO can be set to m rather than y.  However, the
Kconfig system will reliably set INET_LRO=y in this case.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agonet: Fix the condition passed to sk_wait_event()
Nagendra Tomar [Sat, 2 Oct 2010 23:45:06 +0000]
net: Fix the condition passed to sk_wait_event()

This patch fixes the condition (3rd arg) passed to sk_wait_event() in
sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory()
causes the following soft lockup in tcp_sendmsg() when the global tcp
memory pool has exhausted.

>>> snip <<<

localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
localhost kernel: CPU 3:
localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200]  [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200
localhost kernel:
localhost kernel: Call Trace:
localhost kernel:  [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200
localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel:  [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
localhost kernel:  [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
localhost kernel:  [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
localhost kernel:  [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel:  [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
localhost kernel:  [vfs_write+0x185/0x190] vfs_write+0x185/0x190
localhost kernel:  [sys_write+0x50/0x90] sys_write+0x50/0x90
localhost kernel:  [system_call+0x7e/0x83] system_call+0x7e/0x83

>>> snip <<<

What is happening is, that the sk_wait_event() condition passed from
sk_stream_wait_memory() evaluates to true for the case of tcp global memory
exhaustion. This is because both sk_stream_memory_free() and vm_wait are true
which causes sk_wait_event() to *not* call schedule_timeout().
Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
This causes the caller to again try allocation, which again fails and again
calls sk_stream_wait_memory(), and so on.

[ Bug introduced by commit c1cbe4b7ad0bc4b1d98ea708a3fecb7362aa4088
  ("[NET]: Avoid atomic xchg() for non-error case") -DaveM ]

Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agonet: Fix IPv6 PMTU disc. w/ asymmetric routes
Maciej Żenczykowski [Sun, 3 Oct 2010 21:49:00 +0000]
net: Fix IPv6 PMTU disc. w/ asymmetric routes

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
John W. Linville [Fri, 1 Oct 2010 15:12:36 +0000]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next-2.6 into for-davem

8 years agoenic: Update MAINTAINERS
Vasanthy Kolluri [Thu, 30 Sep 2010 13:36:05 +0000]
enic: Update MAINTAINERS

Update MAINTAINERS list

Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoenic: Make local functions static
Vasanthy Kolluri [Thu, 30 Sep 2010 13:35:45 +0000]
enic: Make local functions static

Make functions used locally in a file as static

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoenic: Remove dead code
Vasanthy Kolluri [Thu, 30 Sep 2010 13:35:34 +0000]
enic: Remove dead code

Removed code that is unused

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoneigh: reorder fields in struct neighbour
Eric Dumazet [Thu, 30 Sep 2010 05:36:29 +0000]
neigh: reorder fields in struct neighbour

On 64bit arches, there are two 32bit holes that we can remove.

sizeof(struct neighbour) shrinks from 0xf8 to 0xf0 bytes

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoisdn/gigaset: improve bas_gigaset USB error reporting
Tilman Schmidt [Thu, 30 Sep 2010 13:35:52 +0000]
isdn/gigaset: improve bas_gigaset USB error reporting

Rephrase some USB error messages to make them clearer and more consistent.
Downgrade some warning messages that may occur during normal operation to
debug messages.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoisdn/gigaset: fix bas_gigaset interrupt read error handling
Tilman Schmidt [Thu, 30 Sep 2010 13:35:42 +0000]
isdn/gigaset: fix bas_gigaset interrupt read error handling

Rework the handling of USB errors in interrupt input reads
to clear halts correctly, delay URB resubmission after errors,
limit retries, and improve error recovery.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoisdn/gigaset: unclog bas_gigaset AT response pipe
Tilman Schmidt [Thu, 30 Sep 2010 13:35:31 +0000]
isdn/gigaset: unclog bas_gigaset AT response pipe

Recover from a lost HD_RECEIVEATDATA_ACK message by sending a
zero-length HD_READ_ATMESSAGE command when ev_layer sends "+++".

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoisdn/gigaset: try USB reset for bas_gigaset error recovery
Tilman Schmidt [Thu, 30 Sep 2010 13:35:21 +0000]
isdn/gigaset: try USB reset for bas_gigaset error recovery

In error_reset(), if sending HD_RESET_INTERRUPT_PIPE to the device
fails, try performing an USB reset.
Also correct an error in the leading comment.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoisdn/gigaset: bas_gigaset timer cleanup
Tilman Schmidt [Thu, 30 Sep 2010 13:35:11 +0000]
isdn/gigaset: bas_gigaset timer cleanup

Use setup_timer() and mod_timer() instead of direct assignment to
timer structure members, simplify the argument of one timer routine,
and make extra sure all timers are stopped during suspend.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoisdn/gigaset: drop obsolete debug option
Tilman Schmidt [Thu, 30 Sep 2010 13:35:01 +0000]
isdn/gigaset: drop obsolete debug option

Remove the debug flag DEBUG_DRIVER and associated code.
It doesn't serve any useful purpose anymore.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoisdn/gigaset: correct bas_gigaset rx buffer handling
Tilman Schmidt [Thu, 30 Sep 2010 13:34:51 +0000]
isdn/gigaset: correct bas_gigaset rx buffer handling

In transparent data reception, avoid a NULL pointer dereference
in case an skbuff cannot be allocated, remove an inappropriate
call to the HDLC flush routine, and correct the accounting of
received bytes for continued buffers.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoisdn/gigaset: fix bas_gigaset AT read error handling
Tilman Schmidt [Thu, 30 Sep 2010 13:34:40 +0000]
isdn/gigaset: fix bas_gigaset AT read error handling

Rework the handling of USB errors in AT response reads
to fix a possible infinite retry loop and a memory leak,
and silence a few overly verbose kernel messages.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoisdn/gigaset: bas_gigaset locking fix
Tilman Schmidt [Thu, 30 Sep 2010 13:34:30 +0000]
isdn/gigaset: bas_gigaset locking fix

Unlock cs->lock before calling error_hangup() which is marked
"cs->lock must not be held".

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agotg3: Update version to 3.114
Matt Carlson [Thu, 30 Sep 2010 10:34:37 +0000]
tg3: Update version to 3.114

This patch updates the tg3 version to 3.114.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agotg3: Add extend rx ring sizes for 5717 and 5719
Matt Carlson [Thu, 30 Sep 2010 10:34:36 +0000]
tg3: Add extend rx ring sizes for 5717 and 5719

This patch increases the rx ring sizes for those asic revs that support
them.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agotg3: Prepare for larger rx ring sizes
Matt Carlson [Thu, 30 Sep 2010 10:34:35 +0000]
tg3: Prepare for larger rx ring sizes

This patch adds two new variables to track the size of the standard and
jumbo rx producer ring sizes.  The code is then pivoted to these
variables from preprocessor constants.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agotg3: Futureproof the loopback test
Matt Carlson [Thu, 30 Sep 2010 10:34:34 +0000]
tg3: Futureproof the loopback test

There are other multiqueue modes 5717 and 5719 devices can assume.  This
patch makes sure that the loopback test is safe, should those other
modes be enabled in the future.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agotg3: Cleanup missing VPD partno section
Matt Carlson [Thu, 30 Sep 2010 10:34:33 +0000]
tg3: Cleanup missing VPD partno section

This patch cleans up the default VPD partno section.  New entries for
5717 asic rev devices were also added.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agotg3: Remove 5724 device ID
Matt Carlson [Thu, 30 Sep 2010 10:34:32 +0000]
tg3: Remove 5724 device ID

This product was never released to the public.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agotg3: 5719: Prevent tx data corruption
Matt Carlson [Thu, 30 Sep 2010 10:34:31 +0000]
tg3: 5719: Prevent tx data corruption

This patch enables a bit that prevents read DMA overflows and adjusts
the txmbuf margin from the hardware default.  The combination of these
modifications prevents a tx data corruption issue we were seeing on the
5719.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agotg3: Fix potential netpoll crash
Matt Carlson [Thu, 30 Sep 2010 10:34:30 +0000]
tg3: Fix potential netpoll crash

Up until now the tg3 driver would call netif_napi_add() for the maximum
number of NAPI instances the driver could use.  The problem is that
netpoll could call tg3_poll() on instances that are not active.  The net
effect is that the driver will crash attempting to dereference
uninitialized pointers.

The fix is to only allocate as many NAPI instances as the driver would
use in tg3_open() and deleted them in tg3_close().

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipv4: rcu conversion in ip_route_output_slow
Eric Dumazet [Thu, 30 Sep 2010 03:33:58 +0000]
ipv4: rcu conversion in ip_route_output_slow

ip_route_output_slow() is enclosed in an rcu_read_lock() protected
section, so that no references are taken/released on device, thanks to
__ip_dev_find() & dev_get_by_index_rcu()

Tested with ip route cache disabled, and a stress test :

Before patch:

elapsed time :

real 1m38.347s
user 0m11.909s
sys 23m51.501s

Profile:

13788.00 22.7% ip_route_output_slow [kernel]
 7875.00 13.0% dst_destroy          [kernel]
 3925.00  6.5% fib_semantic_match   [kernel]
 3144.00  5.2% fib_rules_lookup     [kernel]
 3061.00  5.0% dst_alloc            [kernel]
 2276.00  3.7% rt_set_nexthop       [kernel]
 1762.00  2.9% fib_table_lookup     [kernel]
 1538.00  2.5% _raw_read_lock       [kernel]
 1358.00  2.2% ip_output            [kernel]

After patch:

real 1m28.808s
user 0m13.245s
sys 20m37.293s

10950.00 17.2% ip_route_output_slow [kernel]
10726.00 16.9% dst_destroy          [kernel]
 5170.00  8.1% fib_semantic_match   [kernel]
 3937.00  6.2% dst_alloc            [kernel]
 3635.00  5.7% rt_set_nexthop       [kernel]
 2900.00  4.6% fib_rules_lookup     [kernel]
 2240.00  3.5% fib_table_lookup     [kernel]
 1427.00  2.2% _raw_read_lock       [kernel]
 1157.00  1.8% kmem_cache_alloc     [kernel]

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipv4: introduce __ip_dev_find()
Eric Dumazet [Thu, 30 Sep 2010 03:31:56 +0000]
ipv4: introduce __ip_dev_find()

ip_dev_find(net, addr) finds a device given an IPv4 source address and
takes a reference on it.

Introduce __ip_dev_find(), taking a third argument, to optionally take
the device reference. Callers not asking the reference to be taken
should be in an rcu_read_lock() protected section.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agovlan: dont drop packets from unknown vlans in promiscuous mode
Eric Dumazet [Thu, 30 Sep 2010 02:16:44 +0000]
vlan: dont drop packets from unknown vlans in promiscuous mode

Roger Luethi noticed packets for unknown VLANs getting silently dropped
even in promiscuous mode.

Check for promiscuous mode in __vlan_hwaccel_rx() and vlan_gro_common()
before drops.

As suggested by Patrick, mark such packets to have skb->pkt_type set to
PACKET_OTHERHOST to make sure they are dropped by IP stack.

Reported-by: Roger Luethi <rl@hellgate.ch>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoe1000e: 82579 performance improvements
Bruce Allan [Wed, 29 Sep 2010 21:39:37 +0000]
e1000e: 82579 performance improvements

The initial support for 82579 was tuned poorly for performance.  Adjust the
packet buffer allocation appropriately for both standard and jumbo frames;
and for jumbo frames increase the receive descriptor pre-fetch, disable
adaptive interrupt moderation and set the DMA latency tolerance.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoe1000e: use hardware writeback batching
Jesse Brandeburg [Wed, 29 Sep 2010 21:38:49 +0000]
e1000e: use hardware writeback batching

Most e1000e parts support batching writebacks.  The problem with this is
that when some of the TADV or TIDV timers are not set, Tx can sit forever.

This is solved in this patch with write flushes using the Flush Partial
Descriptors (FPD) bit in TIDV and RDTR.

This improves bus utilization and removes partial writes on e1000e,
particularly from 82571 parts in S5500 chipset based machines.

Only ES2LAN and 82571/2 parts are included in this optimization, to reduce
testing load.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoixgbe: fix link issues and panic with shared interrupts for 82598
Emil Tantilov [Wed, 29 Sep 2010 21:35:23 +0000]
ixgbe: fix link issues and panic with shared interrupts for 82598

Fix possible panic/hang with shared Legacy interrupts by not enabling
interrupts when interface is down.

Also fixes an intermittent link by enabling LSC upon exit from ixgbe_intr()

This patch adds flags to ixgbe_irq_enable() to allow for some flexibility
when enabling interrupts.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipv4: __mkroute_output() speedup
Eric Dumazet [Wed, 29 Sep 2010 11:53:50 +0000]
ipv4: __mkroute_output() speedup

While doing stress tests with a disabled IP route cache, I found
__mkroute_output() was touching three times in_device atomic refcount.

Use RCU to touch it once to reduce cache line ping pongs.

Before patch

time to perform the test
real 1m42.009s
user 0m12.545s
sys 25m0.726s

Profile :

16109.00 26.4% ip_route_output_slow   vmlinux
 7434.00 12.2% dst_destroy            vmlinux
 3280.00  5.4% fib_rules_lookup       vmlinux
 3252.00  5.3% fib_semantic_match     vmlinux
 2622.00  4.3% fib_table_lookup       vmlinux
 2535.00  4.1% dst_alloc              vmlinux
 1750.00  2.9% _raw_read_lock         vmlinux
 1532.00  2.5% rt_set_nexthop         vmlinux

After patch

real 1m36.503s
user 0m12.977s
sys 23m25.608s

14234.00 22.4% ip_route_output_slow   vmlinux
 8717.00 13.7% dst_destroy            vmlinux
 4052.00  6.4% fib_rules_lookup       vmlinux
 3951.00  6.2% fib_semantic_match     vmlinux
 3191.00  5.0% dst_alloc              vmlinux
 1764.00  2.8% fib_table_lookup       vmlinux
 1692.00  2.7% _raw_read_lock         vmlinux
 1605.00  2.5% rt_set_nexthop         vmlinux

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoPhonet: restore flow control credits when sending fails
Rémi Denis-Courmont [Wed, 29 Sep 2010 22:33:50 +0000]
Phonet: restore flow control credits when sending fails

This patch restores the below flow control patch submitted by Rémi
Denis-Courmont, which accidentaly got lost due to Pipe controller patch
on Phonet.

commit 1a98214feef2221cd7c24b17cd688a5a9d85b2ea
Author: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Date:   Mon Aug 30 12:57:03 2010 +0000

Phonet: restore flow control credits when sending fails

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agonet: pxa168_etc.c recognize additional contributors
Philip Rakity [Tue, 28 Sep 2010 04:26:30 +0000]
net: pxa168_etc.c recognize additional contributors

Signed-off-by: Philip Rakity <prakity@marvell.com>
Signed-off-by: Sachin Sanap <ssanap@marvell.com>
Signed-off-by: Mark Brown <markb@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
David S. Miller [Thu, 30 Sep 2010 19:02:22 +0000]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-2.6

8 years agoip_gre: comments change
Eric Dumazet [Thu, 30 Sep 2010 06:35:10 +0000]
ip_gre: comments change

HARD_TX_LOCK no longer protects tunnels from dead loops,
but xmit_recursion percpu counter.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agode2104x: remove experimental status
Ondrej Zary [Tue, 28 Sep 2010 08:46:17 +0000]
de2104x: remove experimental status

It should be ready after 8 years...remove the experimental dependency.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agode2104x: disable media debug messages by default
Ondrej Zary [Tue, 28 Sep 2010 08:18:55 +0000]
de2104x: disable media debug messages by default

Print media debug messages only when HW debug is enabled.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agomyri10ge: DCA update (resubmit)
Andrew Gallatin [Tue, 28 Sep 2010 08:13:12 +0000]
myri10ge: DCA update (resubmit)

This patch contains the following DCA improvements to myri10ge:

1) Finally move myri10ge to use dca3 API

2) Disable PCIe relaxed ordering when enabling DCA on
     myri10ge.  This provides a performance boost on Nehalem
     based Xeons

3) Make sure to properly initialize NIC's DCA state when it is enabled,
     rather than giving the NIC a bogus tag (0) and waiting for
     the first received packet to trigger an update.  Not using a
     real tag can cause hardware exceptions on some motherboards
     when a CPU socket is empty.

3) Always update the cached CPU when our interrupt affinity changes
     so as to avoid excessive calls to dca3_get_tag()

Signed-off-by: Andrew Gallatin <gallatin@myri.com>
Signed-off-by: Loic Prylli <loic@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agobnx2x: Moved enabling of MSI to the bnx2x_set_num_queues()
Dmitry Kravkov [Wed, 29 Sep 2010 01:05:37 +0000]
bnx2x: Moved enabling of MSI to the bnx2x_set_num_queues()

Moved enabling of MSI to the bnx2x_set_num_queues() - the same functions that
 handles the initialization of the MSI-X.

From: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agotcp: tcp_enter_quickack_mode can be static
stephen hemminger [Tue, 28 Sep 2010 19:30:14 +0000]
tcp: tcp_enter_quickack_mode can be static

Function only used in tcp_input.c

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoarp: remove unnecessary export of arp_broken_ops
stephen hemminger [Tue, 28 Sep 2010 17:08:02 +0000]
arp: remove unnecessary export of arp_broken_ops

arp_broken_ops is only used in arp.c

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoPhonet: Correct header retrieval after pskb_may_pull
Kumar Sanghvi [Mon, 27 Sep 2010 23:10:42 +0000]
Phonet: Correct header retrieval after pskb_may_pull

Retrieve the header after doing pskb_may_pull since, pskb_may_pull
could change the buffer structure.

This is based on the comment given by Eric Dumazet on Phonet
Pipe controller patch for a similar problem.

Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoum: Proper Fix for f25c80a4: remove duplicate structure field initialization
Boaz Harrosh [Wed, 29 Sep 2010 08:34:27 +0000]
um: Proper Fix for f25c80a4: remove duplicate structure field initialization

uml_net_set_mac() was broken and luckily it was never used, before.
What it was trying to do is spin_lock before memcopy the mac address.
Linus attempted to fix it in assumption that someone decided the
lock was needed. But since it was never ever used at all, and was
just dead code, I think we can assume that it is not needed, after
all.

On the other hand patch [f25c80a4] was trying to use eth_mac_addr()
in eth_configure(), *which was the real fallout*. Because of state
checks done inside eth_mac_addr() the address was never set. I have
not reintroduced the memcpy wrapper, but I've put a comment for future
cats.

The code now is back to exactly as it was before [f25c80a4]. With
the cleanup applied. If the spin_lock is indeed needed then a contender
should supply a test case that fails, then fix it with the proper
locking, as a separate unrelated patch.

CC: Julia Lawall <julia@diku.dk>
CC: David S. Miller <davem@davemloft.net>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Al Viro <viro@ZenIV.linux.org.uk>
Tested-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agonet: rename netdev rx_queue to ingress_queue
Eric Dumazet [Tue, 28 Sep 2010 05:58:37 +0000]
net: rename netdev rx_queue to ingress_queue

There is some confusion with rx_queue name after RPS, and net drivers
private rx_queue fields.

I suggest to rename "struct net_device"->rx_queue to ingress_queue.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoip6tnl: percpu stats accounting
Eric Dumazet [Tue, 28 Sep 2010 03:23:34 +0000]
ip6tnl: percpu stats accounting

Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.

Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.

This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipip: enable lockless xmits
Eric Dumazet [Tue, 28 Sep 2010 00:17:17 +0000]
ipip: enable lockless xmits

IPIP tunnels can benefit from lockless xmits, using NETIF_F_LLTX

Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one ipip tunnel (size:200 bytes per frame)

Before patch :
real 2m53.321s
user 0m10.277s
sys 46m0.597s

After patch:
real 0m32.063s
user 0m9.237s
sys 8m16.255s

Last problem to solve is the contention on dst :

16118.00 28.3% __ip_route_output_key         vmlinux
 6135.00 10.8% dst_release                   vmlinux
 3220.00  5.6% ip_finish_output              vmlinux
 2149.00  3.8% ip_route_output_flow          vmlinux
 1575.00  2.8% ip_append_data                vmlinux
 1481.00  2.6% ip_push_pending_frames        vmlinux
 1349.00  2.4% __xfrm_lookup                 vmlinux
 1216.00  2.1% csum_partial_copy_generic     vmlinux
 1208.00  2.1% udp_sendmsg                   vmlinux

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoip_gre: lockless xmit
Eric Dumazet [Mon, 27 Sep 2010 23:05:47 +0000]
ip_gre: lockless xmit

GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX

Note: If tunnels are created with the "oseq" option, LLTX is not
enabled :

Even using an atomic_t o_seq, we would increase chance for packets being
out of order at receiver.

Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one gre tunnel (size:200 bytes per frame)

Before patch :
real 3m0.094s
user 0m9.365s
sys 47m50.103s

After patch:
real 0m29.756s
user 0m11.097s
sys 7m33.012s

Last problem to solve is the contention on dst :

38660.00 21.4% __ip_route_output_key          vmlinux
20786.00 11.5% dst_release                    vmlinux
14191.00  7.8% __xfrm_lookup                  vmlinux
12410.00  6.9% ip_finish_output               vmlinux
 4540.00  2.5% ip_push_pending_frames         vmlinux
 4427.00  2.4% ip_append_data                 vmlinux
 4265.00  2.4% __alloc_skb                    vmlinux
 4140.00  2.3% __ip_local_out                 vmlinux
 3991.00  2.2% dev_queue_xmit                 vmlinux

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agosit: enable lockless xmits
Eric Dumazet [Tue, 28 Sep 2010 02:53:41 +0000]
sit: enable lockless xmits

SIT tunnels can benefit from lockless xmits, using NETIF_F_LLTX

Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one sit tunnel (size:220 bytes per frame)

Before patch :

real 3m15.399s
user 0m9.185s
sys 51m55.403s

75029.00 87.5% _raw_spin_lock            vmlinux
 1090.00  1.3% dst_release               vmlinux
  902.00  1.1% dev_queue_xmit            vmlinux
  627.00  0.7% sock_wfree                vmlinux
  613.00  0.7% ip6_push_pending_frames   ipv6.ko
  505.00  0.6% __ip_route_output_key     vmlinux

After patch:

real 1m1.387s
user 0m12.489s
sys 15m58.868s

28239.00 23.3% dst_release               vmlinux
13570.00 11.2% ip6_push_pending_frames   ipv6.ko
13118.00 10.8% ip6_append_data           ipv6.ko
 7995.00  6.6% __ip_route_output_key     vmlinux
 7924.00  6.5% sk_dst_check              vmlinux
 5015.00  4.1% udpv6_sendmsg             ipv6.ko
 3594.00  3.0% sock_alloc_send_pskb      vmlinux
 3135.00  2.6% sock_wfree                vmlinux
 3055.00  2.5% ip6_sk_dst_lookup         ipv6.ko
 2473.00  2.0% ip_finish_output          vmlinux

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agosit: fix percpu stats accounting
Eric Dumazet [Tue, 28 Sep 2010 02:17:58 +0000]
sit: fix percpu stats accounting

commit 15fc1f7056ebd (sit: percpu stats accounting) forgot the fallback
tunnel case (sit0), and can crash pretty fast.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipip: fix percpu stats accounting
Eric Dumazet [Mon, 27 Sep 2010 23:56:46 +0000]
ipip: fix percpu stats accounting

commit 3c97af99a5aa1 (ipip: percpu stats accounting) forgot the fallback
tunnel case (tunl0), and can crash pretty fast.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agodummy: percpu stats and lockless xmit
Eric Dumazet [Mon, 27 Sep 2010 20:50:33 +0000]
dummy: percpu stats and lockless xmit

Converts dummy network device driver to :

- percpu stats

- 64bit stats

- lockless xmit (NETIF_F_LLTX)

- performance features added (NETIF_F_SG | NETIF_F_FRAGLIST |
NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agonet: add a recursion limit in xmit path
Eric Dumazet [Wed, 29 Sep 2010 20:23:09 +0000]
net: add a recursion limit in xmit path

As tunnel devices are going to be lockless, we need to make sure a
misconfigured machine wont enter an infinite loop.

Add a percpu variable, and limit to three the number of stacked xmits.

Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipv6: Implement Any-IP support for IPv6.
Maciej Żenczykowski [Mon, 27 Sep 2010 00:07:02 +0000]
ipv6: Implement Any-IP support for IPv6.

AnyIP is the capability to receive packets and establish incoming
connections on IPs we have not explicitly configured on the machine.

An example use case is to configure a machine to accept all incoming
traffic on eth0, and leave the policy of whether traffic for a given IP
should be delivered to the machine up to the load balancer.

Can be setup as follows:
  ip -6 rule from all iif eth0 lookup 200
  ip -6 route add local default dev lo table 200
(in this case for all IPv6 addresses)

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoipv4: Allow configuring subnets as local addresses
Tom Herbert [Sun, 23 May 2010 19:54:12 +0000]
ipv4: Allow configuring subnets as local addresses

This patch allows a host to be configured to respond to any address in
a specified range as if it were local, without actually needing to
configure the address on an interface.  This is done through routing
table configuration.  For instance, to configure a host to respond
to any address in 10.1/16 received on eth0 as a local address we can do:

ip rule add from all iif eth0 lookup 200
ip route add local 10.1/16 dev lo proto kernel scope host src 127.0.0.1 table 200

This host is now reachable by any 10.1/16 address (route lookup on
input for packets received on eth0 can find the route).  On output, the
rule will not be matched so that this host can still send packets to
10.1/16 (not sent on loopback).  Presumably, external routing can be
configured to make sense out of this.

To make this work, we needed to modify the logic in finding the
interface which is assigned a given source address for output
(dev_ip_find).  We perform a normal fib_lookup instead of just a
lookup on the local table, and in the lookup we ignore the input
interface for matching.

This patch is useful to implement IP-anycast for subnets of virtual
addresses.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoip_gre: Fix dependencies wrt. ipv6.
David S. Miller [Wed, 29 Sep 2010 05:37:56 +0000]
ip_gre: Fix dependencies wrt. ipv6.

The GRE tunnel driver needs to invoke icmpv6 helpers in the
ipv6 stack when ipv6 support is enabled.

Therefore if IPV6 is enabled, we have to enforce that GRE's
enabling (modular or static) matches that of ipv6.

Reported-by: Patrick McHardy <kaber@trash.net>
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agonet-2.6: SYN retransmits: Add new parameter to retransmits_timed_out()
Damian Lukowski [Tue, 28 Sep 2010 20:08:32 +0000]
net-2.6: SYN retransmits: Add new parameter to retransmits_timed_out()

Fixes kernel Bugzilla Bug 18952

This patch adds a syn_set parameter to the retransmits_timed_out()
routine and updates its callers. If not set, TCP_RTO_MIN is taken
as the calculation basis as before. If set, TCP_TIMEOUT_INIT is
used instead, so that sysctl_syn_retries represents the actual
amount of SYN retransmissions in case no SYNACKs are received when
establishing a new connection.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

8 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
John W. Linville [Tue, 28 Sep 2010 20:01:47 +0000]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-2.6

Conflicts:
drivers/net/wireless/iwlwifi/iwl-agn-lib.c
drivers/net/wireless/iwlwifi/iwl3945-base.c

8 years agomac80211: Fix WMM driver queue configuration
Juuso Oikarinen [Tue, 28 Sep 2010 11:39:32 +0000]
mac80211: Fix WMM driver queue configuration

The WMM parameter configuration function (ieee80211_sta_wmm_params) only
configures the WMM parameters to the driver is the wmm_last_param_set
counter value is changed by the AP.

The wmm_last_param_set is initialized to -1 on association in order to ensure
the configuration is made to the driver at least once on association, but
currently this initialization is done *after* the WMM parameter configuration
function was called.

This leads to unreliability in the driver getting properly configured on first
association (depending on what counter value the AP happens to use.) When
disassociating (the wmm default parameters are configured to the driver) and
then reassociating, due to the above the WMM configuration is not set to the
driver at all.

On drivers without beacon filtering the problem is corrected by later beacons,
but on drivers with beacon filtering the WMM will remain permanently incorrectly
configured.

Fix this by moving the initialization of wmm_last_param_set to -1 before
ieee80211_sta_wmm_params is called on association.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>