9 years agoaf_unix: Allow SO_PEERCRED to work across namespaces.
Eric W. Biederman [Sun, 13 Jun 2010 03:30:14 +0000]
af_unix: Allow SO_PEERCRED to work across namespaces.

Use struct pid and struct cred to store the peer credentials on struct
sock.  This gives enough information to convert the peer credential
information to a value relative to whatever namespace the socket is in
at the time.

This removes nasty surprises when using SO_PEERCRED on socket
connetions where the processes on either side are in different pid and
user namespaces.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agosock: Introduce cred_to_ucred
Eric W. Biederman [Sun, 13 Jun 2010 03:28:59 +0000]
sock: Introduce cred_to_ucred

To keep the coming code clear and to allow both the sock
code and the scm code to share the logic introduce a
fuction to translate from struct cred to struct ucred.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agouser_ns: Introduce user_nsmap_uid and user_ns_map_gid.
Eric W. Biederman [Sun, 13 Jun 2010 03:28:03 +0000]
user_ns: Introduce user_nsmap_uid and user_ns_map_gid.

Define what happens when a we view a uid from one user_namespace
in another user_namepece.

- If the user namespaces are the same no mapping is necessary.

- For most cases of difference use overflowuid and overflowgid,
  the uid and gid currently used for 16bit apis when we have a 32bit uid
  that does fit in 16bits.  Effectively the situation is the same,
  we want to return a uid or gid that is not assigned to any user.

- For the case when we happen to be mapping the uid or gid of the
  creator of the target user namespace use uid 0 and gid as confusing
  that user with root is not a problem.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoscm: Reorder scm_cookie.
Eric W. Biederman [Sun, 13 Jun 2010 03:27:04 +0000]
scm: Reorder scm_cookie.

Reorder the fields in scm_cookie so they pack better on 64bit.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoqlcnic: Bumped up version number
Anirban Chakraborty [Wed, 16 Jun 2010 09:07:36 +0000]
qlcnic: Bumped up version number

Changed the driver version number to 5.0.4

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoqlcnic: Fix a bug in setting up NIC partitioning mode
Anirban Chakraborty [Wed, 16 Jun 2010 09:07:27 +0000]
qlcnic: Fix a bug in setting up NIC partitioning mode

The driver was not detecting the presence of NIC partitioning capability of the
firmware properly. Now, it checks the eswitch set bit in the FW capabilities
register and accordingly sets the driver mode as NPAR capable or not.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agosyncookies: check decoded options against sysctl settings
Florian Westphal [Wed, 16 Jun 2010 21:42:15 +0000]
syncookies: check decoded options against sysctl settings

Discard the ACK if we find options that do not match current sysctl
settings.

Previously it was possible to create a connection with sack, wscale,
etc. enabled even if the feature was disabled via sysctl.

Also remove an unneeded call to tcp_sack_reset() in
cookie_check_timestamp: Both call sites (cookie_v4_check,
cookie_v6_check) zero "struct tcp_options_received", hand it to
tcp_parse_options() (which does not change tcp_opt->num_sacks/dsack)
and then call cookie_check_timestamp().

Even if num_sacks/dsacks were changed, the structure is allocated on
the stack and after cookie_check_timestamp returns only a few selected
members are copied to the inet_request_sock.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoinetpeer: restore small inet_peer structures
Eric Dumazet [Wed, 16 Jun 2010 04:52:13 +0000]
inetpeer: restore small inet_peer structures

Addition of rcu_head to struct inet_peer added 16bytes on 64bit arches.

Thats a bit unfortunate, since old size was exactly 64 bytes.

This can be solved, using an union between this rcu_head an four fields,
that are normally used only when a refcount is taken on inet_peer.
rcu_head is used only when refcnt=-1, right before structure freeing.

Add a inet_peer_refcheck() function to check this assertion for a while.

We can bring back SLAB_HWCACHE_ALIGN qualifier in kmem cache creation.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agogadget/rndis: dev_get_stats() now returns rtnl_link_stats64.
David S. Miller [Wed, 16 Jun 2010 04:50:14 +0000]
gadget/rndis: dev_get_stats() now returns rtnl_link_stats64.

Based upon a report by Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoinetpeer: do not use zero refcnt for freed entries
Eric Dumazet [Wed, 16 Jun 2010 04:47:39 +0000]
inetpeer: do not use zero refcnt for freed entries

Followup of commit aa1039e73cc2 (inetpeer: RCU conversion)

Unused inet_peer entries have a null refcnt.

Using atomic_inc_not_zero() in rcu lookups is not going to work for
them, and slow path is taken.

Fix this using -1 marker instead of 0 for deleted entries.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonetpoll: Use correct primitives for RCU dereferencing
Herbert Xu [Wed, 16 Jun 2010 04:44:29 +0000]
netpoll: Use correct primitives for RCU dereferencing

Now that RCU debugging checks for matching rcu_dereference calls
and rcu_read_lock, we need to use the correct primitives or face
nasty warnings.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agobridge: Add const to dummy br_netpoll_send_skb
Herbert Xu [Wed, 16 Jun 2010 04:43:48 +0000]
bridge: Add const to dummy br_netpoll_send_skb

The version of br_netpoll_send_skb used when netpoll is off is
missing a const thus causing a warning.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonet: NET_SKB_PAD should depend on L1_CACHE_BYTES
Eric Dumazet [Wed, 16 Jun 2010 01:16:43 +0000]
net: NET_SKB_PAD should depend on L1_CACHE_BYTES

In old kernels, NET_SKB_PAD was defined to 16.

Then commit d6301d3dd1c2 (net: Increase default NET_SKB_PAD to 32), and
commit 18e8c134f4e9 (net: Increase NET_SKB_PAD to 64 bytes) increased it
to 64.

While first patch was governed by network stack needs, second was more
driven by performance issues on current hardware. Real intent was to
align data on a cache line boundary.

So use max(32, L1_CACHE_BYTES) instead of 64, to be more generic.

Remove microblaze and powerpc own NET_SKB_PAD definitions.

Thanks to Alexander Duyck and David Miller for their comments.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoipfrag : frag_kfree_skb() cleanup
Eric Dumazet [Sun, 13 Jun 2010 23:22:43 +0000]
ipfrag : frag_kfree_skb() cleanup

Third param (work) is unused, remove it.

Remove __inline__ and inline qualifiers.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoip_frag: Remove some atomic ops
Eric Dumazet [Sun, 13 Jun 2010 23:02:24 +0000]
ip_frag: Remove some atomic ops

Instead of doing one atomic operation per frag, we can factorize them.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoipv6: syncookies: do not skip ->iif initialization
Florian Westphal [Sun, 13 Jun 2010 11:29:39 +0000]
ipv6: syncookies: do not skip ->iif initialization

When syncookies are in effect, req->iif is left uninitialized.
In case of e.g. link-local addresses the route lookup then fails
and no syn-ack is sent.

Rearrange things so ->iif is also initialized in the syncookie case.

want_cookie can only be true when the isn was zero, thus move the want_cookie
check into the "!isn" branch.

Cc: Glenn Griffin <ggriffin.kernel@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonet: Fix error in comment on net_device_ops::ndo_get_stats
Ben Hutchings [Tue, 15 Jun 2010 22:08:48 +0000]
net: Fix error in comment on net_device_ops::ndo_get_stats

ndo_get_stats still returns struct net_device_stats *; there is
no struct net_device_stats64.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonetdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer
Sonic Zhang [Fri, 11 Jun 2010 09:44:31 +0000]
netdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer

SKBs hold onto resources that can't be held indefinitely, such as TCP
socket references and netfilter conntrack state.  So if a packet is left
in TX ring for a long time, there might be a TCP socket that cannot be
closed and freed up.

Current blackfin EMAC driver always reclaim and free used tx skbs in future
transfers. The problem is that future transfer may not come as soon as
possible. This patch start a timer after transfer to reclaim and free skb.
There is nearly no performance drop with this patch.

TX interrupt is not enabled because of a strange behavior of the Blackfin EMAC.
If EMAC TX transfer control is turned on, endless TX interrupts are triggered
no matter if TX DMA is enabled or not. Since DMA walks down the ring automatically,
TX transfer control can't be turned off in the middle. The only way is to disable
TX interrupt completely.

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoinetpeer: RCU conversion
Eric Dumazet [Tue, 15 Jun 2010 08:23:14 +0000]
inetpeer: RCU conversion

inetpeer currently uses an AVL tree protected by an rwlock.

It's possible to make most lookups use RCU

1) Add a struct rcu_head to struct inet_peer

2) add a lookup_rcu_bh() helper to perform lockless and opportunistic
lookup. This is a normal function, not a macro like lookup().

3) Add a limit to number of links followed by lookup_rcu_bh(). This is
needed in case we fall in a loop.

4) add an smp_wmb() in link_to_pool() right before node insert.

5) make unlink_from_pool() use atomic_cmpxchg() to make sure it can take
last reference to an inet_peer, since lockless readers could increase
refcount, even while we hold peers.lock.

6) Delay struct inet_peer freeing after rcu grace period so that
lookup_rcu_bh() cannot crash.

7) inet_getpeer() first attempts lockless lookup.
   Note this lookup can fail even if target is in AVL tree, but a
concurrent writer can let tree in a non correct form.
   If this attemps fails, lock is taken a regular lookup is performed
again.

8) convert peers.lock from rwlock to a spinlock

9) Remove SLAB_HWCACHE_ALIGN when peer_cachep is created, because
rcu_head adds 16 bytes on 64bit arches, doubling effective size (64 ->
128 bytes)
In a future patch, this is probably possible to revert this part, if rcu
field is put in an union to share space with rid, ip_id_count, tcp_ts &
tcp_ts_stamp. These fields being manipulated only with refcnt > 0.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agocnic: Fix cnic_cm_abort() error handling.
Michael Chan [Tue, 15 Jun 2010 08:57:03 +0000]
cnic: Fix cnic_cm_abort() error handling.

Fix the code that handles the error case when cnic_cm_abort() cannot
proceed normally.  We cannot just set the csk->state and we must
go through cnic_ready_to_close() to handle all the conditions.  We
also add error return code in cnic_cm_abort().

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Eddie Wai <waie@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agocnic: Refactor and fix cnic_ready_to_close().
Michael Chan [Tue, 15 Jun 2010 08:57:02 +0000]
cnic: Refactor and fix cnic_ready_to_close().

Combine RESET_RECEIVED and RESET_COMP logic and fix race condition
between these 2 events and cnic_cm_close().  In particular, we need
to (test_and_clear_bit(SK_F_OFFLD_COMPLETE, &csk->flags)) before we
update csk->state.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Eddie Wai <waie@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agocnic: Refactor code in cnic_cm_process_kcqe().
Michael Chan [Tue, 15 Jun 2010 08:57:01 +0000]
cnic: Refactor code in cnic_cm_process_kcqe().

Move chip-specific code to the respective chip's ->close_conn() functions
for better code organization.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Eddie Wai <waie@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agocnic: Return error code in cnic_cm_close() if unsuccessful.
Michael Chan [Tue, 15 Jun 2010 08:57:00 +0000]
cnic: Return error code in cnic_cm_close() if unsuccessful.

So that bnx2i can handle the error condition immediately and not have to
wait for timeout.

Signed-off-by: Michael Chan <mchan@broadcom.com.
Signed-off-by: Eddie Wai <waie@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoixgbe: update set_rx_mode to fix issues w/ macvlan
Alexander Duyck [Tue, 15 Jun 2010 09:25:48 +0000]
ixgbe: update set_rx_mode to fix issues w/ macvlan

This change corrects issues where macvlan was not correctly triggering
promiscuous mode on ixgbe due to the filters not being correctly set.  It
also corrects the fact that VF rar filters were being overwritten when the
PF was reset.

CC: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
David S. Miller [Tue, 15 Jun 2010 20:49:24 +0000]
Merge branch 'master' of git://git./linux/kernel/git/kaber/nf-next-2.6

9 years agotcp: unify tcp flag macros
Changli Gao [Sat, 12 Jun 2010 14:01:43 +0000]
tcp: unify tcp flag macros

unify tcp flag macros: TCPHDR_FIN, TCPHDR_SYN, TCPHDR_RST, TCPHDR_PSH,
TCPHDR_ACK, TCPHDR_URG, TCPHDR_ECE and TCPHDR_CWR. TCBCB_FLAG_* are replaced
with the corresponding TCPHDR_*.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/net/tcp.h                      |   24 ++++++-------
 net/ipv4/tcp.c                         |    8 ++--
 net/ipv4/tcp_input.c                   |    2 -
 net/ipv4/tcp_output.c                  |   59 ++++++++++++++++-----------------
 net/netfilter/nf_conntrack_proto_tcp.c |   32 ++++++-----------
 net/netfilter/xt_TCPMSS.c              |    4 --
 6 files changed, 58 insertions(+), 71 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agobridge: use rx_handler_data pointer to store net_bridge_port pointer
Jiri Pirko [Tue, 15 Jun 2010 06:50:45 +0000]
bridge: use rx_handler_data pointer to store net_bridge_port pointer

Register net_bridge_port pointer as rx_handler data pointer. As br_port is
removed from struct net_device, another netdev priv_flag is added to indicate
the device serves as a bridge port. Also rcuized pointers are now correctly
dereferenced in br_fdb.c and in netfilter parts.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agomacvlan: use rx_handler_data pointer to store macvlan_port pointer V2
Jiri Pirko [Tue, 15 Jun 2010 03:27:57 +0000]
macvlan: use rx_handler_data pointer to store macvlan_port pointer V2

Register macvlan_port pointer as rx_handler data pointer. As macvlan_port is
removed from struct net_device, another netdev priv_flag is added to indicate
the device serves as a macvlan port.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonet: add rx_handler data pointer
Jiri Pirko [Thu, 10 Jun 2010 03:34:59 +0000]
net: add rx_handler data pointer

Add possibility to register rx_handler data pointer along with a rx_handler.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agobridge: Fix netpoll support
Herbert Xu [Thu, 10 Jun 2010 16:12:50 +0000]
bridge: Fix netpoll support

There are multiple problems with the newly added netpoll support:

1) Use-after-free on each netpoll packet.
2) Invoking unsafe code on netpoll/IRQ path.
3) Breaks when netpoll is enabled on the underlying device.

This patch fixes all of these problems.  In particular, we now
allocate proper netpoll structures for each underlying device.

We only allow netpoll to be enabled on the bridge when all the
devices underneath it support netpoll.  Once it is enabled, we
do not allow non-netpoll devices to join the bridge (until netpoll
is disabled again).

This allows us to do away with the npinfo juggling that caused
problem number 1.

Incidentally this patch fixes number 2 by bypassing unsafe code
such as multicast snooping and netfilter.

Reported-by: Qianfeng Zhang <frzhang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonetpoll: Add netpoll_tx_running
Herbert Xu [Thu, 10 Jun 2010 16:12:49 +0000]
netpoll: Add netpoll_tx_running

This patch adds the helper netpoll_tx_running for use within
ndo_start_xmit.  It returns non-zero if ndo_start_xmit is being
invoked by netpoll, and zero otherwise.

This is currently implemented by simply looking at the hardirq
count.  This is because for all non-netpoll uses of ndo_start_xmit,
IRQs must be enabled while netpoll always disables IRQs before
calling ndo_start_xmit.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonetpoll: Allow netpoll_setup/cleanup recursion
Herbert Xu [Thu, 10 Jun 2010 16:12:48 +0000]
netpoll: Allow netpoll_setup/cleanup recursion

This patch adds the functions __netpoll_setup/__netpoll_cleanup
which is designed to be called recursively through ndo_netpoll_seutp.

They must be called with RTNL held, and the caller must initialise
np->dev and ensure that it has a valid reference count.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonetpoll: Add ndo_netpoll_setup
Herbert Xu [Thu, 10 Jun 2010 16:12:47 +0000]
netpoll: Add ndo_netpoll_setup

This patch adds ndo_netpoll_setup as the initialisation primitive
to complement ndo_netpoll_cleanup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonetpoll: Add locking for netpoll_setup/cleanup
Herbert Xu [Thu, 10 Jun 2010 16:12:46 +0000]
netpoll: Add locking for netpoll_setup/cleanup

As it stands, netpoll_setup and netpoll_cleanup have no locking
protection whatsoever.  So chaos ensures if two entities try to
perform them on the same device.

This patch adds RTNL to the equation.  The code has been rearranged so
that bits that do not need RTNL protection are now moved to the top of
netpoll_setup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonetpoll: Fix RCU usage
Herbert Xu [Thu, 10 Jun 2010 16:12:44 +0000]
netpoll: Fix RCU usage

The use of RCU in netpoll is incorrect in a number of places:

1) The initial setting is lacking a write barrier.
2) The synchronize_rcu is in the wrong place.
3) Read barriers are missing.
4) Some places are even missing rcu_read_lock.
5) npinfo is zeroed after freeing.

This patch fixes those issues.  As most users are in BH context,
this also converts the RCU usage to the BH variant.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agobridge: Remove redundant npinfo NULL setting
Herbert Xu [Thu, 10 Jun 2010 16:12:43 +0000]
bridge: Remove redundant npinfo NULL setting

Now that netpoll always zaps npinfo we no longer need to do it
in bridge.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonetpoll: Set npinfo to NULL even with ndo_netpoll_cleanup
Herbert Xu [Thu, 10 Jun 2010 16:12:42 +0000]
netpoll: Set npinfo to NULL even with ndo_netpoll_cleanup

Since we have to NULL npinfo regardless of whether there is a
ndo_netpoll_cleanup, it makes sense to do this unconditionally
in netpoll_cleanup rather than having every driver do it by
themselves.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoMerge branch 'master' of /repos/git/net-next-2.6
Patrick McHardy [Tue, 15 Jun 2010 15:31:06 +0000]
Merge branch 'master' of /repos/git/net-next-2.6

Conflicts:
include/net/netfilter/xt_rateest.h
net/bridge/br_netfilter.c
net/netfilter/nf_conntrack_core.c

Signed-off-by: Patrick McHardy <kaber@trash.net>

9 years agonetfilter: xtables: idletimer target implementation
Luciano Coelho [Tue, 15 Jun 2010 13:04:00 +0000]
netfilter: xtables: idletimer target implementation

This patch implements an idletimer Xtables target that can be used to
identify when interfaces have been idle for a certain period of time.

Timers are identified by labels and are created when a rule is set with a new
label.  The rules also take a timeout value (in seconds) as an option.  If
more than one rule uses the same timer label, the timer will be restarted
whenever any of the rules get a hit.

One entry for each timer is created in sysfs.  This attribute contains the
timer remaining for the timer to expire.  The attributes are located under
the xt_idletimer class:

/sys/class/xt_idletimer/timers/<label>

When the timer expires, the target module sends a sysfs notification to the
userspace, which can then decide what to do (eg. disconnect to save power).

Cc: Timo Teras <timo.teras@iki.fi>
Signed-off-by: Luciano Coelho <luciano.coelho@nokia.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

9 years agonetfilter: CLUSTERIP: RCU conversion
Eric Dumazet [Tue, 15 Jun 2010 11:08:51 +0000]
netfilter: CLUSTERIP: RCU conversion

- clusterip_lock becomes a spinlock
- lockless lookups
- kfree() deferred after RCU grace period
- rcu_barrier_bh() inserted in clusterip_tg_exit()

v2)
- As Patrick pointed out, we use atomic_inc_not_zero() in
clusterip_config_find_get().
- list_add_rcu() and list_del_rcu() variants are used.
- atomic_dec_and_lock() used in clusterip_config_entry_put()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

9 years agobnx2x: Fix link problem with some DACs
Yaniv Rosner [Tue, 15 Jun 2010 06:25:19 +0000]
bnx2x: Fix link problem with some DACs

Change 2wire transfer rate of SFP+ module EEPROM from 400Khz to 100Khz
since some DACs(direct attached cables) do not work at 400Khz.

Reported-by: Krzysztof Oldzki <ole@ans.pl>
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoinetpeer: various changes
Eric Dumazet [Mon, 14 Jun 2010 19:35:21 +0000]
inetpeer: various changes

Try to reduce cache line contentions in peer management, to reduce IP
defragmentation overhead.

- peer_fake_node is marked 'const' to make sure its not modified.
  (tested with CONFIG_DEBUG_RODATA=y)

- Group variables in two structures to reduce number of dirtied cache
lines. One named "peers" for avl tree root, its number of entries, and
associated lock. (candidate for RCU conversion)

- A second one named "unused_peers" for unused list and its lock

- Add a !list_empty() test in unlink_from_unused() to avoid taking lock
when entry is not unused.

- Use atomic_dec_and_lock() in inet_putpeer() to avoid taking lock in
some cases.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoloopback: Implement 64bit stats on 32bit arches
Eric Dumazet [Mon, 14 Jun 2010 05:59:22 +0000]
loopback: Implement 64bit stats on 32bit arches

Uses a seqcount_t to synchronize stat producer and consumer, for packets
and bytes counter, now u64 types.

(dropped counter being rarely used, stay a native "unsigned long" type)

No noticeable performance impact on x86, as it only adds two increments
per frame. It might be more expensive on arches where smp_wmb() is not
free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoipv6: RCU changes in ipv6_get_mtu() and ip6_dst_hoplimit()
Eric Dumazet [Mon, 14 Jun 2010 04:46:20 +0000]
ipv6: RCU changes in ipv6_get_mtu() and ip6_dst_hoplimit()

Use RCU to avoid atomic ops on idev refcnt in ipv6_get_mtu()
and ip6_dst_hoplimit()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoipv6: avoid two atomics in ipv6_rthdr_rcv()
Eric Dumazet [Mon, 14 Jun 2010 04:39:27 +0000]
ipv6: avoid two atomics in ipv6_rthdr_rcv()

Use __in6_dev_get() instead of in6_dev_get()/in6_dev_put()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoethtool: Revert incorrect indentation changes
Ben Hutchings [Mon, 14 Jun 2010 08:53:26 +0000]
ethtool: Revert incorrect indentation changes

commit 97f8aefbbfb5aa5c9944e5fa8149f1fdaf71c7b6 "net: fix ethtool
coding style errors and warnings" changed the indentation of several
macro definitions in ethtool.h.  These definitions line up in the diff
where there is an extra character at the start of each line, but not
in the resulting file.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
David S. Miller [Tue, 15 Jun 2010 05:59:34 +0000]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6

Conflicts:
drivers/net/ixgbe/ixgbe_ethtool.c

With merge conflict help from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonetfilter: defrag: kill unused work parameter of frag_kfree_skb()
Shan Wei [Mon, 14 Jun 2010 14:30:47 +0000]
netfilter: defrag: kill unused work parameter of frag_kfree_skb()

The parameter (work) is unused, remove it.
Reported from Eric Dumazet.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

9 years agonetfilter: defrag: remove one redundant atomic ops
Shan Wei [Mon, 14 Jun 2010 14:28:23 +0000]
netfilter: defrag: remove one redundant atomic ops

Instead of doing one atomic operation per frag, we can factorize them.
Reported from Eric Dumazet.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

9 years agonetfilter: kill redundant check code in which setting ip_summed value
Shan Wei [Mon, 14 Jun 2010 14:20:02 +0000]
netfilter: kill redundant check code in which setting ip_summed value

If the returned csum value is 0, We has set ip_summed with
CHECKSUM_UNNECESSARY flag in __skb_checksum_complete_head().

So this patch kills the check and changes to return to upper
caller directly.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

9 years agonetfilter: nfnetlink_log: RCU conversion, part 2
Eric Dumazet [Mon, 14 Jun 2010 14:15:23 +0000]
netfilter: nfnetlink_log: RCU conversion, part 2

- must use atomic_inc_not_zero() in instance_lookup_get()

- must use hlist_add_head_rcu() instead of hlist_add_head()

- must use hlist_del_rcu() instead of hlist_del()

- Introduce NFULNL_COPY_DISABLED to stop lockless reader from using an
instance, before we do final instance_put() on it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

9 years agoixgbe: fix automatic LRO/RSC settings for low latency
Andy Gospodarek [Fri, 11 Jun 2010 12:47:03 +0000]
ixgbe: fix automatic LRO/RSC settings for low latency

This patch added to 2.6.34:

commit f8d1dcaf88bddc7f282722ec1fdddbcb06a72f18
Author: Jesse Brandeburg <jesse.brandeburg@intel.com>
Date:   Tue Apr 27 01:37:20 2010 +0000

    ixgbe: enable extremely low latency

introduced a feature where LRO (called RSC on the hardware) was disabled
automatically when setting rx-usecs to 0 via ethtool.  Some might not
like the fact that LRO was disabled automatically, but I'm fine with
that.  What I don't like is that LRO/RSC is automatically enabled when
rx-usecs is set >0 via ethtool.

This would certainly be a problem if the device was used for forwarding
and it was determined that the low latency wasn't needed after the
device was already forwarding.  I played around with saving the state of
LRO in the driver, but it just didn't seem worthwhile and would require
a small change to dev_disable_lro() that I did not like.

This patch simply leaves LRO disabled when setting rx-usecs >0 and
requires that the user enable it again.  An extra informational message
will also now appear in the log so users can understand why LRO isn't
being enabled as they expect.

Inconsistency of LRO setting first noticed by Stanislaw Gruszka.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
CC: Stanislaw Gruszka <sgruszka@redhat.com>
CC: stable@kernel.org
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoe1000: Fix message logging defect
Joe Perches [Fri, 11 Jun 2010 12:51:49 +0000]
e1000: Fix message logging defect

commit 675ad47375c76a7c3be4ace9554d92cd55518ced
removed the capability to use ethtool.set_msglevel to
control the types of messages emitted by the driver.

That commit should probably be reverted.

If not, then this patch fixes a message logging defect
introduced by converting a printk without KERN_<level>
to e_info.

This also reduces text by about 200 bytes.

Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoixgbe: fix for race with 8259(8|9) during shutdown
Don Skidmore [Fri, 11 Jun 2010 13:20:29 +0000]
ixgbe: fix for race with 8259(8|9) during shutdown

There is a small window where the watchdog could be running as the
interface is brought down on a NIC with two ports wired back to back.
If ixgbe_update_status is then called can lead to a panic.  This patch
allows the update to bail if we are in that condition.

This issue was orignally reported and fix proposed by Akihiko Saitou.

CC: Akihiko Saitou <asaitou@users.sourceforge.net>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonet: rxhash already set in __copy_skb_header
Eric Dumazet [Sun, 13 Jun 2010 10:50:46 +0000]
net: rxhash already set in __copy_skb_header

No need to copy rxhash again in __skb_clone()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonet: fix deliver_no_wcard regression on loopback device
John Fastabend [Sun, 13 Jun 2010 10:36:30 +0000]
net: fix deliver_no_wcard regression on loopback device

deliver_no_wcard is not being set in skb_copy_header.
In the skb_cloned case it is not being cleared and
may cause the skb to be dropped when the loopback device
pushes it back up the stack.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoirttp: Print device parameters and statistics as unsigned
Ben Hutchings [Tue, 8 Jun 2010 08:23:01 +0000]
irttp: Print device parameters and statistics as unsigned

Device statistics have type unsigned long and several of the
device-specific parameters printed here have type __u32.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agousbnet: Print device statistics as unsigned
Ben Hutchings [Tue, 8 Jun 2010 08:20:59 +0000]
usbnet: Print device statistics as unsigned

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agosfc: Implement 64-bit net device statistics on all architectures
Ben Hutchings [Tue, 8 Jun 2010 07:21:12 +0000]
sfc: Implement 64-bit net device statistics on all architectures

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonet: Enable 64-bit net device statistics on 32-bit architectures
Ben Hutchings [Tue, 8 Jun 2010 07:19:54 +0000]
net: Enable 64-bit net device statistics on 32-bit architectures

Use struct rtnl_link_stats64 as the statistics structure.

On 32-bit architectures, insert 32 bits of padding after/before each
field of struct net_device_stats to make its layout compatible with
struct rtnl_link_stats64.  Add an anonymous union in net_device; move
stats into the union and add struct rtnl_link_stats64 stats64.

Add net_device_ops::ndo_get_stats64, implementations of which will
return a pointer to struct rtnl_link_stats64.  Drivers that implement
this operation must not update the structure asynchronously.

Change dev_get_stats() to call ndo_get_stats64 if available, and to
return a pointer to struct rtnl_link_stats64.  Change callers of
dev_get_stats() accordingly.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoucc_geth driver: add ioctl
Sergey Matyukevich [Mon, 7 Jun 2010 08:38:13 +0000]
ucc_geth driver: add ioctl

ioctl operation (ndo_do_ioctl) is added to make mii-tools work

Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoenic: fix pci_alloc_consistent argument
Randy Dunlap [Tue, 8 Jun 2010 07:00:20 +0000]
enic: fix pci_alloc_consistent argument

Fix build warning on i386 (32-bit) with 32-bit dma_addr_t:

drivers/net/enic/vnic_dev.c: In function 'vnic_dev_init_prov':
drivers/net/enic/vnic_dev.c:716: warning: passing argument 3 of 'pci_alloc_consistent' from incompatible pointer type
include/asm-generic/pci-dma-compat.h:16: note: expected 'dma_addr_t *' but argument is of type 'u64 *'

Now builds without warnings on i386 and on x86_64.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Scott Feldman <scofeldm@cisco.com>
Cc: Vasanthy Kolluri <vkolluri@cisco.com>
Cc: Roopa Prabhu <roprabhu@cisco.com>
Acked-by: Scott Feldman <scofeldm@cisco.com>

9 years agopktgen: increasing transmission granularity
Daniel Turull [Wed, 9 Jun 2010 22:49:57 +0000]
pktgen: increasing transmission granularity

This patch increases the granularity of the rate generated by pktgen.
The previous version of pktgen uses micro seconds (udelay) resolution when it
was delayed causing gaps in the rates. It is changed to nanosecond (ndelay).
Now any rate is possible.

Also it allows to set, the desired rate in Mb/s or packets per second.

The documentation has been updated.

Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoeconet: fix locking
Eric Dumazet [Wed, 9 Jun 2010 16:33:05 +0000]
econet: fix locking

econet lacks proper locking. It holds econet_lock only when inserting or
deleting an entry in econet_sklist, not during lookups.

- convert econet_lock from rwlock to spinlock

- use econet_lock in ec_listening_socket() lookup

- use appropriate sock_hold() / sock_put() to avoid corruptions.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agopkt_sched: gen_kill_estimator() rcu fixes
Eric Dumazet [Wed, 9 Jun 2010 02:09:23 +0000]
pkt_sched: gen_kill_estimator() rcu fixes

gen_kill_estimator() API is incomplete or not well documented, since
caller should make sure an RCU grace period is respected before
freeing stats_lock.

This was partially addressed in commit 5d944c640b4
(gen_estimator: deadlock fix), but same problem exist for all
gen_kill_estimator() users, if lock they use is not already RCU
protected.

A code review shows xt_RATEEST.c, act_api.c, act_police.c have this
problem. Other are ok because they use qdisc lock, already RCU
protected.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agobnx2: Fix compiler warning in bnx2_disable_forced_2g5().
Michael Chan [Tue, 8 Jun 2010 07:21:30 +0000]
bnx2: Fix compiler warning in bnx2_disable_forced_2g5().

drivers/net/bnx2.c: In function 'bnx2_disable_forced_2g5':
drivers/net/bnx2.c:1489: warning: 'bmcr' may be used uninitialized in this function

We fix it by checking return values from all bnx2_read_phy() and proceeding
to do read-modify-write only if the read operation is successful.

The related bnx2_enable_forced_2g5() is also fixed the same way.

Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoenic: cleanup vic_provinfo_alloc()
Dan Carpenter [Wed, 9 Jun 2010 21:59:03 +0000]
enic: cleanup vic_provinfo_alloc()

If oui were a null variable then vic_provinfo_alloc() would leak memory.
But this function is only called from one place and oui is not null so
I removed the check.

I also moved the memory allocation down a line so it was easier to spot.
(No one ever reads variable declarations).

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoMerge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
David S. Miller [Fri, 11 Jun 2010 20:32:31 +0000]
Merge branch 'master' of /linux/kernel/git/davem/net-2.6

9 years agoethoc: use devres resource management
Jonas Bonn [Fri, 11 Jun 2010 02:47:40 +0000]
ethoc: use devres resource management

The point of using the devres resource management routines is that they
simplify the driver by taking care of releasing resources on failure and
release.  A recent commit added a bunch of error handling that is unnecessary
in this context.

This patch removes this redundant error handling, as well as using
dmam_alloc_coherent in place of dma_alloc_coherent in order to use this
framework consistenly throughout the driver.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoethoc: Clear command buffer after write
Jonas Bonn [Fri, 11 Jun 2010 02:47:39 +0000]
ethoc: Clear command buffer after write

This matches what ethoc_mdio_read does and makes the functions
symmetric.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoRemove unused variable
Jonas Bonn [Fri, 11 Jun 2010 02:47:38 +0000]
Remove unused variable

Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoethoc: Clean up PHY probing
Jonas Bonn [Fri, 11 Jun 2010 02:47:37 +0000]
ethoc: Clean up PHY probing

- No need to iterate over all possible addresses on bus
- Use helper function phy_find_first
- Use phy_connect_direct as we already have the relevant structure

Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoethoc: write number of TX buffers in init_ring
Jonas Bonn [Fri, 11 Jun 2010 02:47:36 +0000]
ethoc: write number of TX buffers in init_ring

This moves the write of the TX_BD_NUM to init_ring together with the
rest of the code setting up the transmission buffers.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoethoc: Write bus addresses to registers
Jonas Bonn [Fri, 11 Jun 2010 02:47:35 +0000]
ethoc: Write bus addresses to registers

The ethoc driver should be writing bus addresses to the ethoc registers, not
virtual addresses.  This patch adds an array to store the virtual addresses
in and references that array when manipulating the contents of the buffer
descriptors.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoethoc: calculate number of buffers in ethoc_probe
Jonas Bonn [Fri, 11 Jun 2010 02:47:34 +0000]
ethoc: calculate number of buffers in ethoc_probe

This moves the calculation of the number of transmission buffers to
ethoc_probe where it more logically fits with the rest of the memory
allocation code.

Signed-off-by: Jonas Bonn <jonas@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoMerge branch 'wimax-2.6.35.y' of git://git.kernel.org/pub/scm/linux/kernel/git/inaky...
David S. Miller [Fri, 11 Jun 2010 19:38:23 +0000]
Merge branch 'wimax-2.6.35.y' of git://git./linux/kernel/git/inaky/wimax

9 years agowimax/i2400m: fix missing endian correction read in fw loader
Inaky Perez-Gonzalez [Fri, 11 Jun 2010 18:51:20 +0000]
wimax/i2400m: fix missing endian correction read in fw loader

i2400m_fw_hdr_check() was accessing hardware field
bcf_hdr->module_type (little endian 32) without converting to host
byte sex.

Reported-by: Данилин Михаил <mdanilin@nsg.net.ru>

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
David S. Miller [Fri, 11 Jun 2010 18:34:06 +0000]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-next-2.6

Conflicts:
drivers/net/wireless/wl12xx/wl1271.h
drivers/net/wireless/wl12xx/wl1271_cmd.h

9 years agonet-next: remove useless union keyword
Changli Gao [Fri, 11 Jun 2010 06:31:35 +0000]
net-next: remove useless union keyword

remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonet8139: fix a race at the end of NAPI
Figo.zhang [Mon, 7 Jun 2010 21:13:22 +0000]
net8139: fix a race at the end of NAPI

fix a race at the end of NAPI complete processing, it had
better do __napi_complete() first before re-enable interrupt.

Signed-off-by:Figo.zhang <figo1802@gmail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agopktgen: Fix accuracy of inter-packet delay.
Daniel Turull [Fri, 11 Jun 2010 06:08:11 +0000]
pktgen: Fix accuracy of inter-packet delay.

This patch correct a bug in the delay of pktgen.
It makes sure the inter-packet interval is accurate.

Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agopkt_sched: gen_estimator: add a new lock
Eric Dumazet [Tue, 8 Jun 2010 23:39:10 +0000]
pkt_sched: gen_estimator: add a new lock

gen_kill_estimator() / gen_new_estimator() is not always called with
RTNL held.

net/netfilter/xt_RATEEST.c is one user of these API that do not hold
RTNL, so random corruptions can occur between "tc" and "iptables".

Add a new fine grained lock instead of trying to use RTNL in netfilter.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoip: ip_ra_control() rcu fix
Eric Dumazet [Wed, 9 Jun 2010 16:21:07 +0000]
ip: ip_ra_control() rcu fix

commit 66018506e15b (ip: Router Alert RCU conversion) introduced RCU
lookups to ip_call_ra_chain(). It missed proper deinit phase :
When ip_ra_control() deletes an ip_ra_chain, it should make sure
ip_call_ra_chain() users can not start to use socket during the rcu
grace period. It should also delay the sock_put() after the grace
period, or we risk a premature socket freeing and corruptions, as
raw sockets are not rcu protected yet.

This delay avoids using expensive atomic_inc_not_zero() in
ip_call_ra_chain().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonet: deliver skbs on inactive slaves to exact matches
John Fastabend [Thu, 3 Jun 2010 09:30:11 +0000]
net: deliver skbs on inactive slaves to exact matches

Currently, the accelerated receive path for VLAN's will
drop packets if the real device is an inactive slave and
is not one of the special pkts tested for in
skb_bond_should_drop().  This behavior is different then
the non-accelerated path and for pkts over a bonded vlan.

For example,

vlanx -> bond0 -> ethx

will be dropped in the vlan path and not delivered to any
packet handlers at all.  However,

bond0 -> vlanx -> ethx

and

bond0 -> ethx

will be delivered to handlers that match the exact dev,
because the VLAN path checks the real_dev which is not a
slave and netif_recv_skb() doesn't drop frames but only
delivers them to exact matches.

This patch adds a sk_buff flag which is used for tagging
skbs that would previously been dropped and allows the
skb to continue to skb_netif_recv().  Here we add
logic to check for the deliver_no_wcard flag and if it
is set only deliver to handlers that match exactly.  This
makes both paths above consistent and gives pkt handlers
a way to identify skbs that come from inactive slaves.
Without this patch in some configurations skbs will be
delivered to handlers with exact matches and in others
be dropped out right in the vlan path.

I have tested the following 4 configurations in failover modes
and load balancing modes.

# bond0 -> ethx

# vlanx -> bond0 -> ethx

# bond0 -> vlanx -> ethx

# bond0 -> ethx
            |
  vlanx -> --

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoipv6: fix ICMP6_MIB_OUTERRORS
Eric Dumazet [Mon, 7 Jun 2010 22:24:44 +0000]
ipv6: fix ICMP6_MIB_OUTERRORS

In commit 1f8438a85366 (icmp: Account for ICMP out errors), I did a typo
on IPV6 side, using ICMP6_MIB_OUTMSGS instead of ICMP6_MIB_OUTERRORS

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoipv6: mcast: RCU conversions
Eric Dumazet [Mon, 7 Jun 2010 21:05:02 +0000]
ipv6: mcast: RCU conversions

- ipv6_sock_mc_join() : doesnt touch dev refcount

- ipv6_sock_mc_drop() : doesnt touch dev/idev refcounts

- ip6_mc_find_dev() becomes ip6_mc_find_dev_rcu() (called from rcu),
                    and doesnt touch dev/idev refcounts

- ipv6_sock_mc_close() : doesnt touch dev/idev refcounts

- ip6_mc_source() uses ip6_mc_find_dev_rcu()

- ip6_mc_msfilter() uses ip6_mc_find_dev_rcu()

- ip6_mc_msfget() uses ip6_mc_find_dev_rcu()

- ipv6_dev_mc_dec(), ipv6_chk_mcast_addr(),
  igmp6_event_query(), igmp6_event_report(),
  mld_sendpack(), igmp6_send() dont touch idev refcount

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agocleanup: remove pppoe_xmit() declaration.
Rami Rosen [Tue, 8 Jun 2010 19:07:56 +0000]
cleanup: remove pppoe_xmit() declaration.

There is no need for pppoe_xmit() forward declaration in
drivers/net/pppoe.c. This patch removes this  pppoe_xmit() declaration.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoicmp: RCU conversion in icmp_address_reply()
Eric Dumazet [Mon, 7 Jun 2010 22:34:35 +0000]
icmp: RCU conversion in icmp_address_reply()

- rcu_read_lock() already held by caller
- use __in_dev_get_rcu() instead of in_dev_get() / in_dev_put()
- remove goto out;

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agor8169: fix mdio_read and update mdio_write according to hw specs
Timo Teräs [Thu, 10 Jun 2010 00:31:48 +0000]
r8169: fix mdio_read and update mdio_write according to hw specs

Realtek confirmed that a 20us delay is needed after mdio_read and
mdio_write operations. Reduce the delay in mdio_write, and add it
to mdio_read too. Also add a comment that the 20us is from hw specs.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agoMerge branch 'num_rx_queues' of git://kernel.ubuntu.com/rtg/net-2.6
David S. Miller [Wed, 9 Jun 2010 23:28:25 +0000]
Merge branch 'num_rx_queues' of git://kernel.ubuntu.com/rtg/net-2.6

9 years agogianfar: Revive the driver for eTSEC devices (disable timestamping)
Anton Vorontsov [Wed, 9 Jun 2010 23:27:08 +0000]
gianfar: Revive the driver for eTSEC devices (disable timestamping)

Since commit cc772ab7cdcaa24d1fae332d92a1602788644f7a ("gianfar: Add
hardware RX timestamping support"), the driver no longer works on
at least MPC8313ERDB and MPC8568EMDS boards (and possibly much more
boards as well).

That's how MPC8313 Reference Manual describes RCTRL_TS_ENABLE bit:

  Timestamp incoming packets as padding bytes. PAL field is set
  to 8 if the PAL field is programmed to less than 8. Must be set
  to zero if TMR_CTRL[TE]=0.

I see that the commit above sets this bit, but it doesn't handle
TMR_CTRL. Manfred probably had this bit set by the firmware for
his boards. But obviously this isn't true for all boards in the
wild.

Also, I recall that Freescale BSPs were explicitly disabling the
timestamping because of a performance drop.

For now, the best way to deal with this is just disable the
timestamping, and later we can discuss proper device tree bindings
and implement enabling this feature via some property.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agocaif: fix a couple range checks
Dan Carpenter [Mon, 7 Jun 2010 04:51:58 +0000]
caif: fix a couple range checks

The extra ! character means that these conditions are always false.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agophylib: Add support for the LXT973 phy.
Richard Cochran [Mon, 7 Jun 2010 05:39:32 +0000]
phylib: Add support for the LXT973 phy.

This patch implements a work around for Erratum 5, "3.3 V Fiber Speed
Selection." If the hardware wiring does not respect this erratum, then
fiber optic mode will not work properly.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agophonet: use call_rcu for phonet device free
Jiri Pirko [Mon, 7 Jun 2010 03:27:39 +0000]
phonet: use call_rcu for phonet device free

Use call_rcu rather than synchronize_rcu.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9 years agonet: Print num_rx_queues imbalance warning only when there are allocated queues
Tim Gardner [Tue, 8 Jun 2010 23:51:27 +0000]
net: Print num_rx_queues imbalance warning only when there are allocated queues

BugLink: http://bugs.launchpad.net/bugs/591416

There are a number of network drivers (bridge, bonding, etc) that are not yet
receive multi-queue enabled and use alloc_netdev(), so don't print a
num_rx_queues imbalance warning in that case.

Also, only print the warning once for those drivers that _are_ multi-queue
enabled.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wirel...
David S. Miller [Wed, 9 Jun 2010 18:13:23 +0000]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless-2.6

9 years agonetfilter: nfnetlink_log: RCU conversion
Eric Dumazet [Wed, 9 Jun 2010 16:14:58 +0000]
netfilter: nfnetlink_log: RCU conversion

- instances_lock becomes a spinlock
- lockless lookups

While nfnetlink_log probably not performance critical, using less
rwlocks in our code is always welcomed...

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

9 years agonetfilter: nfnetlink_queue: some optimizations
Eric Dumazet [Wed, 9 Jun 2010 16:07:06 +0000]
netfilter: nfnetlink_queue: some optimizations

- Use an atomic_t for id_sequence to avoid a spin_lock/spin_unlock pair

- Group highly modified struct nfqnl_instance fields together

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

9 years agonetfilter: ip6_queue: rwlock to spinlock conversion
Eric Dumazet [Wed, 9 Jun 2010 14:25:08 +0000]
netfilter: ip6_queue: rwlock to spinlock conversion

Converts queue_lock rwlock to a spinlock.

(readlocked part can be changed by reads of integer values)

One atomic operation instead of four per ipq_enqueue_packet() call.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>

9 years agonetfilter: ip_queue: rwlock to spinlock conversion
Eric Dumazet [Wed, 9 Jun 2010 13:47:41 +0000]
netfilter: ip_queue: rwlock to spinlock conversion

Converts queue_lock rwlock to a spinlock.

(readlocked part can be changed by reads of integer values)

One atomic operation instead of four per ipq_enqueue_packet() call.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>