Commits · 5317d9af12a59e83a6f173eac3808cc21f6e9d2b · Kirill Smelkov / linux

21 Mar, 2016 1 commit

ipv6: prevent fib6_run_gc() contention · 5317d9af

Michal Kubeček authored 11 years ago

commit 2ac3ac8f

 upstream.

On a high-traffic router with many processors and many IPv6 dst
entries, soft lockup in fib6_run_gc() can occur when number of
entries reaches gc_thresh.

This happens because fib6_run_gc() uses fib6_gc_lock to allow
only one thread to run the garbage collector but ip6_dst_gc()
doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
returns. On a system with many entries, this can take some time
so that in the meantime, other threads pass the tests in
ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
the lock. They then have to run the garbage collector one after
another which blocks them for quite long.

Resolve this by replacing special value ~0UL of expire parameter
to fib6_run_gc() by explicit "force" parameter to choose between
spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
force=false if gc_thresh is reached but not max_size.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <lizefan@huawei.com>

5317d9af

14 Sep, 2013 1 commit

ipv6: Don't depend on per socket memory for neighbour discovery messages · cce0727a

Thomas Graf authored 11 years ago

[ Upstream commit 25a6e6b8

 ]

Allocating skbs when sending out neighbour discovery messages
currently uses sock_alloc_send_skb() based on a per net namespace
socket and thus share a socket wmem buffer space.

If a netdevice is temporarily unable to transmit due to carrier
loss or for other reasons, the queued up ndisc messages will cosnume
all of the wmem space and will thus prevent from any more skbs to
be allocated even for netdevices that are able to transmit packets.

The number of neighbour discovery messages sent is very limited,
use of alloc_skb() bypasses the socket wmem buffer size enforcement
while the manual call to skb_set_owner_w() maintains the socket
reference needed for the IPv6 output path.

This patch has orginally been posted by Eric Dumazet in a modified
form.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Fabio Estevam <festevam@gmail.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cce0727a

17 Nov, 2012 1 commit

ipv6: send unsolicited neighbour advertisements to all-nodes · f0d6767a

Hannes Frederic Sowa authored 12 years ago

[ Upstream commit 60713a0c

 ]

As documented in RFC4861 (Neighbor Discovery for IP version 6) 7.2.6.,
unsolicited neighbour advertisements should be sent to the all-nodes
multicast address.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f0d6767a

13 Apr, 2012 1 commit

ipv6: fix problem with expired dst cache · 1716a961

Gao feng authored 12 years ago


If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet.
this dst cache will not check expire because it has no RTF_EXPIRES flag.
So this dst cache will always be used until the dst gc run.

Change the struct dst_entry,add a union contains new pointer from and expires.
When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use.
we can use this field to point to where the dst cache copy from.
The dst.from is only used in IPV6.

rt6_check_expired check if rt6_info.dst.from is expired.

ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.

ip6_dst_destroy release the ort.

Add some functions to operate the RTF_EXPIRES flag and expires(from) together.
and change the code to use these new adding functions.

Changes from v5:
modify ip6_route_add and ndisc_router_discovery to use new adding functions.

Only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1716a961

22 Feb, 2012 1 commit

ipv6: ip6_route_output() never returns NULL. · 5095d64d

RongQing.Li authored 13 years ago


ip6_route_output() never returns NULL, so it is wrong to
check if the return value is NULL.
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5095d64d

28 Jan, 2012 2 commits

ipv6: Remove neigh argument from ndisc_send_redirect() · 4991969a

David S. Miller authored 13 years ago


Instead, compute it as-needed inside of that function using
dst_neigh_lookup().
Signed-off-by: David S. Miller <davem@davemloft.net>

4991969a

ipv6: ndisc: Convert to dst_neigh_lookup() · eb857186

David S. Miller authored 13 years ago


Now all code paths grab a local reference to the neigh, so if neigh
is not NULL we unconditionally release it at the end.  The old logic
would only release if we didn't have a non-NULL 'rt'.
Signed-off-by: David S. Miller <davem@davemloft.net>

eb857186

04 Jan, 2012 1 commit

ipv6: Check RA for sllao when configuring optimistic ipv6 address (v2) · e6bff995

Neil Horman authored 13 years ago

Recently Dave noticed that a test we did in ipv6_add_addr to see if we next hop
route for the interface we're adding an addres to was wrong (see commit
7ffbcecb

).  for one, it never triggers, and two,
it was completely wrong to begin with.  This test was meant to cover this
section of RFC 4429:

3.3 Modifications to RFC 2462 Stateless Address Autoconfiguration

   * (modifies section 5.5) A host MAY choose to configure a new address
        as an Optimistic Address.  A host that does not know the SLLAO
        of its router SHOULD NOT configure a new address as Optimistic.
        A router SHOULD NOT configure an Optimistic Address.

This patch should bring us into proper compliance with the above clause.  Since
we only add a SLAAC address after we've received a RA which may or may not
contain a source link layer address option, we can pass a pointer to that option
to addrconf_prefix_rcv (which may be null if the option is not present), and
only set the optimistic flag if the option was found in the RA.

Change notes:
(v2) modified the new parameter to addrconf_prefix_rcv to be a bool rather than
a pointer to make its use more clear as per request from davem.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e6bff995

29 Dec, 2011 1 commit

ipv6: Kill rt6i_dev and rt6i_expires defines. · d1918542

David S. Miller authored 13 years ago


It just obscures that the netdevice pointer and the expires value are
implemented in the dst_entry sub-object of the ipv6 route.

And it makes grepping for dst_entry member uses much harder too.
Signed-off-by: David S. Miller <davem@davemloft.net>

d1918542

28 Dec, 2011 1 commit

ipv6: Use universal hash for NDISC. · 2c2aba6c

David S. Miller authored 13 years ago


In order to perform a proper universal hash on a vector of integers,
we have to use different universal hashes on each vector element.

Which means we need 4 different hash randoms for ipv6.
Signed-off-by: David S. Miller <davem@davemloft.net>

2c2aba6c

06 Dec, 2011 1 commit
- ipv6: Move xfrm_lookup() call down into icmp6_dst_alloc(). · 87a11578
  David S. Miller authored 13 years ago
```
And return error pointers.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
  87a11578
05 Dec, 2011 1 commit

net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}. · 27217455

David Miller authored 13 years ago


To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>

27217455

30 Nov, 2011 1 commit

neigh: Do not set tbl->entry_size in ipv4/ipv6 neigh tables. · 76cc714e

David Miller authored 13 years ago


Let the core self-size the neigh entry based upon the key length.
Signed-off-by: David S. Miller <davem@davemloft.net>

76cc714e

23 Nov, 2011 1 commit

ipv6: fix a bug in ndisc_send_redirect · 4d65a246

Li Wei authored 13 years ago


Release skb when transmit rate limit _not_ allow
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d65a246

22 Nov, 2011 1 commit

net: remove ipv6_addr_copy() · 4e3fd7a0

Alexey Dobriyan authored 13 years ago


C assignment can handle struct in6_addr copying.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e3fd7a0

18 Nov, 2011 1 commit

ipv6: Remove all uses of LL_ALLOCATED_SPACE · a7ae1992

Herbert Xu authored 13 years ago


ipv6: Remove all uses of LL_ALLOCATED_SPACE

The macro LL_ALLOCATED_SPACE was ill-conceived.  It applies the
alignment to the sum of needed_headroom and needed_tailroom.  As
the amount that is then reserved for head room is needed_headroom
with alignment, this means that the tail room left may be too small.

This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv6
with the macro LL_RESERVED_SPACE and direct reference to
needed_tailroom.

This also fixes the problem with needed_headroom changing between
allocating the skb and reserving the head room.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7ae1992

14 Nov, 2011 1 commit

neigh: new unresolved queue limits · 8b5c171b

Eric Dumazet authored 13 years ago


Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
> From: David Miller <davem@davemloft.net>
> Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
>
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 09 Nov 2011 12:14:09 +0100
> >
> >> unres_qlen is the number of frames we are able to queue per unresolved
> >> neighbour. Its default value (3) was never changed and is responsible
> >> for strange drops, especially if IP fragments are used, or multiple
> >> sessions start in parallel. Even a single tcp flow can hit this limit.
> >  ...
> >
> > Ok, I've applied this, let's see what happens :-)
>
> Early answer, build fails.
>
> Please test build this patch with DECNET enabled and resubmit.  The
> decnet neigh layer still refers to the removed ->queue_len member.
>
> Thanks.

Ouch, this was fixed on one machine yesterday, but not the other one I
used this morning, sorry.

[PATCH V5 net-next] neigh: new unresolved queue limits

unres_qlen is the number of frames we are able to queue per unresolved
neighbour. Its default value (3) was never changed and is responsible
for strange drops, especially if IP fragments are used, or multiple
sessions start in parallel. Even a single tcp flow can hit this limit.

$ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms
Signed-off-by: David S. Miller <davem@davemloft.net>

8b5c171b

24 Oct, 2011 1 commit

ipv6: Do not use routes from locally generated RAs · 9f56220f

Andreas Hofmeister authored 13 years ago


When hybrid mode is enabled (accept_ra == 2), the kernel also sees RAs
generated locally. This is useful since it allows the kernel to auto-configure
its own interface addresses.

However, if 'accept_ra_defrtr' and/or 'accept_ra_rtr_pref' are set and the
locally generated RAs announce the default route and/or other route information,
the kernel happily inserts bogus routes with its own address as gateway.

With this patch, adding routes from an RA will be skiped when the RAs source
address matches any local address, just as if 'accept_ra_defrtr' and
'accept_ra_rtr_pref' were set to 0.
Signed-off-by: Andreas Hofmeister <andi@collax.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9f56220f

17 Oct, 2011 1 commit

ipv6: remove a rcu_read_lock in ndisc_constructor · 01b7806c

Roy.Li authored 13 years ago


in6_dev_get(dev) takes a reference on struct inet6_dev, we dont need
rcu locking in ndisc_constructor()
Signed-off-by: Roy.Li <rongqing.li@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

01b7806c

01 Aug, 2011 1 commit

ipv6: some RCU conversions · cfdf7647

Eric Dumazet authored 13 years ago


ICMP and ND are not fast path, but still we can avoid changing idev
refcount, using RCU.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cfdf7647

18 Jul, 2011 3 commits

net: Abstract dst->neighbour accesses behind helpers. · 69cce1d1
David S. Miller authored 13 years ago
```
dst_{get,set}_neighbour()
Signed-off-by: David S. Miller <davem@davemloft.net>
```
69cce1d1

ipv6: Get rid of rt6i_nexthop macro. · 9cbb7ecb

David S. Miller authored 13 years ago


It just makes it harder to see 1) what the code is doing
and 2) grep for all users of dst{->,.}neighbour
Signed-off-by: David S. Miller <davem@davemloft.net>

9cbb7ecb

neigh: Pass neighbour entry to output ops. · 8f40b161

David S. Miller authored 13 years ago


This will get us closer to being able to do "neigh stuff"
completely independent of the underlying dst_entry for
protocols (ipv4/ipv6) that wish to do so.

We will also be able to make dst entries neigh-less.
Signed-off-by: David S. Miller <davem@davemloft.net>

8f40b161

17 Jul, 2011 2 commits

neigh: Kill ndisc_ops->queue_xmit · 542d4d68

David S. Miller authored 13 years ago


It is always dev_queue_xmit().
Signed-off-by: David S. Miller <davem@davemloft.net>

542d4d68

neigh: Kill neigh_ops->hh_output · 47ec132a

David S. Miller authored 13 years ago


It's always dev_queue_xmit().
Signed-off-by: David S. Miller <davem@davemloft.net>

47ec132a

29 Apr, 2011 1 commit

ipv4, ipv6, bonding: Restore control over number of peer notifications · ad246c99

Ben Hutchings authored 13 years ago


For backward compatibility, we should retain the module parameters and
sysfs attributes to control the number of peer notifications
(gratuitous ARPs and unsolicited NAs) sent after bonding failover.
Also, it is possible for failover to take place even though the new
active slave does not have link up, and in that case the peer
notification should be deferred until it does.

Change ipv4 and ipv6 so they do not automatically send peer
notifications on bonding failover.

Change the bonding driver to send separate NETDEV_NOTIFY_PEERS
notifications when the link is up, as many times as requested.  Since
it does not directly control which protocols send notifications, make
num_grat_arp and num_unsol_na aliases for a single parameter.  Bump
the bonding version number and update its documentation.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Acked-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad246c99

22 Apr, 2011 1 commit

inet: constify ip headers and in6_addr · b71d1d42

Eric Dumazet authored 13 years ago


Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b71d1d42

18 Apr, 2011 2 commits

bonding, ipv4, ipv6, vlan: Handle NETDEV_BONDING_FAILOVER like NETDEV_NOTIFY_PEERS · 7c899432

Ben Hutchings authored 13 years ago


It is undesirable for the bonding driver to be poking into higher
level protocols, and notifiers provide a way to avoid that.  This does
mean removing the ability to configure reptitition of gratuitous ARPs
and unsolicited NAs.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c899432

ipv6: Send unsolicited neighbour advertismements when notified · f47b9464

Ben Hutchings authored 13 years ago


The NETDEV_NOTIFY_PEERS notifier is a request to send such
advertisements following migration to a different physical link,
e.g. virtual machine migration.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f47b9464

15 Apr, 2011 1 commit

ipv6: ignore looped-back NA while dad is running · bd015928

Daniel Walter authored 13 years ago


[ipv6] Ignore looped-back NAs while in Duplicate Address Detection

If we send an unsolicited NA shortly after bringing up an
IPv6 address, the duplicate address detection algorithm
fails and the ip stays in tentative mode forever.
This is due a missing check if the NA is looped-back to us.
Signed-off-by: Daniel Walter <dwalter@barracuda.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd015928

30 Mar, 2011 1 commit

net: gre: provide multicast mappings for ipv4 and ipv6 · 93ca3bb5

Timo Teräs authored 13 years ago

My commit 6d55cb91

 (gre: fix hard header destination
address checking) broke multicast.

The reason is that ip_gre used to get ipgre_header() calls with
zero destination if we have NOARP or multicast destination. Instead
the actual target was decided at ipgre_tunnel_xmit() time based on
per-protocol dissection.

Instead of allowing the "abuse" of ->header() calls with invalid
destination, this creates multicast mappings for ip_gre. This also
fixes "ip neigh show nud noarp" to display the proper multicast
mappings used by the gre device.
Reported-by: Doug Kehn <rdkehn@yahoo.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Doug Kehn <rdkehn@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

93ca3bb5

12 Mar, 2011 1 commit
- ipv6: Convert to use flowi6 where applicable. · 4c9483b2
  David S. Miller authored 14 years ago
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
  4c9483b2
02 Mar, 2011 1 commit
- xfrm: Return dst directly from xfrm_lookup() · 452edd59
  David S. Miller authored 14 years ago
```
Instead of on the stack.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
  452edd59
04 Feb, 2011 1 commit

inetpeer: Move ICMP rate limiting state into inet_peer entries. · 92d86829

David S. Miller authored 14 years ago


Like metrics, the ICMP rate limiting bits are cached state about
a destination.  So move it into the inet_peer entries.

If an inet_peer cannot be bound (the reason is memory allocation
failure or similar), the policy is to allow.
Signed-off-by: David S. Miller <davem@davemloft.net>

92d86829

09 Dec, 2010 1 commit

net: Abstract away all dst_entry metrics accesses. · defb3519

David S. Miller authored 14 years ago


Use helper functions to hide all direct accesses, especially writes,
to dst_entry metrics values.

This will allow us to:

1) More easily change how the metrics are stored.

2) Implement COW for metrics.

In particular this will help us put metrics into the inetpeer
cache if that is what we end up doing.  We can make the _metrics
member a pointer instead of an array, initially have it point
at the read-only metrics in the FIB, and then on the first set
grab an inetpeer entry and point the _metrics member there.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

defb3519

02 Dec, 2010 1 commit

ipv6: use ND_REACHABLE_TIME and ND_RETRANS_TIMER instead of magic number · b672083e

Shan Wei authored 14 years ago


ND_REACHABLE_TIME and ND_RETRANS_TIMER have defined
since v2.6.12-rc2, but never been used.
So use them instead of magic number.

This patch also changes original code style to read comfortably .

Thank YOSHIFUJI Hideaki for pointing it out.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b672083e

05 Oct, 2010 1 commit

net neigh: RCU conversion of neigh hash table · d6bf7817

Eric Dumazet authored 14 years ago


David

This is the first step for RCU conversion of neigh code.

Next patches will convert hash_buckets[] and "struct neighbour" to RCU
protected objects.

Thanks

[PATCH net-next] net neigh: RCU conversion of neigh hash table

Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
neigh_table", a new structure is defined :

struct neigh_hash_table {
       struct neighbour        **hash_buckets;
       unsigned int            hash_mask;
       __u32                   hash_rnd;
       struct rcu_head         rcu;
};

And "struct neigh_table" has an RCU protected pointer to such a
neigh_hash_table.

This means the signature of (*hash)() function changed: We need to add a
third parameter with the actual hash_rnd value, since this is not
anymore a neigh_table field.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d6bf7817

23 Sep, 2010 1 commit

net: return operator cleanup · a02cec21

Eric Dumazet authored 14 years ago


Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a02cec21

03 Sep, 2010 1 commit

ipv6: add special mode accept_ra=2 to accept RA while configured as router · 65e9b62d

Thomas Graf authored 14 years ago


The current IPv6 behavior is to not accept router advertisements while
forwarding, i.e. configured as router.

This does make sense, a router is typically not supposed to be auto
configured. However there are exceptions and we should allow the
current behavior to be overwritten.

Therefore this patch enables the user to overrule the "if forwarding
enabled then don't listen to RAs" rule by setting accept_ra to the
special value of 2.

An alternative would be to ignore the forwarding switch alltogether
and solely accept RAs based on the value of accept_ra. However, I
found that if not intended, accepting RAs as a router can lead to
strange unwanted behavior therefore we it seems wise to only do so
if the user explicitely asks for this behavior.
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

65e9b62d

26 Jun, 2010 1 commit

ipv6: fix NULL reference in proxy neighbor discovery · 9f888160

stephen hemminger authored 14 years ago

The addition of TLLAO option created a kernel OOPS regression
for the case where neighbor advertisement is being sent via
proxy path.  When using proxy, ipv6_get_ifaddr() returns NULL
causing the NULL dereference.

Change causing the bug was:
commit f7734fdf


Author: Octavian Purdila <opurdila@ixiacom.com>
Date:   Fri Oct 2 11:39:15 2009 +0000

    make TLLAO option for NA packets configurable
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9f888160