- 13 Mar, 2015 22 commits
-
-
David S. Miller authored
Eric W. Biederman says: ==================== tcp_metrics: Network namespace bloat reduction v3 This is a small pile of patches that convert tcp_metrics from using a hash table per network namespace to using a single hash table for all network namespaces. This is broken up into several patches so that each small step along the way could be carefully scrutinized as I wrote it, and equally so that each small step can be reviewed. There are several cleanups included in this series. The addition of panic calls during boot where we can not handle failure, and not trying simplifies the code. The removal of the return code from tcp_metrics_flush_all. The motivation for this change is that the tcp_metrics hash table at 128KiB is one of the largest components of a freshly allocated network namespace. I am resending the the previous version I sent has suffered bitrot, so I have respun the patches so that they apply. I believe I have addressed all of the review concerns except optimal behavior on little machines with 32-byte cache lines, which is beyond me as even the current code has bad behavior in that case. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Now that all of the operations are safe on a single hash table accross network namespaces, allocate a single global hash table and update the code to use it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Rewrite tcp_metrics_flush_all so that it can cope with entries from different network namespaces on it's hash chain. This is based on the logic in tcp_metrics_nl_cmd_del for deleting a selection of entries from a tcp metrics hash chain. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
tcp_metrics_flush_all always returns 0. Remove the unnecessary return code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
In preparation for using one tcp metrics hash table for all network namespaces add a field tcpm_net to struct tcp_metrics_block, and verify that field on all hash table lookups. Make the field tcpm_net of type possible_net_t so it takes no space when network namespaces are disabled. Further add a function tm_net to read that field so we can be efficient when network namespaces are disabled and concise the rest of the time. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
In preparation for using one hash table for all network namespaces mix the network namespace into the hash value. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
There is not a practical way to cleanup during boot so just panic if there is a problem initializing tcp_metrics. That will at least give us a clear place to start debugging if something does go wrong. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Simon Horman authored
In the case of AF_INET s_addr was set to INADDR_ANY (0) which which both symmetric with the AF_INET6 case, where s_addr is not set, and unnecessary as udp_conf is zeroed out earlier in the same function. I suspect this change does not have any run-time effect due to compiler optimisations. But it does make the code a little easier on the/my eyes. Cc: Tom Herbert <therbert@google.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Reobert Shearman noticed that mpls_egress is failing to verify that the bytes to be examined are in fact present in the packet before mpls_egress reads those bytes. As suggested by David Miller reduce this to a single pskb_may_pull call so that we don't do unnecessary work in the fast path. Reported-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jaeden Amero authored
The PHY state machine (in drivers/net/phy/phy.c) will unconditionally call phydev->adjust_link (macb_handle_link_change) when polling in the PHY_CHANGELINK state. As currently written, macb always ends up requesting a new tx_clk frequency in macb_handle_link_change. It is a waste of time to request a new tx_clk frequency if the link state hasn't changed, as the tx_clk will already be configured properly. Let's only request a new tx_clk clock frequency when necessary. Signed-off-by: Jaeden Amero <jaeden.amero@ni.com> Cc: Josh Cartwright <joshc@ni.com> Cc: Soren Brinkmann <soren.brinkmann@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Herbert Xu authored
This patch fixes a typo rhashtable_lookup_compare where we fail to recompute the hash when looking up the new table. This causes elements to be missed and potentially a crash during a resize. Reported-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Commit c0c09bfd ("rhashtable: avoid unnecessary wakeup for worker queue") changed ht->shift to be atomic, which is actually unnecessary. Instead of leaving the current shift in the core rhashtable structure, it can be cached inside the individual bucket tables. There, it will only be initialized once during a new table allocation in the shrink/expansion slow path, and from then onward it stays immutable for the rest of the bucket table liftime. That allows shift to be non-atomic. The patch also moves hash_rnd management into the table setup. The rhashtable structure now consumes 3 instead of 4 cachelines. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Ying Xue <ying.xue@windriver.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Herbert Xu authored
There is a potential race condition between readers and the rehasher. In particular, the rehasher could have started a rehash while the reader finishes a scan of the old table but fails to see the new table pointer. This patch closes this window by adding smp_wmb/smp_rmb. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric Dumazet says: ==================== inet: tcp listener refactoring, part 8 These patches prepare request socks being hashed into general ehash table : We declare 3 aliases (ireq_state, ireq_refcnt, ireq_family) Note that refcnt is not yet handled, this will be done later. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Before inserting request socks into general hash table, fill their socket family. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
ireq->ir_num contains local port, use it. Also, get_openreq4() dumping listen_sk->refcnt makes litle sense. inet_diag_fill_req() can also use ireq->ir_num Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
sock_edemux() & sock_gen_put() should be ready to cope with request socks. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Make proto_register() & proto_unregister() a bit nicer. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
When request socks will be in ehash, they'll need to be refcounted. This patch adds rsk_refcnt/ireq_refcnt macros, and adds reqsk_put() function, but nothing yet use them. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We need to identify request sock when they'll be visible in global ehash table. ireq_state is an alias to req.__req_common.skc_state. Its value is set to TCP_NEW_SYN_RECV Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
TCP_SYN_RECV state is currently used by fast open sockets. Initial TCP requests (the pseudo sockets created when a SYN is received) are not yet associated to a state. They are attached to their parent, and the parent is in TCP_LISTEN state. This commit adds TCP_NEW_SYN_RECV state, so that we can convert TCP stack to a different schem gradually. This state is not exported to user space. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
I forgot to update dccp_v6_conn_request() & cookie_v6_check(). They both need to set ireq->ireq_net and ireq->ir_cookie Lets clear ireq->ir_cookie in inet_reqsk_alloc() Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 33cf7c90 ("net: add real socket cookies") Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 12 Mar, 2015 18 commits
-
-
Daniel Borkmann authored
Currently, it is possible in cls_bpf to access eBPF maps only under rcu_read_lock_bh() variants: while on ingress side, that is, handle_ing(), the classifier would be called from __netif_receive_skb_core() under rcu_read_lock(); on egress side, however, it's rcu_read_lock_bh() via __dev_queue_xmit(). This rcu/rcu_bh mix doesn't work together with eBPF maps as they require soley to be called under rcu_read_lock(). eBPF maps could also be shared among various other eBPF programs (possibly even with other eBPF program types, f.e. tracing) and user space processes, so any context is assumed. Therefore, a possible fix for cls_bpf is to wrap/nest eBPF program invocation under non-bh RCU lock variant. Fixes: e2e9b654 ("cls_bpf: add initial eBPF support for programmable classifiers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Alexander Duyck says: ==================== fib_trie: Minor fixes for table merge This patch set addresses two issues reported with the tables merged, the first is a NULL pointer dereference, and the other is to remove a WARN_ON and set the ordering for aliases from different tables with the same slen values. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
This change makes it so that we should always have a deterministic ordering for the main and local aliases within the merged table when two leaves overlap. So for example if we have a leaf with a key of 192.168.254.0. If we previously added two aliases with a prefix length of 24 from both local and main the first entry would be first and the second would be second. When I was coding this I had added a WARN_ON should such a situation occur as I wasn't sure how likely it would be. However this WARN_ON has been triggered so this is something that should be addressed. With this patch the ordering of the aliases is as follows. First they are sorted on prefix length, then on their table ID, then tos, and finally priority. This way what we end up doing is essentially interleaving the two tables on what used to be leaf_info structure boundaries. Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Duyck authored
The function fib_unmerge assumed the local table had already been allocated. If that is not the case however when custom rules are applied then this can result in a NULL pointer dereference. In order to prevent this we must check the value of the local table pointer and if it is NULL simply return 0 as there is no local table to separate from the main. Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") Reported-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
I noticed that a helper function with argument type ARG_ANYTHING does not need to have an initialized value (register). This can worst case lead to unintented stack memory leakage in future helper functions if they are not carefully designed, or unintended application behaviour in case the application developer was not careful enough to match a correct helper function signature in the API. The underlying issue is that ARG_ANYTHING should actually be split into two different semantics: 1) ARG_DONTCARE for function arguments that the helper function does not care about (in other words: the default for unused function arguments), and 2) ARG_ANYTHING that is an argument actually being used by a helper function and *guaranteed* to be an initialized register. The current risk is low: ARG_ANYTHING is only used for the 'flags' argument (r4) in bpf_map_update_elem() that internally does strict checking. Fixes: 17a52670 ("bpf: verifier (add verifier core)") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Eric W. Biederman says: ==================== Introduce possible_net_t The current usage of write_pnet and read_pnet is a little laborious and error prone as you only notice if you failed to include them if are compiling with network namespaces enabled. possible_net_t remedies that by using a type that is 0 bytes when network namespaces are disabled and can only be read and written to with read_pnet and write_pnet. Aka less work and safer for the same effect. I kill hold_net and release_net first as are they are haven't been used since 2008 and are noise at the points where write_pnet and read_pnet are used. I have folded in Eric Dumazets suggestions to improve the killing of hold_net and release net. And respon. I had to respin anyway as there was enough changes elsewhere in the tree the previous version of these patches did not quite apply cleanly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
Having to say > #ifdef CONFIG_NET_NS > struct net *net; > #endif in structures is a little bit wordy and a little bit error prone. Instead it is possible to say: > typedef struct { > #ifdef CONFIG_NET_NS > struct net *net; > #endif > } possible_net_t; And then in a header say: > possible_net_t net; Which is cleaner and easier to use and easier to test, as the possible_net_t is always there no matter what the compile options. Further this allows read_pnet and write_pnet to be functions in all cases which is better at catching typos. This change adds possible_net_t, updates the definitions of read_pnet and write_pnet, updates optional struct net * variables that write_pnet uses on to have the type possible_net_t, and finally fixes up the b0rked users of read_pnet and write_pnet. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric W. Biederman authored
hold_net and release_net were an idea that turned out to be useless. The code has been disabled since 2008. Kill the code it is long past due. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Herbert Xu says: ==================== rhashtable hash cleanups This is a rebase on top of the nested lock annotation fix. Nothing to see here, just a bunch of simple clean-ups before I move onto something more substantial (hopefully). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Herbert Xu authored
Now that the only caller of obj_raw_hashfn is head_hashfn, we can simply kill it and fold it into the latter. This patch also moves the common shift from head_hashfn/key_hashfn into rht_bucket_index. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Herbert Xu authored
key_hashfn has only one caller and it doesn't really need to supply the key length as an extra parameter. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Herbert Xu authored
Now that we don't have cross-table hashes, we no longer need to keep the entire hash value so all users of obj_raw_hashfn can use head_hashfn instead. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Herbert Xu authored
This patch reverts commit c88455ce ("rhashtable: key_hashfn() must return full hash value") because the only user of it always masks the hash value. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Julia Lawall authored
The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ type T; identifier f; @@ static T f (...) { ... } @@ identifier r.f; declarer name EXPORT_SYMBOL; @@ -EXPORT_SYMBOL(f); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Herbert Xu authored
Commit aa34a6cb ("rhashtable: Add arbitrary rehash function") killed the annotation on the nested lock which leads to bitching from lockdep. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
I forgot to use write_pnet() in three locations. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 33cf7c90 ("net: add real socket cookies") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lubomir Rintel authored
This makes it possible to retain the route preference when RAs are handled in userspace. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Simon Horman authored
Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-