- 30 Nov, 2017 23 commits
-
-
Paolo Abeni authored
Since commit e32ea7e7 ("soreuseport: fast reuseport UDP socket selection") and commit c125e80b ("soreuseport: fast reuseport TCP socket selection") the relevant reuseport socket matching the current packet is selected by the reuseport_select_sock() call. The only exceptions are invalid BPF filters/filters returning out-of-range indices. In the latter case the code implicitly falls back to using the hash demultiplexing, but instead of selecting the socket inside the reuseport_select_sock() function, it relies on the hash selection logic introduced with the early soreuseport implementation. With this patch, in case of a BPF filter returning a bad socket index value, we fall back to hash-based selection inside the reuseport_select_sock() body, so that we can drop some duplicate code in the ipv4 and ipv6 stack. This also allows faster lookup in the above scenario and will allow us to avoid computing the hash value for successful, BPF based demultiplexing - in a later patch. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Linus Walleij authored
This is not supported anymore, devices needing a MAC address just assign one at random, it's just a driver pecularity. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
David Miller says: ==================== net: Significantly shrink the size of routes. Through a combination of several things, our route structures are larger than they need to be. Mostly this stems from having members in dst_entry which are only used by one class of routes. So the majority of the work in this series is about "un-commoning" these members and pushing them into the type specific structures. Unfortunately, IPSEC needed the most surgery. The majority of the changes here had to do with bundle creation and management. The other issue is the refcount alignment in dst_entry. Once we get rid of the not-so-common members, it really opens the door to removing that alignment entirely. I think the new layout looks really nice, so I'll reproduce it here: struct net_device *dev; struct dst_ops *ops; unsigned long _metrics; unsigned long expires; struct xfrm_state *xfrm; int (*input)(struct sk_buff *); int (*output)(struct net *net, struct sock *sk, struct sk_buff *skb); unsigned short flags; short obsolete; unsigned short header_len; unsigned short trailer_len; atomic_t __refcnt; int __use; unsigned long lastuse; struct lwtunnel_state *lwtstate; struct rcu_head rcu_head; short error; short __pad; __u32 tclassid; (This is for 64-bit, on 32-bit the __refcnt comes at the very end) So, the good news: 1) struct dst_entry shrinks from 160 to 112 bytes. 2) struct rtable shrinks from 216 to 168 bytes. 3) struct rt6_info shrinks from 384 to 320 bytes. Enjoy. v2: Collapse some patches logically based upon feedback. Fix the strange patch #7. v3: xfrm_dst_path() needs inline keyword Properly align __refcnt on 32-bit. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Miller authored
There are no more users. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
-
David Miller authored
While building ipsec bundles, blocks of xfrm dsts are linked together using dst->next from bottom to the top. The only thing this is used for is initializing the pmtu values of the xfrm stack, and for updating the mtu values at xfrm_bundle_ok() time. The bundle pmtu entries must be processed in this order so that pmtu values lower in the stack of routes can propagate up to the higher ones. Avoid using dst->next by simply maintaining an array of dst pointers as we already do for the xfrm_state objects when building the bundle. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
-
David Miller authored
We have padding to try and align the refcount on a separate cache line. But after several simplifications the padding has increased substantially. So now it's easy to change the layout to get rid of the padding entirely. We group the write-heavy __refcnt and __use with less often used items such as the rcu_head and the error code. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
-
David Miller authored
The first member of an IPSEC route bundle chain sets it's dst->path to the underlying ipv4/ipv6 route that carries the bundle. Stated another way, if one were to follow the xfrm_dst->child chain of the bundle, the final non-NULL pointer would be the path and point to either an ipv4 or an ipv6 route. This is largely used to make sure that PMTU events propagate down to the correct ipv4 or ipv6 route. When we don't have the top of an IPSEC bundle 'dst->path == dst'. Move it down into xfrm_dst and key off of dst->xfrm. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
-
David Miller authored
The dst->from value is only used by ipv6 routes to track where a route "came from". Any time we clone or copy a core ipv6 route in the ipv6 routing tables, we have the copy/clone's ->from point to the base route. This is used to handle route expiration properly. Only ipv6 uses this mechanism, and only ipv6 code references it. So it is safe to move it into rt6_info. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
-
David Miller authored
XFRM bundle child chains look like this: xdst1 --> xdst2 --> xdst3 --> path_dst All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL. The final child pointer in the chain, here called 'path_dst', is some other kind of route such as an ipv4 or ipv6 one. The xfrm output path pops routes, one at a time, via the child pointer, until we hit one which has a dst->xfrm pointer which is NULL. We can easily preserve the above mechanisms with child sitting only in the xfrm_dst structure. All children in the chain before we break out of the xfrm_output() loop have dst->xfrm non-NULL and are therefore xfrm_dst objects. Since we break out of the loop when we find dst->xfrm NULL, we will not try to dereference 'dst' as if it were an xfrm_dst. Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Miller authored
This will make a future change moving the dst->child pointer less invasive. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
-
David Miller authored
Only IPSEC routes have a non-NULL dst->child pointer. And IPSEC routes are identified by a non-NULL dst->xfrm pointer. Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Miller authored
Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
-
David Miller authored
Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
-
David Miller authored
Delete it. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <edumazet@google.com>
-
Zhu Yanjun authored
In xmit, it is very impossible that TX_ERROR occurs. So using unlikely optimizes the xmit process. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Joe Jin <joe.jin@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tina Ruchandani authored
net/atm/mpoa_* files use 'struct timeval' to store event timestamps. struct timeval uses a 32-bit seconds field which will overflow in the year 2038 and beyond. Morever, the timestamps are being compared only to get seconds elapsed, so struct timeval which stores a seconds and microseconds field is an overkill. This patch replaces the use of struct timeval with time64_t to store a 64-bit seconds field. Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
There are several statements that have incorrect indentation. Fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
timespec is deprecated because of the y2038 overflow, so let's convert this one to ktime_get_ts64(). The code is already safe even on 32-bit architectures, since it uses monotonic times. On 64-bit architectures, nothing changes, while on 32-bit architectures this avoids one type conversion. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
netxen_collect_minidump() evidently just wants to get a monotonic timestamp. Using jiffies_to_timespec(jiffies, &ts) is not appropriate here, since it will overflow after 2^32 jiffies, which may be as short as 49 days of uptime. ktime_get_seconds() is the correct interface here. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Richard Leitner authored
Previously phy_id was u32 and phy_id_mask was unsigned int. As the phy_id_mask defines the important bits of the phy_id (and is therefore the same size) these two variables should be the same data type. Signed-off-by: Richard Leitner <richard.leitner@skidata.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Lukas Wunner authored
No need to reinvent the wheel, we have bus_find_device_by_name(). Cc: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sunil Goutham authored
on T81 there are only 4 cores, hence setting max queue count to 4 would leave nothing for XDP_TX. This patch fixes this by doubling max queue count in above scenarios. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: cjacob <cjacob@caviumnetworks.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sunil Goutham authored
This patch adds support for XDP_REDIRECT. Flush is not yet supported. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: cjacob <cjacob@caviumnetworks.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 29 Nov, 2017 17 commits
-
-
git://linux-nfs.org/~bfields/linuxLinus Torvalds authored
Pull nfsd fixes from Bruce Fields: "I screwed up my merge window pull request; I only sent half of what I meant to. There were no new features, just bugfixes of various importance and some very minor cleanup, so I think it's all still appropriate for -rc2. Highlights: - Fixes from Trond for some races in the NFSv4 state code. - Fix from Naofumi Honda for a typo in the blocked lock notificiation code - Fixes from Vasily Averin for some problems starting and stopping lockd especially in network namespaces" * tag 'nfsd-4.15-1' of git://linux-nfs.org/~bfields/linux: (23 commits) lockd: fix "list_add double add" caused by legacy signal interface nlm_shutdown_hosts_net() cleanup race of nfsd inetaddr notifiers vs nn->nfsd_serv change race of lockd inetaddr notifiers vs nlmsvc_rqst change SUNRPC: make cache_detail structures const NFSD: make cache_detail structures const sunrpc: make the function arg as const nfsd: check for use of the closed special stateid nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat lockd: lost rollback of set_grace_period() in lockd_down_net() lockd: added cleanup checks in exit_net hook grace: replace BUG_ON by WARN_ONCE in exit_net hook nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class lockd: remove net pointer from messages nfsd: remove net pointer from debug messages nfsd: Fix races with check_stateid_generation() nfsd: Ensure we check stateid validity in the seqid operation checks nfsd: Fix race in lock stateid creation nfsd4: move find_lock_stateid nfsd: Ensure we don't recognise lock stateids after freeing them ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fixes from David Sterba: "We've collected some fixes in since the pre-merge window freeze. There's technically only one regression fix for 4.15, but the rest seems important and candidates for stable. - fix missing flush bio puts in error cases (is serious, but rarely happens) - fix reporting stat::st_blocks for buffered append writes - fix space cache invalidation - fix out of bound memory access when setting zlib level - fix potential memory corruption when fsync fails in the middle - fix crash in integrity checker - incremetnal send fix, path mixup for certain unlink/rename combination - pass flags to writeback so compressed writes can be throttled properly - error handling fixes" * tag 'for-4.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: incremental send, fix wrong unlink path after renaming file btrfs: tree-checker: Fix false panic for sanity test Btrfs: fix list_add corruption and soft lockups in fsync btrfs: Fix wild memory access in compression level parser btrfs: fix deadlock when writing out space cache btrfs: clear space cache inode generation always Btrfs: fix reported number of inode blocks after buffered append writes Btrfs: move definition of the function btrfs_find_new_delalloc_bytes Btrfs: bail out gracefully rather than BUG_ON btrfs: dev_alloc_list is not protected by RCU, use normal list_del btrfs: add missing device::flush_bio puts btrfs: Fix transaction abort during failure in btrfs_rm_dev_item Btrfs: add write_flags for compression bio
-
git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds authored
Pull Microblaze fix from Michal Simek: "Add missing header to mmu_context_mm.h" * tag 'microblaze-4.15-rc2' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: add missing include to mmu_context_mm.h
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds authored
Pull sparc fix from David Miller: "Sparc T4 and later cpu bootup regression fix" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Fix boot on T4 and later.
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds authored
Pull networking fixes from David Miller: 1) The forcedeth conversion from pci_*() DMA interfaces to dma_*() ones missed one spot. From Zhu Yanjun. 2) Missing CRYPTO_SHA256 Kconfig dep in cfg80211, from Johannes Berg. 3) Fix checksum offloading in thunderx driver, from Sunil Goutham. 4) Add SPDX to vm_sockets_diag.h, from Stephen Hemminger. 5) Fix use after free of packet headers in TIPC, from Jon Maloy. 6) "sizeof(ptr)" vs "sizeof(*ptr)" bug in i40e, from Gustavo A R Silva. 7) Tunneling fixes in mlxsw driver, from Petr Machata. 8) Fix crash in fanout_demux_rollover() of AF_PACKET, from Mike Maloney. 9) Fix race in AF_PACKET bind() vs. NETDEV_UP notifier, from Eric Dumazet. 10) Fix regression in sch_sfq.c due to one of the timer_setup() conversions. From Paolo Abeni. 11) SCTP does list_for_each_entry() using wrong struct member, fix from Xin Long. 12) Don't use big endian netlink attribute read for IFLA_BOND_AD_ACTOR_SYSTEM, it is in cpu endianness. Also from Xin Long. 13) Fix mis-initialization of q->link.clock in CBQ scheduler, preventing adding filters there. From Jiri Pirko. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits) ethernet: dwmac-stm32: Fix copyright net: via: via-rhine: use %p to format void * address instead of %x net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit myri10ge: Update MAINTAINERS net: sched: cbq: create block for q->link.block atm: suni: remove extraneous space to fix indentation atm: lanai: use %p to format kernel addresses instead of %x VSOCK: Don't set sk_state to TCP_CLOSE before testing it atm: fore200e: use %pK to format kernel addresses instead of %x ambassador: fix incorrect indentation of assignment statement vxlan: use __be32 type for the param vni in __vxlan_fdb_delete bonding: use nla_get_u64 to extract the value for IFLA_BOND_AD_ACTOR_SYSTEM sctp: use right member as the param of list_for_each_entry sch_sfq: fix null pointer dereference at timer expiration cls_bpf: don't decrement net's refcount when offload fails net/packet: fix a race in packet_bind() and packet_notifier() packet: fix crash in fanout_demux_rollover() sctp: remove extern from stream sched sctp: force the params with right types for sctp csum apis sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail ...
-
David S. Miller authored
If we don't put the NG4fls.o object into the same part of the link as the generic sparc64 objects for fls() and __fls() then the relocation in the branch we use for patching will not fit. Move NG4fls.o into lib-y to fix this problem. Fixes: 46ad8d2d ("sparc64: Use sparc optimized fls and __fls for T4 and above") Signed-off-by: David S. Miller <davem@davemloft.net> Reported-by: Anatoly Pugachev <matorola@gmail.com> Tested-by: Anatoly Pugachev <matorola@gmail.com>
-
Linus Torvalds authored
Instead, just fall back on the new '%p' behavior which hashes the pointer. Otherwise, '%pK' - that was intended to mark a pointer as restricted - just ends up leaking pointers that a normal '%p' wouldn't leak. Which just make the whole thing pointless. I suspect we should actually get rid of '%pK' entirely, and make it just work as '%p' regardless, but this is the minimal obvious fix. People who actually use 'kptr_restrict' should weigh in on which behavior they want. Cc: Tobin Harding <me@tobin.cc> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
The conditional kallsym hex printing used a special fixed-width '%lx' output (KALLSYM_FMT) in preparation for the hashing of %p, but that series ended up adding a %px specifier to help with the conversions. Use it, and avoid the "print pointer as an unsigned long" code. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://github.com/tcharding/linuxLinus Torvalds authored
Pull printk pointer hashing update from Tobin Harding: "Here is the patch set that implements hashing of printk specifier %p. First we have two clean up patches then we do the hashing. Hashing is done via the SipHash algorithm. The next patch adds printk specifier %px for printing pointers when we _really_ want to see the address i.e %px is functionally equivalent to %lx. Final patch in the set fixes KASAN since we break it by hashing %p. For the record here is the justification for the series: Currently there exist approximately 14 000 places in the Kernel where addresses are being printed using an unadorned %p. This potentially leaks sensitive information about the Kernel layout in memory. Many of these calls are stale, instead of fixing every call we hash the address by default before printing. We then add %px to provide a way to print the actual address. Although this is achievable using %lx, using %px will assist us if we ever want to change pointer printing behaviour. %px is more uniquely grep'able (there are already >50 000 uses of %lx). The added advantage of hashing %p is that security is now opt-out, if you _really_ want the address you have to work a little harder and use %px. This will of course break some users, forcing code printing needed addresses to be updated" [ I do expect this to be an annoyance, and a number of %px users to be added for debuggability. But nobody is willing to audit existing %p users for information leaks, and a number of places really only use the pointer as an object identifier rather than really 'I need the address'. IOW - sorry for the inconvenience, but it's the least inconvenient of the options. - Linus ] * tag 'printk-hash-pointer-4.15-rc2' of git://github.com/tcharding/linux: kasan: use %px to print addresses instead of %p vsprintf: add printk specifier %px printk: hash addresses printed with %p vsprintf: refactor %pK code out of pointer() docs: correct documentation for %pK
-
Linus Torvalds authored
This reverts commit 152e93af. It was a nice cleanup in theory, but as Nicolai Stange points out, we do need to make the page dirty for the copy-on-write case even when we didn't end up making it writable, since the dirty bit is what we use to check that we've gone through a COW cycle. Reported-by: Michal Hocko <mhocko@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Benjamin Gaignard authored
Uniformize STMicroelectronics copyrights header Signed-off-by: Benjamin Gaignard <benjamin.gaignard@st.com> CC: Alexandre Torgue <alexandre.torgue@st.com> Acked-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
Don't use %x and casting to print out an address, instead use %p and remove the casting. Cleans up smatch warnings: drivers/net/ethernet/via/via-rhine.c:998 rhine_init_one_common() warn: argument 4 to %lx specifier is cast from pointer Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Geert Uytterhoeven authored
On 64-bit (e.g. powerpc64/allmodconfig): drivers/net/ethernet/xilinx/ll_temac_main.c: In function 'temac_start_xmit_done': drivers/net/ethernet/xilinx/ll_temac_main.c:633:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] dev_kfree_skb_irq((struct sk_buff *)cur_p->app4); ^ cdmac_bd.app4 is u32, so it is too small to hold a kernel pointer. Note that several other fields in struct cdmac_bd are also too small to hold physical addresses on 64-bit platforms. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hyong-Youb Kim authored
Change the maintainer to Chris Lee who has access to Myricom hardware and can test/review. Update the website URL. Signed-off-by: Hyong-Youb Kim <hykim@myri.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tobin C. Harding authored
Pointers printed with %p are now hashed by default. Kasan needs the actual address. We can use the new printk specifier %px for this purpose. Use %px instead of %p to print addresses. Signed-off-by: Tobin C. Harding <me@tobin.cc>
-
Tobin C. Harding authored
printk specifier %p now hashes all addresses before printing. Sometimes we need to see the actual unmodified address. This can be achieved using %lx but then we face the risk that if in future we want to change the way the Kernel handles printing of pointers we will have to grep through the already existent 50 000 %lx call sites. Let's add specifier %px as a clear, opt-in, way to print a pointer and maintain some level of isolation from all the other hex integer output within the Kernel. Add printk specifier %px to print the actual unmodified address. Signed-off-by: Tobin C. Harding <me@tobin.cc>
-
Tobin C. Harding authored
Currently there exist approximately 14 000 places in the kernel where addresses are being printed using an unadorned %p. This potentially leaks sensitive information regarding the Kernel layout in memory. Many of these calls are stale, instead of fixing every call lets hash the address by default before printing. This will of course break some users, forcing code printing needed addresses to be updated. Code that _really_ needs the address will soon be able to use the new printk specifier %px to print the address. For what it's worth, usage of unadorned %p can be broken down as follows (thanks to Joe Perches). $ git grep -E '%p[^A-Za-z0-9]' | cut -f1 -d"/" | sort | uniq -c 1084 arch 20 block 10 crypto 32 Documentation 8121 drivers 1221 fs 143 include 101 kernel 69 lib 100 mm 1510 net 40 samples 7 scripts 11 security 166 sound 152 tools 2 virt Add function ptr_to_id() to map an address to a 32 bit unique identifier. Hash any unadorned usage of specifier %p and any malformed specifiers. Signed-off-by: Tobin C. Harding <me@tobin.cc>
-