1. 02 Aug, 2018 9 commits
  2. 01 Aug, 2018 31 commits
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm · 6b470376
      Linus Torvalds authored
      Pull ARM fix from Russell King:
       "Just a single fix this time around for recent binutils causing build
        problems when generating Thumb-2 code"
      
      * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: 8781/1: Fix Thumb-2 syscall return for binutils 2.29+
      6b470376
    • Vincent Bernat's avatar
      net: don't declare IPv6 non-local bind helper if CONFIG_IPV6 undefined · db57dc7c
      Vincent Bernat authored
      Fixes: 83ba4645 ("net: add helpers checking if socket can be bound to nonlocal address")
      Signed-off-by: default avatarVincent Bernat <vincent@bernat.im>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db57dc7c
    • Linus Torvalds's avatar
      mm: do not initialize TLB stack vma's with vma_init() · 8b11ec1b
      Linus Torvalds authored
      Commit 2c4541e2 ("mm: use vma_init() to initialize VMAs on stack and
      data segments") tried to initialize various left-over ad-hoc vma's
      "properly", but actually made things worse for the temporary vma's used
      for TLB flushing.
      
      vma_init() doesn't actually initialize all of the vma, just a few
      fields, so doing something like
      
         -       struct vm_area_struct vma = { .vm_mm = tlb->mm, };
         +       struct vm_area_struct vma;
         +
         +       vma_init(&vma, tlb->mm);
      
      was actually very bad: instead of having a nicely initialized vma with
      every field but "vm_mm" zeroed, you'd have an entirely uninitialized vma
      with only a couple of fields initialized.  And they weren't even fields
      that the code in question mostly cared about.
      
      The flush_tlb_range() function takes a "struct vma" rather than a
      "struct mm_struct", because a few architectures actually care about what
      kind of range it is - being able to only do an ITLB flush if it's a
      range that doesn't have data accesses enabled, for example.  And all the
      normal users already have the vma for doing the range invalidation.
      
      But a few people want to call flush_tlb_range() with a range they just
      made up, so they also end up using a made-up vma.  x86 just has a
      special "flush_tlb_mm_range()" function for this, but other
      architectures (arm and ia64) do the "use fake vma" thing instead, and
      thus got caught up in the vma_init() changes.
      
      At the same time, the TLB flushing code really doesn't care about most
      other fields in the vma, so vma_init() is just unnecessary and
      pointless.
      
      This fixes things by having an explicit "this is just an initializer for
      the TLB flush" initializer macro, which is used by the arm/arm64/ia64
      people who mis-use this interface with just a dummy vma.
      
      Fixes: 2c4541e2 ("mm: use vma_init() to initialize VMAs on stack and data segments")
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8b11ec1b
    • Hugh Dickins's avatar
      mm: delete historical BUG from zap_pmd_range() · 53406ed1
      Hugh Dickins authored
      Delete the old VM_BUG_ON_VMA() from zap_pmd_range(), which asserted
      that mmap_sem must be held when splitting an "anonymous" vma there.
      Whether that's still strictly true nowadays is not entirely clear,
      but the danger of sometimes crashing on the BUG is now fairly clear.
      
      Even with the new stricter rules for anonymous vma marking, the
      condition it checks for can possible trigger. Commit 44960f2a
      ("staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem
      pages") is good, and originally I thought it was safe from that
      VM_BUG_ON_VMA(), because the /dev/ashmem fd exposed to the user is
      disconnected from the vm_file in the vma, and madvise(,,MADV_REMOVE)
      insists on VM_SHARED.
      
      But after I read John's earlier mail, drawing attention to the
      vfs_fallocate() in there: I may be wrong, and I don't know if Android
      has THP in the config anyway, but it looks to me like an
      unmap_mapping_range() from ashmem's vfs_fallocate() could hit precisely
      the VM_BUG_ON_VMA(), once it's vma_is_anonymous().
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      53406ed1
    • David S. Miller's avatar
      Merge tag 'rxrpc-next-20180801' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · b69ab96a
      David S. Miller authored
      David Howells says:
      
      ====================
      rxrpc: Development
      
      Here are some patches that add some more tracepoints to AF_RXRPC and fix
      some issues therein.  The most significant points are:
      
       (1) Display the call timeout information in /proc/net/rxrpc/calls.
      
       (2) Save the call's debug_id in the rxrpc_channel struct so that it can be
           used in traces after the rxrpc_call struct has been destroyed.
      
       (3) Increase the size of the kAFS Rx window from 32 to 63 to be about the
           same as the Auristor server.
      
       (4) Propose the terminal ACK for a client call after it has received all
           its data to be transmitted after a short interval so that it will get
           transmitted if not first superseded by a new call on the same channel.
      
       (5) Flush ACKs during the data reception if we detect that we've run out
           of data.[*]
      
       (6) Trace successful packet transmission and softirq to process context
           socket notification.
      
      [*] Note that on a uncontended gigabit network, rxrpc runs in to trouble
          with ACK packets getting batched together (up to ~32 at a time)
          somewhere between the IP transmit queue on the client and the ethernet
          receive queue on the server.
      
          I can see the kernel afs filesystem client and Auristor userspace
          server stalling occasionally on a 512MB single read.  Sticking
          tracepoints in the network driver at either end seems to show that,
          although the ACK transmissions made by the client are reasonably spaced
          timewise, the received ACKs come in batches from the network card on
          the server.
      
          I'm not sure what, if anything, can be done about this.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b69ab96a
    • YueHaibing's avatar
      rxrpc: Fix user call ID check in rxrpc_service_prealloc_one · c01f6c9b
      YueHaibing authored
      There just check the user call ID isn't already in use, hence should
      compare user_call_ID with xcall->user_call_ID, which is current
      node's user_call_ID.
      
      Fixes: 540b1c48 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
      Suggested-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c01f6c9b
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 9a97ebf7
      Linus Torvalds authored
      Pull MMC fix from Ulf Hansson:
       "MMC host: mxcmmc: Fix build error for powerpc"
      
      * tag 'mmc-v4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: mxcmmc: Fix missing parentheses and brace
      9a97ebf7
    • Linus Torvalds's avatar
      Merge tag 'pm-urgent-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · f390b7bf
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix the scope of a recent intel_pstate driver optimization used
        incorrectly on some systems due to processor identification ambiguity
        and fix a few issues in the turbostat utility, including three recent
        regressions.
      
        Specifics:
      
         - Use ACPI FADT preferred PM Profile to distinguish Skylake desktop
           processors from some server ones with the same model number in
           order to limit the scope of the recent IO-wait boost optimization
           to servers, as intended (Srinivas Pandruvada).
      
         - Fix several issues in the turbostat utility:
            * Fix the -S option on 1-CPU systems (Len Brown).
            * Fix computations using incorrect processor core counts (Artem
              Bityutskiy).
            * Fix the x2apic debug message (Len Brown).
            * Fix logical node enumeration to allow for non-sequential
              physical nodes (Prarit Bhargava).
            * Fix reported family on modern AMD processors (Calvin Walton).
            * Clarify the RAPL column information in the man page (Len Brown)"
      
      * tag 'pm-urgent-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: intel_pstate: Limit the scope of HWP dynamic boost platforms
        tools/power turbostat: version 18.07.27
        tools/power turbostat: Read extended processor family from CPUID
        tools/power turbostat: Fix logical node enumeration to allow for non-sequential physical nodes
        tools/power turbostat: fix x2apic debug message output file
        tools/power turbostat: fix bogus summary values
        tools/power turbostat: fix -S on UP systems
        tools/power turbostat: Update turbostat(8) RAPL throttling column description
      f390b7bf
    • Linus Torvalds's avatar
      squashfs metadata 2: electric boogaloo · cdbb65c4
      Linus Torvalds authored
      Anatoly continues to find issues with fuzzed squashfs images.
      
      This time, corrupt, missing, or undersized data for the page filling
      wasn't checked for, because the squashfs_{copy,read}_cache() functions
      did the squashfs_copy_data() call without checking the resulting data
      size.
      
      Which could result in the page cache pages being incompletely filled in,
      and no error indication to the user space reading garbage data.
      
      So make a helper function for the "fill in pages" case, because the
      exact same incomplete sequence existed in two places.
      
      [ I should have made a squashfs branch for these things, but I didn't
        intend to start doing them in the first place.
      
        My historical connection through cramfs is why I got into looking at
        these issues at all, and every time I (continue to) think it's a
        one-off.
      
        Because _this_ time is always the last time. Right?   - Linus ]
      Reported-by: default avatarAnatoly Trosinenko <anatoly.trosinenko@gmail.com>
      Tested-by: default avatarWilly Tarreau <w@1wt.eu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Phillip Lougher <phillip@squashfs.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cdbb65c4
    • John Stultz's avatar
      staging: ashmem: Fix SIGBUS crash when traversing mmaped ashmem pages · 44960f2a
      John Stultz authored
      Amit Pundir and Youling in parallel reported crashes with recent
      mainline kernels running Android:
      
        F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
        F DEBUG   : Build fingerprint: 'Android/db410c32_only/db410c32_only:Q/OC-MR1/102:userdebug/test-key
        F DEBUG   : Revision: '0'
        F DEBUG   : ABI: 'arm'
        F DEBUG   : pid: 2261, tid: 2261, name: zygote  >>> zygote <<<
        F DEBUG   : signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 0xec00008
        ... <snip> ...
        F DEBUG   : backtrace:
        F DEBUG   :     #00 pc 00001c04  /system/lib/libc.so (memset+48)
        F DEBUG   :     #01 pc 0010c513  /system/lib/libart.so (create_mspace_with_base+82)
        F DEBUG   :     #02 pc 0015c601  /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateMspace(void*, unsigned int, unsigned int)+40)
        F DEBUG   :     #03 pc 0015c3ed  /system/lib/libart.so (art::gc::space::DlMallocSpace::CreateFromMemMap(art::MemMap*, std::__1::basic_string<char, std::__ 1::char_traits<char>, std::__1::allocator<char>> const&, unsigned int, unsigned int, unsigned int, unsigned int, bool)+36)
        ...
      
      This was bisected back to commit bfd40eaf ("mm: fix
      vma_is_anonymous() false-positives").
      
      create_mspace_with_base() in the trace above, utilizes ashmem, and with
      ashmem, for shared mappings we use shmem_zero_setup(), which sets the
      vma->vm_ops to &shmem_vm_ops.  But for private ashmem mappings nothing
      sets the vma->vm_ops.
      
      Looking at the problematic patch, it seems to add a requirement that one
      call vma_set_anonymous() on a vma, otherwise the dummy_vm_ops will be
      used.  Using the dummy_vm_ops seem to triggger SIGBUS when traversing
      unmapped pages.
      
      Thus, this patch adds a call to vma_set_anonymous() for ashmem private
      mappings and seems to avoid the reported problem.
      
      Fixes: bfd40eaf ("mm: fix vma_is_anonymous() false-positives")
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Colin Cross <ccross@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Reported-by: default avatarAmit Pundir <amit.pundir@linaro.org>
      Reported-by: default avatarYouling 257 <youling257@gmail.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44960f2a
    • Ganesh Goudar's avatar
      cxgb4: fix endian to test F_FW_PORT_CMD_DCBXDIS32 · 90d4c5bb
      Ganesh Goudar authored
      For FW_PORT_ACTION_GET_PORT_INFO32 messages, the
      u.info32.lstatus32_to_cbllen32 is 32-bit Big Endian.
      We need to translate that to CPU Endian in order to
      test F_FW_PORT_CMD_DCBXDIS32.
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90d4c5bb
    • David S. Miller's avatar
      Merge branch 'net-sched-cleanups' · cef238d7
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      net: sched: couple of adjustments/fixes
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cef238d7
    • Jiri Pirko's avatar
      net: sched: make tcf_chain_{get,put}() static · 290b1c8b
      Jiri Pirko authored
      These are no longer used outside of cls_api.c so make them static.
      Move tcf_chain_flush() to avoid fwd declaration of tcf_chain_put().
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      
      v1->v2:
      - new patch
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      290b1c8b
    • Jiri Pirko's avatar
      net: sched: fix notifications for action-held chains · 53681407
      Jiri Pirko authored
      Chains that only have action references serve as placeholders.
      Until a non-action reference is created, user should not be aware
      of the chain. Also he should not receive any notifications about it.
      So send notifications for the new chain only in case the chain gets
      the first non-action reference. Symmetrically to that, when
      the last non-action reference is dropped, send the notification about
      deleted chain.
      Reported-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      
      v1->v2:
      - made __tcf_chain_{get,put}() static as suggested by Cong
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53681407
    • Jiri Pirko's avatar
      net: sched: change name of zombie chain to "held_by_acts_only" · 3d32f4c5
      Jiri Pirko authored
      As mentioned by Cong and Jakub during the review process, it is a bit
      odd to sometimes (act flow) create a new chain which would be
      immediately a "zombie". So just rename it to "held_by_acts_only".
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Suggested-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Suggested-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d32f4c5
    • Huazhong Tan's avatar
      net: hns3: fix return value error while hclge_cmd_csq_clean failed · 4a62e252
      Huazhong Tan authored
      While cleaning the command queue, the value of the HEAD register is not
      in the range of next_to_clean and next_to_use, meaning that this value
      is invalid. This also means that there is a hardware error and the
      hardware will trigger a reset soon. At this time we should return an
      error code instead of 0, and HCLGE_STATE_CMD_DISABLE needs to be set to
      prevent sending command again.
      
      Fixes: 3ff50490 ("net: hns3: fix a dead loop in hclge_cmd_csq_clean")
      Signed-off-by: default avatarHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: default avatarSalil Mehta <salil.mehta@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a62e252
    • YueHaibing's avatar
      rds: remove redundant variable 'rds_ibdev' · 87f70132
      YueHaibing authored
      Variable 'rds_ibdev' is being assigned but never used,
      so can be removed.
      
      fix this clang warning:
       net/rds/ib_send.c:762:24: warning: variable ‘rds_ibdev’ set but not used [-Wunused-but-set-variable]
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87f70132
    • YueHaibing's avatar
      strparser: remove redundant variable 'rd_desc' · bd707f17
      YueHaibing authored
      Variable 'rd_desc' is being assigned but never used,
      so can be removed.
      
      fix this clang warning:
      net/strparser/strparser.c:411:20: warning: variable ‘rd_desc’ set but not used [-Wunused-but-set-variable]
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd707f17
    • YueHaibing's avatar
      ip_gre: remove redundant variables t_hlen · 1296ee8f
      YueHaibing authored
      After commit ffc2b6ee ("ip_gre: fix IFLA_MTU ignored on NEWLINK")
      variable t_hlen is assigned values that are never read,
      hence they are redundant and can be removed.
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1296ee8f
    • Linus Torvalds's avatar
      ia64: mark special ia64 memory areas anonymous · ebad825c
      Linus Torvalds authored
      Commit bfd40eaf ("mm: fix vma_is_anonymous() false-positives") made
      newly allocated vma's have a dummy vm_ops field so that they wouldn't be
      mistaken for anonymous mappings, and if you wanted an anonymous vma you
      had to explicitly say so by calling "vma_set_anonymous()" on it.
      
      However, it missed the two special vmas that ia64 processes have: the
      register backing store and the NaT page.  So they wouldn't actually act
      like anonymous ranges, and page faults on them caused a SIGBUS rather
      than the creation of a new anon page in them.
      
      That obviously will make any ia64 binary very unhappy indeed, and the
      boot fails early.
      
      Fixes: bfd40eaf ("mm: fix vma_is_anonymous() false-positives")
      Reported-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ebad825c
    • Wei Yongjun's avatar
      tcp: remove set but not used variable 'skb_size' · 13dde04f
      Wei Yongjun authored
      Fixes gcc '-Wunused-but-set-variable' warning:
      
      net/ipv4/tcp_output.c: In function 'tcp_collapse_retrans':
      net/ipv4/tcp_output.c:2700:6: warning:
       variable 'skb_size' set but not used [-Wunused-but-set-variable]
        int skb_size, next_skb_size;
            ^
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13dde04f
    • David S. Miller's avatar
      Merge branch 'tcp-add-4-new-stats' · fab9593d
      David S. Miller authored
      Wei Wang says:
      
      ====================
      tcp: add 4 new stats
      
      This patch series adds 3 RFC4898 stats:
      1. tcpEStatsPerfHCDataOctetsOut
      2. tcpEStatsPerfOctetsRetrans
      3. tcpEStatsStackDSACKDups
      and an addtional stat to record the number of data packet reordering
      events seen:
      4. tcp_reord_seen
      
      Together with the existing stats, application can use them to measure
      the retransmission rate in bytes, exclude spurious retransmissions
      reflected by DSACK, and keep track of the reordering events on live
      connections.
      In particular the networks with different MTUs make bytes-based loss stats
      more useful. Google servers have been using these stats for many years to
      instrument transport and network performance.
      
      Note: The first patch is a refactor to add a helper to calculate
      opt_stats size in order to make later changes cleaner.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fab9593d
    • Wei Wang's avatar
      tcp: add stat of data packet reordering events · 7ec65372
      Wei Wang authored
      Introduce a new TCP stats to record the number of reordering events seen
      and expose it in both tcp_info (TCP_INFO) and opt_stats
      (SOF_TIMESTAMPING_OPT_STATS).
      Application can use this stats to track the frequency of the reordering
      events in addition to the existing reordering stats which tracks the
      magnitude of the latest reordering event.
      
      Note: this new stats tracks reordering events triggered by ACKs, which
      could often be fewer than the actual number of packets being delivered
      out-of-order.
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ec65372
    • Wei Wang's avatar
      tcp: add dsack blocks received stats · 7e10b655
      Wei Wang authored
      Introduce a new TCP stat to record the number of DSACK blocks received
      (RFC4989 tcpEStatsStackDSACKDups) and expose it in both tcp_info
      (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS).
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e10b655
    • Wei Wang's avatar
      tcp: add data bytes retransmitted stats · fb31c9b9
      Wei Wang authored
      Introduce a new TCP stat to record the number of bytes retransmitted
      (RFC4898 tcpEStatsPerfOctetsRetrans) and expose it in both tcp_info
      (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS).
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb31c9b9
    • Wei Wang's avatar
      tcp: add data bytes sent stats · ba113c3a
      Wei Wang authored
      Introduce a new TCP stat to record the number of bytes sent
      (RFC4898 tcpEStatsPerfHCDataOctetsOut) and expose it in both tcp_info
      (TCP_INFO) and opt_stats (SOF_TIMESTAMPING_OPT_STATS).
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba113c3a
    • Wei Wang's avatar
      tcp: add a helper to calculate size of opt_stats · 984988aa
      Wei Wang authored
      This is to refactor the calculation of the size of opt_stats to a helper
      function to make the code cleaner and easier for later changes.
      Suggested-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      984988aa
    • Florian Fainelli's avatar
      net: dsa: Do not suspend/resume closed slave_dev · a94c689e
      Florian Fainelli authored
      If a DSA slave network device was previously disabled, there is no need
      to suspend or resume it.
      
      Fixes: 24462549 ("net: dsa: allow switch drivers to implement suspend/resume hooks")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a94c689e
    • David S. Miller's avatar
      Merge branch 'ipv4-Control-SKB-reprioritization-after-forwarding' · 53dd9652
      David S. Miller authored
      Petr Machata says:
      
      ====================
      ipv4: Control SKB reprioritization after forwarding
      
      After IPv4 packets are forwarded, the priority of the corresponding SKB
      is updated according to the TOS field of IPv4 header. This overrides any
      prioritization done earlier by e.g. an skbedit action or ingress-qos-map
      defined at a vlan device.
      
      Such overriding may not always be desirable. Even if the packet ends up
      being routed, which implies this is an L3 network node, an administrator
      may wish to preserve whatever prioritization was done earlier on in the
      pipeline.
      
      Therefore this patch set introduces a sysctl that controls this
      behavior, net.ipv4.ip_forward_update_priority. It's value is 1 by
      default to preserve the current behavior.
      
      All of the above is implemented in patch #1.
      
      Value changes prompt a new NETEVENT_IPV4_FWD_UPDATE_PRIORITY_UPDATE
      notification, so that the drivers can hook up whatever logic may depend
      on this value. That is implemented in patch #2.
      
      In patches #3 and #4, mlxsw is adapted to recognize the sysctl. On
      initialization, the RGCR register that handles router configuration is
      set in accordance with the sysctl. The new notification is listened to
      and RGCR is reconfigured as necessary.
      
      In patches #5 to #7, a selftest is added to verify that mlxsw reflects
      the sysctl value as necessary. The test is expressed in terms of the
      recently-introduced ieee_setapp support, and works by observing how DSCP
      value gets rewritten depending on packet priority. For this reason, the
      test is added to the subdirectory drivers/net/mlxsw. Even though it's
      not particularly specific to mlxsw, it's not suitable for running on
      soft devices (which don't support the ieee_setapp et.al.).
      
      Changes from v1 to v2:
      
      - In patch #1, init sysctl_ip_fwd_update_priority to 1 instead of true.
      
      Changes from RFC to v1:
      
      - Fix wrong sysctl name in ip-sysctl.txt
      - Add notifications
      - Add mlxsw support
      - Add self test
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53dd9652
    • Petr Machata's avatar
      selftests: mlxsw: Add test for ip_forward_update_priority · 9bae0451
      Petr Machata authored
      Verify that with that sysctl turned off, DSCP prioritization and rewrite
      works the same way as in qos_dscp_bridge test. However when the sysctl
      is charged, there should be a reprioritization after routing stage,
      which will be observed by a different DSCP rewrite on egress.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9bae0451
    • Petr Machata's avatar
      selftests: forwarding: Move DSCP capture to lib.sh · cf608698
      Petr Machata authored
      dscp_capture_install() and dscp_capture_uninstall() are going to be
      useful for a test added by a following patch, move them therefore to
      lib.sh together with related helpers.
      
      While doing so, change the rule preference from mere DSCP value to
      DSCP+100 in order to support adding captures of packets with DSCP of 0.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf608698