1. 04 Aug, 2015 2 commits
    • Johannes Berg's avatar
      iwlwifi: mvm: LRU-assign key offsets · 2dc2a15e
      Johannes Berg authored
      The current key offset assignment algorithm always uses the lowest
      unused key offset, which will potentially lead to issues when the
      firmware will change to take the key material for TX from the key
      table rather than from the TX command.
      
      In order to avoid those issues (and avoid forgetting about them)
      change the key offset allocation algorithm now to avoid reusing key
      offsets quickly.
      
      The new algorithm always picks as the next offset the least recently
      freed offset, i.e. the offset that has been unused for the longest
      amount of time. This is implemented by having a generation counter
      for each key offset that is incremented every time a key is deleted,
      except for the one that's deleted, which is reset to zero. Thus the
      highest counter is the key that's been unused longest.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      2dc2a15e
    • Haim Dreyfuss's avatar
      iwlwifi: pcie: Set scheduler to work on auto mode · 94ce9e5e
      Haim Dreyfuss authored
      During NIC initialization shared HW is reset and this disables the
      scheduler. Some HW platforms do not activate the scheduler after it.
      Consequently all HCMD sent by the driver stay at the queues which cause
      to queue stuck.
      Set the scheduler to work on auto active mode so it would be activated upon
      change over one of the queues' write pointer.
      Signed-off-by: default avatarHaim Dreyfuss <haim.dreyfuss@intel.com>
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      94ce9e5e
  2. 26 Jun, 2015 5 commits
  3. 24 Jun, 2015 33 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next · e0456717
      Linus Torvalds authored
      Pull networking updates from David Miller:
      
       1) Add TX fast path in mac80211, from Johannes Berg.
      
       2) Add TSO/GRO support to ibmveth, from Thomas Falcon
      
       3) Move away from cached routes in ipv6, just like ipv4, from Martin
          KaFai Lau.
      
       4) Lots of new rhashtable tests, from Thomas Graf.
      
       5) Run ingress qdisc lockless, from Alexei Starovoitov.
      
       6) Allow servers to fetch TCP packet headers for SYN packets of new
          connections, for fingerprinting.  From Eric Dumazet.
      
       7) Add mode parameter to pktgen, for testing receive.  From Alexei
          Starovoitov.
      
       8) Cache access optimizations via simplifications of build_skb(), from
          Alexander Duyck.
      
       9) Move page frag allocator under mm/, also from Alexander.
      
      10) Add xmit_more support to hv_netvsc, from KY Srinivasan.
      
      11) Add a counter guard in case we try to perform endless reclassify
          loops in the packet scheduler.
      
      12) Extern flow dissector to be programmable and use it in new "Flower"
          classifier.  From Jiri Pirko.
      
      13) AF_PACKET fanout rollover fixes, performance improvements, and new
          statistics.  From Willem de Bruijn.
      
      14) Add netdev driver for GENEVE tunnels, from John W Linville.
      
      15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.
      
      16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.
      
      17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
          Borkmann.
      
      18) Add tail call support to BPF, from Alexei Starovoitov.
      
      19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.
      
      20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.
      
      21) Favor even port numbers for allocation to connect() requests, and
          odd port numbers for bind(0), in an effort to help avoid
          ip_local_port_range exhaustion.  From Eric Dumazet.
      
      22) Add Cavium ThunderX driver, from Sunil Goutham.
      
      23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
          from Alexei Starovoitov.
      
      24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.
      
      25) Double TCP Small Queues default to 256K to accomodate situations
          like the XEN driver and wireless aggregation.  From Wei Liu.
      
      26) Add more entropy inputs to flow dissector, from Tom Herbert.
      
      27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
          Jonassen.
      
      28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.
      
      29) Track and act upon link status of ipv4 route nexthops, from Andy
          Gospodarek.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
        bridge: vlan: flush the dynamically learned entries on port vlan delete
        bridge: multicast: add a comment to br_port_state_selection about blocking state
        net: inet_diag: export IPV6_V6ONLY sockopt
        stmmac: troubleshoot unexpected bits in des0 & des1
        net: ipv4 sysctl option to ignore routes when nexthop link is down
        net: track link-status of ipv4 nexthops
        net: switchdev: ignore unsupported bridge flags
        net: Cavium: Fix MAC address setting in shutdown state
        drivers: net: xgene: fix for ACPI support without ACPI
        ip: report the original address of ICMP messages
        net/mlx5e: Prefetch skb data on RX
        net/mlx5e: Pop cq outside mlx5e_get_cqe
        net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
        net/mlx5e: Remove extra spaces
        net/mlx5e: Avoid TX CQE generation if more xmit packets expected
        net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
        net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
        net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
        net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
        net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
        ...
      e0456717
    • Linus Torvalds's avatar
      Merge branch 'sched-hrtimers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 98ec21a0
      Linus Torvalds authored
      Pull scheduler updates from Thomas Gleixner:
       "This series of scheduler updates depends on sched/core and timers/core
        branches, which are already in your tree:
      
         - Scheduler balancing overhaul to plug a hard to trigger race which
           causes an oops in the balancer (Peter Zijlstra)
      
         - Lockdep updates which are related to the balancing updates (Peter
           Zijlstra)"
      
      * 'sched-hrtimers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched,lockdep: Employ lock pinning
        lockdep: Implement lock pinning
        lockdep: Simplify lock_release()
        sched: Streamline the task migration locking a little
        sched: Move code around
        sched,dl: Fix sched class hopping CBS hole
        sched, dl: Convert switched_{from, to}_dl() / prio_changed_dl() to balance callbacks
        sched,dl: Remove return value from pull_dl_task()
        sched, rt: Convert switched_{from, to}_rt() / prio_changed_rt() to balance callbacks
        sched,rt: Remove return value from pull_rt_task()
        sched: Allow balance callbacks for check_class_changed()
        sched: Use replace normalize_task() with __sched_setscheduler()
        sched: Replace post_schedule with a balance callback list
      98ec21a0
    • Linus Torvalds's avatar
      Merge branch 'sched-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a2629483
      Linus Torvalds authored
      Pull locking updates from Thomas Gleixner:
       "These locking updates depend on the alreay merged sched/core branch:
      
         - Lockless top waiter wakeup for rtmutex (Davidlohr)
      
         - Reduce hash bucket lock contention for PI futexes (Sebastian)
      
         - Documentation update (Davidlohr)"
      
      * 'sched-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/rtmutex: Update stale plist comments
        futex: Lower the lock contention on the HB lock during wake up
        locking/rtmutex: Implement lockless top-waiter wakeup
      a2629483
    • Linus Torvalds's avatar
      Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · e3d8238d
      Linus Torvalds authored
      Pull arm64 updates from Catalin Marinas:
       "Mostly refactoring/clean-up:
      
         - CPU ops and PSCI (Power State Coordination Interface) refactoring
           following the merging of the arm64 ACPI support, together with
           handling of Trusted (secure) OS instances
      
         - Using fixmap for permanent FDT mapping, removing the initial dtb
           placement requirements (within 512MB from the start of the kernel
           image).  This required moving the FDT self reservation out of the
           memreserve processing
      
         - Idmap (1:1 mapping used for MMU on/off) handling clean-up
      
         - Removing flush_cache_all() - not safe on ARM unless the MMU is off.
           Last stages of CPU power down/up are handled by firmware already
      
         - "Alternatives" (run-time code patching) refactoring and support for
           immediate branch patching, GICv3 CPU interface access
      
         - User faults handling clean-up
      
        And some fixes:
      
         - Fix for VDSO building with broken ELF toolchains
      
         - Fix another case of init_mm.pgd usage for user mappings (during
           ASID roll-over broadcasting)
      
         - Fix for FPSIMD reloading after CPU hotplug
      
         - Fix for missing syscall trace exit
      
         - Workaround for .inst asm bug
      
         - Compat fix for switching the user tls tpidr_el0 register"
      
      * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits)
        arm64: use private ratelimit state along with show_unhandled_signals
        arm64: show unhandled SP/PC alignment faults
        arm64: vdso: work-around broken ELF toolchains in Makefile
        arm64: kernel: rename __cpu_suspend to keep it aligned with arm
        arm64: compat: print compat_sp instead of sp
        arm64: mm: Fix freeing of the wrong memmap entries with !SPARSEMEM_VMEMMAP
        arm64: entry: fix context tracking for el0_sp_pc
        arm64: defconfig: enable memtest
        arm64: mm: remove reference to tlb.S from comment block
        arm64: Do not attempt to use init_mm in reset_context()
        arm64: KVM: Switch vgic save/restore to alternative_insn
        arm64: alternative: Introduce feature for GICv3 CPU interface
        arm64: psci: fix !CONFIG_HOTPLUG_CPU build warning
        arm64: fix bug for reloading FPSIMD state after CPU hotplug.
        arm64: kernel thread don't need to save fpsimd context.
        arm64: fix missing syscall trace exit
        arm64: alternative: Work around .inst assembler bugs
        arm64: alternative: Merge alternative-asm.h into alternative.h
        arm64: alternative: Allow immediate branch as alternative instruction
        arm64: Rework alternate sequence for ARM erratum 845719
        ...
      e3d8238d
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 4e241557
      Linus Torvalds authored
      Pull first batch of KVM updates from Paolo Bonzini:
       "The bulk of the changes here is for x86.  And for once it's not for
        silicon that no one owns: these are really new features for everyone.
      
        Details:
      
         - ARM:
              several features are in progress but missed the 4.2 deadline.
              So here is just a smattering of bug fixes, plus enabling the
              VFIO integration.
      
         - s390:
              Some fixes/refactorings/optimizations, plus support for 2GB
              pages.
      
         - x86:
              * host and guest support for marking kvmclock as a stable
                scheduler clock.
              * support for write combining.
              * support for system management mode, needed for secure boot in
                guests.
              * a bunch of cleanups required for the above
              * support for virtualized performance counters on AMD
              * legacy PCI device assignment is deprecated and defaults to "n"
                in Kconfig; VFIO replaces it
      
              On top of this there are also bug fixes and eager FPU context
              loading for FPU-heavy guests.
      
         - Common code:
              Support for multiple address spaces; for now it is used only for
              x86 SMM but the s390 folks also have plans"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
        KVM: s390: clear floating interrupt bitmap and parameters
        KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs
        KVM: x86/vPMU: Implement AMD vPMU code for KVM
        KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch
        KVM: x86/vPMU: introduce kvm_pmu_msr_idx_to_pmc
        KVM: x86/vPMU: reorder PMU functions
        KVM: x86/vPMU: whitespace and stylistic adjustments in PMU code
        KVM: x86/vPMU: use the new macros to go between PMC, PMU and VCPU
        KVM: x86/vPMU: introduce pmu.h header
        KVM: x86/vPMU: rename a few PMU functions
        KVM: MTRR: do not map huge page for non-consistent range
        KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type
        KVM: MTRR: introduce mtrr_for_each_mem_type
        KVM: MTRR: introduce fixed_mtrr_addr_* functions
        KVM: MTRR: sort variable MTRRs
        KVM: MTRR: introduce var_mtrr_range
        KVM: MTRR: introduce fixed_mtrr_segment table
        KVM: MTRR: improve kvm_mtrr_get_guest_memory_type
        KVM: MTRR: do not split 64 bits MSR content
        KVM: MTRR: clean up mtrr default type
        ...
      4e241557
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux · 08d183e3
      Linus Torvalds authored
      Pull powerpc updates from Michael Ellerman:
      
       - disable the 32-bit vdso when building LE, so we can build with a
         64-bit only toolchain.
      
       - EEH fixes from Gavin & Richard.
      
       - enable the sys_kcmp syscall from Laurent.
      
       - sysfs control for fastsleep workaround from Shreyas.
      
       - expose OPAL events as an irq chip by Alistair.
      
       - MSI ops moved to pci_controller_ops by Daniel.
      
       - fix for kernel to userspace backtraces for perf from Anton.
      
       - merge pseries and pseries_le defconfigs from Cyril.
      
       - CXL in-kernel API from Mikey.
      
       - OPAL prd driver from Jeremy.
      
       - fix for DSCR handling & tests from Anshuman.
      
       - Powernv flash mtd driver from Cyril.
      
       - dynamic DMA Window support on powernv from Alexey.
      
       - LLVM clang fixes & workarounds from Anton.
      
       - reworked version of the patch to abort syscalls when transactional.
      
       - fix the swap encoding to support 4TB, from Aneesh.
      
       - various fixes as usual.
      
       - Freescale updates from Scott: Highlights include more 8xx
         optimizations, an e6500 hugetlb optimization, QMan device tree nodes,
         t1024/t1023 support, and various fixes and cleanup.
      
      * tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (180 commits)
        cxl: Fix typo in debug print
        cxl: Add CXL_KERNEL_API config option
        powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma()
        powerpc/mm: Change the swap encoding in pte.
        powerpc/mm: PTE_RPN_MAX is not used, remove the same
        powerpc/tm: Abort syscalls in active transactions
        powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off
        powerpc/include: Add opal-prd to installed uapi headers
        powerpc/powernv: fix construction of opal PRD messages
        powerpc/powernv: Increase opal-irqchip initcall priority
        powerpc: Make doorbell check preemption safe
        powerpc/powernv: pnv_init_idle_states() should only run on powernv
        macintosh/nvram: Remove as unused
        powerpc: Don't use gcc specific options on clang
        powerpc: Don't use -mno-strict-align on clang
        powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it
        powerpc: Only use -mabi=altivec if toolchain supports it
        powerpc: Fix duplicate const clang warning in user access code
        vfio: powerpc/spapr: Support Dynamic DMA windows
        vfio: powerpc/spapr: Register memory and define IOMMU v2
        ...
      08d183e3
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 4b1f2af6
      Linus Torvalds authored
      Pull s390 updates from Martin Schwidefsky:
       "Pretty boring for a merge window pull.
      
        One change in behaviour is the patch for dasd driver, the module which
        provides the diagnose discipline is now loaded automatically.
      
        The SCLP code got a nice cleanup, a new global structure replaces a
        bunch of accessor functions.
      
        And a couple of random, small improvements"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/pci: improve handling of hotplug event 0x301
        s390/setup: fix DMA_API_DEBUG warnings
        s390/zcrypt: remove obsolete __constant
        s390/keyboard: avoid off-by-one when using strnlen_user()
        s390/sclp: pass timeout as HZ independent value
        s390/mm: s/specifiation/specification/, s/an specification/a specification/
        s390/sclp: Use DECLARE_BITMAP
        s390/dasd: Enable automatic loading of dasd_diag_mod
        s390/sclp: move sclp_facilities into "struct sclp"
        s390/sclp: get rid of sclp_get_mtid() and sclp_get_mtid_max()
        s390/sclp: unify basic sclp access by exposing "struct sclp"
        s390/sclp: prepare smp_fill_possible_mask for global "struct sclp"
      4b1f2af6
    • Linus Torvalds's avatar
      Merge tag 'microblaze-4.2-rc1' of git://git.monstr.eu/linux-2.6-microblaze · aaa64485
      Linus Torvalds authored
      Pull Microblaze updates from Michal Simek:
      
       - some PCI fixups
      
       - add new MB versions
      
       - sparse fixups
      
      * tag 'microblaze-4.2-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
        microblaze/PCI: Remove unnecessary struct pci_dev declaration
        microblaze/PCI: Remove unnecessary pci_bus_find_capability() declaration
        microblaze/PCI: Remove unused declarations
        microblaze: Label local function static
        microblaze: Add missing release version code
      aaa64485
    • Nikolay Aleksandrov's avatar
      bridge: vlan: flush the dynamically learned entries on port vlan delete · 1ea2d020
      Nikolay Aleksandrov authored
      Add a new argument to br_fdb_delete_by_port which allows to specify a
      vid to match when flushing entries and use it in nbp_vlan_delete() to
      flush the dynamically learned entries of the vlan/port pair when removing
      a vlan from a port. Before this patch only the local mac was being
      removed and the dynamically learned ones were left to expire.
      Note that the do_all argument is still respected and if specified, the
      vid will be ignored.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ea2d020
    • Nikolay Aleksandrov's avatar
      bridge: multicast: add a comment to br_port_state_selection about blocking state · 9aa66382
      Nikolay Aleksandrov authored
      Add a comment to explain why we're not disabling port's multicast when it
      goes in blocking state. Since there's a check in the timer's function which
      bypasses the timer if the port's in blocking/disabled state, the timer will
      simply expire and stop without sending more queries.
      Suggested-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9aa66382
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 3a07bd6f
      David S. Miller authored
      Conflicts:
      	drivers/net/ethernet/mellanox/mlx4/main.c
      	net/packet/af_packet.c
      
      Both conflicts were cases of simple overlapping changes.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a07bd6f
    • Phil Sutter's avatar
      net: inet_diag: export IPV6_V6ONLY sockopt · 20462155
      Phil Sutter authored
      For AF_INET6 sockets, the value of struct ipv6_pinfo.ipv6only is
      exported to userspace. It indicates whether a socket bound to in6addr_any
      listens on IPv4 as well as IPv6. Since the socket is natively IPv6, it is not
      listed by e.g. 'ss -l -4'.
      
      This patch is accompanied by an appropriate one for iproute2 to enable
      the additional information in 'ss -e'.
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20462155
    • Alexey Brodkin's avatar
      stmmac: troubleshoot unexpected bits in des0 & des1 · f1590670
      Alexey Brodkin authored
      Current implementation of descriptor init procedure only takes
      care about setting/clearing ownership flag in "des0"/"des1"
      fields while it is perfectly possible to get unexpected bits
      set because of the following factors:
      
       [1] On driver probe underlying memory allocated with
           dma_alloc_coherent() might not be zeroed and so
           it will be filled with garbage.
      
       [2] During driver operation some bits could be set by SD/MMC
           controller (for example error flags etc).
      
      And unexpected and/or randomly set flags in "des0"/"des1"
      fields may lead to unpredictable behavior of GMAC DMA block.
      
      This change addresses both items above with:
      
       [1] Use of dma_zalloc_coherent() instead of simple
           dma_alloc_coherent() to make sure allocated memory is
           zeroed. That shouldn't affect performance because
           this allocation only happens once on driver probe.
      
       [2] Do explicit zeroing of both "des0" and "des1" fields
           of all buffer descriptors during initialization of
           DMA transfer.
      
      And while at it fixed identation of dma_free_coherent()
      counterpart as well.
      Signed-off-by: default avatarAlexey Brodkin <abrodkin@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: arc-linux-dev@synopsys.com
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1590670
    • David S. Miller's avatar
      Merge branch 'ipv4-nexthop-link-status' · f389a40e
      David S. Miller authored
      Andy Gospodarek says:
      
      ====================
      changes to make ipv4 routing table aware of next-hop link status
      
      This series adds the ability to have the Linux kernel track whether or
      not a particular route should be used based on the link-status of the
      interface associated with the next-hop.
      
      Before this patch any link-failure on an interface that was serving as a
      gateway for some systems could result in those systems being isolated
      from the rest of the network as the stack would continue to attempt to
      send frames out of an interface that is actually linked-down.  When the
      kernel is responsible for all forwarding, it should also be responsible
      for taking action when the traffic can no longer be forwarded -- there
      is no real need to outsource link-monitoring to userspace anymore.
      
      This feature is only enabled with the new per-interface or ipv4 global
      sysctls called 'ignore_routes_with_linkdown'.
      
      net.ipv4.conf.all.ignore_routes_with_linkdown = 0
      net.ipv4.conf.default.ignore_routes_with_linkdown = 0
      net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
      ...
      
      When the above sysctls are set, the kernel will not only report to
      userspace that the link is down, but it will also report to userspace
      that a route is dead.  This will signal to userspace that the route will
      not be selected.
      
      With the new sysctls set, the following behavior can be observed
      (interface p8p1 is link-down):
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
          cache
      
      While the route does remain in the table (so it can be modified if
      needed rather than being wiped away as it would be if IFF_UP was
      cleared), the proper next-hop is chosen automatically when the link is
      down.  Now interface p8p1 is linked-up:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
      90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      
      and the output changes to what one would expect.
      
      If the global or interface sysctl is not set, the following output would
      be expected when p8p1 is down:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      
      If the dead flag does not appear there should be no expectation that the
      kernel would skip using this route due to link being down.
      
      v2: Split kernel changes into 2 patches: first to add linkdown flag and
      second to add new sysctl settings.  Also took suggestion from Alex to
      simplify code by only checking sysctl during fib lookup and suggestion
      from Scott to add a per-interface sysctl.  Added iproute2 patch to
      recognize and print linkdown flag.
      
      v3: Code cleanups along with reverse-path checks suggested by Alex and
      small fixes related to problems found when multipath was disabled.
      
      v4: Drop binary sysctls
      
      v5: Whitespace and variable declaration fixups suggested by Dave
      
      v6: Style changes noticed by Dave and checkpath suggestions.
      
      v7: Last checkpatch fixup.
      
      Though there were some that preferred not to have a configuration option
      and to make this behavior the default when it was discussed in Ottawa
      earlier this year since "it was time to do this."  I wanted to propose
      the config option to preserve the current behavior for those that desire
      it.  I'll happily remove it if Dave and Linus approve.
      
      An IPv6 implementation is also needed (DECnet too!), but I wanted to
      start with the IPv4 implementation to get people comfortable with the
      idea before moving forward.  If this is accepted the IPv6 implementation
      can be posted shortly.
      
      There was also a request for switchdev support for this, but that will
      be posted as a followup as switchdev does not currently handle dead
      next-hops in a multi-path case and I felt that infra needed to be added
      first.
      
      FWIW, we have been running the original version of this series with a
      global sysctl and our customers have been happily using a backported
      version for IPv4 and IPv6 for >6 months.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f389a40e
    • Andy Gospodarek's avatar
      net: ipv4 sysctl option to ignore routes when nexthop link is down · 0eeb075f
      Andy Gospodarek authored
      This feature is only enabled with the new per-interface or ipv4 global
      sysctls called 'ignore_routes_with_linkdown'.
      
      net.ipv4.conf.all.ignore_routes_with_linkdown = 0
      net.ipv4.conf.default.ignore_routes_with_linkdown = 0
      net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
      ...
      
      When the above sysctls are set, will report to userspace that a route is
      dead and will no longer resolve to this nexthop when performing a fib
      lookup.  This will signal to userspace that the route will not be
      selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
      if the sysctl is enabled and link is down.  This was done as without it
      the netlink listeners would have no idea whether or not a nexthop would
      be selected.   The kernel only sets RTNH_F_DEAD internally if the
      interface has IFF_UP cleared.
      
      With the new sysctl set, the following behavior can be observed
      (interface p8p1 is link-down):
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
          cache
      
      While the route does remain in the table (so it can be modified if
      needed rather than being wiped away as it would be if IFF_UP was
      cleared), the proper next-hop is chosen automatically when the link is
      down.  Now interface p8p1 is linked-up:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
      90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      local 80.0.0.1 dev lo  src 80.0.0.1
          cache <local>
      80.0.0.2 dev p8p1  src 80.0.0.1
          cache
      
      and the output changes to what one would expect.
      
      If the sysctl is not set, the following output would be expected when
      p8p1 is down:
      
      default via 10.0.5.2 dev p9p1
      10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
      70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
      80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
      90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
      90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
      
      Since the dead flag does not appear, there should be no expectation that
      the kernel would skip using this route due to link being down.
      
      v2: Split kernel changes into 2 patches, this actually makes a
      behavioral change if the sysctl is set.  Also took suggestion from Alex
      to simplify code by only checking sysctl during fib lookup and
      suggestion from Scott to add a per-interface sysctl.
      
      v3: Code clean-ups to make it more readable and efficient as well as a
      reverse path check fix.
      
      v4: Drop binary sysctl
      
      v5: Whitespace fixups from Dave
      
      v6: Style changes from Dave and checkpatch suggestions
      
      v7: One more checkpatch fixup
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarDinesh Dutt <ddutt@cumulusnetworks.com>
      Acked-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eeb075f
    • Andy Gospodarek's avatar
      net: track link-status of ipv4 nexthops · 8a3d0316
      Andy Gospodarek authored
      Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
      reachable via an interface where carrier is off.  No action is taken,
      but additional flags are passed to userspace to indicate carrier status.
      
      This also includes a cleanup to fib_disable_ip to more clearly indicate
      what event made the function call to replace the more cryptic force
      option previously used.
      
      v2: Split out kernel functionality into 2 patches, this patch simply
      sets and clears new nexthop flag RTNH_F_LINKDOWN.
      
      v3: Cleanups suggested by Alex as well as a bug noticed in
      fib_sync_down_dev and fib_sync_up when multipath was not enabled.
      
      v5: Whitespace and variable declaration fixups suggested by Dave.
      
      v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them
      all.
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarDinesh Dutt <ddutt@cumulusnetworks.com>
      Acked-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a3d0316
    • Vivien Didelot's avatar
      net: switchdev: ignore unsupported bridge flags · 5c8079d0
      Vivien Didelot authored
      switchdev_port_bridge_getlink() queries SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS
      attributes, but a driver doesn't need to implement this in order to get
      bridge link information.
      
      So error out only on errors different than -EOPNOTSUPP.
      
      (This is a follow-up patch for 7d4f8d87.)
      
      Fixes: 8793d0a6 ("switchdev: add new switchdev_port_bridge_getlink")
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c8079d0
    • Pavel Fedin's avatar
      net: Cavium: Fix MAC address setting in shutdown state · bd049a90
      Pavel Fedin authored
      This bug pops up with NetworkManager on Fedora 21. NetworkManager tends to
      stop the interface (nicvf_stop() is called) before changing settings. In
      stopped state MAC cannot be sent to a PF. However, when the interface is
      restarted (nicvf_open() is called), we ping the PF using NIC_MBOX_MSG_READY
      message, and the PF replies back with old MAC address, overriding what we
      had after MAC setting from userspace. As a result, we cannot set MAC
      address using NetworkManager.
      
      This patch introduces special tracking of MAC change in stopped state so
      that the correct new MAC address is sent to a PF when interface is reopen.
      Signed-off-by: default avatarPavel Fedin <p.fedin@samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd049a90
    • Stephen Rothwell's avatar
    • Julian Anastasov's avatar
      ip: report the original address of ICMP messages · 34b99df4
      Julian Anastasov authored
      ICMP messages can trigger ICMP and local errors. In this case
      serr->port is 0 and starting from Linux 4.0 we do not return
      the original target address to the error queue readers.
      Add function to define which errors provide addr_offset.
      With this fix my ping command is not silent anymore.
      
      Fixes: c247f053 ("ip: fix error queue empty skb handling")
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34b99df4
    • David S. Miller's avatar
      Merge branch 'mlx-next' · 12d4ae9d
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      Mellanox NIC drivers update, June 23 2015
      
      This series has two fixes from Eran to his recent SRIOV counters work in
      mlx4 and few more updates from Saeed and Achiad to the mlx5 Ethernet
      code. All fixes here relate to net-next code, so no need for -stable.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12d4ae9d
    • Saeed Mahameed's avatar
      net/mlx5e: Prefetch skb data on RX · 99611ba1
      Saeed Mahameed authored
      Prefetch the 1st cache line used by the buffer pointed by
      the skb linear data.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99611ba1
    • Achiad Shochat's avatar
      net/mlx5e: Pop cq outside mlx5e_get_cqe · a1f5a1a8
      Achiad Shochat authored
      Separate between mlx5e_get_cqe() and mlx5_cqwq_pop(), this helps for
      better code readability and better CQ buffer management.
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1f5a1a8
    • Achiad Shochat's avatar
      net/mlx5e: Remove mlx5e_cq.sqrq back-pointer · e3391054
      Achiad Shochat authored
      Use container_of() instead.
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3391054
    • Achiad Shochat's avatar
      net/mlx5e: Remove extra spaces · 8ca56ce3
      Achiad Shochat authored
      Coding Style fix, remove extra spaces.
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ca56ce3
    • Achiad Shochat's avatar
      net/mlx5e: Avoid TX CQE generation if more xmit packets expected · 059ba072
      Achiad Shochat authored
      In order to save PCI BW consumed by TX CQEs and to reduce the amount of
      CPU cache misses caused by TX CQE reading, we request TX CQE generation
      only when skb->xmit_more=0.
      
      As a consequence of the above, a single TX CQE may now indicate the
      transmission completion of multiple TX SKBs.
      
      This also handles a problem introduced in commit b1b8105ebf41 "net/mlx5e:
      Support NETIF_F_SG" where we didn't ask for NOP completions while the
      driver didn't have the proper code to handle this case.
      
      Fixes: b1b8105ebf41 ('net/mlx5e: Support NETIF_F_SG')
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      059ba072
    • Achiad Shochat's avatar
      net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion · 9fc59306
      Achiad Shochat authored
      NOP completion SKBs are always NULL.
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fc59306
    • Achiad Shochat's avatar
      net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq() · ef583d03
      Achiad Shochat authored
      It is already assigned at mlx5e_build_rq_param()
      Signed-off-by: default avatarAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef583d03
    • Saeed Mahameed's avatar
      net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them · fb6c6f25
      Saeed Mahameed authored
      Instead of counting number of gso fragments, we can use
      skb_shinfo(skb)->gso_segs.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb6c6f25
    • Saeed Mahameed's avatar
      net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues · 03289b88
      Saeed Mahameed authored
      To save per-packet calculations, we use the following static mappings:
      1) priv {channel, tc} to netdev txq (used @mlx5e_selec_queue())
      2) netdev txq to priv sq (used @mlx5e_xmit())
      
      Thanks to these static mappings, no more need for a separate implementation
      of ndo_start_xmit when multiple TCs are configured.
      We believe the performance improvement of such separation would be negligible, if any.
      The previous way of dynamically calculating the above mappings required
      allocating more TX queues than actually used (@alloc_etherdev_mqs()),
      which is now no longer needed.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03289b88
    • Eran Ben Elisha's avatar
      net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device · f1a3badb
      Eran Ben Elisha authored
      Under SRIOV, the port rx/tx bytes/packets statistics should by read
      from the HW instead of using the PF netdevice SW accounting. This is
      needed in order to get the full port statistics and not just the PF
      own ones
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1a3badb
    • Eran Ben Elisha's avatar
      net/mlx4_en: Fix off-by-four in ethtool · 9a2abf5a
      Eran Ben Elisha authored
      NUM_ALL_STATS was not updated with the new four entries, instead
      NUM_FLOW_STATS was updated, fix it. that caused off-by-four for all
      counters below pf_*_*.
      
      Fixes: b42de4d0 ('net/mlx4_en: Show PF own statistics via ethtool')
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a2abf5a
    • Linus Torvalds's avatar
      Merge tag 'iommu-updates-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 6eae81a5
      Linus Torvalds authored
      Pull IOMMU updates from Joerg Roedel:
       "This time with bigger changes than usual:
      
         - A new IOMMU driver for the ARM SMMUv3.
      
           This IOMMU is pretty different from SMMUv1 and v2 in that it is
           configured through in-memory structures and not through the MMIO
           register region.  The ARM SMMUv3 also supports IO demand paging for
           PCI devices with PRI/PASID capabilities, but this is not
           implemented in the driver yet.
      
         - Lots of cleanups and device-tree support for the Exynos IOMMU
           driver.  This is part of the effort to bring Exynos DRM support
           upstream.
      
         - Introduction of default domains into the IOMMU core code.
      
           The rationale behind this is to move functionalily out of the IOMMU
           drivers to common code to get to a unified behavior between
           different drivers.  The patches here introduce a default domain for
           iommu-groups (isolation groups).
      
           A device will now always be attached to a domain, either the
           default domain or another domain handled by the device driver.  The
           IOMMU drivers have to be modified to make use of that feature.  So
           long the AMD IOMMU driver is converted, with others to follow.
      
         - Patches for the Intel VT-d drvier to fix DMAR faults that happen
           when a kdump kernel boots.
      
           When the kdump kernel boots it re-initializes the IOMMU hardware,
           which destroys all mappings from the crashed kernel.  As this
           happens before the endpoint devices are re-initialized, any
           in-flight DMA causes a DMAR fault.  These faults cause PCI master
           aborts, which some devices can't handle properly and go into an
           undefined state, so that the device driver in the kdump kernel
           fails to initialize them and the dump fails.
      
           This is now fixed by copying over the mapping structures (only
           context tables and interrupt remapping tables) from the old kernel
           and keep the old mappings in place until the device driver of the
           new kernel takes over.  This emulates the the behavior without an
           IOMMU to the best degree possible.
      
         - A couple of other small fixes and cleanups"
      
      * tag 'iommu-updates-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (69 commits)
        iommu/amd: Handle large pages correctly in free_pagetable
        iommu/vt-d: Don't disable IR when it was previously enabled
        iommu/vt-d: Make sure copied over IR entries are not reused
        iommu/vt-d: Copy IR table from old kernel when in kdump mode
        iommu/vt-d: Set IRTA in intel_setup_irq_remapping
        iommu/vt-d: Disable IRQ remapping in intel_prepare_irq_remapping
        iommu/vt-d: Move QI initializationt to intel_setup_irq_remapping
        iommu/vt-d: Move EIM detection to intel_prepare_irq_remapping
        iommu/vt-d: Enable Translation only if it was previously disabled
        iommu/vt-d: Don't disable translation prior to OS handover
        iommu/vt-d: Don't copy translation tables if RTT bit needs to be changed
        iommu/vt-d: Don't do early domain assignment if kdump kernel
        iommu/vt-d: Allocate si_domain in init_dmars()
        iommu/vt-d: Mark copied context entries
        iommu/vt-d: Do not re-use domain-ids from the old kernel
        iommu/vt-d: Copy translation tables from old kernel
        iommu/vt-d: Detect pre enabled translation
        iommu/vt-d: Make root entry visible for hardware right after allocation
        iommu/vt-d: Init QI before root entry is allocated
        iommu/vt-d: Cleanup log messages
        ...
      6eae81a5