1. 31 Oct, 2014 32 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 89453379
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "A bit has accumulated, but it's been a week or so since my last batch
        of post-merge-window fixes, so...
      
         1) Missing module license in netfilter reject module, from Pablo.
            Lots of people ran into this.
      
         2) Off by one in mac80211 baserate calculation, from Karl Beldan.
      
         3) Fix incorrect return value from ax88179_178a driver's set_mac_addr
            op, which broke use of it with bonding.  From Ian Morgan.
      
         4) Checking of skb_gso_segment()'s return value was not all
            encompassing, it can return an SKB pointer, a pointer error, or
            NULL.  Fix from Florian Westphal.
      
            This is crummy, and longer term will be fixed to just return error
            pointers or a real SKB.
      
         6) Encapsulation offloads not being handled by
            skb_gso_transport_seglen().  From Florian Westphal.
      
         7) Fix deadlock in TIPC stack, from Ying Xue.
      
         8) Fix performance regression from using rhashtable for netlink
            sockets.  The problem was the synchronize_net() invoked for every
            socket destroy.  From Thomas Graf.
      
         9) Fix bug in eBPF verifier, and remove the strong dependency of BPF
            on NET.  From Alexei Starovoitov.
      
        10) In qdisc_create(), use the correct interface to allocate
            ->cpu_bstats, otherwise the u64_stats_sync member isn't
            initialized properly.  From Sabrina Dubroca.
      
        11) Off by one in ip_set_nfnl_get_byindex(), from Dan Carpenter.
      
        12) nf_tables_newchain() was erroneously expecting error pointers from
            netdev_alloc_pcpu_stats().  It only returna a valid pointer or
            NULL.  From Sabrina Dubroca.
      
        13) Fix use-after-free in _decode_session6(), from Li RongQing.
      
        14) When we set the TX flow hash on a socket, we mistakenly do so
            before we've nailed down the final source port.  Move the setting
            deeper to fix this.  From Sathya Perla.
      
        15) NAPI budget accounting in amd-xgbe driver was counting descriptors
            instead of full packets, fix from Thomas Lendacky.
      
        16) Fix total_data_buflen calculation in hyperv driver, from Haiyang
            Zhang.
      
        17) Fix bcma driver build with OF_ADDRESS disabled, from Hauke
            Mehrtens.
      
        18) Fix mis-use of per-cpu memory in TCP md5 code.  The problem is
            that something that ends up being vmalloc memory can't be passed
            to the crypto hash routines via scatter-gather lists.  From Eric
            Dumazet.
      
        19) Fix regression in promiscuous mode enabling in cdc-ether, from
            Olivier Blin.
      
        20) Bucket eviction and frag entry killing can race with eachother,
            causing an unlink of the object from the wrong list.  Fix from
            Nikolay Aleksandrov.
      
        21) Missing initialization of spinlock in cxgb4 driver, from Anish
            Bhatt.
      
        22) Do not cache ipv4 routing failures, otherwise if the sysctl for
            forwarding is subsequently enabled this won't be seen.  From
            Nicolas Cavallari"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (131 commits)
        drivers: net: cpsw: Support ALLMULTI and fix IFF_PROMISC in switch mode
        drivers: net: cpsw: Fix broken loop condition in switch mode
        net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0
        stmmac: pci: set default of the filter bins
        net: smc91x: Fix gpios for device tree based booting
        mpls: Allow mpls_gso to be built as module
        mpls: Fix mpls_gso handler.
        r8152: stop submitting intr for -EPROTO
        netfilter: nft_reject_bridge: restrict reject to prerouting and input
        netfilter: nft_reject_bridge: don't use IP stack to reject traffic
        netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions
        netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions
        netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routing
        drivers/net: macvtap and tun depend on INET
        drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets
        drivers/net: Disable UFO through virtio
        net: skb_fclone_busy() needs to detect orphaned skb
        gre: Use inner mac length when computing tunnel length
        mlx4: Avoid leaking steering rules on flow creation error flow
        net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN
        ...
      89453379
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 53429290
      Linus Torvalds authored
      Pull sparc update from David Miller:
       "Two changes:
      
        1) It makes no sense to execute a VTOC partition table request in the
           Sun virtual block device driver and fail to load if it doesn't
           succeed because a) we don't use the result at all and b) it won't
           succeed if there is an EFI partition on the disk, for example.
      
           We read the partition table via the normal means in the block layer
           anyways, so this is really completely useless, so just remove it.
      
           From Dwight Engen.
      
        2) Hook up new bpf system call"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sunvdc: don't call VD_OP_GET_VTOC
        sparc: Hook up bpf system call.
      53429290
    • Linus Torvalds's avatar
      Merge tag 'microblaze-3.18-rc3' of git://git.monstr.eu/linux-2.6-microblaze · 9f58c62f
      Linus Torvalds authored
      Pull Microblaze updates from Michal Simek:
       - wire-up new bpf syscall
       - fix PCI bug
       - fix Kconfig warning
      
      * tag 'microblaze-3.18-rc3' of git://git.monstr.eu/linux-2.6-microblaze:
        microblaze: Wire up bpf syscall
        microblaze: Fix IO space breakage after of_pci_range_to_resource() change
        microblaze: Fix missing NR_CPUS in menuconfig
      9f58c62f
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 19e0d5f1
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Fixes from all around the place:
      
         - hyper-V 32-bit PAE guest kernel fix
         - two IRQ allocation fixes on certain x86 boards
         - intel-mid boot crash fix
         - intel-quark quirk
         - /proc/interrupts duplicate irq chip name fix
         - cma boot crash fix
         - syscall audit fix
         - boot crash fix with certain TSC configurations (seen on Qemu)
         - smpboot.c build warning fix"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
        ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi()
        x86, intel-mid: Create IRQs for APB timers and RTC timers
        x86: Don't enable F00F workaround on Intel Quark processors
        x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts
        x86, cma: Reserve DMA contiguous area after initmem_init()
        i386/audit: stop scribbling on the stack frame
        x86, apic: Handle a bad TSC more gracefully
        x86: ACPI: Do not translate GSI number if IOAPIC is disabled
        x86/smpboot: Move data structure to its primary usage scope
      19e0d5f1
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f5fa3630
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "Various scheduler fixes all over the place: three SCHED_DL fixes,
        three sched/numa fixes, two generic race fixes and a comment fix"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/dl: Fix preemption checks
        sched: Update comments for CLONE_NEWNS
        sched: stop the unbound recursion in preempt_schedule_context()
        sched/fair: Fix division by zero sysctl_numa_balancing_scan_size
        sched/fair: Care divide error in update_task_scan_period()
        sched/numa: Fix unsafe get_task_struct() in task_numa_assign()
        sched/deadline: Fix races between rt_mutex_setprio() and dl_task_timer()
        sched/deadline: Don't replenish from a !SCHED_DEADLINE entity
        sched: Fix race between task_group and sched_task_group
      f5fa3630
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5656b408
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Mostly tooling fixes, plus on the kernel side:
      
         - a revert for a newly introduced PMU driver which isn't complete yet
           and where we ran out of time with fixes (to be tried again in
           v3.19) - this makes up for a large chunk of the diffstat.
      
         - compilation warning fixes
      
         - a printk message fix
      
         - event_idx usage fixes/cleanups"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf probe: Trivial typo fix for --demangle
        perf tools: Fix report -F dso_from for data without branch info
        perf tools: Fix report -F dso_to for data without branch info
        perf tools: Fix report -F symbol_from for data without branch info
        perf tools: Fix report -F symbol_to for data without branch info
        perf tools: Fix report -F mispredict for data without branch info
        perf tools: Fix report -F in_tx for data without branch info
        perf tools: Fix report -F abort for data without branch info
        perf tools: Make CPUINFO_PROC an array to support different kernel versions
        perf callchain: Use global caching provided by libunwind
        perf/x86/intel: Revert incomplete and undocumented Broadwell client support
        perf/x86: Fix compile warnings for intel_uncore
        perf: Fix typos in sample code in the perf_event.h header
        perf: Fix and clean up initialization of pmu::event_idx
        perf: Fix bogus kernel printk
        perf diff: Add missing hists__init() call at tool start
      5656b408
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c958f920
      Linus Torvalds authored
      Pull futex fixes from Ingo Molnar:
       "This contains two futex fixes: one fixes a race condition, the other
        clarifies shared/private futex comments"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        futex: Fix a race condition between REQUEUE_PI and task death
        futex: Mention key referencing differences between shared and private futexes
      c958f920
    • David S. Miller's avatar
      Merge tag 'master-2014-10-30' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless · 99a49ce6
      David S. Miller authored
      John W. Linville says:
      
      ====================
      pull request: wireless 2014-10-31
      
      Please pull this small batch of spooky fixes intended for the 3.18
      stream...boo!
      
      Cyril Brulebois adds an rt2x00 device ID.
      
      Dan Carpenter provides a one-line masking fix for an ath9k debugfs
      entry.
      
      Larry Finger gives us a package of small rtlwifi fixes which add some
      bits that were left out of some feature updates that were included
      in the merge window.  Hopefully this isn't a sign that the rtlwifi
      base is getting too big...
      
      Marc Yang brings a fix for a temporary mwifiex stall when doing 11n
      RX reordering.
      
      Please let me know if there are problems!
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99a49ce6
    • Lennart Sorensen's avatar
      drivers: net: cpsw: Support ALLMULTI and fix IFF_PROMISC in switch mode · 1e5c4bc4
      Lennart Sorensen authored
      The cpsw driver did not support the IFF_ALLMULTI flag which makes dynamic
      multicast routing not work.  Related to this, when enabling IFF_PROMISC
      in switch mode, all registered multicast addresses are flushed, resulting
      in only broadcast and unicast traffic being received.
      
      A new cpsw_ale_set_allmulti function now scans through the ALE entry
      table and adds/removes the host port from the unregistered multicast
      port mask of each vlan entry depending on the state of IFF_ALLMULTI.
      In promiscious mode, cpsw_ale_set_allmulti is used to force reception
      of all multicast traffic in addition to the unicast and broadcast traffic.
      
      With this change dynamic multicast and promiscious mode both work in
      switch mode.
      Signed-off-by: default avatarLen Sorensen <lsorense@csclub.uwaterloo.ca>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e5c4bc4
    • Lennart Sorensen's avatar
      drivers: net: cpsw: Fix broken loop condition in switch mode · 6f979eb3
      Lennart Sorensen authored
      0d961b3b (drivers: net: cpsw: fix buggy
      loop condition) accidentally fixed a loop comparison in too many places
      while fixing a real bug.
      
      It was correct to fix the dual_emac mode section since there 'i' is used
      as an index into priv->slaves which is a 0 based array.
      
      However the other two changes (which are only used in switch mode)
      are wrong since there 'i' is actually the ALE port number, and port 0
      is the host port, while port 1 and up are the slave ports.
      
      Putting the loop condition back in the switch mode section fixes it.
      
      A comment has been added to point out the intent clearly to avoid future
      confusion.  Also a comment is fixed that said the opposite of what was
      actually happening.
      Signed-off-by: default avatarLen Sorensen <lsorense@csclub.uwaterloo.ca>
      Acked-by: default avatarHeiko Schocher <hs@denx.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f979eb3
    • Guenter Roeck's avatar
      net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0 · e0fb6fb6
      Guenter Roeck authored
      If a driver supports reading EEPROM but no EEPROM is installed in the system,
      the driver's get_eeprom_len function returns 0. ethtool will subsequently
      try to read that zero-length EEPROM anyway. If the driver does not support
      EEPROM access at all, this operation will return -EOPNOTSUPP. If the driver
      does support EEPROM access but no EEPROM is installed, the operation will
      return -EINVAL. Return -EOPNOTSUPP in both cases for consistency.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0fb6fb6
    • Andy Shevchenko's avatar
      stmmac: pci: set default of the filter bins · 1e19e084
      Andy Shevchenko authored
      The commit 3b57de95 brought the support for a different amount of the
      filter bins, but didn't update the PCI driver accordingly. This patch appends
      the default values when the device is enumerated via PCI bus.
      
      Fixes: 3b57de95 (net: stmmac: Support devicetree configs for mcast and ucast filter entries)
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e19e084
    • Tony Lindgren's avatar
      net: smc91x: Fix gpios for device tree based booting · 7d2911c4
      Tony Lindgren authored
      With legacy booting, the platform init code was taking care of
      the configuring of GPIOs. With device tree based booting, things
      may or may not work depending what bootloader has configured or
      if the legacy platform code gets called.
      
      Let's add support for the pwrdn and reset GPIOs to the smc91x
      driver to fix the issues of smc91x not working properly when
      booted in device tree mode.
      
      And let's change n900 to use these settings as some versions
      of the bootloader do not configure things properly causing
      errors.
      Reported-by: default avatarKevin Hilman <khilman@linaro.org>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7d2911c4
    • Dwight Engen's avatar
      sunvdc: don't call VD_OP_GET_VTOC · 85b0c6e6
      Dwight Engen authored
      The VD_OP_GET_VTOC operation will succeed only if the vdisk backend has a
      VTOC label, otherwise it will fail. In particular, it will return error
      48 (ENOTSUP) if the disk has an EFI label. VTOC disk labels are already
      handled by directly reading the disk in block/partitions/sun.c (enabled by
      CONFIG_SUN_PARTITION which defaults to y on SPARC). Since port->label is
      unused in the driver, remove the call and the field.
      Signed-off-by: default avatarDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85b0c6e6
    • Pravin B Shelar's avatar
      mpls: Allow mpls_gso to be built as module · de05c400
      Pravin B Shelar authored
      Kconfig already allows mpls to be built as module. Following patch
      fixes Makefile to do same.
      
      CC: Simon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Acked-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de05c400
    • Pravin B Shelar's avatar
      mpls: Fix mpls_gso handler. · f7065f4b
      Pravin B Shelar authored
      mpls gso handler needs to pull skb after segmenting skb.
      
      CC: Simon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Acked-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7065f4b
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · aea4869f
      Linus Torvalds authored
      Pull core fixes from Ingo Molnar:
       "The tree contains two RCU fixes and a compiler quirk comment fix"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        rcu: Make rcu_barrier() understand about missing rcuo kthreads
        compiler/gcc4+: Remove inaccurate comment about 'asm goto' miscompiles
        rcu: More on deadlock between CPU hotplug and expedited grace periods
      aea4869f
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0f4b0676
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "As you requested in the rc2 release mail the timer department serves
        you a few real bug fixes:
      
         - Fix the probe logic of the architected arm/arm64 timer
         - Plug a stack info leak in posix-timers
         - Prevent a shift out of bounds issue in the clockevents core"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        ARM/ARM64: arch-timer: fix arch_timer_probed logic
        clockevents: Prevent shift out of bounds
        posix-timers: Fix stack info leak in timer_create()
      0f4b0676
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v3.18-rc1-2' of... · bcdfdaee
      Linus Torvalds authored
      Merge tag 'trace-fixes-v3.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull tracing fix from Steven Rostedt:
       "ARM has system calls outside the NR_syscalls range, and the generic
        tracing system does not support that and without checks, it can cause
        an oops to be reported.
      
        Rabin Vincent added checks in the return code on syscall events to
        make sure that the system call number is within the range that tracing
        knows about, and if not, simply ignores the system call.
      
        The system call tracing infrastructure needs to be rewritten to handle
        these cases better, but for now, to keep from oopsing, this patch will
        do"
      
      * tag 'trace-fixes-v3.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing/syscalls: Ignore numbers outside NR_syscalls' range
      bcdfdaee
    • Linus Torvalds's avatar
      Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6 · 4f080f05
      Linus Torvalds authored
      Pull documentation fixes from Jonathan Corbet:
       "So this is my first pull request since I rashly agreed to look after
        the documentation subtree.  It contains some typo fixes, a few minor
        documentation improvements, and, most importantly, fixes for a couple
        of build problems in various bits of sample code.
      
        I fully intend to start sending pull requests with signed tags.
        However, due to poor planning on my part and the general obnoxiousness
        of life, I'm 2000 miles away from my private key which is sitting on a
        powered-down machine.  This should be fixed before my next request.
      
        Meanwhile git.lwn.net is a machine under my control, the patches are
        all trivial, and all have done time in linux-next"
      
      * tag 'docs-for-linus' of git://git.lwn.net/linux-2.6:
        Documentation/SubmittingPatches: Reported-by tags and permission
        Documentation: remove outdated references to the linux-next wiki
        Documentation: Restrict TSC test code to x86
        doc: kernel-parameters.txt: Add ide-generic.probe-mask
        vdso: don't require 64-bit math in standalone test
        Documentation: Add CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF case
        Documentation: Add default kmemleak off case in kernel-parameters.txt
        Docs: Document that the sticky bit is understood by hugetlbfs
        DocBook: Reduce noise from make cleandocs
        Documentation: fix vdso_standalone_test_x86 on 32-bit
        Documentation: dt-bindings: Explain order in patch series
        Documentation/ABI/testing/sysfs-ibft: fix a typo
      4f080f05
    • hayeswang's avatar
      r8152: stop submitting intr for -EPROTO · d59c876d
      hayeswang authored
      For Renesas USB 3.0 host controller, when unplugging the usb hub which
      has the RTL8153 plugged, the driver would get -EPROTO for interrupt
      transfer. There is high probability to get the information of "HC died;
      cleaning up", if the driver continues to submit the interrupt transfer
      before the disconnect() is called.
      
      [ 1024.197678] r8152 9-1.4:1.0 eth0: intr status -71
      [ 1024.213673] r8152 9-1.4:1.0 eth0: intr status -71
      [ 1024.229668] r8152 9-1.4:1.0 eth0: intr status -71
      [ 1024.245661] r8152 9-1.4:1.0 eth0: intr status -71
      [ 1024.261653] r8152 9-1.4:1.0 eth0: intr status -71
      [ 1024.277648] r8152 9-1.4:1.0 eth0: intr status -71
      [ 1024.293642] r8152 9-1.4:1.0 eth0: intr status -71
      [ 1024.309638] r8152 9-1.4:1.0 eth0: intr status -71
      [ 1024.325633] r8152 9-1.4:1.0 eth0: intr status -71
      [ 1024.341627] r8152 9-1.4:1.0 eth0: intr status -71
      [ 1024.357621] r8152 9-1.4:1.0 eth0: intr status -71
      [ 1024.373615] r8152 9-1.4:1.0 eth0: intr status -71
      [ 1024.383097] usb 9-1: USB disconnect, device number 2
      [ 1024.383103] usb 9-1.4: USB disconnect, device number 6
      [ 1029.391010] xhci_hcd 0000:04:00.0: xHCI host not responding to stop endpoint command.
      [ 1029.391016] xhci_hcd 0000:04:00.0: Assuming host is dying, halting host.
      [ 1029.392551] xhci_hcd 0000:04:00.0: HC died; cleaning up
      [ 1029.421480] usb 8-1: USB disconnect, device number 2
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d59c876d
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · e3a88f9c
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      netfilter/ipvs fixes for net
      
      The following patchset contains fixes for netfilter/ipvs. This round of
      fixes is larger than usual at this stage, specifically because of the
      nf_tables bridge reject fixes that I would like to see in 3.18. The
      patches are:
      
      1) Fix a null-pointer dereference that may occur when logging
         errors. This problem was introduced by 4a4739d5 ("ipvs: Pull
         out crosses_local_route_boundary logic") in v3.17-rc5.
      
      2) Update hook mask in nft_reject_bridge so we can also filter out
         packets from there. This fixes 36d2af59 ("netfilter: nf_tables: allow
         to filter from prerouting and postrouting"), which needs this chunk
         to work.
      
      3) Two patches to refactor common code to forge the IPv4 and IPv6
         reject packets from the bridge. These are required by the nf_tables
         reject bridge fix.
      
      4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject
         packets from the bridge. The idea is to forge the reject packets and
         inject them to the original port via br_deliver() which is now
         exported for that purpose.
      
      5) Restrict nft_reject_bridge to bridge prerouting and input hooks.
         the original skbuff may cloned after prerouting when the bridge stack
         needs to flood it to several bridge ports, it is too late to reject
         the traffic.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3a88f9c
    • Pablo Neira Ayuso's avatar
      netfilter: nft_reject_bridge: restrict reject to prerouting and input · 127917c2
      Pablo Neira Ayuso authored
      Restrict the reject expression to the prerouting and input bridge
      hooks. If we allow this to be used from forward or any other later
      bridge hook, if the frame is flooded to several ports, we'll end up
      sending several reject packets, one per cloned packet.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      127917c2
    • Pablo Neira Ayuso's avatar
      netfilter: nft_reject_bridge: don't use IP stack to reject traffic · 523b929d
      Pablo Neira Ayuso authored
      If the packet is received via the bridge stack, this cannot reject
      packets from the IP stack.
      
      This adds functions to build the reject packet and send it from the
      bridge stack. Comments and assumptions on this patch:
      
      1) Validate the IPv4 and IPv6 headers before further processing,
         given that the packet comes from the bridge stack, we cannot assume
         they are clean. Truncated packets are dropped, we follow similar
         approach in the existing iptables match/target extensions that need
         to inspect layer 4 headers that is not available. This also includes
         packets that are directed to multicast and broadcast ethernet
         addresses.
      
      2) br_deliver() is exported to inject the reject packet via
         bridge localout -> postrouting. So the approach is similar to what
         we already do in the iptables reject target. The reject packet is
         sent to the bridge port from which we have received the original
         packet.
      
      3) The reject packet is forged based on the original packet. The TTL
         is set based on sysctl_ip_default_ttl for IPv4 and per-net
         ipv6.devconf_all hoplimit for IPv6.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      523b929d
    • Pablo Neira Ayuso's avatar
      netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions · 8bfcdf66
      Pablo Neira Ayuso authored
      That can be reused by the reject bridge expression to build the reject
      packet. The new functions are:
      
      * nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header.
      * nf_reject_ip6hdr_put(): to build the IPv6 header.
      * nf_reject_ip6_tcphdr_put(): to build the TCP header.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      8bfcdf66
    • Pablo Neira Ayuso's avatar
      netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions · 052b9498
      Pablo Neira Ayuso authored
      That can be reused by the reject bridge expression to build the reject
      packet. The new functions are:
      
      * nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header.
      * nf_reject_iphdr_put(): to build the IPv4 header.
      * nf_reject_ip_tcphdr_put(): to build the TCP header.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      052b9498
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routing · 4d87716c
      Pablo Neira Ayuso authored
      Fixes: 36d2af59 ("netfilter: nf_tables: allow to filter from prerouting and postrouting")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4d87716c
    • Ben Hutchings's avatar
      drivers/net: macvtap and tun depend on INET · de11b0e8
      Ben Hutchings authored
      These drivers now call ipv6_proxy_select_ident(), which is defined
      only if CONFIG_INET is enabled.  However, they have really depended
      on CONFIG_INET for as long as they have allowed sending GSO packets
      from userland.
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Fixes: f43798c2 ("tun: Allow GSO using virtio_net_hdr")
      Fixes: b9fb9ee0 ("macvtap: add GSO/csum offload support")
      Fixes: 5188cd44 ("drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de11b0e8
    • Rabin Vincent's avatar
      tracing/syscalls: Ignore numbers outside NR_syscalls' range · 086ba77a
      Rabin Vincent authored
      ARM has some private syscalls (for example, set_tls(2)) which lie
      outside the range of NR_syscalls.  If any of these are called while
      syscall tracing is being performed, out-of-bounds array access will
      occur in the ftrace and perf sys_{enter,exit} handlers.
      
       # trace-cmd record -e raw_syscalls:* true && trace-cmd report
       ...
       true-653   [000]   384.675777: sys_enter:            NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
       true-653   [000]   384.675812: sys_exit:             NR 192 = 1995915264
       true-653   [000]   384.675971: sys_enter:            NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
       true-653   [000]   384.675988: sys_exit:             NR 983045 = 0
       ...
      
       # trace-cmd record -e syscalls:* true
       [   17.289329] Unable to handle kernel paging request at virtual address aaaaaace
       [   17.289590] pgd = 9e71c000
       [   17.289696] [aaaaaace] *pgd=00000000
       [   17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
       [   17.290169] Modules linked in:
       [   17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
       [   17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
       [   17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
       [   17.290866] LR is at syscall_trace_enter+0x124/0x184
      
      Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.
      
      Commit cd0980fc "tracing: Check invalid syscall nr while tracing syscalls"
      added the check for less than zero, but it should have also checked
      for greater than NR_syscalls.
      
      Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in
      
      Fixes: cd0980fc "tracing: Check invalid syscall nr while tracing syscalls"
      Cc: stable@vger.kernel.org # 2.6.33+
      Signed-off-by: default avatarRabin Vincent <rabin@rab.in>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      086ba77a
    • David S. Miller's avatar
      Merge branch 'ufo-fix' · c1304b21
      David S. Miller authored
      Ben Hutchings says:
      
      ====================
      drivers/net,ipv6: Fix IPv6 fragment ID selection for virtio
      
      The virtio net protocol supports UFO but does not provide for passing a
      fragment ID for fragmentation of IPv6 packets.  We used to generate a
      fragment ID wherever such a packet was fragmented, but currently we
      always use ID=0!
      
      v2: Add blank lines after declarations
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1304b21
    • Ben Hutchings's avatar
      drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets · 5188cd44
      Ben Hutchings authored
      UFO is now disabled on all drivers that work with virtio net headers,
      but userland may try to send UFO/IPv6 packets anyway.  Instead of
      sending with ID=0, we should select identifiers on their behalf (as we
      used to).
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Fixes: 916e4cf4 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5188cd44
    • Ben Hutchings's avatar
      drivers/net: Disable UFO through virtio · 3d0ad094
      Ben Hutchings authored
      IPv6 does not allow fragmentation by routers, so there is no
      fragmentation ID in the fixed header.  UFO for IPv6 requires the ID to
      be passed separately, but there is no provision for this in the virtio
      net protocol.
      
      Until recently our software implementation of UFO/IPv6 generated a new
      ID, but this was a bug.  Now we will use ID=0 for any UFO/IPv6 packet
      passed through a tap, which is even worse.
      
      Unfortunately there is no distinction between UFO/IPv4 and v6
      features, so disable UFO on taps and virtio_net completely until we
      have a proper solution.
      
      We cannot depend on VM managers respecting the tap feature flags, so
      keep accepting UFO packets but log a warning the first time we do
      this.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Fixes: 916e4cf4 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d0ad094
  2. 30 Oct, 2014 8 commits
    • Eric Dumazet's avatar
      net: skb_fclone_busy() needs to detect orphaned skb · 39bb5e62
      Eric Dumazet authored
      Some drivers are unable to perform TX completions in a bound time.
      They instead call skb_orphan()
      
      Problem is skb_fclone_busy() has to detect this case, otherwise
      we block TCP retransmits and can freeze unlucky tcp sessions on
      mostly idle hosts.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 1f3279ae ("tcp: avoid retransmits of TCP packets hanging in host queues")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39bb5e62
    • Tom Herbert's avatar
      gre: Use inner mac length when computing tunnel length · 14051f04
      Tom Herbert authored
      Currently, skb_inner_network_header is used but this does not account
      for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
      handles TEB and also should work with IP encapsulation in which case
      inner mac and inner network headers are the same.
      
      Tested: Ran TCP_STREAM over GRE, worked as expected.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14051f04
    • David S. Miller's avatar
      Merge branch 'mellanox-net' · 292dd654
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      mlx4 driver encapsulation/steering fixes
      
      The 1st patch fixes a bug in the TX path that supports offloading the
      TX checksum of (VXLAN) encapsulated TCP packets. It turns out that the
      bug is revealed only when the receiver runs in non-offloaded mode, so
      we somehow missed it so far... please queue it for -stable >= 3.14
      
      The 2nd patch makes sure not to leak steering entry on error flow,
      please queue it to 3.17-stable
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      292dd654
    • Or Gerlitz's avatar
      mlx4: Avoid leaking steering rules on flow creation error flow · 571e1b2c
      Or Gerlitz authored
      If mlx4_ib_create_flow() attempts to create > 1 rules with the
      firmware, and one of these registrations fail, we leaked the
      already created flow rules.
      
      One example of the leak is when the registration of the VXLAN ghost
      steering rule fails, we didn't unregister the original rule requested
      by the user, introduced in commit d2fce8a9 "mlx4: Set
      user-space raw Ethernet QPs to properly handle VXLAN traffic".
      
      While here, add dump of the VXLAN portion of steering rules
      so it can actually be seen when flow creation fails.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      571e1b2c
    • Or Gerlitz's avatar
      net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN · a4f2dacb
      Or Gerlitz authored
      For VXLAN/NVGRE encapsulation, the current HW doesn't support offloading
      both the outer UDP TX checksum and the inner TCP/UDP TX checksum.
      
      The driver doesn't advertize SKB_GSO_UDP_TUNNEL_CSUM, however we are wrongly
      telling the HW to offload the outer UDP checksum for encapsulated packets,
      fix that.
      
      Fixes: 837052d0 ('net/mlx4_en: Add netdev support for TCP/IP
      		     offloads of vxlan tunneling')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4f2dacb
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net · 9cc233fb
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2014-10-30
      
      This series contains updates to e1000, igb and ixgbe.
      
      Francesco Ruggeri fixes an issue with e1000 where in a VM the driver did
      not support unicast filtering.
      
      Roman Gushchin fixes an issue with igb where the driver was re-using
      mapped pages so that packets were still getting dropped even if all
      the memory issues are gone and there is free memory.
      
      Junwei Zhang found where in the ixgbe_clean_rx_ring() we were repeating
      the assignment of NULL to the receive buffer skb and fixes it.
      
      Emil fixes a race condition between setup_link and SFP detection routine
      in the watchdog when setting the advertised speed.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cc233fb
    • Nicolas Cavallari's avatar
      ipv4: Do not cache routing failures due to disabled forwarding. · fa19c2b0
      Nicolas Cavallari authored
      If we cache them, the kernel will reuse them, independently of
      whether forwarding is enabled or not.  Which means that if forwarding is
      disabled on the input interface where the first routing request comes
      from, then that unreachable result will be cached and reused for
      other interfaces, even if forwarding is enabled on them.  The opposite
      is also true.
      
      This can be verified with two interfaces A and B and an output interface
      C, where B has forwarding enabled, but not A and trying
      ip route get $dst iif A from $src && ip route get $dst iif B from $src
      Signed-off-by: default avatarNicolas Cavallari <nicolas.cavallari@green-communications.fr>
      Reviewed-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa19c2b0
    • Eric Rannaud's avatar
      fs: allow open(dir, O_TMPFILE|..., 0) with mode 0 · 69a91c23
      Eric Rannaud authored
      The man page for open(2) indicates that when O_CREAT is specified, the
      'mode' argument applies only to future accesses to the file:
      
      	Note that this mode applies only to future accesses of the newly
      	created file; the open() call that creates a read-only file
      	may well return a read/write file descriptor.
      
      The man page for open(2) implies that 'mode' is treated identically by
      O_CREAT and O_TMPFILE.
      
      O_TMPFILE, however, behaves differently:
      
      	int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
      	assert(fd == -1);
      	assert(errno == EACCES);
      
      	int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
      	assert(fd > 0);
      
      For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:
      
      	if (*opened & FILE_CREATED) {
      		/* Don't check for write permission, don't truncate */
      		open_flag &= ~O_TRUNC;
      		will_truncate = false;
      		acc_mode = MAY_OPEN;
      		path_to_nameidata(path, nd);
      		goto finish_open_created;
      	}
      
      But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
      may_open().
      
      This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
      inode is created, may_open() is called with acc_mode = MAY_OPEN, in
      do_tmpfile().
      
      A different, but related glibc bug revealed the discrepancy:
      https://sourceware.org/bugzilla/show_bug.cgi?id=17523
      
      The glibc lazily loads the 'mode' argument of open() and openat() using
      va_arg() only if O_CREAT is present in 'flags' (to support both the 2
      argument and the 3 argument forms of open; same idea for openat()).
      However, the glibc ignores the 'mode' argument if O_TMPFILE is in
      'flags'.
      
      On x86_64, for open(), it magically works anyway, as 'mode' is in
      RDX when entering open(), and is still in RDX on SYSCALL, which is where
      the kernel looks for the 3rd argument of a syscall.
      
      But openat() is not quite so lucky: 'mode' is in RCX when entering the
      glibc wrapper for openat(), while the kernel looks for the 4th argument
      of a syscall in R10. Indeed, the syscall calling convention differs from
      the regular calling convention in this respect on x86_64. So the kernel
      sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
      fails with EACCES.
      Signed-off-by: default avatarEric Rannaud <e@nanocritical.com>
      Acked-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69a91c23