1. 12 Sep, 2014 26 commits
  2. 03 Sep, 2014 1 commit
  3. 27 Aug, 2014 13 commits
    • Boris Ostrovsky's avatar
      x86/espfix/xen: Fix allocation of pages for paravirt page tables · 5d1b4311
      Boris Ostrovsky authored
      commit 8762e509 upstream.
      
      init_espfix_ap() is currently off by one level when informing hypervisor
      that allocated pages will be used for ministacks' page tables.
      
      The most immediate effect of this on a PV guest is that if
      'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor
      will refuse to use it for a page table (which it shouldn't be anyway). This will
      result in warnings by both Xen and Linux.
      
      More importantly, a subsequent write to that page (again, by a PV guest) is
      likely to result in fatal page fault.
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.comReviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d1b4311
    • willy tarreau's avatar
      net: mvneta: replace Tx timer with a real interrupt · 10d937e0
      willy tarreau authored
      Right now the mvneta driver doesn't handle Tx IRQ, and relies on two
      mechanisms to flush Tx descriptors : a flush at the end of mvneta_tx()
      and a timer. If a burst of packets is emitted faster than the device
      can send them, then the queue is stopped until next wake-up of the
      timer 10ms later. This causes jerky output traffic with bursts and
      pauses, making it difficult to reach line rate with very few streams.
      
      A test on UDP traffic shows that it's not possible to go beyond 134
      Mbps / 12 kpps of outgoing traffic with 1500-bytes IP packets. Routed
      traffic tends to observe pauses as well if the traffic is bursty,
      making it even burstier after the wake-up.
      
      It seems that this feature was inherited from the original driver but
      nothing there mentions any reason for not using the interrupt instead,
      which the chip supports.
      
      Thus, this patch enables Tx interrupts and removes the timer. It does
      the two at once because it's not really possible to make the two
      mechanisms coexist, so a split patch doesn't make sense.
      
      First tests performed on a Mirabox (Armada 370) show that less CPU
      seems to be used when sending traffic. One reason might be that we now
      call the mvneta_tx_done_gbe() with a mask indicating which queues have
      been done instead of looping over all of them.
      
      The same UDP test above now happily reaches 987 Mbps / 87.7 kpps.
      Single-stream TCP traffic can now more easily reach line rate. HTTP
      transfers of 1 MB objects over a single connection went from 730 to
      840 Mbps. It is even possible to go significantly higher (>900 Mbps)
      by tweaking tcp_tso_win_divisor.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Cc: Arnaud Ebalard <arno@natisbad.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 71f6d1b3)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      10d937e0
    • willy tarreau's avatar
      net: mvneta: add missing bit descriptions for interrupt masks and causes · 7a399737
      willy tarreau authored
      Marvell has not published the chip's datasheet yet, so it's very hard
      to find the relevant bits to manipulate to change the IRQ behaviour.
      Fortunately, these bits are described in the proprietary LSP patch set
      which is publicly available here :
      
          http://www.plugcomputer.org/downloads/mirabox/
      
      So let's put them back in the driver in order to reduce the burden of
      current and future maintenance.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 40ba35e7)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7a399737
    • willy tarreau's avatar
      net: mvneta: do not schedule in mvneta_tx_timeout · f1b0e3b7
      willy tarreau authored
      If a queue timeout is reported, we can oops because of some
      schedules while the caller is atomic, as shown below :
      
        mvneta d0070000.ethernet eth0: tx timeout
        BUG: scheduling while atomic: bash/1528/0x00000100
        Modules linked in: slhttp_ethdiv(C) [last unloaded: slhttp_ethdiv]
        CPU: 2 PID: 1528 Comm: bash Tainted: G        WC   3.13.0-rc4-mvebu-nf #180
        [<c0011bd9>] (unwind_backtrace+0x1/0x98) from [<c000f1ab>] (show_stack+0xb/0xc)
        [<c000f1ab>] (show_stack+0xb/0xc) from [<c02ad323>] (dump_stack+0x4f/0x64)
        [<c02ad323>] (dump_stack+0x4f/0x64) from [<c02abe67>] (__schedule_bug+0x37/0x4c)
        [<c02abe67>] (__schedule_bug+0x37/0x4c) from [<c02ae261>] (__schedule+0x325/0x3ec)
        [<c02ae261>] (__schedule+0x325/0x3ec) from [<c02adb97>] (schedule_timeout+0xb7/0x118)
        [<c02adb97>] (schedule_timeout+0xb7/0x118) from [<c0020a67>] (msleep+0xf/0x14)
        [<c0020a67>] (msleep+0xf/0x14) from [<c01dcbe5>] (mvneta_stop_dev+0x21/0x194)
        [<c01dcbe5>] (mvneta_stop_dev+0x21/0x194) from [<c01dcfe9>] (mvneta_tx_timeout+0x19/0x24)
        [<c01dcfe9>] (mvneta_tx_timeout+0x19/0x24) from [<c024afc7>] (dev_watchdog+0x18b/0x1c4)
        [<c024afc7>] (dev_watchdog+0x18b/0x1c4) from [<c0020b53>] (call_timer_fn.isra.27+0x17/0x5c)
        [<c0020b53>] (call_timer_fn.isra.27+0x17/0x5c) from [<c0020cad>] (run_timer_softirq+0x115/0x170)
        [<c0020cad>] (run_timer_softirq+0x115/0x170) from [<c001ccb9>] (__do_softirq+0xbd/0x1a8)
        [<c001ccb9>] (__do_softirq+0xbd/0x1a8) from [<c001cfad>] (irq_exit+0x61/0x98)
        [<c001cfad>] (irq_exit+0x61/0x98) from [<c000d4bf>] (handle_IRQ+0x27/0x60)
        [<c000d4bf>] (handle_IRQ+0x27/0x60) from [<c000843b>] (armada_370_xp_handle_irq+0x33/0xc8)
        [<c000843b>] (armada_370_xp_handle_irq+0x33/0xc8) from [<c000fba9>] (__irq_usr+0x49/0x60)
      
      Ben Hutchings attempted to propose a better fix consisting in using a
      scheduled work for this, but while it fixed this panic, it caused other
      random freezes and panics proving that the reset sequence in the driver
      is unreliable and that additional fixes should be investigated.
      
      When sending multiple streams over a link limited to 100 Mbps, Tx timeouts
      happen from time to time, and the driver correctly recovers only when the
      function is disabled.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 29021366)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f1b0e3b7
    • willy tarreau's avatar
      net: mvneta: use per_cpu stats to fix an SMP lock up · 6a026d8f
      willy tarreau authored
      Stats writers are mvneta_rx() and mvneta_tx(). They don't lock anything
      when they update the stats, and as a result, it randomly happens that
      the stats freeze on SMP if two updates happen during stats retrieval.
      This is very easily reproducible by starting two HTTP servers and binding
      each of them to a different CPU, then consulting /proc/net/dev in loops
      during transfers, the interface should immediately lock up. This issue
      also randomly happens upon link state changes during transfers, because
      the stats are collected in this situation, but it takes more attempts to
      reproduce it.
      
      The comments in netdevice.h suggest using per_cpu stats instead to get
      rid of this issue.
      
      This patch implements this. It merges both rx_stats and tx_stats into
      a single "stats" member with a single syncp. Both mvneta_rx() and
      mvneta_rx() now only update the a single CPU's counters.
      
      In turn, mvneta_get_stats64() does the summing by iterating over all CPUs
      to get their respective stats.
      
      With this change, stats are still correct and no more lockup is encountered.
      
      Note that this bug was present since the first import of the mvneta
      driver.  It might make sense to backport it to some stable trees. If
      so, it depends on "d33dc73 net: mvneta: increase the 64-bit rx/tx stats
      out of the hot path".
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 74c41b04)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6a026d8f
    • willy tarreau's avatar
      net: mvneta: increase the 64-bit rx/tx stats out of the hot path · 7d798913
      willy tarreau authored
      Better count packets and bytes in the stack and on 32 bit then
      accumulate them at the end for once. This saves two memory writes
      and two memory barriers per packet. The incoming packet rate was
      increased by 4.7% on the Openblocks AX3 thanks to this.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit dc4277dd)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7d798913
    • Johannes Berg's avatar
      Revert "mac80211: move "bufferable MMPDU" check to fix AP mode scan" · 9e1bbd3f
      Johannes Berg authored
      This reverts commit 277d916f as it was
      at least breaking iwlwifi by setting the IEEE80211_TX_CTL_NO_PS_BUFFER
      flag in all kinds of interface modes, not only for AP mode where it is
      appropriate.
      
      To avoid reintroducing the original problem, explicitly check for probe
      request frames in the multicast buffering code.
      
      Cc: stable@vger.kernel.org
      Fixes: 277d916f ("mac80211: move "bufferable MMPDU" check to fix AP mode scan")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      
      (cherry picked from commit 08b99399)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      9e1bbd3f
    • Andy Lutomirski's avatar
      x86_64/entry/xen: Do not invoke espfix64 on Xen · 350cc3fc
      Andy Lutomirski authored
      This moves the espfix64 logic into native_iret.  To make this work,
      it gets rid of the native patch for INTERRUPT_RETURN:
      INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.
      
      This changes the 16-bit SS behavior on Xen from OOPSing to leaking
      some bits of the Xen hypervisor's RSP (I think).
      
      [ hpa: this is a nonzero cost on native, but probably not enough to
        measure. Xen needs to fix this in their own code, probably doing
        something equivalent to espfix64. ]
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.netSigned-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      
      (cherry picked from commit 7209a75d)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      350cc3fc
    • H. Peter Anvin's avatar
      x86, espfix: Make it possible to disable 16-bit support · 5190bdb7
      H. Peter Anvin authored
      Embedded systems, which may be very memory-size-sensitive, are
      extremely unlikely to ever encounter any 16-bit software, so make it
      a CONFIG_EXPERT option to turn off support for any 16-bit software
      whatsoever.
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
      
      (cherry picked from commit 34273f41)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5190bdb7
    • H. Peter Anvin's avatar
      x86, espfix: Make espfix64 a Kconfig option, fix UML · 54d052e0
      H. Peter Anvin authored
      Make espfix64 a hidden Kconfig option.  This fixes the x86-64 UML
      build which had broken due to the non-existence of init_espfix_bsp()
      in UML: since UML uses its own Kconfig, this option does not appear in
      the UML build.
      
      This also makes it possible to make support for 16-bit segments a
      configuration option, for the people who want to minimize the size of
      the kernel.
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Richard Weinberger <richard@nod.at>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
      
      (cherry picked from commit 197725de)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      54d052e0
    • H. Peter Anvin's avatar
      x86, espfix: Fix broken header guard · 04d5b66d
      H. Peter Anvin authored
      Header guard is #ifndef, not #ifdef...
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      
      (cherry picked from commit 20b68535)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      04d5b66d
    • H. Peter Anvin's avatar
      x86, espfix: Move espfix definitions into a separate header file · 90bfa721
      H. Peter Anvin authored
      Sparse warns that the percpu variables aren't declared before they are
      defined.  Rather than hacking around it, move espfix definitions into
      a proper header file.
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      
      (cherry picked from commit e1fe9ed8)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      90bfa721
    • H. Peter Anvin's avatar
      x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack · 7f76a341
      H. Peter Anvin authored
      The IRET instruction, when returning to a 16-bit segment, only
      restores the bottom 16 bits of the user space stack pointer.  This
      causes some 16-bit software to break, but it also leaks kernel state
      to user space.  We have a software workaround for that ("espfix") for
      the 32-bit kernel, but it relies on a nonzero stack segment base which
      is not available in 64-bit mode.
      
      In checkin:
      
          b3b42ac2 x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
      
      we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
      the logic that 16-bit support is crippled on 64-bit kernels anyway (no
      V86 support), but it turns out that people are doing stuff like
      running old Win16 binaries under Wine and expect it to work.
      
      This works around this by creating percpu "ministacks", each of which
      is mapped 2^16 times 64K apart.  When we detect that the return SS is
      on the LDT, we copy the IRET frame to the ministack and use the
      relevant alias to return to userspace.  The ministacks are mapped
      readonly, so if IRET faults we promote #GP to #DF which is an IST
      vector and thus has its own stack; we then do the fixup in the #DF
      handler.
      
      (Making #GP an IST exception would make the msr_safe functions unsafe
      in NMI/MC context, and quite possibly have other effects.)
      
      Special thanks to:
      
      - Andy Lutomirski, for the suggestion of using very small stack slots
        and copy (as opposed to map) the IRET frame there, and for the
        suggestion to mark them readonly and let the fault promote to #DF.
      - Konrad Wilk for paravirt fixup and testing.
      - Borislav Petkov for testing help and useful comments.
      Reported-by: default avatarBrian Gerst <brgerst@gmail.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Lutomriski <amluto@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dirk Hohndel <dirk@hohndel.org>
      Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
      Cc: comex <comexk@gmail.com>
      Cc: Alexander van Heukelum <heukelum@fastmail.fm>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: <stable@vger.kernel.org> # consider after upstream merge
      
      (cherry picked from commit 3891a04a)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7f76a341