1. 19 Aug, 2016 40 commits
    • Soheil Hassas Yeganeh's avatar
      tcp: consider recv buf for the initial window scale · 9e0b3f94
      Soheil Hassas Yeganeh authored
      [ Upstream commit f626300a ]
      
      tcp_select_initial_window() intends to advertise a window
      scaling for the maximum possible window size. To do so,
      it considers the maximum of net.ipv4.tcp_rmem[2] and
      net.core.rmem_max as the only possible upper-bounds.
      However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE
      to set the socket's receive buffer size to values
      larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max.
      Thus, SO_RCVBUFFORCE is effectively ignored by
      tcp_select_initial_window().
      
      To fix this, consider the maximum of net.ipv4.tcp_rmem[2],
      net.core.rmem_max and socket's initial buffer space.
      
      Fixes: b0573dea ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options")
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Suggested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9e0b3f94
    • Vegard Nossum's avatar
      net/irda: fix NULL pointer dereference on memory allocation failure · f7149544
      Vegard Nossum authored
      [ Upstream commit d3e6952c ]
      
      I ran into this:
      
          kasan: CONFIG_KASAN_INLINE enabled
          kasan: GPF could be caused by NULL-ptr deref or user memory access
          general protection fault: 0000 [#1] PREEMPT SMP KASAN
          CPU: 2 PID: 2012 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
          task: ffff8800b745f2c0 ti: ffff880111740000 task.ti: ffff880111740000
          RIP: 0010:[<ffffffff82bbf066>]  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
          RSP: 0018:ffff880111747bb8  EFLAGS: 00010286
          RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000069dd8358
          RDX: 0000000000000009 RSI: 0000000000000027 RDI: 0000000000000048
          RBP: ffff880111747c00 R08: 0000000000000000 R09: 0000000000000000
          R10: 0000000069dd8358 R11: 1ffffffff0759723 R12: 0000000000000000
          R13: ffff88011a7e4780 R14: 0000000000000027 R15: 0000000000000000
          FS:  00007fc738404700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00007fc737fdfb10 CR3: 0000000118087000 CR4: 00000000000006e0
          Stack:
           0000000000000200 ffff880111747bd8 ffffffff810ee611 ffff880119f1f220
           ffff880119f1f4f8 ffff880119f1f4f0 ffff88011a7e4780 ffff880119f1f232
           ffff880119f1f220 ffff880111747d58 ffffffff82bca542 0000000000000000
          Call Trace:
           [<ffffffff82bca542>] irda_connect+0x562/0x1190
           [<ffffffff825ae582>] SYSC_connect+0x202/0x2a0
           [<ffffffff825b4489>] SyS_connect+0x9/0x10
           [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
           [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25
          Code: 41 89 ca 48 89 e5 41 57 41 56 41 55 41 54 41 89 d7 53 48 89 fb 48 83 c7 48 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 20 4c 8b 65 10 <0f> b6 04 02 84 c0 74 08 84 c0 0f 8e 4c 04 00 00 80 7b 48 00 74
          RIP  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
           RSP <ffff880111747bb8>
          ---[ end trace 4cda2588bc055b30 ]---
      
      The problem is that irda_open_tsap() can fail and leave self->tsap = NULL,
      and then irttp_connect_request() almost immediately dereferences it.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f7149544
    • Eric Dumazet's avatar
      tcp: make challenge acks less predictable · 56d86b8a
      Eric Dumazet authored
      [ Upstream commit 75ff39cc ]
      
      Yue Cao claims that current host rate limiting of challenge ACKS
      (RFC 5961) could leak enough information to allow a patient attacker
      to hijack TCP sessions. He will soon provide details in an academic
      paper.
      
      This patch increases the default limit from 100 to 1000, and adds
      some randomization so that the attacker can no longer hijack
      sessions without spending a considerable amount of probes.
      
      Based on initial analysis and patch from Linus.
      
      Note that we also have per socket rate limiting, so it is tempting
      to remove the host limit in the future.
      
      v2: randomize the count of challenge acks per second, not the period.
      
      js: backport to 3.12
      
      Fixes: 282f23c6 ("tcp: implement RFC 5961 3.2")
      Reported-by: default avatarYue Cao <ycao009@ucr.edu>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      56d86b8a
    • Daniel Borkmann's avatar
      random32: add prandom_u32_max and convert open coded users · 73596d54
      Daniel Borkmann authored
      commit f337db64 upstream.
      
      Many functions have open coded a function that returns a random
      number in range [0,N-1]. Under the assumption that we have a PRNG
      such as taus113 with being well distributed in [0, ~0U] space,
      we can implement such a function as uword t = (n*m')>>32, where
      m' is a random number obtained from PRNG, n the right open interval
      border and t our resulting random number, with n,m',t in u32 universe.
      
      Lets go with Joe and simply call it prandom_u32_max(), although
      technically we have an right open interval endpoint, but that we
      have documented. Other users can further be migrated to the new
      prandom_u32_max() function later on; for now, we need to make sure
      to migrate reciprocal_divide() users for the reciprocal_divide()
      follow-up fixup since their function signatures are going to change.
      
      Joint work with Hannes Frederic Sowa.
      
      Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      73596d54
    • Dmitri Epshtein's avatar
      net: mvneta: set real interrupt per packet for tx_done · 61ed3761
      Dmitri Epshtein authored
      commit 06708f81 upstream.
      
      Commit aebea2ba ("net: mvneta: fix Tx interrupt delay") intended to
      set coalescing threshold to a value guaranteeing interrupt generation
      per each sent packet, so that buffers can be released with no delay.
      
      In fact setting threshold to '1' was wrong, because it causes interrupt
      every two packets. According to the documentation a reason behind it is
      following - interrupt occurs once sent buffers counter reaches a value,
      which is higher than one specified in MVNETA_TXQ_SIZE_REG(q). This
      behavior was confirmed during tests. Also when testing the SoC working
      as a NAS device, better performance was observed with int-per-packet,
      as it strongly depends on the fact that all transmitted packets are
      released immediately.
      
      This commit enables NETA controller work in interrupt per sent packet mode
      by setting coalescing threshold to 0.
      Signed-off-by: default avatarDmitri Epshtein <dima@marvell.com>
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Fixes aebea2ba ("net: mvneta: fix Tx interrupt delay")
      Acked-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      61ed3761
    • Brian King's avatar
      ipr: Clear interrupt on croc/crocodile when running with LSI · 8968041d
      Brian King authored
      commit 54e430bb upstream.
      
      If we fall back to using LSI on the Croc or Crocodile chip we need to
      clear the interrupt so we don't hang the system.
      Tested-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8968041d
    • Oliver Hartkopp's avatar
      can: fix oops caused by wrong rtnl dellink usage · fbe33977
      Oliver Hartkopp authored
      commit 25e1ed6e upstream.
      
      For 'real' hardware CAN devices the netlink interface is used to set CAN
      specific communication parameters. Real CAN hardware can not be created nor
      removed with the ip tool ...
      
      This patch adds a private dellink function for the CAN device driver interface
      that does just nothing.
      
      It's a follow up to commit 993e6f2f ("can: fix oops caused by wrong rtnl
      newlink usage") but for dellink.
      Reported-by: default avatarajneu <ajneu1@gmail.com>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fbe33977
    • Oliver Hartkopp's avatar
      can: fix handling of unmodifiable configuration options fix · ded1c127
      Oliver Hartkopp authored
      commit bce271f2 upstream.
      
      With upstream commit bb208f14 (can: fix handling of unmodifiable
      configuration options) a new can_validate() function was introduced.
      
      When invoking 'ip link set can0 type can' without any configuration data
      can_validate() tries to validate the content without taking into account that
      there's totally no content. This patch adds a check for missing content.
      Reported-by: default avatarajneu <ajneu1@gmail.com>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ded1c127
    • Wolfgang Grandegger's avatar
      can: at91_can: RX queue could get stuck at high bus load · 8c39ed7f
      Wolfgang Grandegger authored
      commit 43200a44 upstream.
      
      At high bus load it could happen that "at91_poll()" enters with all RX
      message boxes filled up. If then at the end the "quota" is exceeded as
      well, "rx_next" will not be reset to the first RX mailbox and hence the
      interrupts remain disabled.
      Signed-off-by: default avatarWolfgang Grandegger <wg@grandegger.com>
      Tested-by: default avatarAmr Bekhit <amrbekhit@gmail.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8c39ed7f
    • Andi Kleen's avatar
      x86, asmlinkage, lguest: Pass in globals into assembler statement · 2d7bf05e
      Andi Kleen authored
      commit cdd77e87 upstream.
      
      Tell the compiler that the inline assembler statement
      references lguest_entry.
      
      This fixes compile problems with LTO where the variable
      and the assembler code may end up in different files.
      
      Cc: x86@kernel.org
      Cc: rusty@rustcorp.com.au
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2d7bf05e
    • Andrea Arcangeli's avatar
      mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED · 0208cd52
      Andrea Arcangeli authored
      commit ad33bb04 upstream.
      
      pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
      introduced to locklessy (but atomically) detect when a pmd is a regular
      (stable) pmd or when the pmd is unstable and can infinitely transition
      from pmd_none() and pmd_trans_huge() from under us, while only holding
      the mmap_sem for reading (for writing not).
      
      While holding the mmap_sem only for reading, MADV_DONTNEED can run from
      under us and so before we can assume the pmd to be a regular stable pmd
      we need to compare it against pmd_none() and pmd_trans_huge() in an
      atomic way, with pmd_trans_unstable().  The old pmd_trans_huge() left a
      tiny window for a race.
      
      Useful applications are unlikely to notice the difference as doing
      MADV_DONTNEED concurrently with a page fault would lead to undefined
      behavior.
      
      [js] 3.12 backport: no pmd_devmap in 3.12 yet.
      
      [akpm@linux-foundation.org: tidy up comment grammar/layout]
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0208cd52
    • Taras Kondratiuk's avatar
      mmc: block: fix packed command header endianness · 45fabee6
      Taras Kondratiuk authored
      commit f68381a7 upstream.
      
      The code that fills packed command header assumes that CPU runs in
      little-endian mode. Hence the header is malformed in big-endian mode
      and causes MMC data transfer errors:
      
      [  563.200828] mmcblk0: error -110 transferring data, sector 2048, nr 8, cmd response 0x900, card status 0xc40
      [  563.219647] mmcblk0: packed cmd failed, nr 2, sectors 16, failure index: -1
      
      Convert header data to LE.
      Signed-off-by: default avatarTaras Kondratiuk <takondra@cisco.com>
      Fixes: ce39f9d1 ("mmc: support packed write command for eMMC4.5 devices")
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      45fabee6
    • Ursula Braun's avatar
      qeth: delete napi struct when removing a qeth device · 1d0f1ab1
      Ursula Braun authored
      commit 7831b4ff upstream.
      
      A qeth_card contains a napi_struct linked to the net_device during
      device probing. This struct must be deleted when removing the qeth
      device, otherwise Panic on oops can occur when qeth devices are
      repeatedly removed and added.
      
      Fixes: a1c3ed4c ("qeth: NAPI support for l2 and l3 discipline")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Tested-by: default avatarAlexander Klein <ALKL@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1d0f1ab1
    • Vegard Nossum's avatar
      ext4: verify extent header depth · 2dcbf826
      Vegard Nossum authored
      commit 7bc94916 upstream.
      
      Although the extent tree depth of 5 should enough be for the worst
      case of 2*32 extents of length 1, the extent tree code does not
      currently to merge nodes which are less than half-full with a sibling
      node, or to shrink the tree depth if possible.  So it's possible, at
      least in theory, for the tree depth to be greater than 5.  However,
      even in the worst case, a tree depth of 32 is highly unlikely, and if
      the file system is maliciously corrupted, an insanely large eh_depth
      can cause memory allocation failures that will trigger kernel warnings
      (here, eh_depth = 65280):
      
          JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256
          ------------[ cut here ]------------
          WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580
          CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508
          Stack:
           604a8947 625badd8 0002fd09 00000000
           60078643 00000000 62623910 601bf9bc
           62623970 6002fc84 626239b0 900000125
          Call Trace:
           [<6001c2dc>] show_stack+0xdc/0x1a0
           [<601bf9bc>] dump_stack+0x2a/0x2e
           [<6002fc84>] __warn+0x114/0x140
           [<6002fdff>] warn_slowpath_null+0x1f/0x30
           [<60165829>] start_this_handle+0x569/0x580
           [<60165d4e>] jbd2__journal_start+0x11e/0x220
           [<60146690>] __ext4_journal_start_sb+0x60/0xa0
           [<60120a81>] ext4_truncate+0x131/0x3a0
           [<60123677>] ext4_setattr+0x757/0x840
           [<600d5d0f>] notify_change+0x16f/0x2a0
           [<600b2b16>] do_truncate+0x76/0xc0
           [<600c3e56>] path_openat+0x806/0x1300
           [<600c55c9>] do_filp_open+0x89/0xf0
           [<600b4074>] do_sys_open+0x134/0x1e0
           [<600b4140>] SyS_open+0x20/0x30
           [<6001ea68>] handle_syscall+0x88/0x90
           [<600295fd>] userspace+0x3fd/0x500
           [<6001ac55>] fork_handler+0x85/0x90
      
          ---[ end trace 08b0b88b6387a244 ]---
      
      [ Commit message modified and the extent tree depath check changed
      from 5 to 32 -- tytso ]
      
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2dcbf826
    • Cameron Gutman's avatar
      Input: xpad - validate USB endpoint count during probe · 07c6f70c
      Cameron Gutman authored
      commit caca925f upstream.
      
      This prevents a malicious USB device from causing an oops.
      Signed-off-by: default avatarCameron Gutman <aicommander@gmail.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      07c6f70c
    • Ping Cheng's avatar
      Input: wacom_w8001 - w8001_MAX_LENGTH should be 13 · 1b59979d
      Ping Cheng authored
      commit 12afb344 upstream.
      
      Somehow the patch that added two-finger touch support forgot to update
      W8001_MAX_LENGTH from 11 to 13.
      Signed-off-by: default avatarPing Cheng <pingc@wacom.com>
      Reviewed-by: default avatarPeter Hutterer <peter.hutterer@who-t.net>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1b59979d
    • Andrey Grodzovsky's avatar
      xen/pciback: Fix conf_space read/write overlap check. · 0a98c066
      Andrey Grodzovsky authored
      commit 02ef871e upstream.
      
      Current overlap check is evaluating to false a case where a filter
      field is fully contained (proper subset) of a r/w request.  This
      change applies classical overlap check instead to include all the
      scenarios.
      
      More specifically, for (Hilscher GmbH CIFX 50E-DP(M/S)) device driver
      the logic is such that the entire confspace is read and written in 4
      byte chunks. In this case as an example, CACHE_LINE_SIZE,
      LATENCY_TIMER and PCI_BIST are arriving together in one call to
      xen_pcibk_config_write() with offset == 0xc and size == 4.  With the
      exsisting overlap check the LATENCY_TIMER field (offset == 0xd, length
      == 1) is fully contained in the write request and hence is excluded
      from write, which is incorrect.
      Signed-off-by: default avatarAndrey Grodzovsky <andrey2805@gmail.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: default avatarJan Beulich <JBeulich@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0a98c066
    • Alexey Brodkin's avatar
      arc: unwind: warn only once if DW2_UNWIND is disabled · 6140b1a0
      Alexey Brodkin authored
      commit 9bd54517 upstream.
      
      If CONFIG_ARC_DW2_UNWIND is disabled every time arc_unwind_core()
      gets called following message gets printed in debug console:
      ----------------->8---------------
      CONFIG_ARC_DW2_UNWIND needs to be enabled
      ----------------->8---------------
      
      That message makes sense if user indeed wants to see a backtrace or
      get nice function call-graphs in perf but what if user disabled
      unwinder for the purpose? Why pollute his debug console?
      
      So instead we'll warn user about possibly missing feature once and
      let him decide if that was what he or she really wanted.
      Signed-off-by: default avatarAlexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6140b1a0
    • Torsten Hilbrich's avatar
      fs/nilfs2: fix potential underflow in call to crc32_le · 1a2e331d
      Torsten Hilbrich authored
      commit 63d2f95d upstream.
      
      The value `bytes' comes from the filesystem which is about to be
      mounted.  We cannot trust that the value is always in the range we
      expect it to be.
      
      Check its value before using it to calculate the length for the crc32_le
      call.  It value must be larger (or equal) sumoff + 4.
      
      This fixes a kernel bug when accidentially mounting an image file which
      had the nilfs2 magic value 0x3434 at the right offset 0x406 by chance.
      The bytes 0x01 0x00 were stored at 0x408 and were interpreted as a
      s_bytes value of 1.  This caused an underflow when substracting sumoff +
      4 (20) in the call to crc32_le.
      
        BUG: unable to handle kernel paging request at ffff88021e600000
        IP:  crc32_le+0x36/0x100
        ...
        Call Trace:
          nilfs_valid_sb.part.5+0x52/0x60 [nilfs2]
          nilfs_load_super_block+0x142/0x300 [nilfs2]
          init_nilfs+0x60/0x390 [nilfs2]
          nilfs_mount+0x302/0x520 [nilfs2]
          mount_fs+0x38/0x160
          vfs_kern_mount+0x67/0x110
          do_mount+0x269/0xe00
          SyS_mount+0x9f/0x100
          entry_SYSCALL_64_fastpath+0x16/0x71
      
      Link: http://lkml.kernel.org/r/1466778587-5184-2-git-send-email-konishi.ryusuke@lab.ntt.co.jpSigned-off-by: default avatarTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Tested-by: default avatarTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1a2e331d
    • Jan Willeke's avatar
      s390/seccomp: fix error return for filtered system calls · 5f21bb04
      Jan Willeke authored
      commit dc295880 upstream.
      
      The syscall_set_return_value function of s390 negates the error argument
      before storing the value to the return register gpr2. This is incorrect,
      the seccomp code already passes the negative error value.
      Store the unmodified error value to gpr2.
      Signed-off-by: default avatarJan Willeke <willeke@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5f21bb04
    • Hugh Dickins's avatar
      tmpfs: fix regression hang in fallocate undo · 9f00ea67
      Hugh Dickins authored
      commit 7f556567 upstream.
      
      The well-spotted fallocate undo fix is good in most cases, but not when
      fallocate failed on the very first page.  index 0 then passes lend -1
      to shmem_undo_range(), and that has two bad effects: (a) that it will
      undo every fallocation throughout the file, unrestricted by the current
      range; but more importantly (b) it can cause the undo to hang, because
      lend -1 is treated as truncation, which makes it keep on retrying until
      every page has gone, but those already fully instantiated will never go
      away.  Big thank you to xfstests generic/269 which demonstrates this.
      
      Fixes: b9b4bb26 ("tmpfs: don't undo fallocate past its last page")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9f00ea67
    • Anthony Romano's avatar
      tmpfs: don't undo fallocate past its last page · 2bac711d
      Anthony Romano authored
      commit b9b4bb26 upstream.
      
      When fallocate is interrupted it will undo a range that extends one byte
      past its range of allocated pages.  This can corrupt an in-use page by
      zeroing out its first byte.  Instead, undo using the inclusive byte
      range.
      
      Fixes: 1635f6a7 ("tmpfs: undo fallocation on failure")
      Link: http://lkml.kernel.org/r/1462713387-16724-1-git-send-email-anthony.romano@coreos.comSigned-off-by: default avatarAnthony Romano <anthony.romano@coreos.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Brandon Philips <brandon@ifup.co>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2bac711d
    • Jan Beulich's avatar
      xen/acpi: allow xen-acpi-processor driver to load on Xen 4.7 · cd61e45b
      Jan Beulich authored
      commit 6f2d9d99 upstream.
      
      As of Xen 4.7 PV CPUID doesn't expose either of CPUID[1].ECX[7] and
      CPUID[0x80000007].EDX[7] anymore, causing the driver to fail to load on
      both Intel and AMD systems. Doing any kind of hardware capability
      checks in the driver as a prerequisite was wrong anyway: With the
      hypervisor being in charge, all such checking should be done by it. If
      ACPI data gets uploaded despite some missing capability, the hypervisor
      is free to ignore part or all of that data.
      
      Ditch the entire check_prereq() function, and do the only valid check
      (xen_initial_domain()) in the caller in its place.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cd61e45b
    • Steve French's avatar
      Fix reconnect to not defer smb3 session reconnect long after socket reconnect · 875cc09c
      Steve French authored
      commit 4fcd1813 upstream.
      
      Azure server blocks clients that open a socket and don't do anything on it.
      In our reconnect scenarios, we can reconnect the tcp session and
      detect the socket is available but we defer the negprot and SMB3 session
      setup and tree connect reconnection until the next i/o is requested, but
      this looks suspicous to some servers who expect SMB3 negprog and session
      setup soon after a socket is created.
      
      In the echo thread, reconnect SMB3 sessions and tree connections
      that are disconnected.  A later patch will replay persistent (and
      resilient) handle opens.
      Signed-off-by: default avatarSteve French <steve.french@primarydata.com>
      Acked-by: default avatarPavel Shilovsky <pshilovsky@samba.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      875cc09c
    • Wei Fang's avatar
      scsi: fix race between simultaneous decrements of ->host_failed · ac00a333
      Wei Fang authored
      commit 72d8c36e upstream.
      
      sas_ata_strategy_handler() adds the works of the ata error handler to
      system_unbound_wq. This workqueue asynchronously runs work items, so the
      ata error handler will be performed concurrently on different CPUs. In
      this case, ->host_failed will be decreased simultaneously in
      scsi_eh_finish_cmd() on different CPUs, and become abnormal.
      
      It will lead to permanently inequality between ->host_failed and
      ->host_busy, and scsi error handler thread won't start running. IO
      errors after that won't be handled.
      
      Since all scmds must have been handled in the strategy handler, just
      remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy after
      the strategy handler to fix this race.
      
      Fixes: 50824d6c ("[SCSI] libsas: async ata-eh")
      Signed-off-by: default avatarWei Fang <fangwei1@huawei.com>
      Reviewed-by: default avatarJames Bottomley <jejb@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ac00a333
    • Takashi Iwai's avatar
      ALSA: ctl: Stop notification after disconnection · ce3b334f
      Takashi Iwai authored
      commit f388cdcd upstream.
      
      snd_ctl_remove() has a notification for the removal event.  It's
      superfluous when done during the device got disconnected.  Although
      the notification itself is mostly harmless, it may potentially be
      harmful, and should be suppressed.  Actually some components PCM may
      free ctl elements during the disconnect or free callbacks, thus it's
      no theoretical issue.
      
      This patch adds the check of card->shutdown flag for avoiding
      unnecessary notifications after (or during) the disconnect.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ce3b334f
    • Takashi Iwai's avatar
      ALSA: au88x0: Fix calculation in vortex_wtdma_bufshift() · 61216b5a
      Takashi Iwai authored
      commit 62db7152 upstream.
      
      vortex_wtdma_bufshift() function does calculate the page index
      wrongly, first masking then shift, which always results in zero.
      The proper computation is to first shift, then mask.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      61216b5a
    • Takashi Iwai's avatar
      ALSA: dummy: Fix a use-after-free at closing · 9d12ce32
      Takashi Iwai authored
      commit d5dbbe65 upstream.
      
      syzkaller fuzzer spotted a potential use-after-free case in snd-dummy
      driver when hrtimer is used as backend:
      > ==================================================================
      > BUG: KASAN: use-after-free in rb_erase+0x1b17/0x2010 at addr ffff88005e5b6f68
      >  Read of size 8 by task syz-executor/8984
      > =============================================================================
      > BUG kmalloc-192 (Not tainted): kasan: bad access detected
      > -----------------------------------------------------------------------------
      >
      > Disabling lock debugging due to kernel taint
      > INFO: Allocated in 0xbbbbbbbbbbbbbbbb age=18446705582212484632
      > ....
      > [<      none      >] dummy_hrtimer_create+0x49/0x1a0 sound/drivers/dummy.c:464
      > ....
      > INFO: Freed in 0xfffd8e09 age=18446705496313138713 cpu=2164287125 pid=-1
      > [<      none      >] dummy_hrtimer_free+0x68/0x80 sound/drivers/dummy.c:481
      > ....
      > Call Trace:
      >  [<ffffffff8179e59e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:333
      >  [<     inline     >] rb_set_parent include/linux/rbtree_augmented.h:111
      >  [<     inline     >] __rb_erase_augmented include/linux/rbtree_augmented.h:218
      >  [<ffffffff82ca5787>] rb_erase+0x1b17/0x2010 lib/rbtree.c:427
      >  [<ffffffff82cb02e8>] timerqueue_del+0x78/0x170 lib/timerqueue.c:86
      >  [<ffffffff814d0c80>] __remove_hrtimer+0x90/0x220 kernel/time/hrtimer.c:903
      >  [<     inline     >] remove_hrtimer kernel/time/hrtimer.c:945
      >  [<ffffffff814d23da>] hrtimer_try_to_cancel+0x22a/0x570 kernel/time/hrtimer.c:1046
      >  [<ffffffff814d2742>] hrtimer_cancel+0x22/0x40 kernel/time/hrtimer.c:1066
      >  [<ffffffff85420531>] dummy_hrtimer_stop+0x91/0xb0 sound/drivers/dummy.c:417
      >  [<ffffffff854228bf>] dummy_pcm_trigger+0x17f/0x1e0 sound/drivers/dummy.c:507
      >  [<ffffffff85392170>] snd_pcm_do_stop+0x160/0x1b0 sound/core/pcm_native.c:1106
      >  [<ffffffff85391b26>] snd_pcm_action_single+0x76/0x120 sound/core/pcm_native.c:956
      >  [<ffffffff85391e01>] snd_pcm_action+0x231/0x290 sound/core/pcm_native.c:974
      >  [<     inline     >] snd_pcm_stop sound/core/pcm_native.c:1139
      >  [<ffffffff8539754d>] snd_pcm_drop+0x12d/0x1d0 sound/core/pcm_native.c:1784
      >  [<ffffffff8539d3be>] snd_pcm_common_ioctl1+0xfae/0x2150 sound/core/pcm_native.c:2805
      >  [<ffffffff8539ee91>] snd_pcm_capture_ioctl1+0x2a1/0x5e0 sound/core/pcm_native.c:2976
      >  [<ffffffff8539f2ec>] snd_pcm_kernel_ioctl+0x11c/0x160 sound/core/pcm_native.c:3020
      >  [<ffffffff853d9a44>] snd_pcm_oss_sync+0x3a4/0xa30 sound/core/oss/pcm_oss.c:1693
      >  [<ffffffff853da27d>] snd_pcm_oss_release+0x1ad/0x280 sound/core/oss/pcm_oss.c:2483
      >  .....
      
      A workaround is to call hrtimer_cancel() in dummy_hrtimer_sync() which
      is called certainly before other blocking ops.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9d12ce32
    • Dmitry Torokhov's avatar
      tty/vt/keyboard: fix OOB access in do_compute_shiftstate() · f401ce4c
      Dmitry Torokhov authored
      commit 510cccb5 upstream.
      
      The size of individual keymap in drivers/tty/vt/keyboard.c is NR_KEYS,
      which is currently 256, whereas number of keys/buttons in input device (and
      therefor in key_down) is much larger - KEY_CNT - 768, and that can cause
      out-of-bound access when we do
      
      	sym = U(key_maps[0][k]);
      
      with large 'k'.
      
      To fix it we should not attempt iterating beyond smaller of NR_KEYS and
      KEY_CNT.
      
      Also while at it let's switch to for_each_set_bit() instead of open-coding
      it.
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f401ce4c
    • Mark Brown's avatar
      iio:ad7266: Fix probe deferral for vref · 56008dd6
      Mark Brown authored
      commit 68b356eb upstream.
      
      Currently the ad7266 driver treats any failure to get vref as though the
      regulator were not present but this means that if probe deferral is
      triggered the driver will act as though the regulator were not present.
      Instead only use the internal reference if we explicitly got -ENODEV which
      is what is returned for absent regulators.
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      56008dd6
    • Mark Brown's avatar
      iio:ad7266: Fix support for optional regulators · 801044f8
      Mark Brown authored
      commit e5511c81 upstream.
      
      The ad7266 driver attempts to support deciding between the use of internal
      and external power supplies by checking to see if an error is returned when
      requesting the regulator. This doesn't work with the current code since the
      driver uses a normal regulator_get() which is for non-optional supplies
      and so assumes that if a regulator is not provided by the platform then
      this is a bug in the platform integration and so substitutes a dummy
      regulator. Use regulator_get_optional() instead which indicates to the
      framework that the regulator may be absent and provides a dummy regulator
      instead.
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      801044f8
    • Mark Brown's avatar
      iio:ad7266: Fix broken regulator error handling · bd2f349d
      Mark Brown authored
      commit 6b7f4e25 upstream.
      
      All regulator_get() variants return either a pointer to a regulator or an
      ERR_PTR() so testing for NULL makes no sense and may lead to bugs if we
      use NULL as a valid regulator. Fix this by using IS_ERR() as expected.
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bd2f349d
    • Linus Walleij's avatar
      iio: accel: kxsd9: fix the usage of spi_w8r8() · 95ac1169
      Linus Walleij authored
      commit 0c1f91b9 upstream.
      
      These two spi_w8r8() calls return a value with is used by the code
      following the error check. The dubious use was caused by a cleanup
      patch.
      
      Fixes: d34dbee8 ("staging:iio:accel:kxsd9 cleanup and conversion to iio_chan_spec.")
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      95ac1169
    • Luis de Bethencourt's avatar
      staging: iio: accel: fix error check · 14a450ef
      Luis de Bethencourt authored
      commit ef3149eb upstream.
      
      sca3000_read_ctrl_reg() returns a negative number on failure, check for
      this instead of zero.
      Signed-off-by: default avatarLuis de Bethencourt <luisbg@osg.samsung.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      14a450ef
    • Crestez Dan Leonard's avatar
      iio: Fix error handling in iio_trigger_attach_poll_func · 4521af12
      Crestez Dan Leonard authored
      commit 99543823 upstream.
      
      When attaching a pollfunc iio_trigger_attach_poll_func will allocate a
      virtual irq and call the driver's set_trigger_state function. Fix error
      handling to undo previous steps if any fails.
      
      In particular this fixes handling errors from a driver's
      set_trigger_state function. When using triggered buffers a failure to
      enable the trigger used to make the buffer unusable.
      Signed-off-by: default avatarCrestez Dan Leonard <leonard.crestez@intel.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4521af12
    • Lyude's avatar
      drm/i915/ilk: Don't disable SSC source if it's in use · fb3bb94d
      Lyude authored
      commit 476490a9 upstream.
      
      Thanks to Ville Syrjälä for pointing me towards the cause of this issue.
      
      Unfortunately one of the sideaffects of having the refclk for a DPLL set
      to SSC is that as long as it's set to SSC, the GPU will prevent us from
      powering down any of the pipes or transcoders using it. A couple of
      BIOSes enable SSC in both PCH_DREF_CONTROL and in the DPLL
      configurations. This causes issues on the first modeset, since we don't
      expect SSC to be left on and as a result, can't successfully power down
      the pipes or the transcoders using it. Here's an example from this Dell
      OptiPlex 990:
      
      [drm:intel_modeset_init] SSC enabled by BIOS, overriding VBT which says disabled
      [drm:intel_modeset_init] 2 display pipes available.
      [drm:intel_update_cdclk] Current CD clock rate: 400000 kHz
      [drm:intel_update_max_cdclk] Max CD clock rate: 400000 kHz
      [drm:intel_update_max_cdclk] Max dotclock rate: 360000 kHz
      vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
      [drm:intel_crt_reset] crt adpa set to 0xf40000
      [drm:intel_dp_init_connector] Adding DP connector on port C
      [drm:intel_dp_aux_init] registering DPDDC-C bus for card0-DP-1
      [drm:ironlake_init_pch_refclk] has_panel 0 has_lvds 0 has_ck505 0
      [drm:ironlake_init_pch_refclk] Disabling SSC entirely
      … later we try committing the first modeset …
      [drm:intel_dump_pipe_config] [CRTC:26][modeset] config ffff88041b02e800 for pipe A
      [drm:intel_dump_pipe_config] cpu_transcoder: A
      …
      [drm:intel_dump_pipe_config] dpll_hw_state: dpll: 0xc4016001, dpll_md: 0x0, fp0: 0x20e08, fp1: 0x30d07
      [drm:intel_dump_pipe_config] planes on this crtc
      [drm:intel_dump_pipe_config] STANDARD PLANE:23 plane: 0.0 idx: 0 enabled
      [drm:intel_dump_pipe_config]     FB:42, fb = 800x600 format = 0x34325258
      [drm:intel_dump_pipe_config]     scaler:0 src (0, 0) 800x600 dst (0, 0) 800x600
      [drm:intel_dump_pipe_config] CURSOR PLANE:25 plane: 0.1 idx: 1 disabled, scaler_id = 0
      [drm:intel_dump_pipe_config] STANDARD PLANE:27 plane: 0.1 idx: 2 disabled, scaler_id = 0
      [drm:intel_get_shared_dpll] CRTC:26 allocated PCH DPLL A
      [drm:intel_get_shared_dpll] using PCH DPLL A for pipe A
      [drm:ilk_audio_codec_disable] Disable audio codec on port C, pipe A
      [drm:intel_disable_pipe] disabling pipe A
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 130 at drivers/gpu/drm/i915/intel_display.c:1146 intel_disable_pipe+0x297/0x2d0 [i915]
      pipe_off wait timed out
      …
      ---[ end trace 94fc8aa03ae139e8 ]---
      [drm:intel_dp_link_down]
      [drm:ironlake_crtc_disable [i915]] *ERROR* failed to disable transcoder A
      
      Later modesets succeed since they reset the DPLL's configuration anyway,
      but this is enough to get stuck with a big fat warning in dmesg.
      
      A better solution would be to add refcounts for the SSC source, but for
      now leaving the source clock on should suffice.
      
      Changes since v4:
       - Fix calculation of final for systems with LVDS panels (fixes BUG() on
         CI test suite)
      Changes since v3:
       - Move temp variable into loop
       - Move checks for using_ssc_source to after we've figured out has_ck505
       - Add using_ssc_source to debug output
      Changes since v2:
       - Fix debug output for when we disable the CPU source
      Changes since v1:
       - Leave the SSC source clock on instead of just shutting it off on all
         of the DPLL configurations.
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarLyude <cpaul@redhat.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1465916649-10228-1-git-send-email-cpaul@redhat.comSigned-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fb3bb94d
    • Alex Deucher's avatar
      drm/radeon: fix asic initialization for virtualized environments · a5591555
      Alex Deucher authored
      commit 05082b8b upstream.
      
      When executing in a PCI passthrough based virtuzliation environment, the
      hypervisor will usually attempt to send a PCIe bus reset signal to the
      ASIC when the VM reboots. In this scenario, the card is not correctly
      initialized, but we still consider it to be posted. Therefore, in a
      passthrough based environemnt we should always post the card to guarantee
      it is in a good state for driver initialization.
      
      Ported from amdgpu commit:
      amdgpu: fix asic initialization for virtualized environments
      
      Cc: Andres Rodriguez <andres.rodriguez@amd.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a5591555
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Handle NULL formats in hold_module_trace_bprintk_format() · b16e8932
      Steven Rostedt (Red Hat) authored
      commit 70c8217a upstream.
      
      If a task uses a non constant string for the format parameter in
      trace_printk(), then the trace_printk_fmt variable is set to NULL. This
      variable is then saved in the __trace_printk_fmt section.
      
      The function hold_module_trace_bprintk_format() checks to see if duplicate
      formats are used by modules, and reuses them if so (saves them to the list
      if it is new). But this function calls lookup_format() that does a strcmp()
      to the value (which is now NULL) and can cause a kernel oops.
      
      This wasn't an issue till 3debb0a9 ("tracing: Fix trace_printk() to print
      when not using bprintk()") which added "__used" to the trace_printk_fmt
      variable, and before that, the kernel simply optimized it out (no NULL value
      was saved).
      
      The fix is simply to handle the NULL pointer in lookup_format() and have the
      caller ignore the value if it was NULL.
      
      Link: http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.xing@intel.comReported-by: default avatarxingzhen <zhengjun.xing@intel.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Fixes: 3debb0a9 ("tracing: Fix trace_printk() to print when not using bprintk()")
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b16e8932
    • Xiubo Li's avatar
      kvm: Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES · 08ad57e2
      Xiubo Li authored
      commit caf1ff26 upstream.
      
      These days, we experienced one guest crash with 8 cores and 3 disks,
      with qemu error logs as bellow:
      
      qemu-system-x86_64: /build/qemu-2.0.0/kvm-all.c:984:
      kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
      
      And then we found one patch(bdf026317d) in qemu tree, which said
      could fix this bug.
      
      Execute the following script will reproduce the BUG quickly:
      
      irq_affinity.sh
      ========================================================================
      
      vda_irq_num=25
      vdb_irq_num=27
      while [ 1 ]
      do
          for irq in {1,2,4,8,10,20,40,80}
              do
                  echo $irq > /proc/irq/$vda_irq_num/smp_affinity
                  echo $irq > /proc/irq/$vdb_irq_num/smp_affinity
                  dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct
                  dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct
              done
      done
      ========================================================================
      
      The following qemu log is added in the qemu code and is displayed when
      this bug reproduced:
      
      kvm_irqchip_commit_routes: max gsi: 1008, nr_allocated_irq_routes: 1024,
      irq_routes->nr: 1024, gsi_count: 1024.
      
      That's to say when irq_routes->nr == 1024, there are 1024 routing entries,
      but in the kernel code when routes->nr >= 1024, will just return -EINVAL;
      
      The nr is the number of the routing entries which is in of
      [1 ~ KVM_MAX_IRQ_ROUTES], not the index in [0 ~ KVM_MAX_IRQ_ROUTES - 1].
      
      This patch fix the BUG above.
      Signed-off-by: default avatarXiubo Li <lixiubo@cmss.chinamobile.com>
      Signed-off-by: default avatarWei Tang <tangwei@cmss.chinamobile.com>
      Signed-off-by: default avatarZhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      08ad57e2
    • Ilya Dryomov's avatar
      libceph: apply new_state before new_up_client on incrementals · bbc3aa6b
      Ilya Dryomov authored
      commit 930c5328 upstream.
      
      Currently, osd_weight and osd_state fields are updated in the encoding
      order.  This is wrong, because an incremental map may look like e.g.
      
          new_up_client: { osd=6, addr=... } # set osd_state and addr
          new_state: { osd=6, xorstate=EXISTS } # clear osd_state
      
      Suppose osd6's current osd_state is EXISTS (i.e. osd6 is down).  After
      applying new_up_client, osd_state is changed to EXISTS | UP.  Carrying
      on with the new_state update, we flip EXISTS and leave osd6 in a weird
      "!EXISTS but UP" state.  A non-existent OSD is considered down by the
      mapping code
      
      2087    for (i = 0; i < pg->pg_temp.len; i++) {
      2088            if (ceph_osd_is_down(osdmap, pg->pg_temp.osds[i])) {
      2089                    if (ceph_can_shift_osds(pi))
      2090                            continue;
      2091
      2092                    temp->osds[temp->size++] = CRUSH_ITEM_NONE;
      
      and so requests get directed to the second OSD in the set instead of
      the first, resulting in OSD-side errors like:
      
      [WRN] : client.4239 192.168.122.21:0/2444980242 misdirected client.4239.1:2827 pg 2.5df899f2 to osd.4 not [1,4,6] in e680/680
      
      and hung rbds on the client:
      
      [  493.566367] rbd: rbd0: write 400000 at 11cc00000 (0)
      [  493.566805] rbd: rbd0:   result -6 xferred 400000
      [  493.567011] blk_update_request: I/O error, dev rbd0, sector 9330688
      
      The fix is to decouple application from the decoding and:
      - apply new_weight first
      - apply new_state before new_up_client
      - twiddle osd_state flags if marking in
      - clear out some of the state if osd is destroyed
      
      Fixes: http://tracker.ceph.com/issues/14901Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarJosh Durgin <jdurgin@redhat.com>
      [idryomov@gmail.com: backport to 3.10-3.14: strip primary-affinity]
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bbc3aa6b