1. 10 Feb, 2017 40 commits
    • Willy Tarreau's avatar
      Linux 3.10.105 · ec55e7c2
      Willy Tarreau authored
      ec55e7c2
    • Guenter Roeck's avatar
      metag: Only define atomic_dec_if_positive conditionally · 63d7f335
      Guenter Roeck authored
      commit 35d04077 upstream.
      
      The definition of atomic_dec_if_positive() assumes that
      atomic_sub_if_positive() exists, which is only the case if
      metag specific atomics are used. This results in the following
      build error when trying to build metag1_defconfig.
      
      kernel/ucount.c: In function 'dec_ucount':
      kernel/ucount.c:211: error:
      	implicit declaration of function 'atomic_sub_if_positive'
      
      Moving the definition of atomic_dec_if_positive() into the metag
      conditional code fixes the problem.
      
      Fixes: 6006c0d8 ("metag: Atomics, locks and bitops")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      63d7f335
    • Max Staudt's avatar
      fbdev/efifb: Fix 16 color palette entry calculation · 16654a56
      Max Staudt authored
      commit d50b3f43 upstream.
      
      When using efifb with a 16-bit (5:6:5) visual, fbcon's text is rendered
      in the wrong colors - e.g. text gray (#aaaaaa) is rendered as green
      (#50bc50) and neighboring pixels have slightly different values
      (such as #50bc78).
      
      The reason is that fbcon loads its 16 color palette through
      efifb_setcolreg(), which in turn calculates a 32-bit value to write
      into memory for each palette index.
      Until now, this code could only handle 8-bit visuals and didn't mask
      overlapping values when ORing them.
      
      With this patch, fbcon displays the correct colors when a qemu VM is
      booted in 16-bit mode (in GRUB: "set gfxpayload=800x600x16").
      
      Fixes: 7c83172b ("x86_64 EFI boot support: EFI frame buffer driver")  # v2.6.24+
      Signed-off-by: default avatarMax Staudt <mstaudt@suse.de>
      Acked-By: default avatarPeter Jones <pjones@redhat.com>
      Signed-off-by: default avatarTomi Valkeinen <tomi.valkeinen@ti.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      16654a56
    • Bart Van Assche's avatar
      dm: mark request_queue dead before destroying the DM device · fa88cd79
      Bart Van Assche authored
      commit 3b785fbc upstream.
      
      This avoids that new requests are queued while __dm_destroy() is in
      progress.
      
      [js] use md->queue instead of non-present helper
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      fa88cd79
    • Jan Remmet's avatar
      regulator: tps65910: Work around silicon erratum SWCZ010 · 7fbbad13
      Jan Remmet authored
      commit 8f9165c9 upstream.
      
      http://www.ti.com/lit/pdf/SWCZ010:
        DCDC o/p voltage can go higher than programmed value
      
      Impact:
      VDDI, VDD2, and VIO output programmed voltage level can go higher than
      expected or crash, when coming out of PFM to PWM mode or using DVFS.
      
      Description:
      When DCDC CLK SYNC bits are 11/01:
      * VIO 3-MHz oscillator is the source clock of the digital core and input
        clock of VDD1 and VDD2
      * Turn-on of VDD1 and VDD2 HSD PFETis synchronized or at a constant
        phase shift
      * Current pulled though VCC1+VCC2 is Iload(VDD1) + Iload(VDD2)
      * The 3 HSD PFET will be turned-on at the same time, causing the highest
        possible switching noise on the application. This noise level depends
        on the layout, the VBAT level, and the load current. The noise level
        increases with improper layout.
      
      When DCDC CLK SYNC bits are 00:
      * VIO 3-MHz oscillator is the source clock of digital core
      * VDD1 and VDD2 are running on their own 3-MHz oscillator
      * Current pulled though VCC1+VCC2 average of Iload(VDD1) + Iload(VDD2)
      * The switching noise of the 3 SMPS will be randomly spread over time,
        causing lower overall switching noise.
      
      Workaround:
      Set DCDCCTRL_REG[1:0]= 00.
      Signed-off-by: default avatarJan Remmet <j.remmet@phytec.de>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7fbbad13
    • Peter Ujfalusi's avatar
      ASoC: omap-mcpdm: Fix irq resource handling · bc90bffd
      Peter Ujfalusi authored
      commit a8719670 upstream.
      
      Fixes: ddd17531 ("ASoC: omap-mcpdm: Clean up with devm_* function")
      
      Managed irq request will not doing any good in ASoC probe level as it is
      not going to free up the irq when the driver is unbound from the sound
      card.
      Signed-off-by: default avatarPeter Ujfalusi <peter.ujfalusi@ti.com>
      Reported-by: default avatarRussell King <linux@armlinux.org.uk>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bc90bffd
    • Dan Carpenter's avatar
      mfd: 88pm80x: Double shifting bug in suspend/resume · a28d8358
      Dan Carpenter authored
      commit 9a6dc644 upstream.
      
      set_bit() and clear_bit() take the bit number so this code is really
      doing "1 << (1 << irq)" which is a double shift bug.  It's done
      consistently so it won't cause a problem unless "irq" is more than 4.
      
      Fixes: 70c6cce0 ('mfd: Support 88pm80x in 80x driver')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a28d8358
    • Andrey Ryabinin's avatar
      mpi: Fix NULL ptr dereference in mpi_powm() [ver #3] · 940d1175
      Andrey Ryabinin authored
      commit f5527fff upstream.
      
      This fixes CVE-2016-8650.
      
      If mpi_powm() is given a zero exponent, it wants to immediately return
      either 1 or 0, depending on the modulus.  However, if the result was
      initalised with zero limb space, no limbs space is allocated and a
      NULL-pointer exception ensues.
      
      Fix this by allocating a minimal amount of limb space for the result when
      the 0-exponent case when the result is 1 and not touching the limb space
      when the result is 0.
      
      This affects the use of RSA keys and X.509 certificates that carry them.
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      PGD 0
      Oops: 0002 [#1] SMP
      Modules linked in:
      CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff8804011944c0 task.stack: ffff880401294000
      RIP: 0010:[<ffffffff8138ce5d>]  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      RSP: 0018:ffff880401297ad8  EFLAGS: 00010212
      RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0
      RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0
      RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000
      R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50
      FS:  00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0
      Stack:
       ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4
       0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30
       ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8
      Call Trace:
       [<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66
       [<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d
       [<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd
       [<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146
       [<ffffffff8132a95c>] rsa_verify+0x9d/0xee
       [<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb
       [<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1
       [<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228
       [<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4
       [<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1
       [<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1
       [<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61
       [<ffffffff812fc9f3>] key_create_or_update+0x145/0x399
       [<ffffffff812fe227>] SyS_add_key+0x154/0x19e
       [<ffffffff81001c2b>] do_syscall_64+0x80/0x191
       [<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25
      Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f
      RIP  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
       RSP <ffff880401297ad8>
      CR2: 0000000000000000
      ---[ end trace d82015255d4a5d8d ]---
      
      Basically, this is a backport of a libgcrypt patch:
      
      	http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526
      
      Fixes: cdec9cb5 ("crypto: GnuPG based MPI lib - source files (part 1)")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
      cc: linux-ima-devel@lists.sourceforge.net
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      940d1175
    • Michael Walle's avatar
      hwmon: (adt7411) set bit 3 in CFG1 register · 2c8c5e67
      Michael Walle authored
      commit b53893aa upstream.
      
      According to the datasheet you should only write 1 to this bit. If it is
      not set, at least AIN3 will return bad values on newer silicon revisions.
      
      Fixes: d84ca5b3 ("hwmon: Add driver for ADT7411 voltage and temperature sensor")
      Signed-off-by: default avatarMichael Walle <michael@walle.cc>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2c8c5e67
    • Sergei Miroshnichenko's avatar
      can: dev: fix deadlock reported after bus-off · ebee2e2f
      Sergei Miroshnichenko authored
      commit 9abefcb1 upstream.
      
      A timer was used to restart after the bus-off state, leading to a
      relatively large can_restart() executed in an interrupt context,
      which in turn sets up pinctrl. When this happens during system boot,
      there is a high probability of grabbing the pinctrl_list_mutex,
      which is locked already by the probe() of other device, making the
      kernel suspect a deadlock condition [1].
      
      To resolve this issue, the restart_timer is replaced by a delayed
      work.
      
      [1] https://github.com/victronenergy/venus/issues/24Signed-off-by: default avatarSergei Miroshnichenko <sergeimir@emcraft.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      ebee2e2f
    • zhong jiang's avatar
      mm,ksm: fix endless looping in allocating memory when ksm enable · 3419dc51
      zhong jiang authored
      commit 5b398e41 upstream.
      
      I hit the following hung task when runing a OOM LTP test case with 4.1
      kernel.
      
      Call trace:
      [<ffffffc000086a88>] __switch_to+0x74/0x8c
      [<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
      [<ffffffc000a1c09c>] schedule+0x3c/0x94
      [<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
      [<ffffffc000a1e32c>] down_write+0x64/0x80
      [<ffffffc00021f794>] __ksm_exit+0x90/0x19c
      [<ffffffc0000be650>] mmput+0x118/0x11c
      [<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
      [<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
      [<ffffffc0000d0f34>] get_signal+0x444/0x5e0
      [<ffffffc000089fcc>] do_signal+0x1d8/0x450
      [<ffffffc00008a35c>] do_notify_resume+0x70/0x78
      
      The oom victim cannot terminate because it needs to take mmap_sem for
      write while the lock is held by ksmd for read which loops in the page
      allocator
      
      ksm_do_scan
      	scan_get_next_rmap_item
      		down_read
      		get_next_rmap_item
      			alloc_rmap_item   #ksmd will loop permanently.
      
      There is no way forward because the oom victim cannot release any memory
      in 4.1 based kernel.  Since 4.6 we have the oom reaper which would solve
      this problem because it would release the memory asynchronously.
      Nevertheless we can relax alloc_rmap_item requirements and use
      __GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan
      would just retry later after the lock got dropped.
      
      Such a patch would be also easy to backport to older stable kernels which
      do not have oom_reaper.
      
      While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed
      by the allocation failure.
      
      Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.comSigned-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Suggested-by: default avatarHugh Dickins <hughd@google.com>
      Suggested-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3419dc51
    • Mike Snitzer's avatar
      dm flakey: fix reads to be issued if drop_writes configured · 46049e5c
      Mike Snitzer authored
      commit 299f6230 upstream.
      
      v4.8-rc3 commit 99f3c90d ("dm flakey: error READ bios during the
      down_interval") overlooked the 'drop_writes' feature, which is meant to
      allow reads to be issued rather than errored, during the down_interval.
      
      Fixes: 99f3c90d ("dm flakey: error READ bios during the down_interval")
      Reported-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      46049e5c
    • Chris Metcalf's avatar
      tile: avoid using clocksource_cyc2ns with absolute cycle count · 04be31f5
      Chris Metcalf authored
      commit e658a6f1 upstream.
      
      For large values of "mult" and long uptimes, the intermediate
      result of "cycles * mult" can overflow 64 bits.  For example,
      the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
      we have mult = 853, and after 208.5 days, we overflow 64 bits.
      
      Since clocksource_cyc2ns() is intended to be used for relative
      cycle counts, not absolute cycle counts, performance is more
      importance than accepting a wider range of cycle values.  So,
      just use mult_frac() directly in tile's sched_clock().
      
      Commit 4cecf6d4 ("sched, x86: Avoid unnecessary overflow
      in sched_clock") by Salman Qazi results in essentially the same
      generated code for x86 as this change does for tile.  In fact,
      a follow-on change by Salman introduced mult_frac() and switched
      to using it, so the C code was largely identical at that point too.
      
      Peter Zijlstra then added mul_u64_u32_shr() and switched x86
      to use it.  This is, in principle, better; by optimizing the
      64x64->64 multiplies to be 32x32->64 multiplies we can potentially
      save some time.  However, the compiler piplines the 64x64->64
      multiplies pretty well, and the conditional branch in the generic
      mul_u64_u32_shr() causes some bubbles in execution, with the
      result that it's pretty much a wash.  If tilegx provided its own
      implementation of mul_u64_u32_shr() without the conditional branch,
      we could potentially save 3 cycles, but that seems like small gain
      for a fair amount of additional build scaffolding; no other platform
      currently provides a mul_u64_u32_shr() override, and tile doesn't
      currently have an <asm/div64.h> header to put the override in.
      
      Additionally, gcc currently has an optimization bug that prevents
      it from recognizing the opportunity to use a 32x32->64 multiply,
      and so the result would be no better than the existing mult_frac()
      until such time as the compiler is fixed.
      
      For now, just using mult_frac() seems like the right answer.
      Signed-off-by: default avatarChris Metcalf <cmetcalf@mellanox.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      04be31f5
    • Myron Stowe's avatar
      PCI: Handle read-only BARs on AMD CS553x devices · 569d62a3
      Myron Stowe authored
      commit 06cf35f9 upstream.
      
      Some AMD CS553x devices have read-only BARs because of a firmware or
      hardware defect.  There's a workaround in quirk_cs5536_vsa(), but it no
      longer works after 36e81648 ("PCI: Restore detection of read-only
      BARs").  Prior to 36e81648, we filled in res->start; afterwards we
      leave it zeroed out.  The quirk only updated the size, so the driver tried
      to use a region starting at zero, which didn't work.
      
      Expand quirk_cs5536_vsa() to read the base addresses from the BARs and
      hard-code the sizes.
      
      On Nix's system BAR 2's read-only value is 0x6200.  Prior to 36e81648,
      we interpret that as a 512-byte BAR based on the lowest-order bit set.  Per
      datasheet sec 5.6.1, that BAR (MFGPT) requires only 64 bytes; use that to
      avoid clearing any address bits if a platform uses only 64-byte alignment.
      
      [js] pcibios_bus_to_resource takes pdev, not bus in 3.12
      
      [bhelgaas: changelog, reduce BAR 2 size to 64]
      Fixes: 36e81648 ("PCI: Restore detection of read-only BARs")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=85991#c4
      Link: http://support.amd.com/TechDocs/31506_cs5535_databook.pdf
      Link: http://support.amd.com/TechDocs/33238G_cs5536_db.pdfReported-and-tested-by: default avatarNix <nix@esperi.org.uk>
      Signed-off-by: default avatarMyron Stowe <myron.stowe@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      569d62a3
    • Punit Agrawal's avatar
      ACPI / APEI: Fix incorrect return value of ghes_proc() · bdb29c7b
      Punit Agrawal authored
      commit 806487a8 upstream.
      
      Although ghes_proc() tests for errors while reading the error status,
      it always return success (0). Fix this by propagating the return
      value.
      
      Fixes: d334a491 (ACPI, APEI, Generic Hardware Error Source memory error support)
      Signed-of-by: default avatarPunit Agrawal <punit.agrawa.@arm.com>
      Tested-by: default avatarTyler Baicar <tbaicar@codeaurora.org>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      [ rjw: Subject ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      bdb29c7b
    • Alexander Usyskin's avatar
      mei: bus: fix received data size check in NFC fixup · 2ef72292
      Alexander Usyskin authored
      commit 582ab27a upstream.
      
      NFC version reply size checked against only header size, not against
      full message size. That may lead potentially to uninitialized memory access
      in version data.
      
      That leads to warnings when version data is accessed:
      drivers/misc/mei/bus-fixup.c: warning: '*((void *)&ver+11)' may be used uninitialized in this function [-Wuninitialized]:  => 212:2
      
      Reported in
      Build regressions/improvements in v4.9-rc3
      https://lkml.org/lkml/2016/10/30/57
      
      [js] the check is in 3.12 only once
      
      Fixes: 59fcd7c6 (mei: nfc: Initial nfc implementation)
      Signed-off-by: default avatarAlexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: default avatarTomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      2ef72292
    • Arnd Bergmann's avatar
      staging: iio: ad5933: avoid uninitialized variable in error case · 0b0d4fc3
      Arnd Bergmann authored
      commit 34eee70a upstream.
      
      The ad5933_i2c_read function returns an error code to indicate
      whether it could read data or not. However ad5933_work() ignores
      this return code and just accesses the data unconditionally,
      which gets detected by gcc as a possible bug:
      
      drivers/staging/iio/impedance-analyzer/ad5933.c: In function 'ad5933_work':
      drivers/staging/iio/impedance-analyzer/ad5933.c:649:16: warning: 'status' may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      This adds minimal error handling so we only evaluate the
      data if it was correctly read.
      
      Link: https://patchwork.kernel.org/patch/8110281/Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0b0d4fc3
    • Long Li's avatar
      hv: do not lose pending heartbeat vmbus packets · c670aaed
      Long Li authored
      commit 407a3aee upstream.
      
      The host keeps sending heartbeat packets independent of the
      guest responding to them.  Even though we respond to the heartbeat messages at
      interrupt level, we can have situations where there maybe multiple heartbeat
      messages pending that have not been responded to. For instance this occurs when the
      VM is paused and the host continues to send the heartbeat messages.
      Address this issue by draining and responding to all
      the heartbeat messages that maybe pending.
      Signed-off-by: default avatarLong Li <longli@microsoft.com>
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      c670aaed
    • David Howells's avatar
      KEYS: Fix short sprintf buffer in /proc/keys show function · b9eabdf0
      David Howells authored
      commit 03dab869 upstream.
      
      This fixes CVE-2016-7042.
      
      Fix a short sprintf buffer in proc_keys_show().  If the gcc stack protector
      is turned on, this can cause a panic due to stack corruption.
      
      The problem is that xbuf[] is not big enough to hold a 64-bit timeout
      rendered as weeks:
      
      	(gdb) p 0xffffffffffffffffULL/(60*60*24*7)
      	$2 = 30500568904943
      
      That's 14 chars plus NUL, not 11 chars plus NUL.
      
      Expand the buffer to 16 chars.
      
      I think the unpatched code apparently works if the stack-protector is not
      enabled because on a 32-bit machine the buffer won't be overflowed and on a
      64-bit machine there's a 64-bit aligned pointer at one side and an int that
      isn't checked again on the other side.
      
      The panic incurred looks something like:
      
      Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81352ebe
      CPU: 0 PID: 1692 Comm: reproducer Not tainted 4.7.2-201.fc24.x86_64 #1
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       0000000000000086 00000000fbbd2679 ffff8800a044bc00 ffffffff813d941f
       ffffffff81a28d58 ffff8800a044bc98 ffff8800a044bc88 ffffffff811b2cb6
       ffff880000000010 ffff8800a044bc98 ffff8800a044bc30 00000000fbbd2679
      Call Trace:
       [<ffffffff813d941f>] dump_stack+0x63/0x84
       [<ffffffff811b2cb6>] panic+0xde/0x22a
       [<ffffffff81352ebe>] ? proc_keys_show+0x3ce/0x3d0
       [<ffffffff8109f7f9>] __stack_chk_fail+0x19/0x30
       [<ffffffff81352ebe>] proc_keys_show+0x3ce/0x3d0
       [<ffffffff81350410>] ? key_validate+0x50/0x50
       [<ffffffff8134db30>] ? key_default_cmp+0x20/0x20
       [<ffffffff8126b31c>] seq_read+0x2cc/0x390
       [<ffffffff812b6b12>] proc_reg_read+0x42/0x70
       [<ffffffff81244fc7>] __vfs_read+0x37/0x150
       [<ffffffff81357020>] ? security_file_permission+0xa0/0xc0
       [<ffffffff81246156>] vfs_read+0x96/0x130
       [<ffffffff81247635>] SyS_read+0x55/0xc0
       [<ffffffff817eb872>] entry_SYSCALL_64_fastpath+0x1a/0xa4
      Reported-by: default avatarOndrej Kozina <okozina@redhat.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarOndrej Kozina <okozina@redhat.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      b9eabdf0
    • Jan Viktorin's avatar
      uio: fix dmem_region_start computation · a6e9fafc
      Jan Viktorin authored
      commit 4d31a258 upstream.
      
      The variable i contains a total number of resources (including
      IORESOURCE_IRQ). However, we want the dmem_region_start to point
      after the last resource of type IORESOURCE_MEM. The original behaviour
      leads (very likely) to skipping several UIO mapping regions and makes
      them useless. Fix this by computing dmem_region_start from the uiomem
      which points to the last used UIO mapping.
      
      Fixes: 0a0c3b5a ("Add new uio device for dynamic memory allocation")
      Signed-off-by: default avatarJan Viktorin <viktorin@rehivetech.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      a6e9fafc
    • Liu Gang's avatar
      gpio: mpc8xxx: Correct irq handler function · 4499fcbd
      Liu Gang authored
      commit d71cf15b upstream.
      
      From the beginning of the gpio-mpc8xxx.c, the "handle_level_irq"
      has being used to handle GPIO interrupts in the PowerPC/Layerscape
      platforms. But actually, almost all PowerPC/Layerscape platforms
      assert an interrupt request upon either a high-to-low change or
      any change on the state of the signal.
      
      So the "handle_level_irq" is not reasonable for PowerPC/Layerscape
      GPIO interrupt, it should be "handle_edge_irq". Otherwise the system
      may lost some interrupts from the PIN's state changes.
      Signed-off-by: default avatarLiu Gang <Gang.Liu@nxp.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      4499fcbd
    • Mauro Carvalho Chehab's avatar
      cx231xx: fix GPIOs for Pixelview SBTVD hybrid · dd228932
      Mauro Carvalho Chehab authored
      commit 24b923f0 upstream.
      
      This device uses GPIOs: 28 to switch between analog and
      digital modes: on digital mode, it should be set to 1.
      
      The code that sets it on analog mode is OK, but it misses
      the logic that sets it on digital mode.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      dd228932
    • Mauro Carvalho Chehab's avatar
      cx231xx: don't return error on success · 9865f089
      Mauro Carvalho Chehab authored
      commit 1871d718 upstream.
      
      The cx231xx_set_agc_analog_digital_mux_select() callers
      expect it to return 0 or an error. Returning a positive value
      makes the first attempt to switch between analog/digital to fail.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9865f089
    • Mauro Carvalho Chehab's avatar
      mb86a20s: fix demod settings · e3ba79c7
      Mauro Carvalho Chehab authored
      commit 505a0ea7 upstream.
      
      With the current settings, only one channel locks properly.
      That's likely because, when this driver was written, Brazil
      were still using experimental transmissions.
      
      Change it to reproduce the settings used by the newer drivers.
      That makes it lock on other channels.
      
      Tested with both PixelView SBTVD Hybrid (cx231xx-based) and
      C3Tech Digital Duo HDTV/SDTV (em28xx-based) devices.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e3ba79c7
    • Mauro Carvalho Chehab's avatar
      mb86a20s: fix the locking logic · e80cf277
      Mauro Carvalho Chehab authored
      commit dafb65fb upstream.
      
      On this frontend, it takes a while to start output normal
      TS data. That only happens on state S9. On S8, the TS output
      is enabled, but it is not reliable enough.
      
      However, the zigzag loop is too fast to let it sync.
      
      As, on practical tests, the zigzag software loop doesn't
      seem to be helping, but just slowing down the tuning, let's
      switch to hardware algorithm, as the tuners used on such
      devices are capable of work with frequency drifts without
      any help from software.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      e80cf277
    • Andrew Bresticker's avatar
      pstore/ram: Use memcpy_fromio() to save old buffer · 8996ef00
      Andrew Bresticker authored
      commit d771fdf9 upstream.
      
      The ramoops buffer may be mapped as either I/O memory or uncached
      memory.  On ARM64, this results in a device-type (strongly-ordered)
      mapping.  Since unnaligned accesses to device-type memory will
      generate an alignment fault (regardless of whether or not strict
      alignment checking is enabled), it is not safe to use memcpy().
      memcpy_fromio() is guaranteed to only use aligned accesses, so use
      that instead.
      Signed-off-by: default avatarAndrew Bresticker <abrestic@chromium.org>
      Signed-off-by: default avatarEnric Balletbo Serra <enric.balletbo@collabora.com>
      Reviewed-by: default avatarPuneet Kumar <puneetster@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      8996ef00
    • Furquan Shaikh's avatar
      pstore/ram: Use memcpy_toio instead of memcpy · 0d29b98a
      Furquan Shaikh authored
      commit 7e75678d upstream.
      
      persistent_ram_update uses vmap / iomap based on whether the buffer is in
      memory region or reserved region. However, both map it as non-cacheable
      memory. For armv8 specifically, non-cacheable mapping requests use a
      memory type that has to be accessed aligned to the request size. memcpy()
      doesn't guarantee that.
      Signed-off-by: default avatarFurquan Shaikh <furquan@google.com>
      Signed-off-by: default avatarEnric Balletbo Serra <enric.balletbo@collabora.com>
      Reviewed-by: default avatarAaron Durbin <adurbin@chromium.org>
      Reviewed-by: default avatarOlof Johansson <olofj@chromium.org>
      Tested-by: default avatarFurquan Shaikh <furquan@chromium.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0d29b98a
    • Sebastian Andrzej Siewior's avatar
      pstore/core: drop cmpxchg based updates · 90efe9d4
      Sebastian Andrzej Siewior authored
      commit d5a9bf0b upstream.
      
      I have here a FPGA behind PCIe which exports SRAM which I use for
      pstore. Now it seems that the FPGA no longer supports cmpxchg based
      updates and writes back 0xff…ff and returns the same.  This leads to
      crash during crash rendering pstore useless.
      Since I doubt that there is much benefit from using cmpxchg() here, I am
      dropping this atomic access and use the spinlock based version.
      
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Rabin Vincent <rabinv@axis.com>
      Tested-by: default avatarRabin Vincent <rabinv@axis.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      [kees: remove "_locked" suffix since it's the only option now]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      90efe9d4
    • Daniel Glöckner's avatar
      mmc: block: don't use CMD23 with very old MMC cards · 98af342b
      Daniel Glöckner authored
      commit 0ed50abb upstream.
      
      CMD23 aka SET_BLOCK_COUNT was introduced with MMC v3.1.
      Older versions of the specification allowed to terminate
      multi-block transfers only with CMD12.
      
      The patch fixes the following problem:
      
        mmc0: new MMC card at address 0001
        mmcblk0: mmc0:0001 SDMB-16 15.3 MiB
        mmcblk0: timed out sending SET_BLOCK_COUNT command, card status 0x400900
        ...
        blk_update_request: I/O error, dev mmcblk0, sector 0
        Buffer I/O error on dev mmcblk0, logical block 0, async page read
         mmcblk0: unable to read partition table
      Signed-off-by: default avatarDaniel Glöckner <dg@emlix.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      98af342b
    • Fabio Estevam's avatar
      mmc: mxs: Initialize the spinlock prior to using it · 0d9dddc0
      Fabio Estevam authored
      commit f91346e8 upstream.
      
      An interrupt may occur right after devm_request_irq() is called and
      prior to the spinlock initialization, leading to a kernel oops,
      as the interrupt handler uses the spinlock.
      
      In order to prevent this problem, move the spinlock initialization
      prior to requesting the interrupts.
      
      Fixes: e4243f13 (mmc: mxs-mmc: add mmc host driver for i.MX23/28)
      Signed-off-by: default avatarFabio Estevam <fabio.estevam@nxp.com>
      Reviewed-by: default avatarMarek Vasut <marex@denx.de>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      0d9dddc0
    • Johan Hovold's avatar
      PM / sleep: fix device reference leak in test_suspend · 6d97f02a
      Johan Hovold authored
      commit ceb75787 upstream.
      
      Make sure to drop the reference taken by class_find_device() after
      opening the RTC device.
      
      Fixes: 77437fd4 (pm: boot time suspend selftest)
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      6d97f02a
    • Johan Hovold's avatar
      mfd: core: Fix device reference leak in mfd_clone_cell · 26b5a4a5
      Johan Hovold authored
      commit 722f1910 upstream.
      
      Make sure to drop the reference taken by bus_find_device_by_name()
      before returning from mfd_clone_cell().
      
      Fixes: a9bbba99 ("mfd: add platform_device sharing support for mfd")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      26b5a4a5
    • Jaewon Kim's avatar
      ratelimit: fix bug in time interval by resetting right begin time · 479ce88b
      Jaewon Kim authored
      commit c2594bc3 upstream.
      
      rs->begin in ratelimit is set in two cases.
       1) when rs->begin was not initialized
       2) when rs->interval was passed
      
      For case #2, current ratelimit sets the begin to 0.  This incurrs
      improper suppression.  The begin value will be set in the next ratelimit
      call by 1).  Then the time interval check will be always false, and
      rs->printed will not be initialized.  Although enough time passed,
      ratelimit may return 0 if rs->printed is not less than rs->burst.  To
      reset interval properly, begin should be jiffies rather than 0.
      
      For an example code below:
      
          static DEFINE_RATELIMIT_STATE(mylimit, 1, 1);
          for (i = 1; i <= 10; i++) {
              if (__ratelimit(&mylimit))
                  printk("ratelimit test count %d\n", i);
              msleep(3000);
          }
      
      test result in the current code shows suppression even there is 3 seconds sleep.
      
        [  78.391148] ratelimit test count 1
        [  81.295988] ratelimit test count 2
        [  87.315981] ratelimit test count 4
        [  93.336267] ratelimit test count 6
        [  99.356031] ratelimit test count 8
        [ 105.376367] ratelimit test count 10
      Signed-off-by: default avatarJaewon Kim <jaewon31.kim@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      479ce88b
    • Ding Tianhong's avatar
      rcu: Fix soft lockup for rcu_nocb_kthread · fe5080ab
      Ding Tianhong authored
      commit bedc1969 upstream.
      
      Carrying out the following steps results in a softlockup in the
      RCU callback-offload (rcuo) kthreads:
      
      1. Connect to ixgbevf, and set the speed to 10Gb/s.
      2. Use ifconfig to bring the nic up and down repeatedly.
      
      [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
      [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
      [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
      [  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
      [  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
      [  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
      [  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
      [  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
      [  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
      [  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
      [  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
      [  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
      [  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  368.106005] Stack:
      [  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
      [  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
      [  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
      [  368.106005] Call Trace:
      [  368.106005]  <IRQ>
      [  368.106005]
      [  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
      [  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
      [  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
      [  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
      [  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
      [  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
      [  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
      [  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
      [  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
      [  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
      [  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
      [  368.106005]  <EOI>
      [  368.106005]
      [  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
      [  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
      [  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
      [  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
      [  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
      [  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
      [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
      [  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
      [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
      
      ==================================cut here==============================
      
      It turns out that the rcuos callback-offload kthread is busy processing
      a very large quantity of RCU callbacks, and it is not reliquishing the
      CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
      within the loop to allow other tasks to run.
      
      [js] use onlu cond_resched() in 3.12
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Dhaval Giani <dhaval.giani@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      fe5080ab
    • Dan Carpenter's avatar
      tools/vm/slabinfo: fix an unintentional printf · 9db8b648
      Dan Carpenter authored
      commit 2d6a4d64 upstream.
      
      The curly braces are missing here so we print stuff unintentionally.
      
      Fixes: 9da4714a ('slub: slabinfo update for cmpxchg handling')
      Link: http://lkml.kernel.org/r/20160715211243.GE19522@mwandaSigned-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Laura Abbott <labbott@fedoraproject.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9db8b648
    • Daniel Mentz's avatar
      lib/genalloc.c: start search from start of chunk · f98c2c40
      Daniel Mentz authored
      commit 62e931fa upstream.
      
      gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
      a contiguous block of memory that satisfies the allocation request.
      
      The shortcut
      
      	if (size > atomic_read(&chunk->avail))
      		continue;
      
      makes the loop skip over chunks that do not have enough bytes left to
      fulfill the request.  There are two situations, though, where an
      allocation might still fail:
      
      (1) The available memory is not contiguous, i.e.  the request cannot
          be fulfilled due to external fragmentation.
      
      (2) A race condition.  Another thread runs the same code concurrently
          and is quicker to grab the available memory.
      
      In those situations, the loop calls pool->algo() to search the entire
      chunk, and pool->algo() returns some value that is >= end_bit to
      indicate that the search failed.  This return value is then assigned to
      start_bit.  The variables start_bit and end_bit describe the range that
      should be searched, and this range should be reset for every chunk that
      is searched.  Today, the code fails to reset start_bit to 0.  As a
      result, prefixes of subsequent chunks are ignored.  Memory allocations
      might fail even though there is plenty of room left in these prefixes of
      those other chunks.
      
      Fixes: 7f184275 ("lib, Make gen_pool memory allocator lockless")
      Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.comSigned-off-by: default avatarDaniel Mentz <danielmentz@google.com>
      Reviewed-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      f98c2c40
    • Richard Weinberger's avatar
      drbd: Fix kernel_sendmsg() usage - potential NULL deref · 7a59aff6
      Richard Weinberger authored
      commit d8e9e5e8 upstream.
      
      Don't pass a size larger than iov_len to kernel_sendmsg().
      Otherwise it will cause a NULL pointer deref when kernel_sendmsg()
      returns with rv < size.
      
      DRBD as external module has been around in the kernel 2.4 days already.
      We used to be compatible to 2.4 and very early 2.6 kernels,
      we used to use
       rv = sock_sendmsg(sock, &msg, iov.iov_len);
      then later changed to
       rv = kernel_sendmsg(sock, &msg, &iov, 1, size);
      when we should have used
       rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len);
      
      tcp_sendmsg() used to totally ignore the size parameter.
       57be5bda ip: convert tcp_sendmsg() to iov_iter primitives
      changes that, and exposes our long standing error.
      
      Even with this error exposed, to trigger the bug, we would need to have
      an environment (config or otherwise) causing us to not use sendpage()
      for larger transfers, a failing connection, and have it fail "just at the
      right time".  Apparently that was unlikely enough for most, so this went
      unnoticed for years.
      
      Still, it is known to trigger at least some of these,
      and suspected for the others:
      [0] http://lists.linbit.com/pipermail/drbd-user/2016-July/023112.html
      [1] http://lists.linbit.com/pipermail/drbd-dev/2016-March/003362.html
      [2] https://forums.grsecurity.net/viewtopic.php?f=3&t=4546
      [3] https://ubuntuforums.org/showthread.php?t=2336150
      [4] http://e2.howsolveproblem.com/i/1175162/
      
      This should go into 4.9,
      and into all stable branches since and including v4.0,
      which is the first to contain the exposing change.
      
      It is correct for all stable branches older than that as well
      (which contain the DRBD driver; which is 2.6.33 and up).
      
      It requires a small "conflict" resolution for v4.4 and earlier, with v4.5
      we dropped the comment block immediately preceding the kernel_sendmsg().
      
      Fixes: b411b363 ("The DRBD driver")
      Cc: viro@zeniv.linux.org.uk
      Cc: christoph.lechleitner@iteg.at
      Cc: wolfgang.glas@iteg.at
      Reported-by: default avatarChristoph Lechleitner <christoph.lechleitner@iteg.at>
      Tested-by: default avatarChristoph Lechleitner <christoph.lechleitner@iteg.at>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      [changed oneliner to be "obvious" without context; more verbose message]
      Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      7a59aff6
    • Glauber Costa's avatar
      cfq: fix starvation of asynchronous writes · 5dd23e67
      Glauber Costa authored
      commit 3932a86b upstream.
      
      While debugging timeouts happening in my application workload (ScyllaDB), I have
      observed calls to open() taking a long time, ranging everywhere from 2 seconds -
      the first ones that are enough to time out my application - to more than 30
      seconds.
      
      The problem seems to happen because XFS may block on pending metadata updates
      under certain circumnstances, and that's confirmed with the following backtrace
      taken by the offcputime tool (iovisor/bcc):
      
          ffffffffb90c57b1 finish_task_switch
          ffffffffb97dffb5 schedule
          ffffffffb97e310c schedule_timeout
          ffffffffb97e1f12 __down
          ffffffffb90ea821 down
          ffffffffc046a9dc xfs_buf_lock
          ffffffffc046abfb _xfs_buf_find
          ffffffffc046ae4a xfs_buf_get_map
          ffffffffc046babd xfs_buf_read_map
          ffffffffc0499931 xfs_trans_read_buf_map
          ffffffffc044a561 xfs_da_read_buf
          ffffffffc0451390 xfs_dir3_leaf_read.constprop.16
          ffffffffc0452b90 xfs_dir2_leaf_lookup_int
          ffffffffc0452e0f xfs_dir2_leaf_lookup
          ffffffffc044d9d3 xfs_dir_lookup
          ffffffffc047d1d9 xfs_lookup
          ffffffffc0479e53 xfs_vn_lookup
          ffffffffb925347a path_openat
          ffffffffb9254a71 do_filp_open
          ffffffffb9242a94 do_sys_open
          ffffffffb9242b9e sys_open
          ffffffffb97e42b2 entry_SYSCALL_64_fastpath
          00007fb0698162ed [unknown]
      
      Inspecting my run with blktrace, I can see that the xfsaild kthread exhibit very
      high "Dispatch wait" times, on the dozens of seconds range and consistent with
      the open() times I have saw in that run.
      
      Still from the blktrace output, we can after searching a bit, identify the
      request that wasn't dispatched:
      
        8,0   11      152    81.092472813   804  A  WM 141698288 + 8 <- (8,1) 141696240
        8,0   11      153    81.092472889   804  Q  WM 141698288 + 8 [xfsaild/sda1]
        8,0   11      154    81.092473207   804  G  WM 141698288 + 8 [xfsaild/sda1]
        8,0   11      206    81.092496118   804  I  WM 141698288 + 8 (   22911) [xfsaild/sda1]
        <==== 'I' means Inserted (into the IO scheduler) ===================================>
        8,0    0   289372    96.718761435     0  D  WM 141698288 + 8 (15626265317) [swapper/0]
        <==== Only 15s later the CFQ scheduler dispatches the request ======================>
      
      As we can see above, in this particular example CFQ took 15 seconds to dispatch
      this request. Going back to the full trace, we can see that the xfsaild queue
      had plenty of opportunity to run, and it was selected as the active queue many
      times. It would just always be preempted by something else (example):
      
        8,0    1        0    81.117912979     0  m   N cfq1618SN / insert_request
        8,0    1        0    81.117913419     0  m   N cfq1618SN / add_to_rr
        8,0    1        0    81.117914044     0  m   N cfq1618SN / preempt
        8,0    1        0    81.117914398     0  m   N cfq767A  / slice expired t=1
        8,0    1        0    81.117914755     0  m   N cfq767A  / resid=40
        8,0    1        0    81.117915340     0  m   N / served: vt=1948520448 min_vt=1948520448
        8,0    1        0    81.117915858     0  m   N cfq767A  / sl_used=1 disp=0 charge=0 iops=1 sect=0
      
      where cfq767 is the xfsaild queue and cfq1618 corresponds to one of the ScyllaDB
      IO dispatchers.
      
      The requests preempting the xfsaild queue are synchronous requests. That's a
      characteristic of ScyllaDB workloads, as we only ever issue O_DIRECT requests.
      While it can be argued that preempting ASYNC requests in favor of SYNC is part
      of the CFQ logic, I don't believe that doing so for 15+ seconds is anyone's
      goal.
      
      Moreover, unless I am misunderstanding something, that breaks the expectation
      set by the "fifo_expire_async" tunable, which in my system is set to the
      default.
      
      Looking at the code, it seems to me that the issue is that after we make
      an async queue active, there is no guarantee that it will execute any request.
      
      When the queue itself tests if it cfq_may_dispatch() it can bail if it sees SYNC
      requests in flight. An incoming request from another queue can also preempt it
      in such situation before we have the chance to execute anything (as seen in the
      trace above).
      
      This patch sets the must_dispatch flag if we notice that we have requests
      that are already fifo_expired. This flag is always cleared after
      cfq_dispatch_request() returns from cfq_dispatch_requests(), so it won't pin
      the queue for subsequent requests (unless they are themselves expired)
      
      Care is taken during preempt to still allow rt requests to preempt us
      regardless.
      
      Testing my workload with this patch applied produces much better results.
      From the application side I see no timeouts, and the open() latency histogram
      generated by systemtap looks much better, with the worst outlier at 131ms:
      
      Latency histogram of xfs_buf_lock acquisition (microseconds):
       value |-------------------------------------------------- count
           0 |                                                     11
           1 |@@@@                                                161
           2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  1966
           4 |@                                                    54
           8 |                                                     36
          16 |                                                      7
          32 |                                                      0
          64 |                                                      0
             ~
        1024 |                                                      0
        2048 |                                                      0
        4096 |                                                      1
        8192 |                                                      1
       16384 |                                                      2
       32768 |                                                      0
       65536 |                                                      0
      131072 |                                                      1
      262144 |                                                      0
      524288 |                                                      0
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: linux-block@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarGlauber Costa <glauber@scylladb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      5dd23e67
    • Willy Tarreau's avatar
      Revert "ipc/sem.c: optimize sem_lock()" · 3d971ee1
      Willy Tarreau authored
      This reverts commit 901f6fed
      (upstream commit 6d07b68c).
      
      As suggested in commit 5864a2fd
      (ipc/sem.c: fix complex_count vs. simple op race) since it introduces
      a regression and the candidate fix requires too many changes for 3.10.
      
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      3d971ee1
    • Michal Hocko's avatar
      kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd · 9c559b0f
      Michal Hocko authored
      commit 735f2770 upstream.
      
      Commit fec1d011 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal
      exit") has caused a subtle regression in nscd which uses
      CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the
      shared databases, so that the clients are notified when nscd is
      restarted.  Now, when nscd uses a non-persistent database, clients that
      have it mapped keep thinking the database is being updated by nscd, when
      in fact nscd has created a new (anonymous) one (for non-persistent
      databases it uses an unlinked file as backend).
      
      The original proposal for the CLONE_CHILD_CLEARTID change claimed
      (https://lkml.org/lkml/2006/10/25/233):
      
      : The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
      : on behalf of pthread_create() library calls.  This feature is used to
      : request that the kernel clear the thread-id in user space (at an address
      : provided in the syscall) when the thread disassociates itself from the
      : address space, which is done in mm_release().
      :
      : Unfortunately, when a multi-threaded process incurs a core dump (such as
      : from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
      : the other threads, which then proceed to clear their user-space tids
      : before synchronizing in exit_mm() with the start of core dumping.  This
      : misrepresents the state of process's address space at the time of the
      : SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
      : problems (misleading him/her to conclude that the threads had gone away
      : before the fault).
      :
      : The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
      : core dump has been initiated.
      
      The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
      seems to have a larger scope than the original patch asked for.  It
      seems that limitting the scope of the check to core dumping should work
      for SIGSEGV issue describe above.
      
      [Changelog partly based on Andreas' description]
      Fixes: fec1d011 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
      Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Tested-by: default avatarWilliam Preston <wpreston@suse.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Andreas Schwab <schwab@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      9c559b0f