1. 13 Apr, 2018 40 commits
    • Pardha Saradhi K's avatar
      ASoC: Intel: Skylake: Disable clock gating during firmware and library download · 42353500
      Pardha Saradhi K authored
      
      [ Upstream commit d5cc0a1f ]
      
      During firmware and library download, sometimes it is observed that
      firmware and library download is timed-out resulting into probe failure.
      
      This patch disables dynamic clock gating while firmware and library
      download.
      Signed-off-by: default avatarPardha Saradhi K <pardha.saradhi.kesapragada@intel.com>
      Signed-off-by: default avatarSanyog Kale <sanyog.r.kale@intel.com>
      Signed-off-by: default avatarGuneshwor Singh <guneshwor.o.singh@intel.com>
      Acked-By: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      42353500
    • Mauro Carvalho Chehab's avatar
      media: videobuf2-core: don't go out of the buffer range · 1be3eb15
      Mauro Carvalho Chehab authored
      
      [ Upstream commit df93dc61 ]
      
      Currently, there's no check if an invalid buffer range
      is passed. However, while testing DVB memory mapped apps,
      I got this:
      
         videobuf2_core: VB: num_buffers -2143943680, buffer 33, index -2143943647
         unable to handle kernel paging request at ffff888b773c0890
         IP: __vb2_queue_alloc+0x134/0x4e0 [videobuf2_core]
         PGD 4142c7067 P4D 4142c7067 PUD 0
         Oops: 0002 [#1] SMP
         Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables bluetooth rfkill ecdh_generic binfmt_misc rc_dvbsky sp2 ts2020 intel_rapl x86_pkg_temp_thermal dvb_usb_dvbsky intel_powerclamp dvb_usb_v2 coretemp m88ds3103 kvm_intel i2c_mux dvb_core snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul videobuf2_vmalloc videobuf2_memops snd_hda_intel ghash_clmulni_intel videobuf2_core snd_hda_codec rc_core mei_me intel_cstate snd_hwdep snd_hda_core videodev intel_uncore snd_pcm mei media tpm_tis tpm_tis_core intel_rapl_perf tpm snd_timer lpc_ich snd soundcore kvm irqbypass libcrc32c i915 i2c_algo_bit drm_kms_helper
         e1000e ptp drm crc32c_intel video pps_core
         CPU: 3 PID: 1776 Comm: dvbv5-zap Not tainted 4.14.0+ #78
         Hardware name:                  /NUC5i7RYB, BIOS RYBDWi35.86A.0364.2017.0511.0949 05/11/2017
         task: ffff88877c73bc80 task.stack: ffffb7c402418000
         RIP: 0010:__vb2_queue_alloc+0x134/0x4e0 [videobuf2_core]
         RSP: 0018:ffffb7c40241bc60 EFLAGS: 00010246
         RAX: 0000000080360421 RBX: 0000000000000021 RCX: 000000000000000a
         RDX: ffffb7c40241bcf4 RSI: ffff888780362c60 RDI: ffff888796d8e130
         RBP: ffffb7c40241bcc8 R08: 0000000000000316 R09: 0000000000000004
         R10: ffff888780362c00 R11: 0000000000000001 R12: 000000000002f000
         R13: ffff8887758be700 R14: 0000000000021000 R15: 0000000000000001
         FS:  00007f2849024740(0000) GS:ffff888796d80000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         CR2: ffff888b773c0890 CR3: 000000043beb2005 CR4: 00000000003606e0
         Call Trace:
          vb2_core_reqbufs+0x226/0x420 [videobuf2_core]
          dvb_vb2_reqbufs+0x2d/0xc0 [dvb_core]
          dvb_dvr_do_ioctl+0x98/0x1d0 [dvb_core]
          dvb_usercopy+0x53/0x1b0 [dvb_core]
          ? dvb_demux_ioctl+0x20/0x20 [dvb_core]
          ? tty_ldisc_deref+0x16/0x20
          ? tty_write+0x1f9/0x310
          ? process_echoes+0x70/0x70
          dvb_dvr_ioctl+0x15/0x20 [dvb_core]
          do_vfs_ioctl+0xa5/0x600
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x1a/0xa5
         RIP: 0033:0x7f28486f7ea7
         RSP: 002b:00007ffc13b2db18 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
         RAX: ffffffffffffffda RBX: 000055b10fc06130 RCX: 00007f28486f7ea7
         RDX: 00007ffc13b2db48 RSI: 00000000c0086f3c RDI: 0000000000000007
         RBP: 0000000000000203 R08: 000055b10df1e02c R09: 000000000000002e
         R10: 0036b42415108357 R11: 0000000000000246 R12: 0000000000000000
         R13: 00007f2849062f60 R14: 00000000000001f1 R15: 00007ffc13b2da54
         Code: 74 0a 60 8b 0a 48 83 c0 30 48 83 c2 04 89 48 d0 89 48 d4 48 39 f0 75 eb 41 8b 42 08 83 7d d4 01 41 c7 82 ec 01 00 00 ff ff ff ff <4d> 89 94 c5 88 00 00 00 74 14 83 c3 01 41 39 dc 0f 85 f1 fe ff
         RIP: __vb2_queue_alloc+0x134/0x4e0 [videobuf2_core] RSP: ffffb7c40241bc60
         CR2: ffff888b773c0890
      
      So, add a sanity check in order to prevent going past array.
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Acked-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1be3eb15
    • Maciej Purski's avatar
      hwmon: (ina2xx) Make calibration register value fixed · 8d008c0c
      Maciej Purski authored
      
      [ Upstream commit 5d389b12 ]
      
      Calibration register is used for calculating current register in
      hardware according to datasheet:
      current = shunt_volt * calib_register / 2048 (ina 226)
      current = shunt_volt * calib_register / 4096 (ina 219)
      
      Fix calib_register value to 2048 for ina226 and 4096 for ina 219 in
      order to avoid truncation error and provide best precision allowed
      by shunt_voltage measurement. Make current scale value follow changes
      of shunt_resistor from sysfs as calib_register value is now fixed.
      
      Power_lsb value should also follow shunt_resistor changes as stated in
      datasheet:
      power_lsb = 25 * current_lsb (ina 226)
      power_lsb = 20 * current_lsb (ina 219)
      Signed-off-by: default avatarMaciej Purski <m.purski@samsung.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d008c0c
    • Gustavo A. R. Silva's avatar
      PM / devfreq: Fix potential NULL pointer dereference in governor_store · 715f7e3e
      Gustavo A. R. Silva authored
      
      [ Upstream commit 63f1e05f ]
      
      df->governor is being dereferenced before it is null checked,
      hence there is a potential null pointer dereference.
      
      Notice that df->governor is being null checked at line 1004:
      if (df->governor) {, which implies it might be null.
      
      Fix this by null checking df->governor before dereferencing it.
      
      Addresses-Coverity-ID: 1401988 ("Dereference before null check")
      Fixes: bcf23c79 ("PM / devfreq: Fix available_governor sysfs")
      Signed-off-by: default avatarGustavo A. R. Silva <garsilva@embeddedor.com>
      Reviewed-by: default avatarChanwoo Choi <cw00.choi@samsung.com>
      Signed-off-by: default avatarMyungJoo Ham <myungjoo.ham@samsung.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      715f7e3e
    • NeilBrown's avatar
      VFS: close race between getcwd() and d_move() · 3aa66ba5
      NeilBrown authored
      
      [ Upstream commit 61647823 ]
      
      d_move() will call __d_drop() and then __d_rehash()
      on the dentry being moved.  This creates a small window
      when the dentry appears to be unhashed.  Many tests
      of d_unhashed() are made under ->d_lock and so are safe
      from racing with this window, but some aren't.
      In particular, getcwd() calls d_unlinked() (which calls
      d_unhashed()) without d_lock protection, so it can race.
      
      This races has been seen in practice with lustre, which uses d_move() as
      part of name lookup.  See:
         https://jira.hpdd.intel.com/browse/LU-9735
      It could race with a regular rename(), and result in ENOENT instead
      of either the 'before' or 'after' name.
      
      The race can be demonstrated with a simple program which
      has two threads, one renaming a directory back and forth
      while another calls getcwd() within that directory: it should never
      fail, but does.  See:
        https://patchwork.kernel.org/patch/9455345/
      
      We could fix this race by taking d_lock and rechecking when
      d_unhashed() reports true.  Alternately when can remove the window,
      which is the approach this patch takes.
      
      ___d_drop() is introduce which does *not* clear d_hash.pprev
      so the dentry still appears to be hashed.  __d_drop() calls
      ___d_drop(), then clears d_hash.pprev.
      __d_move() now uses ___d_drop() and only clears d_hash.pprev
      when not rehashing.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3aa66ba5
    • Moni Shoua's avatar
      net/mlx4_en: Change default QoS settings · 9e156f31
      Moni Shoua authored
      
      [ Upstream commit a42b63c1 ]
      
      Change the default mapping between TC and TCG as follows:
      
      Prio     |             TC/TCG
               |      from             to
               |    (set by FW)      (set by SW)
      ---------+-----------------------------------
      0        |      0/0              0/7
      1        |      1/0              0/6
      2        |      2/0              0/5
      3        |      3/0              0/4
      4        |      4/0              0/3
      5        |      5/0              0/2
      6        |      6/0              0/1
      7        |      7/0              0/0
      
      These new settings cause that a pause frame for any prio stops
      traffic for all prios.
      
      Fixes: 564c274c ("net/mlx4_en: DCB QoS support")
      Signed-off-by: default avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e156f31
    • Hans de Goede's avatar
      ACPI / video: Default lcd_only to true on Win8-ready and newer machines · 2fb63a4c
      Hans de Goede authored
      
      [ Upstream commit 5928c281 ]
      
      We're seeing a lot of bogus backlight interfaces on newer machines without
      a LCD such as desktops, servers and HDMI sticks. This causes userspace to
      show a non-functional brightness slider in e.g. the GNOME3 system menu,
      which is undesirable. And, in general, we should simply just not register
      a non functional backlight interface.
      
      Checking the LCD flag causes the bogus acpi_video backlight interfaces to
      go away (on the machines this was tested on).
      
      This change sets the lcd_only option by default on any machines which
      are Win8-ready, to fix this.
      
      This is not entirely without a risk of regressions, but video_detect.c
      already prefers native-backlight interfaces over the acpi_video one
      on Win8-ready machines, calling acpi_video_unregister_backlight() as soon
      as a native interface shows up. This is done because the ACPI backlight
      interface often is broken on Win8-ready machines, because win8 does not
      seem to actually use it.
      
      So in practice we already end up not registering the ACPI backlight
      interface on (most) Win8-ready machines with a LCD panel, thus this
      change does not change anything for (most) machines with a LCD panel
      and on machines without a LCD panel we actually don't want to register
      any backlight interfaces.
      
      This has been tested on the following machines and fixes a bogus backlight
      interface showing up there:
       - Desktop with an Asrock B150M Pro4S/D3 m.b. using i5-6500 builtin gfx
       - Intel Compute Stick STK1AW32SC
       - Meegopad T08 HDMI stick
      
      Bogus backlight interfaces have also been reported on:
       - Desktop with Asus H87I-Plus m.b.
       - Desktop with ASRock B75M-ITX m.b.
       - Desktop with Gigabyte Z87-D3HP m.b.
       - Dell PowerEdge T20 desktop
      
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1097436
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1133327
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1133329
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1133646Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2fb63a4c
    • Sowmini Varadhan's avatar
      rds; Reset rs->rs_bound_addr in rds_add_bound() failure path · 59bf0365
      Sowmini Varadhan authored
      
      [ Upstream commit 7ae0c649 ]
      
      If the rds_sock is not added to the bind_hash_table, we must
      reset rs_bound_addr so that rds_remove_bound will not trip on
      this rds_sock.
      
      rds_add_bound() does a rds_sock_put() in this failure path, so
      failing to reset rs_bound_addr will result in a socket refcount
      bug, and will trigger a WARN_ON with the stack shown below when
      the application subsequently tries to close the PF_RDS socket.
      
           WARNING: CPU: 20 PID: 19499 at net/rds/af_rds.c:496 \
      		rds_sock_destruct+0x15/0x30 [rds]
             :
           __sk_destruct+0x21/0x190
           rds_remove_bound.part.13+0xb6/0x140 [rds]
           rds_release+0x71/0x120 [rds]
           sock_release+0x1a/0x70
           sock_close+0xe/0x20
           __fput+0xd5/0x210
           task_work_run+0x82/0xa0
           do_exit+0x2ce/0xb30
           ? syscall_trace_enter+0x1cc/0x2b0
           do_group_exit+0x39/0xa0
           SyS_exit_group+0x10/0x10
           do_syscall_64+0x61/0x1a0
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59bf0365
    • Hangbin Liu's avatar
      l2tp: fix missing print session offset info · ca1853c0
      Hangbin Liu authored
      
      [ Upstream commit 820da535 ]
      
      Report offset parameter in L2TP_CMD_SESSION_GET command if
      it has been configured by userspace
      
      Fixes: 309795f4 ("l2tp: Add netlink control API for L2TP")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca1853c0
    • Masami Hiramatsu's avatar
      perf probe: Add warning message if there is unexpected event name · 24cce930
      Masami Hiramatsu authored
      
      [ Upstream commit 9f5c6d87 ]
      
      This improve the error message so that user can know event-name error
      before writing new events to kprobe-events interface.
      
      E.g.
         ======
         #./perf probe -x /lib64/libc-2.25.so malloc_get_state*
         Internal error: "malloc_get_state@GLIBC_2" is an invalid event name.
           Error: Failed to add events.
         ======
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: default avatarRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Reviewed-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: bhargavb <bhargavaramudu@gmail.com>
      Cc: linux-rt-users@vger.kernel.org
      Link: http://lkml.kernel.org/r/151275040665.24652.5188568529237584489.stgit@devboxSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24cce930
    • Yi Zeng's avatar
      thermal: power_allocator: fix one race condition issue for thermal_instances list · 8e50e9d6
      Yi Zeng authored
      
      [ Upstream commit a5de11d6 ]
      
      When invoking allow_maximum_power and traverse tz->thermal_instances,
      we should grab thermal_zone_device->lock to avoid race condition. For
      example, during the system reboot, if the mali GPU device implements
      device shutdown callback and unregister GPU devfreq cooling device,
      the deleted list head may be accessed to cause panic, as the following
      log shows:
      
      [   33.551070] c3 25 (kworker/3:0) Unable to handle kernel paging request at virtual address dead000000000070
      [   33.566708] c3 25 (kworker/3:0) pgd = ffffffc0ed290000
      [   33.572071] c3 25 (kworker/3:0) [dead000000000070] *pgd=00000001ed292003, *pud=00000001ed292003, *pmd=0000000000000000
      [   33.581515] c3 25 (kworker/3:0) Internal error: Oops: 96000004 [#1] PREEMPT SMP
      [   33.599761] c3 25 (kworker/3:0) CPU: 3 PID: 25 Comm: kworker/3:0 Not tainted 4.4.35+ #912
      [   33.614137] c3 25 (kworker/3:0) Workqueue: events_freezable thermal_zone_device_check
      [   33.620245] c3 25 (kworker/3:0) task: ffffffc0f32e4200 ti: ffffffc0f32f0000 task.ti: ffffffc0f32f0000
      [   33.629466] c3 25 (kworker/3:0) PC is at power_allocator_throttle+0x7c8/0x8a4
      [   33.636609] c3 25 (kworker/3:0) LR is at power_allocator_throttle+0x808/0x8a4
      [   33.643742] c3 25 (kworker/3:0) pc : [<ffffff8008683dd0>] lr : [<ffffff8008683e10>] pstate: 20000145
      [   33.652874] c3 25 (kworker/3:0) sp : ffffffc0f32f3bb0
      [   34.468519] c3 25 (kworker/3:0) Process kworker/3:0 (pid: 25, stack limit = 0xffffffc0f32f0020)
      [   34.477220] c3 25 (kworker/3:0) Stack: (0xffffffc0f32f3bb0 to 0xffffffc0f32f4000)
      [   34.819822] c3 25 (kworker/3:0) Call trace:
      [   34.824021] c3 25 (kworker/3:0) Exception stack(0xffffffc0f32f39c0 to 0xffffffc0f32f3af0)
      [   34.924993] c3 25 (kworker/3:0) [<ffffff8008683dd0>] power_allocator_throttle+0x7c8/0x8a4
      [   34.933184] c3 25 (kworker/3:0) [<ffffff80086807f4>] handle_thermal_trip.part.25+0x70/0x224
      [   34.941545] c3 25 (kworker/3:0) [<ffffff8008680a68>] thermal_zone_device_update+0xc0/0x20c
      [   34.949818] c3 25 (kworker/3:0) [<ffffff8008680bd4>] thermal_zone_device_check+0x20/0x2c
      [   34.957924] c3 25 (kworker/3:0) [<ffffff80080b93a4>] process_one_work+0x168/0x458
      [   34.965414] c3 25 (kworker/3:0) [<ffffff80080ba068>] worker_thread+0x13c/0x4b4
      [   34.972650] c3 25 (kworker/3:0) [<ffffff80080c0a4c>] kthread+0xe8/0xfc
      [   34.979187] c3 25 (kworker/3:0) [<ffffff8008084e90>] ret_from_fork+0x10/0x40
      [   34.986244] c3 25 (kworker/3:0) Code: f9405e73 eb1302bf d102e273 54ffc460 (b9402a61)
      [   34.994339] c3 25 (kworker/3:0) ---[ end trace 32057901e3b7e1db ]---
      Signed-off-by: default avatarYi Zeng <yizeng@asrmicro.com>
      Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e50e9d6
    • Rasmus Villemoes's avatar
      ARM: dts: ls1021a: add "fsl,ls1021a-esdhc" compatible string to esdhc node · 9487ce39
      Rasmus Villemoes authored
      
      [ Upstream commit d5c7b4d5 ]
      
      Commit a22950c8 (mmc: sdhci-of-esdhc: add quirk
      SDHCI_QUIRK_BROKEN_TIMEOUT_VAL for ls1021a) added logic to the driver to
      enable the broken timeout val quirk for ls1021a, but did not add the
      corresponding compatible string to the device tree, so it didn't really
      have any effect. Fix that.
      Signed-off-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: default avatarShawn Guo <shawnguo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9487ce39
    • Shiraz Saleem's avatar
      i40iw: Correct Q1/XF object count equation · 11c7a4b5
      Shiraz Saleem authored
      
      [ Upstream commit fe99afd1 ]
      
      Lower Inbound RDMA Read Queue (Q1) object count by a factor of 2
      as it is incorrectly doubled. Also, round up Q1 and Transmit FIFO (XF)
      object count to power of 2 to satisfy hardware requirement.
      
      Fixes: 86dbcd0f ("i40iw: add file to handle cqp calls")
      Signed-off-by: default avatarShiraz Saleem <shiraz.saleem@intel.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11c7a4b5
    • Shiraz Saleem's avatar
      i40iw: Fix sequence number for the first partial FPDU · ff2620e3
      Shiraz Saleem authored
      
      [ Upstream commit df8b13a1 ]
      
      Partial FPDU processing is broken as the sequence number
      for the first partial FPDU is wrong due to incorrect
      Q2 buffer offset. The offset should be 64 rather than 16.
      
      Fixes: 786c6adb ("i40iw: add puda code")
      Signed-off-by: default avatarShiraz Saleem <shiraz.saleem@intel.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff2620e3
    • Jordan Crouse's avatar
      drm/msm: Take the mutex before calling msm_gem_new_impl · a435a0ce
      Jordan Crouse authored
      
      [ Upstream commit 90dd57de ]
      
      Amongst its other duties, msm_gem_new_impl adds the newly created
      GEM object to the shared inactive list which may also be actively
      modifiying the list during submission.  All the paths to modify
      the list are protected by the mutex except for the one through
      msm_gem_import which can end up causing list corruption.
      Signed-off-by: default avatarJordan Crouse <jcrouse@codeaurora.org>
      [add extra WARN_ON(!mutex_is_locked(&dev->struct_mutex))]
      Signed-off-by: default avatarRob Clark <robdclark@gmail.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a435a0ce
    • linzhang's avatar
      net: llc: add lock_sock in llc_ui_bind to avoid a race condition · 04a87dfa
      linzhang authored
      
      [ Upstream commit 0908cf4d ]
      
      There is a race condition in llc_ui_bind if two or more processes/threads
      try to bind a same socket.
      
      If more processes/threads bind a same socket success that will lead to
      two problems, one is this action is not what we expected, another is
      will lead to kernel in unstable status or oops(in my simple test case,
      cause llc2.ko can't unload).
      
      The current code is test SOCK_ZAPPED bit to avoid a process to
      bind a same socket twice but that is can't avoid more processes/threads
      try to bind a same socket at the same time.
      
      So, add lock_sock in llc_ui_bind like others, such as llc_ui_connect.
      Signed-off-by: default avatarLin Zhang <xiaolou4617@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04a87dfa
    • Jan H. Schönherr's avatar
      KVM: nVMX: Fix handling of lmsw instruction · a74bec40
      Jan H. Schönherr authored
      
      [ Upstream commit e1d39b17 ]
      
      The decision whether or not to exit from L2 to L1 on an lmsw instruction is
      based on bogus values: instead of using the information encoded within the
      exit qualification, it uses the data also used for the mov-to-cr
      instruction, which boils down to using whatever is in %eax at that point.
      
      Use the correct values instead.
      
      Without this fix, an L1 may not get notified when a 32-bit Linux L2
      switches its secondary CPUs to protected mode; the L1 is only notified on
      the next modification of CR0. This short time window poses a problem, when
      there is some other reason to exit to L1 in between. Then, L2 will be
      resumed in real mode and chaos ensues.
      Signed-off-by: default avatarJan H. Schönherr <jschoenh@amazon.de>
      Reviewed-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a74bec40
    • Wanpeng Li's avatar
      KVM: X86: Fix preempt the preemption timer cancel · 5ea0519a
      Wanpeng Li authored
      
      [ Upstream commit 5acc1ca4 ]
      
      Preemption can occur during cancel preemption timer, and there will be
      inconsistent status in lapic, vmx and vmcs field.
      
                CPU0                    CPU1
      
        preemption timer vmexit
        handle_preemption_timer(vCPU0)
          kvm_lapic_expired_hv_timer
            vmx_cancel_hv_timer
              vmx->hv_deadline_tsc = -1
              vmcs_clear_bits
              /* hv_timer_in_use still true */
        sched_out
                                 sched_in
                                 kvm_arch_vcpu_load
                                   vmx_set_hv_timer
                                     write vmx->hv_deadline_tsc
                                     vmcs_set_bits
                                 /* back in kvm_lapic_expired_hv_timer */
                                 hv_timer_in_use = false
                                 ...
                                 vmx_vcpu_run
                                   vmx_arm_hv_run
                                     write preemption timer deadline
                                   spurious preemption timer vmexit
                                     handle_preemption_timer(vCPU0)
                                       kvm_lapic_expired_hv_timer
                                         WARN_ON(!apic->lapic_timer.hv_timer_in_use);
      
      This can be reproduced sporadically during boot of L2 on a
      preemptible L1, causing a splat on L1.
      
       WARNING: CPU: 3 PID: 1952 at arch/x86/kvm/lapic.c:1529 kvm_lapic_expired_hv_timer+0xb5/0xd0 [kvm]
       CPU: 3 PID: 1952 Comm: qemu-system-x86 Not tainted 4.12.0-rc1+ #24 RIP: 0010:kvm_lapic_expired_hv_timer+0xb5/0xd0 [kvm]
        Call Trace:
        handle_preemption_timer+0xe/0x20 [kvm_intel]
        vmx_handle_exit+0xc9/0x15f0 [kvm_intel]
        ? lock_acquire+0xdb/0x250
        ? lock_acquire+0xdb/0x250
        ? kvm_arch_vcpu_ioctl_run+0xdf3/0x1ce0 [kvm]
        kvm_arch_vcpu_ioctl_run+0xe55/0x1ce0 [kvm]
        kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? kvm_vcpu_ioctl+0x384/0x7b0 [kvm]
        ? __fget+0xf3/0x210
        do_vfs_ioctl+0xa4/0x700
        ? __fget+0x114/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x8f/0x750
        ? trace_hardirqs_on_thunk+0x1a/0x1c
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This patch fixes it by disabling preemption while cancelling
      preemption timer.  This way cancel_hv_timer is atomic with
      respect to kvm_arch_vcpu_load.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ea0519a
    • Christoph Hellwig's avatar
      PCI/msi: fix the pci_alloc_irq_vectors_affinity stub · 63913462
      Christoph Hellwig authored
      
      [ Upstream commit 83b4605b ]
      
      We need to return an error for any call that asks for MSI / MSI-X
      vectors only, so that non-trivial fallback logic can work properly.
      
      Also valid dev->irq and use the "correct" errno value based on feedback
      from Linus.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Fixes: aff17164 ("PCI: Provide sensible IRQ vector alloc/free routines")
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63913462
    • Thomas Gleixner's avatar
      cpuhotplug: Link lock stacks for hotplug callbacks · c198e227
      Thomas Gleixner authored
      
      [ Upstream commit 49dfe2a6 ]
      
      The CPU hotplug callbacks are not covered by lockdep versus the cpu hotplug
      rwsem.
      
      CPU0						CPU1
      cpuhp_setup_state(STATE, startup, teardown);
       cpus_read_lock();
        invoke_callback_on_ap();
          kick_hotplug_thread(ap);
          wait_for_completion();			hotplug_thread_fn()
          						  lock(m);
      						  do_stuff();
      						  unlock(m);
      
      Lockdep does not know about this dependency and will not trigger on the
      following code sequence:
      
      	  lock(m);
      	  cpus_read_lock();
      
      Add a lockdep map and connect the initiators lock chain with the hotplug
      thread lock chain, so potential deadlocks can be detected.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20170524081549.709375845@linutronix.deSigned-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c198e227
    • Nithin Sujir's avatar
      bonding: Don't update slave->link until ready to commit · 1a6906e7
      Nithin Sujir authored
      
      [ Upstream commit 797a9364 ]
      
      In the loadbalance arp monitoring scheme, when a slave link change is
      detected, the slave->link is immediately updated and slave_state_changed
      is set. Later down the function, the rtnl_lock is acquired and the
      changes are committed, updating the bond link state.
      
      However, the acquisition of the rtnl_lock can fail. The next time the
      monitor runs, since slave->link is already updated, it determines that
      link is unchanged. This results in the bond link state permanently out
      of sync with the slave link.
      
      This patch modifies bond_loadbalance_arp_mon() to handle link changes
      identical to bond_ab_arp_{inspect/commit}(). The new link state is
      maintained in slave->new_link until we're ready to commit at which point
      it's copied into slave->link.
      
      NOTE: miimon_{inspect/commit}() has a more complex state machine
      requiring the use of the bond_{propose,commit}_link_state() functions
      which maintains the intermediate state in slave->link_new_state. The arp
      monitors don't require that.
      
      Testing: This bug is very easy to reproduce with the following steps.
      1. In a loop, toggle a slave link of a bond slave interface.
      2. In a separate loop, do ifconfig up/down of an unrelated interface to
      create contention for rtnl_lock.
      Within a few iterations, the bond link goes out of sync with the slave
      link.
      Signed-off-by: default avatarNithin Nayak Sujir <nsujir@tintri.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
      Acked-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a6906e7
    • KT Liao's avatar
      Input: elan_i2c - clear INT before resetting controller · 969eeef3
      KT Liao authored
      
      [ Upstream commit 4b3c7dbb ]
      
      Some old touchpad FWs need to have interrupt cleared before issuing reset
      command after updating firmware. We clear interrupt by attempting to read
      full report from the controller, and discarding any data read.
      Signed-off-by: default avatarKT Liao <kt.liao@emc.com.tw>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      969eeef3
    • Roman Kapl's avatar
      net: move somaxconn init from sysctl code · 419490af
      Roman Kapl authored
      
      [ Upstream commit 7c3f1875 ]
      
      The default value for somaxconn is set in sysctl_core_net_init(), but this
      function is not called when kernel is configured without CONFIG_SYSCTL.
      
      This results in the kernel not being able to accept TCP connections,
      because the backlog has zero size. Usually, the user ends up with:
      "TCP: request_sock_TCP: Possible SYN flooding on port 7. Dropping request.  Check SNMP counters."
      If SYN cookies are not enabled the connection is rejected.
      
      Before ef547f2a (tcp: remove max_qlen_log), the effects were less
      severe, because the backlog was always at least eight slots long.
      Signed-off-by: default avatarRoman Kapl <roman.kapl@sysgo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      419490af
    • Eric Dumazet's avatar
      tcp: better validation of received ack sequences · 71c0b334
      Eric Dumazet authored
      
      [ Upstream commit d0e1a1b5 ]
      
      Paul Fiterau Brostean reported :
      
      <quote>
      Linux TCP stack we analyze exhibits behavior that seems odd to me.
      The scenario is as follows (all packets have empty payloads, no window
      scaling, rcv/snd window size should not be a factor):
      
             TEST HARNESS (CLIENT)                        LINUX SERVER
      
         1.  -                                          LISTEN (server listen,
      then accepts)
      
         2.  - --> <SEQ=100><CTL=SYN>               --> SYN-RECEIVED
      
         3.  - <-- <SEQ=300><ACK=101><CTL=SYN,ACK>  <-- SYN-RECEIVED
      
         4.  - --> <SEQ=101><ACK=301><CTL=ACK>      --> ESTABLISHED
      
         5.  - <-- <SEQ=301><ACK=101><CTL=FIN,ACK>  <-- FIN WAIT-1 (server
      opts to close the data connection calling "close" on the connection
      socket)
      
         6.  - --> <SEQ=101><ACK=99999><CTL=FIN,ACK> --> CLOSING (client sends
      FIN,ACK with not yet sent acknowledgement number)
      
         7.  - <-- <SEQ=302><ACK=102><CTL=ACK>      <-- CLOSING (ACK is 102
      instead of 101, why?)
      
      ... (silence from CLIENT)
      
         8.  - <-- <SEQ=301><ACK=102><CTL=FIN,ACK>  <-- CLOSING
      (retransmission, again ACK is 102)
      
      Now, note that packet 6 while having the expected sequence number,
      acknowledges something that wasn't sent by the server. So I would
      expect
      the packet to maybe prompt an ACK response from the server, and then be
      ignored. Yet it is not ignored and actually leads to an increase of the
      acknowledgement number in the server's retransmission of the FIN,ACK
      packet. The explanation I found is that the FIN  in packet 6 was
      processed, despite the acknowledgement number being unacceptable.
      Further experiments indeed show that the server processes this FIN,
      transitioning to CLOSING, then on receiving an ACK for the FIN it had
      send in packet 5, the server (or better said connection) transitions
      from CLOSING to TIME_WAIT (as signaled by netstat).
      
      </quote>
      
      Indeed, tcp_rcv_state_process() calls tcp_ack() but
      does not exploit the @acceptable status but for TCP_SYN_RECV
      state.
      
      What we want here is to send a challenge ACK, if not in TCP_SYN_RECV
      state. TCP_FIN_WAIT1 state is not the only state we should fix.
      
      Add a FLAG_NO_CHALLENGE_ACK so that tcp_rcv_state_process()
      can choose to send a challenge ACK and discard the packet instead
      of wrongly change socket state.
      
      With help from Neal Cardwell.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarPaul Fiterau Brostean <p.fiterau-brostean@science.ru.nl>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71c0b334
    • Timmy Li's avatar
      ARM64: PCI: Fix struct acpi_pci_root_ops allocation failure path · bca3e27b
      Timmy Li authored
      
      [ Upstream commit 717902cc ]
      
      Commit 093d24a2 ("arm64: PCI: Manage controller-specific data on
      per-controller basis") added code to allocate ACPI PCI root_ops
      dynamically on a per host bridge basis but failed to update the
      corresponding memory allocation failure path in pci_acpi_scan_root()
      leading to a potential memory leakage.
      
      Fix it by adding the required kfree call.
      
      Fixes: 093d24a2 ("arm64: PCI: Manage controller-specific data on per-controller basis")
      Reviewed-by: default avatarTomasz Nowicki <tn@semihalf.com>
      Signed-off-by: default avatarTimmy Li <lixiaoping3@huawei.com>
      [lorenzo.pieralisi@arm.com: refactored code, rewrote commit log]
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      CC: Will Deacon <will.deacon@arm.com>
      CC: Bjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bca3e27b
    • Eryu Guan's avatar
      ext4: fix off-by-one on max nr_pages in ext4_find_unwritten_pgoff() · 78b336af
      Eryu Guan authored
      
      [ Upstream commit 624327f8 ]
      
      ext4_find_unwritten_pgoff() is used to search for offset of hole or
      data in page range [index, end] (both inclusive), and the max number
      of pages to search should be at least one, if end == index.
      Otherwise the only page is missed and no hole or data is found,
      which is not correct.
      
      When block size is smaller than page size, this can be demonstrated
      by preallocating a file with size smaller than page size and writing
      data to the last block. E.g. run this xfs_io command on a 1k block
      size ext4 on x86_64 host.
      
        # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \
        	    -c "seek -d 0" /mnt/ext4/testfile
        wrote 1024/1024 bytes at offset 2048
        1 KiB, 1 ops; 0.0000 sec (42.459 MiB/sec and 43478.2609 ops/sec)
        Whence  Result
        DATA    EOF
      
      Data at offset 2k was missed, and lseek(2) returned ENXIO.
      
      This is unconvered by generic/285 subtest 07 and 08 on ppc64 host,
      where pagesize is 64k. Because a recent change to generic/285
      reduced the preallocated file size to smaller than 64k.
      Signed-off-by: default avatarEryu Guan <eguan@redhat.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78b336af
    • Michael Schmitz's avatar
      fix race in drivers/char/random.c:get_reg() · a1df375c
      Michael Schmitz authored
      
      [ Upstream commit 9dfa7bba ]
      
      get_reg() can be reentered on architectures with prioritized interrupts
      (m68k in this case), causing f->reg_index to be incremented after the
      range check. Out of bounds memory access past the pt_regs struct results.
      This will go mostly undetected unless access is beyond end of memory.
      
      Prevent the race by disabling interrupts in get_reg().
      
      Tested on m68k (Atari Falcon, and ARAnyM emulator).
      
      Kudos to Geert Uytterhoeven for helping to trace this race.
      Signed-off-by: default avatarMichael Schmitz <schmitzmic@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1df375c
    • Maurizio Lombardi's avatar
      scsi: bnx2fc: fix race condition in bnx2fc_get_host_stats() · 5cddd2f3
      Maurizio Lombardi authored
      
      [ Upstream commit c2dd893a ]
      
      If multiple tasks attempt to read the stats, it may happen that the
      start_req_done completion is re-initialized while still being used by
      another task, causing a list corruption.
      
      This patch fixes the bug by adding a mutex to serialize the calls to
      bnx2fc_get_host_stats().
      
      WARNING: at lib/list_debug.c:48 list_del+0x6e/0xa0() (Not tainted)
      Hardware name: PowerEdge R820
      list_del corruption. prev->next should be ffff882035627d90, but was ffff884069541588
      
      Pid: 40267, comm: perl Not tainted 2.6.32-642.3.1.el6.x86_64 #1
      Call Trace:
       [<ffffffff8107c691>] ? warn_slowpath_common+0x91/0xe0
       [<ffffffff8107c796>] ? warn_slowpath_fmt+0x46/0x60
       [<ffffffff812ad16e>] ? list_del+0x6e/0xa0
       [<ffffffff81547eed>] ? wait_for_common+0x14d/0x180
       [<ffffffff8106c4a0>] ? default_wake_function+0x0/0x20
       [<ffffffff81547fd3>] ? wait_for_completion_timeout+0x13/0x20
       [<ffffffffa05410b1>] ? bnx2fc_get_host_stats+0xa1/0x280 [bnx2fc]
       [<ffffffffa04cf630>] ? fc_stat_show+0x90/0xc0 [scsi_transport_fc]
       [<ffffffffa04cf8b6>] ? show_fcstat_tx_frames+0x16/0x20 [scsi_transport_fc]
       [<ffffffff8137c647>] ? dev_attr_show+0x27/0x50
       [<ffffffff8113b9be>] ? __get_free_pages+0xe/0x50
       [<ffffffff812170e1>] ? sysfs_read_file+0x111/0x200
       [<ffffffff8119a305>] ? vfs_read+0xb5/0x1a0
       [<ffffffff8119b0b6>] ? fget_light_pos+0x16/0x50
       [<ffffffff8119a651>] ? sys_read+0x51/0xb0
       [<ffffffff810ee1fe>] ? __audit_syscall_exit+0x25e/0x290
       [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
      Signed-off-by: default avatarMaurizio Lombardi <mlombard@redhat.com>
      Acked-by: default avatarChad Dupuis <chad.dupuis@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5cddd2f3
    • Kuninori Morimoto's avatar
      ASoC: rsnd: SSI PIO adjust to 24bit mode · b3311f62
      Kuninori Morimoto authored
      
      [ Upstream commit 7819a942 ]
      
      commit 90431eb4 ("ASoC: rsnd: don't use PDTA bit for 24bit on SSI")
      fixups 24bit mode data alignment, but PIO was not cared.
      This patch fixes PIO mode 24bit data alignment
      Signed-off-by: default avatarKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3311f62
    • Dan Carpenter's avatar
      pNFS/flexfiles: missing error code in ff_layout_alloc_lseg() · 07b2a328
      Dan Carpenter authored
      
      [ Upstream commit 662f9a10 ]
      
      If xdr_inline_decode() fails then we end up returning ERR_PTR(0).  The
      caller treats NULL returns as -ENOMEM so it doesn't really hurt runtime,
      but obviously we intended to set an error code here.
      
      Fixes: d67ae825 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07b2a328
    • Liping Zhang's avatar
      netfilter: ctnetlink: fix incorrect nf_ct_put during hash resize · 2171500a
      Liping Zhang authored
      
      [ Upstream commit fefa9267 ]
      
      If nf_conntrack_htable_size was adjusted by the user during the ct
      dump operation, we may invoke nf_ct_put twice for the same ct, i.e.
      the "last" ct. This will cause the ct will be freed but still linked
      in hash buckets.
      
      It's very easy to reproduce the problem by the following commands:
        # while : ; do
        echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets
        done
        # while : ; do
        conntrack -L
        done
        # iperf -s 127.0.0.1 &
        # iperf -c 127.0.0.1 -P 60 -t 36000
      
      After a while, the system will hang like this:
        NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [bash:20184]
        NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [iperf:20382]
        ...
      
      So at last if we find cb->args[1] is equal to "last", this means hash
      resize happened, then we can set cb->args[1] to 0 to fix the above
      issue.
      
      Fixes: d205dc40 ("[NETFILTER]: ctnetlink: fix deadlock in table dumping")
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2171500a
    • Milian Wolff's avatar
      perf report: Fix off-by-one for non-activation frames · 695d3a35
      Milian Wolff authored
      
      [ Upstream commit 1982ad48 ]
      
      As the documentation for dwfl_frame_pc says, frames that
      are no activation frames need to have their program counter
      decremented by one to properly find the function of the caller.
      
      This fixes many cases where perf report currently attributes
      the cost to the next line. I.e. I have code like this:
      
      ~~~~~~~~~~~~~~~
        #include <thread>
        #include <chrono>
      
        using namespace std;
      
        int main()
        {
          this_thread::sleep_for(chrono::milliseconds(1000));
          this_thread::sleep_for(chrono::milliseconds(100));
          this_thread::sleep_for(chrono::milliseconds(10));
      
          return 0;
        }
      ~~~~~~~~~~~~~~~
      
      Now compile and record it:
      
      ~~~~~~~~~~~~~~~
        g++ -std=c++11 -g -O2 test.cpp
        echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
        perf record \
          --event sched:sched_stat_sleep \
          --event sched:sched_process_exit \
          --event sched:sched_switch --call-graph=dwarf \
          --output perf.data.raw \
          ./a.out
        echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
        perf inject --sched-stat --input perf.data.raw --output perf.data
      ~~~~~~~~~~~~~~~
      
      Before this patch, the report clearly shows the off-by-one issue.
      Most notably, the last sleep invocation is incorrectly attributed
      to the "return 0;" line:
      
      ~~~~~~~~~~~~~~~
        Overhead  Source:Line
        ........  ...........
      
         100.00%  core.c:0
                  |
                  ---__schedule core.c:0
                     schedule
                     do_nanosleep hrtimer.c:0
                     hrtimer_nanosleep
                     sys_nanosleep
                     entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
                     __nanosleep_nocancel .:0
                     std::this_thread::sleep_for<long, std::ratio<1l, 1000l> > thread:323
                     |
                     |--90.08%--main test.cpp:9
                     |          __libc_start_main
                     |          _start
                     |
                     |--9.01%--main test.cpp:10
                     |          __libc_start_main
                     |          _start
                     |
                      --0.91%--main test.cpp:13
                                __libc_start_main
                                _start
      ~~~~~~~~~~~~~~~
      
      With this patch here applied, the issue is fixed. The report becomes
      much more usable:
      
      ~~~~~~~~~~~~~~~
        Overhead  Source:Line
        ........  ...........
      
         100.00%  core.c:0
                  |
                  ---__schedule core.c:0
                     schedule
                     do_nanosleep hrtimer.c:0
                     hrtimer_nanosleep
                     sys_nanosleep
                     entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
                     __nanosleep_nocancel .:0
                     std::this_thread::sleep_for<long, std::ratio<1l, 1000l> > thread:323
                     |
                     |--90.08%--main test.cpp:8
                     |          __libc_start_main
                     |          _start
                     |
                     |--9.01%--main test.cpp:9
                     |          __libc_start_main
                     |          _start
                     |
                      --0.91%--main test.cpp:10
                                __libc_start_main
                                _start
      ~~~~~~~~~~~~~~~
      
      Similarly it works for signal frames:
      
      ~~~~~~~~~~~~~~~
        __noinline void bar(void)
        {
          volatile long cnt = 0;
      
          for (cnt = 0; cnt < 100000000; cnt++);
        }
      
        __noinline void foo(void)
        {
          bar();
        }
      
        void sig_handler(int sig)
        {
          foo();
        }
      
        int main(void)
        {
          signal(SIGUSR1, sig_handler);
          raise(SIGUSR1);
      
          foo();
          return 0;
        }
      ~~~~~~~~~~~~~~~~
      
      Before, the report wrongly points to `signal.c:29` after raise():
      
      ~~~~~~~~~~~~~~~~
        $ perf report --stdio --no-children -g srcline -s srcline
        ...
         100.00%  signal.c:11
                  |
                  ---bar signal.c:11
                     |
                     |--50.49%--main signal.c:29
                     |          __libc_start_main
                     |          _start
                     |
                      --49.51%--0x33a8f
                                raise .:0
                                main signal.c:29
                                __libc_start_main
                                _start
      ~~~~~~~~~~~~~~~~
      
      With this patch in, the issue is fixed and we instead get:
      
      ~~~~~~~~~~~~~~~~
         100.00%  signal   signal            [.] bar
                  |
                  ---bar signal.c:11
                     |
                     |--50.49%--main signal.c:29
                     |          __libc_start_main
                     |          _start
                     |
                      --49.51%--0x33a8f
                                raise .:0
                                main signal.c:27
                                __libc_start_main
                                _start
      ~~~~~~~~~~~~~~~~
      
      Note how this patch fixes this issue for both unwinding methods, i.e.
      both dwfl and libunwind. The former case is straight-forward thanks
      to dwfl_frame_pc(). For libunwind, we replace the functionality via
      unw_is_signal_frame() for any but the very first frame.
      Signed-off-by: default avatarMilian Wolff <milian.wolff@kdab.com>
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Cc: kernel-team@lge.com
      Link: http://lkml.kernel.org/r/20170524062129.32529-4-namhyung@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      695d3a35
    • Dan Carpenter's avatar
      libceph: NULL deref on crush_decode() error path · 0bd1a03e
      Dan Carpenter authored
      
      [ Upstream commit 293dffaa ]
      
      If there is not enough space then ceph_decode_32_safe() does a goto bad.
      We need to return an error code in that situation.  The current code
      returns ERR_PTR(0) which is NULL.  The callers are not expecting that
      and it results in a NULL dereference.
      
      Fixes: f24e9980 ("ceph: OSD client")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bd1a03e
    • Lin Zhang's avatar
      net: ieee802154: fix net_device reference release too early · 9fc2a46a
      Lin Zhang authored
      
      [ Upstream commit a611c58b ]
      
      This patch fixes the kernel oops when release net_device reference in
      advance. In function raw_sendmsg(i think the dgram_sendmsg has the same
      problem), there is a race condition between dev_put and dev_queue_xmit
      when the device is gong that maybe lead to dev_queue_ximt to see
      an illegal net_device pointer.
      
      My test kernel is 3.13.0-32 and because i am not have a real 802154
      device, so i change lowpan_newlink function to this:
      
              /* find and hold real wpan device */
              real_dev = dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
              if (!real_dev)
                      return -ENODEV;
      //      if (real_dev->type != ARPHRD_IEEE802154) {
      //              dev_put(real_dev);
      //              return -EINVAL;
      //      }
              lowpan_dev_info(dev)->real_dev = real_dev;
              lowpan_dev_info(dev)->fragment_tag = 0;
              mutex_init(&lowpan_dev_info(dev)->dev_list_mtx);
      
      Also, in order to simulate preempt, i change the raw_sendmsg function
      to this:
      
              skb->dev = dev;
              skb->sk  = sk;
              skb->protocol = htons(ETH_P_IEEE802154);
              dev_put(dev);
              //simulate preempt
              schedule_timeout_uninterruptible(30 * HZ);
              err = dev_queue_xmit(skb);
              if (err > 0)
                      err = net_xmit_errno(err);
      
      and this is my userspace test code named test_send_data:
      
      int main(int argc, char **argv)
      {
              char buf[127];
              int sockfd;
              sockfd = socket(AF_IEEE802154, SOCK_RAW, 0);
              if (sockfd < 0) {
                      printf("create sockfd error: %s\n", strerror(errno));
                      return -1;
              }
              send(sockfd, buf, sizeof(buf), 0);
              return 0;
      }
      
      This is my test case:
      
      root@zhanglin-x-computer:~/develop/802154# uname -a
      Linux zhanglin-x-computer 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15
      03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
      root@zhanglin-x-computer:~/develop/802154# ip link add link eth0 name
      lowpan0 type lowpan
      root@zhanglin-x-computer:~/develop/802154#
      //keep the lowpan0 device down
      root@zhanglin-x-computer:~/develop/802154# ./test_send_data &
      //wait a while
      root@zhanglin-x-computer:~/develop/802154# ip link del link dev lowpan0
      //the device is gone
      //oops
      [381.303307] general protection fault: 0000 [#1]SMP
      [381.303407] Modules linked in: af_802154 6lowpan bnep rfcomm
      bluetooth nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
      rts5139(C) snd_hda_intel
      snd_had_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi
      snd_seq_midi_event snd_rawmidi snd_req intel_rapl snd_seq_device
      coretemp i915 kvm_intel
      kvm snd_timer snd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
      cypted drm_kms_helper drm i2c_algo_bit soundcore video mac_hid
      parport_pc ppdev ip parport hid_generic
      usbhid hid ahci r8169 mii libahdi
      [381.304286] CPU:1 PID: 2524 Commm: 1 Tainted: G C 0 3.13.0-32-generic
      [381.304409] Hardware name: Haier Haier DT Computer/Haier DT Codputer,
      BIOS FIBT19H02_X64 06/09/2014
      [381.304546] tasks: ffff000096965fc0 ti: ffffB0013779c000 task.ti:
      ffffB8013779c000
      [381.304659] RIP: 0010:[<ffffffff01621fe1>] [<ffffffff81621fe1>]
      __dev_queue_ximt+0x61/0x500
      [381.304798] RSP: 0018:ffffB8013779dca0 EFLAGS: 00010202
      [381.304880] RAX: 272b031d57565351 RBX: 0000000000000000 RCX: ffff8800968f1a00
      [381.304987] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800968f1a00
      [381.305095] RBP: ffff8e013773dce0 R08: 0000000000000266 R09: 0000000000000004
      [381.305202] R10: 0000000000000004 R11: 0000000000000005 R12: ffff88013902e000
      [381.305310] R13: 000000000000007f R14: 000000000000007f R15: ffff8800968f1a00
      [381.305418] FS:  00007fc57f50f740(0000) GS: ffff88013fc80000(0000)
      knlGS: 0000000000000000
      [381.305540] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [381.305627] CR2: 00007fad0841c000 CR3: 00000001368dd000 CR4: 00000000001007e0
      [361.905734] Stack:
      [381.305768]  00000000002052d0 000000003facb30a ffff88013779dcc0
      ffff880137764000
      [381.305898]  ffff88013779de70 000000000000007f 000000000000007f
      ffff88013902e000
      [381.306026]  ffff88013779dcf0 ffffffff81622490 ffff88013779dd39
      ffffffffa03af9f1
      [381.306155] Call Trace:
      [381.306202]  [<ffffffff81622490>] dev_queue_xmit+0x10/0x20
      [381.306294]  [<ffffffffa03af9f1>] raw_sendmsg+0x1b1/0x270 [af_802154]
      [381.306396]  [<ffffffffa03af054>] ieee802154_sock_sendmsg+0x14/0x20 [af_802154]
      [381.306512]  [<ffffffff816079eb>] sock_sendmsg+0x8b/0xc0
      [381.306600]  [<ffffffff811d52a5>] ? __d_alloc+0x25/0x180
      [381.306687]  [<ffffffff811a1f56>] ? kmem_cache_alloc_trace+0x1c6/0x1f0
      [381.306791]  [<ffffffff81607b91>] SYSC_sendto+0x121/0x1c0
      [381.306878]  [<ffffffff8109ddf4>] ? vtime_account_user+x54/0x60
      [381.306975]  [<ffffffff81020d45>] ? syscall_trace_enter+0x145/0x250
      [381.307073]  [<ffffffff816086ae>] SyS_sendto+0xe/0x10
      [381.307156]  [<ffffffff8172c87f>] tracesys+0xe1/0xe6
      [381.307233] Code: c6 a1 a4 ff 41 8b 57 78 49 8b 47 20 85 d2 48 8b 80
      78 07 00 00 75 21 49 8b 57 18 48 85 d2 74 18 48 85 c0 74 13 8b 92 ac
      01 00 00 <3b> 50 10 73 08 8b 44 90 14 41 89 47 78 41 f6 84 24 d5 00 00
      00
      [381.307801] RIP [<ffffffff81621fe1>] _dev_queue_xmit+0x61/0x500
      [381.307901]  RSP <ffff88013779dca0>
      [381.347512] Kernel panic - not syncing: Fatal exception in interrupt
      [381.347747] drm_kms_helper: panic occurred, switching back to text console
      
      In my opinion, there is always exist a chance that the device is gong
      before call dev_queue_xmit.
      
      I think the latest kernel is have the same problem and that
      dev_put should be behind of the dev_queue_xmit.
      Signed-off-by: default avatarLin Zhang <xiaolou4617@gmail.com>
      Acked-by: default avatarStefan Schmidt <stefan@osg.samsung.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fc2a46a
    • Jesper Dangaard Brouer's avatar
      mlx5: fix bug reading rss_hash_type from CQE · 8adab3b9
      Jesper Dangaard Brouer authored
      
      [ Upstream commit 12e8b570 ]
      
      Masks for extracting part of the Completion Queue Entry (CQE)
      field rss_hash_type was swapped, namely CQE_RSS_HTYPE_IP and
      CQE_RSS_HTYPE_L4.
      
      The bug resulted in setting skb->l4_hash, even-though the
      rss_hash_type indicated that hash was NOT computed over the
      L4 (UDP or TCP) part of the packet.
      
      Added comments from the datasheet, to make it more clear what
      these masks are selecting.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8adab3b9
    • Dan Carpenter's avatar
      block: fix an error code in add_partition() · e6687645
      Dan Carpenter authored
      
      [ Upstream commit 7bd897cf ]
      
      We don't set an error code on this path.  It means that we return NULL
      instead of an error pointer and the caller does a NULL dereference.
      
      Fixes: 6d1d8050 ("block, partition: add partition_meta_info to hd_struct")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6687645
    • Stephen Smalley's avatar
      selinux: do not check open permission on sockets · b983b2a5
      Stephen Smalley authored
      
      [ Upstream commit ccb54478 ]
      
      open permission is currently only defined for files in the kernel
      (COMMON_FILE_PERMS rather than COMMON_FILE_SOCK_PERMS). Construction of
      an artificial test case that tries to open a socket via /proc/pid/fd will
      generate a recvfrom avc denial because recvfrom and open happen to map to
      the same permission bit in socket vs file classes.
      
      open of a socket via /proc/pid/fd is not supported by the kernel regardless
      and will ultimately return ENXIO. But we hit the permission check first and
      can thus produce these odd/misleading denials.  Omit the open check when
      operating on a socket.
      Signed-off-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b983b2a5
    • Tariq Toukan's avatar
      net/mlx5: Tolerate irq_set_affinity_hint() failures · 2c582b6b
      Tariq Toukan authored
      
      [ Upstream commit b665d98e ]
      
      Add tolerance to failures of irq_set_affinity_hint().
      Its role is to give hints that optimizes performance,
      and should not block the driver load.
      
      In non-SMP systems, functionality is not available as
      there is a single core, and all these calls definitely
      fail.  Hence, do not call the function and avoid the
      warning prints.
      
      Fixes: db058a18 ("net/mlx5_core: Set irq affinity hints")
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c582b6b
    • Hans de Goede's avatar
      gpio: crystalcove: Do not write regular gpio registers for virtual GPIOs · 72185b97
      Hans de Goede authored
      
      [ Upstream commit 9a752b4c ]
      
      The Crystal Cove PMIC has 16 real GPIOs but the ACPI code for devices
      with this PMIC may address up to 95 GPIOs, these extra GPIOs are
      called virtual GPIOs and are used by the ACPI code as a method of
      accessing various non GPIO bits of PMIC.
      
      Commit dcdc3018 ("gpio: crystalcove: support virtual GPIO") added
      dummy support for these to avoid a bunch of ACPI errors, but instead of
      ignoring writes / reads to them by doing:
      
      if (gpio >= CRYSTALCOVE_GPIO_NUM)
      	return 0;
      
      It accidentally introduced the following wrong check:
      
      if (gpio > CRYSTALCOVE_VGPIO_NUM)
      	return 0;
      
      Which means that attempts by the ACPI code to access these gpios
      causes some arbitrary gpio to get touched through for example
      GPIO1P0CTLO + gpionr % 8.
      
      Since we do support input/output (but not interrupts) on the 0x5e
      virtual GPIO, this commit makes to_reg return -ENOTSUPP for unsupported
      virtual GPIOs so as to not have to check for (gpio >= CRYSTALCOVE_GPIO_NUM
      && gpio != 0x5e) everywhere and to make it easier to add support for more
      virtual GPIOs in the future.
      
      It then adds a check for to_reg returning an error to all callers where
      this may happen fixing the ACPI code accessing virtual GPIOs accidentally
      causing changes to real GPIOs.
      
      Fixes: dcdc3018 ("gpio: crystalcove: support virtual GPIO")
      Cc: Aaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72185b97
    • Vlastimil Babka's avatar
      sched/numa: Use down_read_trylock() for the mmap_sem · a1e7a9e2
      Vlastimil Babka authored
      
      [ Upstream commit 8655d549 ]
      
      A customer has reported a soft-lockup when running an intensive
      memory stress test, where the trace on multiple CPU's looks like this:
      
       RIP: 0010:[<ffffffff810c53fe>]
        [<ffffffff810c53fe>] native_queued_spin_lock_slowpath+0x10e/0x190
      ...
       Call Trace:
        [<ffffffff81182d07>] queued_spin_lock_slowpath+0x7/0xa
        [<ffffffff811bc331>] change_protection_range+0x3b1/0x930
        [<ffffffff811d4be8>] change_prot_numa+0x18/0x30
        [<ffffffff810adefe>] task_numa_work+0x1fe/0x310
        [<ffffffff81098322>] task_work_run+0x72/0x90
      
      Further investigation showed that the lock contention here is pmd_lock().
      
      The task_numa_work() function makes sure that only one thread is let to perform
      the work in a single scan period (via cmpxchg), but if there's a thread with
      mmap_sem locked for writing for several periods, multiple threads in
      task_numa_work() can build up a convoy waiting for mmap_sem for read and then
      all get unblocked at once.
      
      This patch changes the down_read() to the trylock version, which prevents the
      build up. For a workload experiencing mmap_sem contention, it's probably better
      to postpone the NUMA balancing work anyway. This seems to have fixed the soft
      lockups involving pmd_lock(), which is in line with the convoy theory.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170515131316.21909-1-vbabka@suse.czSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1e7a9e2