1. 10 Nov, 2018 40 commits
    • Mark Rustad's avatar
      ixgbe: Correct X550EM_x revision check · 83aacb9c
      Mark Rustad authored
      [ Upstream commit 3ca2b250 ]
      
      The X550EM_x revision check needs to check a value, not just a bit.
      Use a mask and check the value. Also remove the redundant check
      inside the ixgbe_enter_lplu_t_x550em, because it can only be called
      when both the mac type and revision check pass.
      Signed-off-by: default avatarMark Rustad <mark.d.rustad@intel.com>
      Tested-by: default avatarPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      83aacb9c
    • Emil Tantilov's avatar
      ixgbe: fix RSS limit for X550 · 83662744
      Emil Tantilov authored
      [ Upstream commit e9ee3238 ]
      
      X550 allows for up to 64 RSS queues, but the driver can have max
      of 63 (-1 MSIX vector for link).
      
      On systems with >= 64 CPUs the driver will set the redirection table
      for all 64 queues which will result in packets being dropped.
      Signed-off-by: default avatarEmil Tantilov <emil.s.tantilov@intel.com>
      Tested-by: default avatarPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      83662744
    • Tariq Toukan's avatar
      net/mlx5e: Correctly handle RSS indirection table when changing number of channels · fcac753b
      Tariq Toukan authored
      [ Upstream commit 85082dba ]
      
      Upon changing num_channels, reset the RSS indirection table to
      match the new value.
      
      Fixes: 2d75b2bc ('net/mlx5e: Add ethtool RSS configuration options')
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      fcac753b
    • Tariq Toukan's avatar
      net/mlx5e: Fix LRO modify · 40e9abaa
      Tariq Toukan authored
      [ Upstream commit ab0394fe ]
      
      Ethtool LRO enable/disable is broken, as of today we only modify TCP
      TIRs in order to apply the requested configuration.
      
      Hardware requires that all TIRs pointing to the same RQ should share the
      same LRO configuration. For that all other TIRs' LRO fields must be
      modified as well.
      
      Fixes: 5c50368f ('net/mlx5e: Light-weight netdev open/stop')
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      40e9abaa
    • William Dauchy's avatar
      ixgbevf: Fix handling of NAPI budget when multiple queues are enabled per vector · c97383a6
      William Dauchy authored
      [ Upstream commit d0f71aff ]
      
      This is the same patch as for ixgbe but applied differently according to
      busy polling.  See commit 5d6002b7 ("ixgbe: Fix handling of NAPI
      budget when multiple queues are enabled per vector")
      Signed-off-by: default avatarWilliam Dauchy <william@gandi.net>
      Tested-by: default avatarPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c97383a6
    • Ashish Samant's avatar
      fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dio · 1b6a863f
      Ashish Samant authored
      [ Upstream commit 61c12b49 ]
      
      Commit 8fba54ae ("fuse: direct-io: don't dirty ITER_BVEC pages") fixes
      the ITER_BVEC page deadlock for direct io in fuse by checking in
      fuse_direct_io(), whether the page is a bvec page or not, before locking
      it.  However, this check is missed when the "async_dio" mount option is
      enabled.  In this case, set_page_dirty_lock() is called from the req->end
      callback in request_end(), when the fuse thread is returning from userspace
      to respond to the read request.  This will cause the same deadlock because
      the bvec condition is not checked in this path.
      
      Here is the stack of the deadlocked thread, while returning from userspace:
      
      [13706.656686] INFO: task glusterfs:3006 blocked for more than 120 seconds.
      [13706.657808] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
      this message.
      [13706.658788] glusterfs       D ffffffff816c80f0     0  3006      1
      0x00000080
      [13706.658797]  ffff8800d6713a58 0000000000000086 ffff8800d9ad7000
      ffff8800d9ad5400
      [13706.658799]  ffff88011ffd5cc0 ffff8800d6710008 ffff88011fd176c0
      7fffffffffffffff
      [13706.658801]  0000000000000002 ffffffff816c80f0 ffff8800d6713a78
      ffffffff816c790e
      [13706.658803] Call Trace:
      [13706.658809]  [<ffffffff816c80f0>] ? bit_wait_io_timeout+0x80/0x80
      [13706.658811]  [<ffffffff816c790e>] schedule+0x3e/0x90
      [13706.658813]  [<ffffffff816ca7e5>] schedule_timeout+0x1b5/0x210
      [13706.658816]  [<ffffffff81073ffb>] ? gup_pud_range+0x1db/0x1f0
      [13706.658817]  [<ffffffff810668fe>] ? kvm_clock_read+0x1e/0x20
      [13706.658819]  [<ffffffff81066909>] ? kvm_clock_get_cycles+0x9/0x10
      [13706.658822]  [<ffffffff810f5792>] ? ktime_get+0x52/0xc0
      [13706.658824]  [<ffffffff816c6f04>] io_schedule_timeout+0xa4/0x110
      [13706.658826]  [<ffffffff816c8126>] bit_wait_io+0x36/0x50
      [13706.658828]  [<ffffffff816c7d06>] __wait_on_bit_lock+0x76/0xb0
      [13706.658831]  [<ffffffffa0545636>] ? lock_request+0x46/0x70 [fuse]
      [13706.658834]  [<ffffffff8118800a>] __lock_page+0xaa/0xb0
      [13706.658836]  [<ffffffff810c8500>] ? wake_atomic_t_function+0x40/0x40
      [13706.658838]  [<ffffffff81194d08>] set_page_dirty_lock+0x58/0x60
      [13706.658841]  [<ffffffffa054d968>] fuse_release_user_pages+0x58/0x70 [fuse]
      [13706.658844]  [<ffffffffa0551430>] ? fuse_aio_complete+0x190/0x190 [fuse]
      [13706.658847]  [<ffffffffa0551459>] fuse_aio_complete_req+0x29/0x90 [fuse]
      [13706.658849]  [<ffffffffa05471e9>] request_end+0xd9/0x190 [fuse]
      [13706.658852]  [<ffffffffa0549126>] fuse_dev_do_write+0x336/0x490 [fuse]
      [13706.658854]  [<ffffffffa054963e>] fuse_dev_write+0x6e/0xa0 [fuse]
      [13706.658857]  [<ffffffff812a9ef3>] ? security_file_permission+0x23/0x90
      [13706.658859]  [<ffffffff81205300>] do_iter_readv_writev+0x60/0x90
      [13706.658862]  [<ffffffffa05495d0>] ? fuse_dev_splice_write+0x350/0x350
      [fuse]
      [13706.658863]  [<ffffffff812062a1>] do_readv_writev+0x171/0x1f0
      [13706.658866]  [<ffffffff810b3d00>] ? try_to_wake_up+0x210/0x210
      [13706.658868]  [<ffffffff81206361>] vfs_writev+0x41/0x50
      [13706.658870]  [<ffffffff81206496>] SyS_writev+0x56/0xf0
      [13706.658872]  [<ffffffff810257a1>] ? syscall_trace_leave+0xf1/0x160
      [13706.658874]  [<ffffffff816cbb2e>] system_call_fastpath+0x12/0x71
      
      Fix this by making should_dirty a fuse_io_priv parameter that can be
      checked in fuse_aio_complete_req().
      Reported-by: default avatarTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: default avatarAshish Samant <ashish.samant@oracle.com>
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1b6a863f
    • Pavel Roskin's avatar
      drm/nouveau/fbcon: fix oops without fbdev emulation · 4a3948cb
      Pavel Roskin authored
      [ Upstream commit 48137663 ]
      
      This is similar to an earlier commit 52dfcc5c ("drm/nouveau: fix for
      disabled fbdev emulation"), but protects all occurrences of helper.fbdev
      in the source.
      
      I see oops in nouveau_fbcon_accel_save_disable() called from
      nouveau_fbcon_set_suspend_work() on Linux 3.13 when
      CONFIG_DRM_FBDEV_EMULATION option is disabled.
      Signed-off-by: default avatarPavel Roskin <plroskin@gmail.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4a3948cb
    • Daniel Borkmann's avatar
      bpf: generally move prog destruction to RCU deferral · e25dc63a
      Daniel Borkmann authored
      [ Upstream commit 1aacde3d ]
      
      Jann Horn reported following analysis that could potentially result
      in a very hard to trigger (if not impossible) UAF race, to quote his
      event timeline:
      
       - Set up a process with threads T1, T2 and T3
       - Let T1 set up a socket filter F1 that invokes another filter F2
         through a BPF map [tail call]
       - Let T1 trigger the socket filter via a unix domain socket write,
         don't wait for completion
       - Let T2 call PERF_EVENT_IOC_SET_BPF with F2, don't wait for completion
       - Now T2 should be behind bpf_prog_get(), but before bpf_prog_put()
       - Let T3 close the file descriptor for F2, dropping the reference
         count of F2 to 2
       - At this point, T1 should have looked up F2 from the map, but not
         finished executing it
       - Let T3 remove F2 from the BPF map, dropping the reference count of
         F2 to 1
       - Now T2 should call bpf_prog_put() (wrong BPF program type), dropping
         the reference count of F2 to 0 and scheduling bpf_prog_free_deferred()
         via schedule_work()
       - At this point, the BPF program could be freed
       - BPF execution is still running in a freed BPF program
      
      While at PERF_EVENT_IOC_SET_BPF time it's only guaranteed that the perf
      event fd we're doing the syscall on doesn't disappear from underneath us
      for whole syscall time, it may not be the case for the bpf fd used as
      an argument only after we did the put. It needs to be a valid fd pointing
      to a BPF program at the time of the call to make the bpf_prog_get() and
      while T2 gets preempted, F2 must have dropped reference to 1 on the other
      CPU. The fput() from the close() in T3 should also add additionally delay
      to the reference drop via exit_task_work() when bpf_prog_release() gets
      called as well as scheduling bpf_prog_free_deferred().
      
      That said, it makes nevertheless sense to move the BPF prog destruction
      generally after RCU grace period to guarantee that such scenario above,
      but also others as recently fixed in ceb56070 ("bpf, perf: delay release
      of BPF prog after grace period") with regards to tail calls won't happen.
      Integrating bpf_prog_free_deferred() directly into the RCU callback is
      not allowed since the invocation might happen from either softirq or
      process context, so we're not permitted to block. Reviewing all bpf_prog_put()
      invocations from eBPF side (note, cBPF -> eBPF progs don't use this for
      their destruction) with call_rcu() look good to me.
      
      Since we don't know whether at the time of attaching the program, we're
      already part of a tail call map, we need to use RCU variant. However, due
      to this, there won't be severely more stress on the RCU callback queue:
      situations with above bpf_prog_get() and bpf_prog_put() combo in practice
      normally won't lead to releases, but even if they would, enough effort/
      cycles have to be put into loading a BPF program into the kernel already.
      Reported-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e25dc63a
    • Alan Stern's avatar
      usb-storage: fix bogus hardware error messages for ATA pass-thru devices · 75c85423
      Alan Stern authored
      [ Upstream commit a4fd4a72 ]
      
      Ever since commit a621bac3 ("scsi_lib: correctly retry failed zero
      length REQ_TYPE_FS commands"), people have been getting bogus error
      messages for USB disk drives using ATA pass-thru.  For example:
      
      [ 1344.880193] sd 6:0:0:0: [sdb] Attached SCSI disk
      [ 1345.069152] sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_SENSE
      [ 1345.069159] sd 6:0:0:0: [sdb] tag#0 Sense Key : Hardware Error [current] [descriptor]
      [ 1345.069162] sd 6:0:0:0: [sdb] tag#0 Add. Sense: No additional sense information
      [ 1345.069168] sd 6:0:0:0: [sdb] tag#0 CDB: ATA command pass through(16) 85 06 20 00 00 00 00 00 00 00 00 00 00 00 e5 00
      [ 1345.172252] sd 6:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_SENSE
      [ 1345.172258] sd 6:0:0:0: [sdb] tag#0 Sense Key : Hardware Error [current] [descriptor]
      [ 1345.172261] sd 6:0:0:0: [sdb] tag#0 Add. Sense: No additional sense information
      [ 1345.172266] sd 6:0:0:0: [sdb] tag#0 CDB: ATA command pass through(12)/Blank a1 06 20 da 00 00 4f c2 00 b0 00 00
      
      These messages can be quite annoying, because programs like udisks2
      provoke them every 10 minutes or so.  Other programs can also have
      this effect, such as those in smartmontools.
      
      I don't fully understand how that commit induced the SCSI core to log
      these error messages, but the underlying cause for them is code added
      to usb-storage by commit f1a0743b ("USB: storage: When a device
      returns no sense data, call it a Hardware Error").  At the time it was
      necessary to do this, in order to prevent an infinite retry loop with
      some not-so-great mass storage devices.
      
      However, the ATA pass-thru protocol uses SCSI sense data to return
      command status values, and some devices always report Check Condition
      status for ATA pass-thru commands to ensure that the host retrieves
      the sense data, even if the command succeeded.  This violates the USB
      mass-storage protocol (Check Condition status is supposed to mean the
      command failed), but we can't help that.
      
      This patch attempts to mitigate the problem of these bogus error
      reports by changing usb-storage.  The HARDWARE ERROR sense key will be
      inserted only for commands that aren't ATA pass-thru.
      
      Thanks to Ewan Milne for pointing out that this mechanism was present
      in usb-storage.  8 years after writing it, I had completely forgotten
      its existence.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Tested-by: default avatarKris Lindgren <kris.lindgren@gmail.com>
      Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1351305
      CC: Ewan D. Milne <emilne@redhat.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      75c85423
    • WANG Cong's avatar
      sch_red: update backlog as well · ab0b3b9d
      WANG Cong authored
      [ Upstream commit d7f4f332 ]
      
      Fixes: 2ccccf5f ("net_sched: update hierarchical backlog too")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ab0b3b9d
    • Sowmini Varadhan's avatar
      sparc/pci: Refactor dev_archdata initialization into pci_init_dev_archdata · c300313d
      Sowmini Varadhan authored
      [ Upstream commit 9a78d4fc ]
      
      The function pcibios_add_device() added by commit d0c31e02
      ("sparc/PCI: Fix for panic while enabling SR-IOV") initializes
      the dev_archdata by doing a memcpy from the PF. This has the
      problem that it erroneously copies the OF device without
      explicitly refcounting it.
      
      As David Miller pointed out: "Generally speaking we don't
      really support hot-plug for OF probed devices, but if we did
      all of the device tree pointers have to be refcounted properly."
      
      To fix this error, and also avoid code duplication, this patch
      creates a new helper function, pci_init_dev_archdata(), that
      initializes the fields in dev_archdata, and can be invoked
      by callers after they have taken the needed refcounts
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Tested-by: default avatarBabu Moger <babu.moger@oracle.com>
      Reviewed-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c300313d
    • Ewan D. Milne's avatar
      scsi: Add STARGET_CREATED_REMOVE state to scsi_target_state · d896b9f2
      Ewan D. Milne authored
      [ Upstream commit f9279c96 ]
      
      The addition of the STARGET_REMOVE state had the side effect of
      introducing a race condition that can cause a crash.
      
      scsi_target_reap_ref_release() checks the starget->state to
      see if it still in STARGET_CREATED, and if so, skips calling
      transport_remove_device() and device_del(), because the starget->state
      is only set to STARGET_RUNNING after scsi_target_add() has called
      device_add() and transport_add_device().
      
      However, if an rport loss occurs while a target is being scanned,
      it can happen that scsi_remove_target() will be called while the
      starget is still in the STARGET_CREATED state.  In this case, the
      starget->state will be set to STARGET_REMOVE, and as a result,
      scsi_target_reap_ref_release() will take the wrong path.  The end
      result is a panic:
      
      [ 1255.356653] Oops: 0000 [#1] SMP
      [ 1255.360154] Modules linked in: x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel ghash_clmulni_i
      [ 1255.393234] CPU: 5 PID: 149 Comm: kworker/u96:4 Tainted: G        W       4.11.0+ #8
      [ 1255.401879] Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013
      [ 1255.410327] Workqueue: scsi_wq_6 fc_scsi_scan_rport [scsi_transport_fc]
      [ 1255.417720] task: ffff88060ca8c8c0 task.stack: ffffc900048a8000
      [ 1255.424331] RIP: 0010:kernfs_find_ns+0x13/0xc0
      [ 1255.429287] RSP: 0018:ffffc900048abbf0 EFLAGS: 00010246
      [ 1255.435123] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [ 1255.443083] RDX: 0000000000000000 RSI: ffffffff8188d659 RDI: 0000000000000000
      [ 1255.451043] RBP: ffffc900048abc10 R08: 0000000000000000 R09: 0000012433fe0025
      [ 1255.459005] R10: 0000000025e5a4b5 R11: 0000000025e5a4b5 R12: ffffffff8188d659
      [ 1255.466972] R13: 0000000000000000 R14: ffff8805f55e5088 R15: 0000000000000000
      [ 1255.474931] FS:  0000000000000000(0000) GS:ffff880616b40000(0000) knlGS:0000000000000000
      [ 1255.483959] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1255.490370] CR2: 0000000000000068 CR3: 0000000001c09000 CR4: 00000000000406e0
      [ 1255.498332] Call Trace:
      [ 1255.501058]  kernfs_find_and_get_ns+0x31/0x60
      [ 1255.505916]  sysfs_unmerge_group+0x1d/0x60
      [ 1255.510498]  dpm_sysfs_remove+0x22/0x60
      [ 1255.514783]  device_del+0xf4/0x2e0
      [ 1255.518577]  ? device_remove_file+0x19/0x20
      [ 1255.523241]  attribute_container_class_device_del+0x1a/0x20
      [ 1255.529457]  transport_remove_classdev+0x4e/0x60
      [ 1255.534607]  ? transport_add_class_device+0x40/0x40
      [ 1255.540046]  attribute_container_device_trigger+0xb0/0xc0
      [ 1255.546069]  transport_remove_device+0x15/0x20
      [ 1255.551025]  scsi_target_reap_ref_release+0x25/0x40
      [ 1255.556467]  scsi_target_reap+0x2e/0x40
      [ 1255.560744]  __scsi_scan_target+0xaa/0x5b0
      [ 1255.565312]  scsi_scan_target+0xec/0x100
      [ 1255.569689]  fc_scsi_scan_rport+0xb1/0xc0 [scsi_transport_fc]
      [ 1255.576099]  process_one_work+0x14b/0x390
      [ 1255.580569]  worker_thread+0x4b/0x390
      [ 1255.584651]  kthread+0x109/0x140
      [ 1255.588251]  ? rescuer_thread+0x330/0x330
      [ 1255.592730]  ? kthread_park+0x60/0x60
      [ 1255.596815]  ret_from_fork+0x29/0x40
      [ 1255.600801] Code: 24 08 48 83 42 40 01 5b 41 5c 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
      [ 1255.621876] RIP: kernfs_find_ns+0x13/0xc0 RSP: ffffc900048abbf0
      [ 1255.628479] CR2: 0000000000000068
      [ 1255.632756] ---[ end trace 34a69ba0477d036f ]---
      
      Fix this by adding another scsi_target state STARGET_CREATED_REMOVE
      to distinguish this case.
      
      Fixes: f05795d3 ("scsi: Add intermediate STARGET_REMOVE state to scsi_target_state")
      Reported-by: default avatarDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: default avatarEwan D. Milne <emilne@redhat.com>
      Cc: <stable@vger.kernel.org>
      Reviewed-by: default avatarLaurence Oberman <loberman@redhat.com>
      Tested-by: default avatarLaurence Oberman <loberman@redhat.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d896b9f2
    • Jonathan Basseri's avatar
      xfrm: Clear sk_dst_cache when applying per-socket policy. · 9e9fe58a
      Jonathan Basseri authored
      [ Upstream commit 2b06cdf3 ]
      
      If a socket has a valid dst cache, then xfrm_lookup_route will get
      skipped. However, the cache is not invalidated when applying policy to a
      socket (i.e. IPV6_XFRM_POLICY). The result is that new policies are
      sometimes ignored on those sockets. (Note: This was broken for IPv4 and
      IPv6 at different times.)
      
      This can be demonstrated like so,
      1. Create UDP socket.
      2. connect() the socket.
      3. Apply an outbound XFRM policy to the socket. (setsockopt)
      4. send() data on the socket.
      
      Packets will continue to be sent in the clear instead of matching an
      xfrm or returning a no-match error (EAGAIN). This affects calls to
      send() and not sendto().
      
      Invalidating the sk_dst_cache is necessary to correctly apply xfrm
      policies. Since we do this in xfrm_user_policy(), the sk_lock was
      already acquired in either do_ip_setsockopt() or do_ipv6_setsockopt(),
      and we may call __sk_dst_reset().
      
      Performance impact should be negligible, since this code is only called
      when changing xfrm policy, and only affects the socket in question.
      
      Fixes: 00bc0ef5 ("ipv6: Skip XFRM lookup if dst_entry in socket cache is valid")
      Tested: https://android-review.googlesource.com/517555
      Tested: https://android-review.googlesource.com/418659Signed-off-by: default avatarJonathan Basseri <misterikkit@google.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9e9fe58a
    • Catalin Marinas's avatar
      arm64: Fix potential race with hardware DBM in ptep_set_access_flags() · db769a03
      Catalin Marinas authored
      [ Upstream commit 6d332747 ]
      
      In a system with DBM (dirty bit management) capable agents there is a
      possible race between a CPU executing ptep_set_access_flags() (maybe
      non-DBM capable) and a hardware update of the dirty state (clearing of
      PTE_RDONLY). The scenario:
      
      a) the pte is writable (PTE_WRITE set), clean (PTE_RDONLY set) and old
         (PTE_AF clear)
      b) ptep_set_access_flags() is called as a result of a read access and it
         needs to set the pte to writable, clean and young (PTE_AF set)
      c) a DBM-capable agent, as a result of a different write access, is
         marking the entry as young (setting PTE_AF) and dirty (clearing
         PTE_RDONLY)
      
      The current ptep_set_access_flags() implementation would set the
      PTE_RDONLY bit in the resulting value overriding the DBM update and
      losing the dirty state.
      
      This patch fixes such race by setting PTE_RDONLY to the most permissive
      (lowest value) of the current entry and the new one.
      
      Fixes: 66dbd6e6 ("arm64: Implement ptep_set_access_flags() for hardware AF/DBM")
      Cc: Will Deacon <will.deacon@arm.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarSteve Capper <steve.capper@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      db769a03
    • Mark Syms's avatar
      CIFS: handle guest access errors to Windows shares · 22979776
      Mark Syms authored
      [ Upstream commit 40920c2b ]
      
      Commit 1a967d6c ("correctly to
      anonymous authentication for the NTLM(v2) authentication") introduces
      a regression in handling errors related to attempting a guest
      connection to a Windows share which requires authentication. This
      should result in a permission denied error but actually causes the
      kernel module to enter a never-ending loop trying to follow a DFS
      referal which doesn't exist.
      
      The base cause of this is the failure now occurs later in the process
      during tree connect and not at the session setup setup and all errors
      in tree connect are interpreted as needing to follow the DFS paths
      which isn't in this case correct. So, check the returned error against
      EACCES and fail if this is returned error.
      
      Feedback from Aurelien:
      
        PS> net user guest /activate:no
          PS> mkdir C:\guestshare
            PS> icacls C:\guestshare /grant 'Everyone:(OI)(CI)F'
              PS> new-smbshare -name guestshare -path C:\guestshare -fullaccess Everyone
      
              I've tested v3.10, v4.4, master, master+your patch using default options
              (empty or no user "NU") and user=abc (U).
      
              NT_LOGON_FAILURE in session setup: LF
              This is what you seem to have in 3.10.
      
              NT_ACCESS_DENIED in tree connect to the share: AD
              This is what you get before your infinite loop.
      
                           |   NU       U
                           --------------------------------
                           3.10         |   LF       LF
                           4.4          |   LF       LF
                           master       |   AD       LF
                           master+patch |   AD       LF
      
                           No infinite DFS loop :(
                           All these issues result in mount failing very fast with permission denied.
      
                           I guess it could be from either the Windows version or the share/folder
                           ACL. A deeper analysis of the packets might reveal more.
      
                           In any case I did not notice any issues for on a basic DFS setup with
                           the patch so I don't think it introduced any regressions, which is
                           probably all that matters. It still bothers me a little I couldn't hit
                           the bug.
      
                           I've included kernel output w/ debugging output and network capture of
                           my tests if anyone want to have a look at it. (master+patch = ml-guestfix).
      Signed-off-by: default avatarMark Syms <mark.syms@citrix.com>
      Reviewed-by: default avatarAurelien Aptel <aaptel@suse.com>
      Tested-by: default avatarAurelien Aptel <aaptel@suse.com>
      Acked-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      22979776
    • Geert Uytterhoeven's avatar
      ASoC: wm8940: Enable cache usage to fix crashes on resume · 796c9c01
      Geert Uytterhoeven authored
      [ Upstream commit 50c7a0ef ]
      
      The wm8940 driver is using a regmap cache sync to restore the
      configuration of the chip when switching from OFF to STANDBY, but does
      not actually define a register cache which means that the restore is
      never going to work and we trigger asserts in regmap.  Fix this by
      enabling caching.
      
      Based on commit d3030d11 ("ASoC: ak4642: Enable cache usage to
      fix crashes on resume") by Mark Brown <broonie@kernel.org>.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarCharles Keepax <ckeepax@opensource.wolfsonmicro.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      796c9c01
    • Geert Uytterhoeven's avatar
      ASoC: ak4613: Enable cache usage to fix crashes on resume · f6291044
      Geert Uytterhoeven authored
      [ Upstream commit dcd2d1f7 ]
      
      During system resume:
      
          kernel BUG at drivers/base/regmap/regcache.c:347!
          ...
          PC is at regcache_sync+0x1c/0x128
          LR is at ak4613_resume+0x28/0x34
      
      The ak4613 driver is using a regmap cache sync to restore the
      configuration of the chip on resume but does not actually define a
      register cache which means that the resume is never going to work and we
      trigger asserts in regmap.  Fix this by enabling caching.
      
      Based on commit d3030d11 ("ASoC: ak4642: Enable cache usage to
      fix crashes on resume") by Mark Brown <broonie@kernel.org>.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f6291044
    • Maciej W. Rozycki's avatar
      MIPS: Fix FCSR Cause bit handling for correct SIGFPE issue · 1c84d14e
      Maciej W. Rozycki authored
      [ Upstream commit 5a1aca44 ]
      
      Sanitize FCSR Cause bit handling, following a trail of past attempts:
      
      * commit 42495484 ("MIPS: ptrace: Fix FP context restoration FCSR
      regression"),
      
      * commit 443c4403 ("MIPS: Always clear FCSR cause bits after
      emulation"),
      
      * commit 64bedffe ("MIPS: Clear [MSA]FPE CSR.Cause after
      notify_die()"),
      
      * commit b1442d39 ("MIPS: Prevent user from setting FCSR cause
      bits"),
      
      * commit b54d2901517d ("Properly handle branch delay slots in connection
      with signals.").
      
      Specifically do not mask these bits out in ptrace(2) processing and send
      a SIGFPE signal instead whenever a matching pair of an FCSR Cause and
      Enable bit is seen as execution of an affected context is about to
      resume.  Only then clear Cause bits, and even then do not clear any bits
      that are set but masked with the respective Enable bits.  Adjust Cause
      bit clearing throughout code likewise, except within the FPU emulator
      proper where they are set according to IEEE 754 exceptions raised as the
      operation emulated executed.  Do so so that any IEEE 754 exceptions
      subject to their default handling are recorded like with operations
      executed by FPU hardware.
      Signed-off-by: default avatarMaciej W. Rozycki <macro@imgtec.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/14460/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1c84d14e
    • Vladis Dronov's avatar
      usbvision: revert commit 588afcc1 · af521c7f
      Vladis Dronov authored
      [ Upstream commit d5468d7a ]
      
      Commit 588afcc1 ("[media] usbvision fix overflow of interfaces
      array")' should be reverted, because:
      
      * "!dev->actconfig->interface[ifnum]" won't catch a case where the value
      is not NULL but some garbage. This way the system may crash later with
      GPF.
      
      * "(ifnum >= USB_MAXINTERFACES)" does not cover all the error
      conditions. "ifnum" should be compared to "dev->actconfig->
      desc.bNumInterfaces", i.e. compared to the number of "struct
      usb_interface" kzalloc()-ed, not to USB_MAXINTERFACES.
      
      * There is a "struct usb_device" leak in this error path, as there is
      usb_get_dev(), but no usb_put_dev() on this path.
      
      * There is a bug of the same type several lines below with number of
      endpoints. The code is accessing hard-coded second endpoint
      ("interface->endpoint[1].desc") which may not exist. It would be great
      to handle this in the same patch too.
      
      * All the concerns above are resolved by already-accepted commit fa52bd50
      ("[media] usbvision: fix crash on detecting device with invalid
      configuration")
      
      * Mailing list message:
      http://www.spinics.net/lists/linux-media/msg94832.htmlSigned-off-by: default avatarVladis Dronov <vdronov@redhat.com>
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Cc: <stable@vger.kernel.org>      # for v4.5
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      af521c7f
    • Alexander Shishkin's avatar
      perf/core: Don't leak event in the syscall error path · 5796c70e
      Alexander Shishkin authored
      [ Upstream commit 201c2f85 ]
      
      In the error path, event_file not being NULL is used to determine
      whether the event itself still needs to be free'd, so fix it up to
      avoid leaking.
      Reported-by: default avatarLeon Yu <chianglungyu@gmail.com>
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 13005627 ("perf: Do not double free")
      Link: http://lkml.kernel.org/r/87twk06yxp.fsf@ashishki-desk.ger.corp.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5796c70e
    • Raghava Aditya Renukunta's avatar
      aacraid: Start adapter after updating number of MSIX vectors · 93e8e691
      Raghava Aditya Renukunta authored
      [ Upstream commit 116d77fe ]
      
      The adapter has to be started after updating the number of MSIX Vectors
      
      Fixes: ecc479e0 (aacraid: Set correct MSIX count for EEH recovery)
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarRaghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      93e8e691
    • Prarit Bhargava's avatar
      x86/PCI: Mark Broadwell-EP Home Agent 1 as having non-compliant BARs · 92fe37c0
      Prarit Bhargava authored
      [ Upstream commit da77b671 ]
      
      Commit b8941571 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having
      non-compliant BARs") marked Home Agent 0 & PCU has having non-compliant
      BARs.  Home Agent 1 also has non-compliant BARs.
      
      Mark Home Agent 1 as having non-compliant BARs so the PCI core doesn't
      touch them.
      
      The problem with these devices is documented in the Xeon v4 specification
      update:
      
        BDF2          PCI BARs in the Home Agent Will Return Non-Zero Values
                      During Enumeration
      
        Problem:      During system initialization the Operating System may access
                      the standard PCI BARs (Base Address Registers).  Due to
                      this erratum, accesses to the Home Agent BAR registers (Bus
                      1; Device 18; Function 0,4; Offsets (0x14-0x24) will return
                      non-zero values.
      
        Implication:  The operating system may issue a warning.  Intel has not
                      observed any functional failures due to this erratum.
      
      Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-v4-spec-update.html
      Fixes: b8941571 ("x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs")
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: "H. Peter Anvin" <hpa@zytor.com>
      CC: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      92fe37c0
    • Jarkko Sakkinen's avatar
      tpm: fix: return rc when devm_add_action() fails · c447410b
      Jarkko Sakkinen authored
      [ Upstream commit 4f3b193d ]
      
      Call put_device() and return error code if devm_add_action() fails.
      Signed-off-by: default avatarJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Reported-by: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Fixes: 8e0ee3c9 ("tpm: fix the cleanup of struct tpm_chip")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c447410b
    • Arnd Bergmann's avatar
      thermal: allow u8500-thermal driver to be a module · c8a5f83f
      Arnd Bergmann authored
      [ Upstream commit 26716ce1 ]
      
      When the thermal subsystem is a loadable module, the u8500 driver
      fails to build:
      
      drivers/thermal/built-in.o: In function `db8500_thermal_probe':
      db8500_thermal.c:(.text+0x96c): undefined reference to `thermal_zone_device_register'
      drivers/thermal/built-in.o: In function `db8500_thermal_work':
      db8500_thermal.c:(.text+0xab4): undefined reference to `thermal_zone_device_update'
      
      This changes the symbol to a tristate, so Kconfig can track the
      dependency correctly.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c8a5f83f
    • Arnd Bergmann's avatar
      thermal: allow spear-thermal driver to be a module · 90f9ed93
      Arnd Bergmann authored
      [ Upstream commit 4d2f1794 ]
      
      When the thermal subsystem is a loadable module, the spear driver
      fails to build:
      
      drivers/thermal/built-in.o: In function `spear_thermal_exit':
      spear_thermal.c:(.text+0xf8): undefined reference to `thermal_zone_device_unregister'
      drivers/thermal/built-in.o: In function `spear_thermal_probe':
      spear_thermal.c:(.text+0x230): undefined reference to `thermal_zone_device_register'
      
      This changes the symbol to a tristate, so Kconfig can track the
      dependency correctly.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarEduardo Valentin <edubezval@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      90f9ed93
    • Jeff Mahoney's avatar
      btrfs: don't create or leak aliased root while cleaning up orphans · c1504091
      Jeff Mahoney authored
      [ Upstream commit 35bbb97f ]
      
      commit 909c3a22 (Btrfs: fix loading of orphan roots leading to BUG_ON)
      avoids the BUG_ON but can add an aliased root to the dead_roots list or
      leak the root.
      
      Since we've already been loading roots into the radix tree, we should
      use it before looking the root up on disk.
      
      Cc: <stable@vger.kernel.org> # 4.5
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c1504091
    • Peter Zijlstra's avatar
      sched/cgroup: Fix cgroup entity load tracking tear-down · 137b1ce3
      Peter Zijlstra authored
      [ Upstream commit 6fe1f348 ]
      
      When a cgroup's CPU runqueue is destroyed, it should remove its
      remaining load accounting from its parent cgroup.
      
      The current site for doing so it unsuited because its far too late and
      unordered against other cgroup removal (->css_free() will be, but we're also
      in an RCU callback).
      
      Put it in the ->css_offline() callback, which is the start of cgroup
      destruction, right after the group has been made unavailable to
      userspace. The ->css_offline() callbacks are called in hierarchical order
      after the following v4.4 commit:
      
        aa226ff4 ("cgroup: make sure a parent css isn't offlined before its children")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160121212416.GL6357@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      137b1ce3
    • Florian Fainelli's avatar
      um: Avoid longjmp/setjmp symbol clashes with libpthread.a · 53025e7f
      Florian Fainelli authored
      [ Upstream commit f44f1e7d ]
      
      Building a statically linked UML kernel on a Centos 6.9 host resulted in
      the following linking failure (GCC 4.4, glibc-2.12):
      
      /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libpthread.a(libpthread.o):
      In function `siglongjmp':
      (.text+0x8490): multiple definition of `longjmp'
      arch/x86/um/built-in.o:/local/users/fainelli/openwrt/trunk/build_dir/target-x86_64_musl/linux-uml/linux-4.4.69/arch/x86/um/setjmp_64.S:44:
      first defined here
      /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/libpthread.a(libpthread.o):
      In function `sem_open':
      (.text+0x77cd): warning: the use of `mktemp' is dangerous, better use
      `mkstemp'
      collect2: ld returned 1 exit status
      make[4]: *** [vmlinux] Error 1
      
      Adopt a solution similar to the one done for vmap where we define
      longjmp/setjmp to be kernel_longjmp/setjmp. In the process, make sure we
      do rename the functions in arch/x86/um/setjmp_*.S accordingly.
      
      Fixes: a7df4716 ("um: link with -lpthread")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      53025e7f
    • Eric Dumazet's avatar
      ipv6: orphan skbs in reassembly unit · 274337f8
      Eric Dumazet authored
      [ Upstream commit 48cac18e ]
      
      Andrey reported a use-after-free in IPv6 stack.
      
      Issue here is that we free the socket while it still has skb
      in TX path and in some queues.
      
      It happens here because IPv6 reassembly unit messes skb->truesize,
      breaking skb_set_owner_w() badly.
      
      We fixed a similar issue for IPV4 in commit 8282f274 ("inet: frag:
      Always orphan skbs inside ip_defrag()")
      Acked-by: default avatarJoe Stringer <joe@ovn.org>
      
      ==================================================================
      BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
      Read of size 8 at addr ffff880062da0060 by task a.out/4140
      
      page:ffffea00018b6800 count:1 mapcount:0 mapping:          (null)
      index:0x0 compound_mapcount: 0
      flags: 0x100000000008100(slab|head)
      raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013
      raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000
      page dumped because: kasan: bad access detected
      
      CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15
       dump_stack+0x292/0x398 lib/dump_stack.c:51
       describe_address mm/kasan/report.c:262
       kasan_report_error+0x121/0x560 mm/kasan/report.c:370
       kasan_report mm/kasan/report.c:392
       __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413
       sock_flag ./arch/x86/include/asm/bitops.h:324
       sock_wfree+0x118/0x120 net/core/sock.c:1631
       skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655
       skb_release_all+0x15/0x60 net/core/skbuff.c:668
       __kfree_skb+0x15/0x20 net/core/skbuff.c:684
       kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705
       inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
       inet_frag_put ./include/net/inet_frag.h:133
       nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617
       ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn ./include/linux/netfilter.h:102
       nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
       nf_hook ./include/linux/netfilter.h:212
       __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160
       ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
       ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
       ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
       rawv6_push_pending_frames net/ipv6/raw.c:613
       rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635
       sock_sendmsg+0xca/0x110 net/socket.c:645
       sock_write_iter+0x326/0x620 net/socket.c:848
       new_sync_write fs/read_write.c:499
       __vfs_write+0x483/0x760 fs/read_write.c:512
       vfs_write+0x187/0x530 fs/read_write.c:560
       SYSC_write fs/read_write.c:607
       SyS_write+0xfb/0x230 fs/read_write.c:599
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      RIP: 0033:0x7ff26e6f5b79
      RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79
      RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003
      RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
      R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003
      
      The buggy address belongs to the object at ffff880062da0000
       which belongs to the cache RAWv6 of size 1504
      The buggy address ffff880062da0060 is located 96 bytes inside
       of 1504-byte region [ffff880062da0000, ffff880062da05e0)
      
      Freed by task 4113:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514
       kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
       slab_free_hook mm/slub.c:1352
       slab_free_freelist_hook mm/slub.c:1374
       slab_free mm/slub.c:2951
       kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973
       sk_prot_free net/core/sock.c:1377
       __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452
       sk_destruct+0x47/0x80 net/core/sock.c:1460
       __sk_free+0x57/0x230 net/core/sock.c:1468
       sk_free+0x23/0x30 net/core/sock.c:1479
       sock_put ./include/net/sock.h:1638
       sk_common_release+0x31e/0x4e0 net/core/sock.c:2782
       rawv6_close+0x54/0x80 net/ipv6/raw.c:1214
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431
       sock_release+0x8d/0x1e0 net/socket.c:599
       sock_close+0x16/0x20 net/socket.c:1063
       __fput+0x332/0x7f0 fs/file_table.c:208
       ____fput+0x15/0x20 fs/file_table.c:244
       task_work_run+0x19b/0x270 kernel/task_work.c:116
       exit_task_work ./include/linux/task_work.h:21
       do_exit+0x186b/0x2800 kernel/exit.c:839
       do_group_exit+0x149/0x420 kernel/exit.c:943
       SYSC_exit_group kernel/exit.c:954
       SyS_exit_group+0x1d/0x20 kernel/exit.c:952
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      
      Allocated by task 4115:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
       slab_post_alloc_hook mm/slab.h:432
       slab_alloc_node mm/slub.c:2708
       slab_alloc mm/slub.c:2716
       kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721
       sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334
       sk_alloc+0x105/0x1010 net/core/sock.c:1396
       inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183
       __sock_create+0x4f6/0x880 net/socket.c:1199
       sock_create net/socket.c:1239
       SYSC_socket net/socket.c:1269
       SyS_socket+0xf9/0x230 net/socket.c:1249
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      
      Memory state around the buggy address:
       ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
       ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      274337f8
    • Eugenia Emantayev's avatar
      net/mlx4_en: Resolve dividing by zero in 32-bit system · 61918dbc
      Eugenia Emantayev authored
      [ Upstream commit 4850cf45 ]
      
      When doing roundup_pow_of_two for large enough number with
      bit 31, an overflow will occur and a value equal to 1 will
      be returned. In this case 1 will be subtracted from the return
      value and division by zero will be reached.
      
      Fixes: 31c128b6 ("net/mlx4_en: Choose time-stamping shift value according to HW frequency")
      Signed-off-by: default avatarEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      61918dbc
    • Mateusz Jurczyk's avatar
      af_iucv: Move sockaddr length checks to before accessing sa_family in bind and connect handlers · 80176161
      Mateusz Jurczyk authored
      [ Upstream commit e3c42b61 ]
      
      Verify that the caller-provided sockaddr structure is large enough to
      contain the sa_family field, before accessing it in bind() and connect()
      handlers of the AF_IUCV socket. Since neither syscall enforces a minimum
      size of the corresponding memory region, very short sockaddrs (zero or
      one byte long) result in operating on uninitialized memory while
      referencing .sa_family.
      
      Fixes: 52a82e23 ("af_iucv: Validate socket address length in iucv_sock_bind()")
      Signed-off-by: default avatarMateusz Jurczyk <mjurczyk@google.com>
      [jwi: removed unneeded null-check for addr]
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      80176161
    • Andrey Ryabinin's avatar
      radix-tree: fix radix_tree_iter_retry() for tagged iterators. · 11eea056
      Andrey Ryabinin authored
      [ Upstream commit 3cb9185c ]
      
      radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
      Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
      leading to crash:
      
        RIP: radix_tree_next_slot include/linux/radix-tree.h:473
          find_get_pages_tag+0x334/0x930 mm/filemap.c:1452
        ....
        Call Trace:
          pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960
          mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516
          ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736
          do_writepages+0x97/0x100 mm/page-writeback.c:2364
          __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300
          filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490
          ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115
          vfs_fsync_range+0x10a/0x250 fs/sync.c:195
          vfs_fsync fs/sync.c:209
          do_fsync+0x42/0x70 fs/sync.c:219
          SYSC_fdatasync fs/sync.c:232
          SyS_fdatasync+0x19/0x20 fs/sync.c:230
          entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207
      
      We must reset iterator's tags to bail out from radix_tree_next_slot()
      and go to the slow-path in radix_tree_next_chunk().
      
      Fixes: 46437f9a ("radix-tree: fix race in gang lookup")
      Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.comSigned-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      11eea056
    • Matt Fleming's avatar
      x86/mm/pat: Prevent hang during boot when mapping pages · 9903f3ab
      Matt Fleming authored
      [ Upstream commit e535ec08 ]
      
      There's a mixture of signed 32-bit and unsigned 32-bit and 64-bit data
      types used for keeping track of how many pages have been mapped.
      
      This leads to hangs during boot when mapping large numbers of pages
      (multiple terabytes, as reported by Waiman) because those values are
      interpreted as being negative.
      
      commit 74256377 ("x86/mm/pat: Avoid truncation when converting
      cpa->numpages to address") fixed one of those bugs, but there is
      another lurking in __change_page_attr_set_clr().
      
      Additionally, the return value type for the populate_*() functions can
      return negative values when a large number of pages have been mapped,
      triggering the error paths even though no error occurred.
      
      Consistently use 64-bit types on 64-bit platforms when counting pages.
      Even in the signed case this gives us room for regions 8PiB
      (pebibytes) in size whilst still allowing the usual negative value
      error checking idiom.
      Reported-by: default avatarWaiman Long <waiman.long@hpe.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      CC: Theodore Ts'o <tytso@mit.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Scott J Norton <scott.norton@hpe.com>
      Cc: Douglas Hatch <doug.hatch@hpe.com>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9903f3ab
    • Srinivas Kandagatla's avatar
      ARM: dts: apq8064: add ahci ports-implemented mask · 011859fd
      Srinivas Kandagatla authored
      [ Upstream commit bb4add2c ]
      
      This patch adds new ports-implemented mask, which is required to get
      achi working on the mainline. Without this patch value read from
      PORTS_IMPL register which is zero would not enable any ports for
      software to use.
      
      Fixes: 566d1827 ("libata: disable forced PORTS_IMPL for >= AHCI 1.3")
      Cc: stable@vger.kernel.org # v4.5+
      Signed-off-by: default avatarSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Reviewed-by: default avatarAndy Gross <andy.gross@linaro.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      011859fd
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Skip more functions when doing stack tracing of events · 70b3d6c5
      Steven Rostedt (Red Hat) authored
      [ Upstream commit be54f69c ]
      
       # echo 1 > options/stacktrace
       # echo 1 > events/sched/sched_switch/enable
       # cat trace
                <idle>-0     [002] d..2  1982.525169: <stack trace>
       => save_stack_trace
       => __ftrace_trace_stack
       => trace_buffer_unlock_commit_regs
       => event_trigger_unlock_commit
       => trace_event_buffer_commit
       => trace_event_raw_event_sched_switch
       => __schedule
       => schedule
       => schedule_preempt_disabled
       => cpu_startup_entry
       => start_secondary
      
      The above shows that we are seeing 6 functions before ever making it to the
      caller of the sched_switch event.
      
       # echo stacktrace > events/sched/sched_switch/trigger
       # cat trace
                <idle>-0     [002] d..3  2146.335208: <stack trace>
       => trace_event_buffer_commit
       => trace_event_raw_event_sched_switch
       => __schedule
       => schedule
       => schedule_preempt_disabled
       => cpu_startup_entry
       => start_secondary
      
      The stacktrace trigger isn't as bad, because it adds its own skip to the
      stacktracing, but still has two events extra.
      
      One issue is that if the stacktrace passes its own "regs" then there should
      be no addition to the skip, as the regs will not include the functions being
      called. This was an issue that was fixed by commit 7717c6be ("tracing:
      Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()" as adding
      the skip number for kprobes made the probes not have any stack at all.
      
      But since this is only an issue when regs is being used, a skip should be
      added if regs is NULL. Now we have:
      
       # echo 1 > options/stacktrace
       # echo 1 > events/sched/sched_switch/enable
       # cat trace
                <idle>-0     [000] d..2  1297.676333: <stack trace>
       => __schedule
       => schedule
       => schedule_preempt_disabled
       => cpu_startup_entry
       => rest_init
       => start_kernel
       => x86_64_start_reservations
       => x86_64_start_kernel
      
       # echo stacktrace > events/sched/sched_switch/trigger
       # cat trace
                <idle>-0     [002] d..3  1370.759745: <stack trace>
       => __schedule
       => schedule
       => schedule_preempt_disabled
       => cpu_startup_entry
       => start_secondary
      
      And kprobes are not touched.
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      70b3d6c5
    • Paul Bolle's avatar
      ser_gigaset: use container_of() instead of detour · 6b3d1619
      Paul Bolle authored
      [ Upstream commit 8d2c3ab4 ]
      
      The purpose of gigaset_device_release() is to kfree() the struct
      ser_cardstate that contains our struct device. This is done via a bit of
      a detour. First we make our struct device's driver_data point to the
      container of our struct ser_cardstate (which is a struct cardstate). In
      gigaset_device_release() we then retrieve that driver_data again. And
      after that we finally kfree() the struct ser_cardstate that was saved in
      the struct cardstate.
      
      All of this can be achieved much easier by using container_of() to get
      from our struct device to its container, struct ser_cardstate. Do so.
      
      Note that at the time the detour was implemented commit b8b2c7d8
      ("base/platform: assert that dev_pm_domain callbacks are called
      unconditionally") had just entered the tree. That commit disconnected
      our platform_device and our platform_driver. These were reconnected
      again in v4.5-rc2 through commit 25cad69f ("base/platform: Fix
      platform drivers with no probe callback"). And one of the consequences
      of that fix was that it broke the detour via driver_data. That's because
      it made __device_release_driver() stop being a NOP for our struct device
      and actually do stuff again. One of the things it now does, is setting
      our driver_data to NULL. That, in turn, makes it impossible for
      gigaset_device_release() to get to our struct cardstate. Which has the
      net effect of leaking a struct ser_cardstate at every call of this
      driver's tty close() operation. So using container_of() has the
      additional benefit of actually working.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaul Bolle <pebolle@tiscali.nl>
      Acked-by: default avatarTilman Schmidt <tilman@imap.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6b3d1619
    • David Herrmann's avatar
      net: drop write-only stack variable · ca4a744b
      David Herrmann authored
      [ Upstream commit 3575dbf2 ]
      
      Remove a write-only stack variable from unix_attach_fds(). This is a
      left-over from the security fix in:
      
          commit 712f4aad
          Author: willy tarreau <w@1wt.eu>
          Date:   Sun Jan 10 07:54:56 2016 +0100
      
              unix: properly account for FDs passed over unix sockets
      Signed-off-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ca4a744b
    • Johannes Berg's avatar
      ipv6: suppress sparse warnings in IP6_ECN_set_ce() · 997a4944
      Johannes Berg authored
      [ Upstream commit c15c0ab1 ]
      
      Pass the correct type __wsum to csum_sub() and csum_add(). This doesn't
      really change anything since __wsum really *is* __be32, but removes the
      address space warnings from sparse.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: 34ae6a1a ("ipv6: update skb->csum when CE mark is propagated")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      997a4944
    • Eric Biggers's avatar
      KEYS: put keyring if install_session_keyring_to_cred() fails · 9aae17f8
      Eric Biggers authored
      [ Upstream commit d636bd9f ]
      
      In join_session_keyring(), if install_session_keyring_to_cred() were to
      fail, we would leak the keyring reference, just like in the bug fixed by
      commit 23567fd0 ("KEYS: Fix keyring ref leak in
      join_session_keyring()").  Fortunately this cannot happen currently, but
      we really should be more careful.  Do this by adding and using a new
      error label at which the keyring reference is dropped.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9aae17f8
    • Wenwen Wang's avatar
      net: cxgb3_main: fix a missing-check bug · 6596d0ab
      Wenwen Wang authored
      [ Upstream commit 2c05d888 ]
      
      In cxgb_extension_ioctl(), the command of the ioctl is firstly copied from
      the user-space buffer 'useraddr' to 'cmd' and checked through the
      switch statement. If the command is not as expected, an error code
      EOPNOTSUPP is returned. In the following execution, i.e., the cases of the
      switch statement, the whole buffer of 'useraddr' is copied again to a
      specific data structure, according to what kind of command is requested.
      However, after the second copy, there is no re-check on the newly-copied
      command. Given that the buffer 'useraddr' is in the user space, a malicious
      user can race to change the command between the two copies. By doing so,
      the attacker can supply malicious data to the kernel and cause undefined
      behavior.
      
      This patch adds a re-check in each case of the switch statement if there is
      a second copy in that case, to re-check whether the command obtained in the
      second copy is the same as the one in the first copy. If not, an error code
      EINVAL is returned.
      Signed-off-by: default avatarWenwen Wang <wang6495@umn.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6596d0ab