1. 09 Jan, 2017 40 commits
    • NeilBrown's avatar
      SUNRPC: fix refcounting problems with auth_gss messages. · 0349fbeb
      NeilBrown authored
      commit 1cded9d2 upstream.
      
      There are two problems with refcounting of auth_gss messages.
      
      First, the reference on the pipe->pipe list (taken by a call
      to rpc_queue_upcall()) is not counted.  It seems to be
      assumed that a message in pipe->pipe will always also be in
      pipe->in_downcall, where it is correctly reference counted.
      
      However there is no guaranty of this.  I have a report of a
      NULL dereferences in rpc_pipe_read() which suggests a msg
      that has been freed is still on the pipe->pipe list.
      
      One way I imagine this might happen is:
      - message is queued for uid=U and auth->service=S1
      - rpc.gssd reads this message and starts processing.
        This removes the message from pipe->pipe
      - message is queued for uid=U and auth->service=S2
      - rpc.gssd replies to the first message. gss_pipe_downcall()
        calls __gss_find_upcall(pipe, U, NULL) and it finds the
        *second* message, as new messages are placed at the head
        of ->in_downcall, and the service type is not checked.
      - This second message is removed from ->in_downcall and freed
        by gss_release_msg() (even though it is still on pipe->pipe)
      - rpc.gssd tries to read another message, and dereferences a pointer
        to this message that has just been freed.
      
      I fix this by incrementing the reference count before calling
      rpc_queue_upcall(), and decrementing it if that fails, or normally in
      gss_pipe_destroy_msg().
      
      It seems strange that the reply doesn't target the message more
      precisely, but I don't know all the details.  In any case, I think the
      reference counting irregularity became a measureable bug when the
      extra arg was added to __gss_find_upcall(), hence the Fixes: line
      below.
      
      The second problem is that if rpc_queue_upcall() fails, the new
      message is not freed. gss_alloc_msg() set the ->count to 1,
      gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
      then the pointer is discarded so the memory never gets freed.
      
      Fixes: 9130b8db ("SUNRPC: allow for upcalls for same uid but different gss service")
      Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0349fbeb
    • Trond Myklebust's avatar
      pNFS: Fix a deadlock between read resends and layoutreturn · c513ade4
      Trond Myklebust authored
      commit 54e4a0df upstream.
      
      We must not call nfs_pageio_init_read() on a new nfs_pageio_descriptor
      while holding a reference to a layout segment, as that can deadlock
      pnfs_update_layout().
      
      Fixes: d67ae825 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c513ade4
    • Trond Myklebust's avatar
      pNFS: Clear NFS_LAYOUT_RETURN_REQUESTED when invalidating the layout stateid · abb2903f
      Trond Myklebust authored
      commit ae5a459d upstream.
      
      We must ensure that we don't schedule a layoutreturn if the layout stateid
      has been marked as invalid.
      
      Fixes: 2a59a041 ("pNFS: Fix pnfs_set_layout_stateid() to clear...")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      abb2903f
    • Trond Myklebust's avatar
      pNFS: Don't clear the layout stateid if a layout return is outstanding · f061c76c
      Trond Myklebust authored
      commit 7b650994 upstream.
      
      If we no longer hold any layout segments, we're normally expected to
      consider the layout stateid to be invalid. However we cannot assume this
      if we're about to, or in the process of sending a layoutreturn.
      
      Fixes: 334a8f37 ("pNFS: Don't forget the layout stateid if...")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f061c76c
    • Trond Myklebust's avatar
      pNFS: On error, do not send LAYOUTGET until the LAYOUTRETURN has completed · 7de1b81c
      Trond Myklebust authored
      commit 6604b203 upstream.
      
      If there is an I/O error, we should not call LAYOUTGET until the
      LAYOUTRETURN that reports the error is complete.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7de1b81c
    • Al Viro's avatar
      nfs_write_end(): fix handling of short copies · 8f5ff877
      Al Viro authored
      commit c0cf3ef5 upstream.
      
      What matters when deciding if we should make a page uptodate is
      not how much we _wanted_ to copy, but how much we actually have
      copied.  As it is, on architectures that do not zero tail on
      short copy we can leave uninitialized data in page marked uptodate.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f5ff877
    • Ilya Dryomov's avatar
      libceph: verify authorize reply on connect · 1678adac
      Ilya Dryomov authored
      commit 5c056fdc upstream.
      
      After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
      the client gets back a ceph_x_authorize_reply, which it is supposed to
      verify to ensure the authenticity and protect against replay attacks.
      The code for doing this is there (ceph_x_verify_authorizer_reply(),
      ceph_auth_verify_authorizer_reply() + plumbing), but it is never
      invoked by the the messenger.
      
      AFAICT this goes back to 2009, when ceph authentication protocols
      support was added to the kernel client in 4e7a5dcd ("ceph:
      negotiate authentication protocol; implement AUTH_NONE protocol").
      
      The second param of ceph_connection_operations::verify_authorizer_reply
      is unused all the way down.  Pass 0 to facilitate backporting, and kill
      it in the next commit.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1678adac
    • Alan Stern's avatar
      PCI: Check for PME in targeted sleep state · 1f93d1a7
      Alan Stern authored
      commit 6496ebd7 upstream.
      
      One some systems, the firmware does not allow certain PCI devices to be put
      in deep D-states.  This can cause problems for wakeup signalling, if the
      device does not support PME# in the deepest allowed suspend state.  For
      example, Pierre reports that on his system, ACPI does not permit his xHCI
      host controller to go into D3 during runtime suspend -- but D3 is the only
      state in which the controller can generate PME# signals.  As a result, the
      controller goes into runtime suspend but never wakes up, so it doesn't work
      properly.  USB devices plugged into the controller are never detected.
      
      If the device relies on PME# for wakeup signals but is not capable of
      generating PME# in the target state, the PCI core should accurately report
      that it cannot do wakeup from runtime suspend.  This patch modifies the
      pci_dev_run_wake() routine to add this check.
      Reported-by: default avatarPierre de Villemereuil <flyos@mailoo.org>
      Tested-by: default avatarPierre de Villemereuil <flyos@mailoo.org>
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      CC: Lukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f93d1a7
    • Shiraz Saleem's avatar
      i40iw: Use correct src address in memcpy to rdma stats counters · c75bc2bd
      Shiraz Saleem authored
      commit 91c42b72 upstream.
      
      hw_stats is a pointer to i40_iw_dev_stats struct in i40iw_get_hw_stats().
      Use hw_stats and not &hw_stats in the memcpy to copy the i40iw device stats
      data into rdma_hw_stats counters.
      
      Fixes: b40f4757 ("IB/core: Make device counter infrastructure dynamic")
      Signed-off-by: default avatarShiraz Saleem <shiraz.saleem@intel.com>
      Signed-off-by: default avatarFaisal Latif <faisal.latif@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c75bc2bd
    • Jingkui Wang's avatar
      Input: drv260x - fix input device's parent assignment · 7d0a6cf3
      Jingkui Wang authored
      commit 5a8a6b89 upstream.
      
      We were assigning I2C bus controller instead of client as parent device.
      Besides being logically wrong, it messed up with devm handling of input
      device. As a result we were leaving input device and event node behind
      after rmmod-ing the driver, which lead to a kernel oops if one were to
      access the event node later.
      
      Let's remove the assignment and rely on devm_input_allocate_device() to
      set it up properly for us.
      Signed-off-by: default avatarJingkui Wang <jkwang@google.com>
      Fixes: 7132fe4f ("Input: drv260x - add TI drv260x haptics driver")
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d0a6cf3
    • Laurent Pinchart's avatar
      v4l: tvp5150: Add missing break in set control handler · b7843712
      Laurent Pinchart authored
      commit d183e4ef upstream.
      
      A break is missing resulting in the hue control enabling or disabling
      the decode completely. Fix it.
      
      Fixes: c43875f6 ("[media] tvp5150: replace MEDIA_ENT_F_CONN_TEST by a control")
      Signed-off-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7843712
    • Andrey Utkin's avatar
      media: solo6x10: fix lockup by avoiding delayed register write · 4963b191
      Andrey Utkin authored
      commit 5fc4b067 upstream.
      
      This fixes a lockup at device probing which happens on some solo6010
      hardware samples. This is a regression introduced by commit e1ceb25a
      ("[media] SOLO6x10: remove unneeded register locking and barriers")
      
      The observed lockup happens in solo_set_motion_threshold() called from
      solo_motion_config().
      
      This extra "flushing" is not fundamentally needed for every write, but
      apparently the code in driver assumes such behaviour at last in some
      places.
      
      Actual fix was proposed by Hans Verkuil.
      
      Fixes: e1ceb25a ("[media] SOLO6x10: remove unneeded register locking and barriers")
      Signed-off-by: default avatarAndrey Utkin <andrey.utkin@corp.bluecherry.net>
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4963b191
    • Marek Szyprowski's avatar
      s5p-mfc: fix failure path of s5p_mfc_alloc_memdev() · 88bfde25
      Marek Szyprowski authored
      commit 3467c9a7 upstream.
      
      s5p_mfc_alloc_memdev() function lacks proper releasing
      of allocated device in case of reserved memory initialization
      failure. This results in NULL pointer dereference:
      
      [    2.828457] Unable to handle kernel NULL pointer dereference at virtual address 00000001
      [    2.835089] pgd = c0004000
      [    2.837752] [00000001] *pgd=00000000
      [    2.844696] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
      [    2.848680] Modules linked in:
      [    2.851722] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc6-00002-gafa1b97 #878
      [    2.859357] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
      [    2.865433] task: ef080000 task.stack: ef06c000
      [    2.869952] PC is at strcmp+0x0/0x30
      [    2.873508] LR is at platform_match+0x84/0xac
      [    2.877847] pc : [<c032621c>]    lr : [<c03f65e8>]    psr: 20000013
      [    2.877847] sp : ef06dea0  ip : 00000000  fp : 00000000
      [    2.889303] r10: 00000000  r9 : c0b34848  r8 : c0b1e968
      [    2.894511] r7 : 00000000  r6 : 00000001  r5 : c086e7fc  r4 : eeb8e010
      [    2.901021] r3 : 0000006d  r2 : 00000000  r1 : c086e7fc  r0 : 00000001
      [    2.907533] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      [    2.914649] Control: 10c5387d  Table: 4000404a  DAC: 00000051
      [    2.920378] Process swapper/0 (pid: 1, stack limit = 0xef06c210)
      [    2.926367] Stack: (0xef06dea0 to 0xef06e000)
      [    2.930711] dea0: eeb8e010 c0c2d91c c03f4a6c c03f4a8c 00000000 c0c2d91c c03f4a6c c03f2fc8
      [    2.938870] dec0: ef003274 ef10c4c0 c0c2d91c ef10cc80 c0c21270 c03f3fa4 c09c1be8 c0c2d91c
      [    2.947028] dee0: 00000006 c0c2d91c 00000006 c0b3483c c0c47000 c03f5314 c0c2d908 c0b5fed8
      [    2.955188] df00: 00000006 c010178c 60000013 c0a4ef14 00000000 c06feaa0 ef080000 60000013
      [    2.963347] df20: 00000000 c0c095c8 efffca76 c0816b8c 000000d5 c0134098 c0b34848 c09d6cdc
      [    2.971506] df40: c0a4de70 00000000 00000006 00000006 c0c09568 efffca40 c0b5fed8 00000006
      [    2.979665] df60: c0b3483c c0c47000 000000d5 c0b34848 c0b005a4 c0b00d84 00000006 00000006
      [    2.987824] df80: 00000000 c0b005a4 00000000 c06fb4d8 00000000 00000000 00000000 00000000
      [    2.995983] dfa0: 00000000 c06fb4e0 00000000 c01079b8 00000000 00000000 00000000 00000000
      [    3.004142] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    3.012302] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff ffffffff
      [    3.020469] [<c032621c>] (strcmp) from [<c03f65e8>] (platform_match+0x84/0xac)
      [    3.027672] [<c03f65e8>] (platform_match) from [<c03f4a8c>] (__driver_attach+0x20/0xb0)
      [    3.035654] [<c03f4a8c>] (__driver_attach) from [<c03f2fc8>] (bus_for_each_dev+0x54/0x88)
      [    3.043812] [<c03f2fc8>] (bus_for_each_dev) from [<c03f3fa4>] (bus_add_driver+0xe8/0x1f4)
      [    3.051971] [<c03f3fa4>] (bus_add_driver) from [<c03f5314>] (driver_register+0x78/0xf4)
      [    3.059958] [<c03f5314>] (driver_register) from [<c010178c>] (do_one_initcall+0x3c/0x16c)
      [    3.068123] [<c010178c>] (do_one_initcall) from [<c0b00d84>] (kernel_init_freeable+0x120/0x1ec)
      [    3.076802] [<c0b00d84>] (kernel_init_freeable) from [<c06fb4e0>] (kernel_init+0x8/0x118)
      [    3.084958] [<c06fb4e0>] (kernel_init) from [<c01079b8>] (ret_from_fork+0x14/0x3c)
      [    3.092506] Code: 1afffffb e12fff1e e1a03000 eafffff7 (e4d03001)
      [    3.098618] ---[ end trace 511bf9d750810709 ]---
      [    3.103207] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      
      This patch fixes this issue.
      
      Fixes: c79667dd ("media: s5p-mfc: replace custom
      	reserved memory handling code with generic one")
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarSylwester Nawrocki <s.nawrocki@samsung.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88bfde25
    • Antti Palosaari's avatar
      mn88473: fix chip id check on probe · c3fe33d1
      Antti Palosaari authored
      commit d930b5b5 upstream.
      
      A register used to identify chip during probe was overwritten during
      firmware download and due to that later probe's for warm chip were
      failing. Detect chip from the another register, which is located on
      different register bank 2.
      
      Fixes: 7908fad9 ("[media] mn88473: finalize driver")
      Signed-off-by: default avatarAntti Palosaari <crope@iki.fi>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3fe33d1
    • Antti Palosaari's avatar
      mn88472: fix chip id check on probe · 84b2f664
      Antti Palosaari authored
      commit 365fe4e0 upstream.
      
      A register used to identify chip during probe was overwritten during
      firmware download and due to that later probe's for warm chip were
      failing. Detect chip from the another register, which is located on
      different register bank 2.
      
      Fixes: 94d0eaa4 ("[media] mn88472: move out of staging to media")
      Signed-off-by: default avatarAntti Palosaari <crope@iki.fi>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84b2f664
    • Bart Van Assche's avatar
      IB/cma: Fix a race condition in iboe_addr_get_sgid() · 15d1d226
      Bart Van Assche authored
      commit fba332b0 upstream.
      
      Code that dereferences the struct net_device ip_ptr member must be
      protected with an in_dev_get() / in_dev_put() pair. Hence insert
      calls to these functions.
      
      Fixes: commit 7b85627b ("IB/cma: IBoE (RoCE) IP-based GID addressing")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: default avatarMoni Shoua <monis@mellanox.com>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Roland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15d1d226
    • Bart Van Assche's avatar
      IB/rxe: Fix a memory leak in rxe_qp_cleanup() · 7b3721af
      Bart Van Assche authored
      commit e259934d upstream.
      
      A socket is associated with every QP by the rxe driver but sock_release()
      is never called. Add a call to sock_release() in rxe_qp_cleanup().
      
      Fixes: commit 8700e3e7c48A5 ("Add Soft RoCE driver")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Moni Shoua <monis@mellanox.com>
      Cc: Kamal Heib <kamalh@mellanox.com>
      Cc: Amir Vadai <amirv@mellanox.com>
      Cc: Haggai Eran <haggaie@mellanox.com>
      Reviewed-by: default avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b3721af
    • Bart Van Assche's avatar
      IB/multicast: Check ib_find_pkey() return value · 2a0aa77a
      Bart Van Assche authored
      commit d3a2418e upstream.
      
      This patch avoids that Coverity complains about not checking the
      ib_find_pkey() return value.
      
      Fixes: commit 547af765 ("IB/multicast: Report errors on multicast groups if P_key changes")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a0aa77a
    • Bart Van Assche's avatar
      IPoIB: Avoid reading an uninitialized member variable · 37d4adba
      Bart Van Assche authored
      commit 11b642b8 upstream.
      
      This patch avoids that Coverity reports the following:
      
          Using uninitialized value port_attr.state when calling printk
      
      Fixes: commit 94232d9c ("IPoIB: Start multicast join process only on active ports")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Erez Shitrit <erezsh@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37d4adba
    • Bart Van Assche's avatar
      IB/mad: Fix an array index check · f079fc11
      Bart Van Assche authored
      commit 2fe2f378 upstream.
      
      The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
      (80) elements. Hence compare the array index with that value instead
      of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
      reports the following:
      
      Overrunning array class->method_table of 80 8-byte elements at element index 127 (byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class) (which evaluates to 127).
      
      Fixes: commit b7ab0b19 ("IB/mad: Verify mgmt class in received MADs")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Reviewed-by: default avatarHal Rosenstock <hal@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f079fc11
    • Steven Rostedt (Red Hat)'s avatar
      fgraph: Handle a case where a tracer ignores set_graph_notrace · e661b5d4
      Steven Rostedt (Red Hat) authored
      commit 794de08a upstream.
      
      Both the wakeup and irqsoff tracers can use the function graph tracer when
      the display-graph option is set. The problem is that they ignore the notrace
      file, and record the entry of functions that would be ignored by the
      function_graph tracer. This causes the trace->depth to be recorded into the
      ring buffer. The set_graph_notrace uses a trick by adding a large negative
      number to the trace->depth when a graph function is to be ignored.
      
      On trace output, the graph function uses the depth to record a stack of
      functions. But since the depth is negative, it accesses the array with a
      negative number and causes an out of bounds access that can cause a kernel
      oops or corrupt data.
      
      Have the print functions handle cases where a tracer still records functions
      even when they are in set_graph_notrace.
      
      Also add warnings if the depth is below zero before accessing the array.
      
      Note, the function graph logic will still prevent the return of these
      functions from being recorded, which means that they will be left hanging
      without a return. For example:
      
         # echo '*spin*' > set_graph_notrace
         # echo 1 > options/display-graph
         # echo wakeup > current_tracer
         # cat trace
         [...]
            _raw_spin_lock() {
              preempt_count_add() {
              do_raw_spin_lock() {
            update_rq_clock();
      
      Where it should look like:
      
            _raw_spin_lock() {
              preempt_count_add();
              do_raw_spin_lock();
            }
            update_rq_clock();
      
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Fixes: 29ad23b0 ("ftrace: Add set_graph_notrace filter")
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e661b5d4
    • Marcos Paulo de Souza's avatar
      platform/x86: asus-nb-wmi.c: Add X45U quirk · b2758da0
      Marcos Paulo de Souza authored
      commit e74e2599 upstream.
      
      Without this patch, the Asus X45U wireless card can't be turned
      on (hard-blocked), but after a suspend/resume it just starts working.
      
      Following this bug report[1], there are other cases like this one, but
      this Asus is the only model that I can test.
      
      [1] https://ubuntuforums.org/showthread.php?t=2181558Signed-off-by: default avatarMarcos Paulo de Souza <marcos.souza.org@gmail.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2758da0
    • Steven Rostedt (Red Hat)'s avatar
      ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it · f61152e3
      Steven Rostedt (Red Hat) authored
      commit 847fa1a6 upstream.
      
      With new binutils, gcc may get smart with its optimization and change a jmp
      from a 5 byte jump to a 2 byte one even though it was jumping to a global
      function. But that global function existed within a 2 byte radius, and gcc
      was able to optimize it. Unfortunately, that jump was also being modified
      when function graph tracing begins. Since ftrace expected that jump to be 5
      bytes, but it was only two, it overwrote code after the jump, causing a
      crash.
      
      This was fixed for x86_64 with commit 8329e818, with the same subject as
      this commit, but nothing was done for x86_32.
      
      Fixes: d61f82d0 ("ftrace: use dynamic patching for updating mcount calls")
      Reported-by: default avatarColin Ian King <colin.king@canonical.com>
      Tested-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f61152e3
    • Michael S. Tsirkin's avatar
      vsock/virtio: fix src/dst cid format · 8569aade
      Michael S. Tsirkin authored
      commit f83f12d6 upstream.
      
      These fields are 64 bit, using le32_to_cpu and friends
      on these will not do the right thing.
      Fix this up.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8569aade
    • Jan Kara's avatar
      fsnotify: Fix possible use-after-free in inode iteration on umount · 576ea9e5
      Jan Kara authored
      commit 5716863e upstream.
      
      fsnotify_unmount_inodes() plays complex tricks to pin next inode in the
      sb->s_inodes list when iterating over all inodes. Furthermore the code has a
      bug that if the current inode is the last on i_sb_list that does not have e.g.
      I_FREEING set, then we leave next_i pointing to inode which may get removed
      from the i_sb_list once we drop s_inode_list_lock thus resulting in
      use-after-free issues (usually manifesting as infinite looping in
      fsnotify_unmount_inodes()).
      
      Fix the problem by keeping current inode pinned somewhat longer. Then we can
      make the code much simpler and standard.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      576ea9e5
    • Jim Mattson's avatar
      kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF) · 43983ce7
      Jim Mattson authored
      commit ef85b673 upstream.
      
      When L2 exits to L0 due to "exception or NMI", software exceptions
      (#BP and #OF) for which L1 has requested an intercept should be
      handled by L1 rather than L0. Previously, only hardware exceptions
      were forwarded to L1.
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43983ce7
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Don't lose hardware R/C bit updates in H_PROTECT · 9a5cf8c4
      Paul Mackerras authored
      commit f064a0de upstream.
      
      The hashed page table MMU in POWER processors can update the R
      (reference) and C (change) bits in a HPTE at any time until the
      HPTE has been invalidated and the TLB invalidation sequence has
      completed.  In kvmppc_h_protect, which implements the H_PROTECT
      hypercall, we read the HPTE, modify the second doubleword,
      invalidate the HPTE in memory, do the TLB invalidation sequence,
      and then write the modified value of the second doubleword back
      to memory.  In doing so we could overwrite an R/C bit update done
      by hardware between when we read the HPTE and when the TLB
      invalidation completed.  To fix this we re-read the second
      doubleword after the TLB invalidation and OR in the (possibly)
      new values of R and C.  We can use an OR since hardware only ever
      sets R and C, never clears them.
      
      This race was found by code inspection.  In principle this bug could
      cause occasional guest memory corruption under host memory pressure.
      
      Fixes: a8606e20 ("KVM: PPC: Handle some PAPR hcalls in the kernel", 2011-06-29)
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9a5cf8c4
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Save/restore XER in checkpointed register state · b751eb6e
      Paul Mackerras authored
      commit 0d808df0 upstream.
      
      When switching from/to a guest that has a transaction in progress,
      we need to save/restore the checkpointed register state.  Although
      XER is part of the CPU state that gets checkpointed, the code that
      does this saving and restoring doesn't save/restore XER.
      
      This fixes it by saving and restoring the XER.  To allow userspace
      to read/write the checkpointed XER value, we also add a new ONE_REG
      specifier.
      
      The visible effect of this bug is that the guest may see its XER
      value being corrupted when it uses transactions.
      
      Fixes: e4e38121 ("KVM: PPC: Book3S HV: Add transactional memory support")
      Fixes: 0a8eccef ("KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exit")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: default avatarThomas Huth <thuth@redhat.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b751eb6e
    • Kevin Barnett's avatar
      scsi: aacraid: remove wildcard for series 9 controllers · 0773e924
      Kevin Barnett authored
      commit ae2aae24 upstream.
      
      Controllers with this PCI ID never shipped outside of
      PMCS/Microsemi. Remove the ID from the aacraid driver. smartpqi is the
      correct driver for these controllers.
      
      [mkp: patch description]
      Reviewed-by: default avatarScott Teel <scott.teel@microsemi.com>
      Signed-off-by: default avatarKevin Barnett <kevin.barnett@microsemi.com>
      Signed-off-by: default avatarDon Brace <don.brace@microsemi.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0773e924
    • Konstantin Khlebnikov's avatar
      md/raid5: limit request size according to implementation limits · b202064f
      Konstantin Khlebnikov authored
      commit e8d7c332 upstream.
      
      Current implementation employ 16bit counter of active stripes in lower
      bits of bio->bi_phys_segments. If request is big enough to overflow
      this counter bio will be completed and freed too early.
      
      Fortunately this not happens in default configuration because several
      other limits prevent that: stripe_cache_size * nr_disks effectively
      limits count of active stripes. And small max_sectors_kb at lower
      disks prevent that during normal read/write operations.
      
      Overflow easily happens in discard if it's enabled by module parameter
      "devices_handle_discard_safely" and stripe_cache_size is set big enough.
      
      This patch limits requests size with 256Mb - 8Kb to prevent overflows.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Neil Brown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b202064f
    • Josh Cartwright's avatar
      sc16is7xx: Drop bogus use of IRQF_ONESHOT · 47090341
      Josh Cartwright authored
      commit 04da7380 upstream.
      
      The use of IRQF_ONESHOT when registering an interrupt handler with
      request_irq() is non-sensical.
      
      Not only that, it also prevents the handler from being threaded when it
      otherwise should be w/ IRQ_FORCED_THREADING is enabled.  This causes the
      following deadlock observed by Sean Nyekjaer on -rt:
      
      Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
      [..]
         rt_spin_lock_slowlock from queue_kthread_work
         queue_kthread_work from sc16is7xx_irq
         sc16is7xx_irq [sc16is7xx] from handle_irq_event_percpu
         handle_irq_event_percpu from handle_irq_event
         handle_irq_event from handle_level_irq
         handle_level_irq from generic_handle_irq
         generic_handle_irq from mxc_gpio_irq_handler
         mxc_gpio_irq_handler from mx3_gpio_irq_handler
         mx3_gpio_irq_handler from generic_handle_irq
         generic_handle_irq from __handle_domain_irq
         __handle_domain_irq from gic_handle_irq
         gic_handle_irq from __irq_svc
         __irq_svc from rt_spin_unlock
         rt_spin_unlock from kthread_worker_fn
         kthread_worker_fn from kthread
         kthread from ret_from_fork
      
      Fixes: 9e6f4ca3 ("sc16is7xx: use kthread_worker for tx_work and irq")
      Reported-by: default avatarSean Nyekjaer <sean.nyekjaer@prevas.dk>
      Signed-off-by: default avatarJosh Cartwright <joshc@ni.com>
      Cc: linux-rt-users@vger.kernel.org
      Cc: Jakub Kicinski <moorray3@wp.pl>
      Cc: linux-serial@vger.kernel.org
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarJulia Cartwright <julia@ni.com>
      Acked-by: default avatarJakub Kicinski <kubakici@wp.pl>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47090341
    • Marc Zyngier's avatar
      arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest · c33e1abd
      Marc Zyngier authored
      commit 21cbe3cc upstream.
      
      The ARMv8 architecture allows the cycle counter to be configured
      by setting PMSELR_EL0.SEL==0x1f and then accessing PMXEVTYPER_EL0,
      hence accessing PMCCFILTR_EL0. But it disallows the use of
      PMSELR_EL0.SEL==0x1f to access the cycle counter itself through
      PMXEVCNTR_EL0.
      
      Linux itself doesn't violate this rule, but we may end up with
      PMSELR_EL0.SEL being set to 0x1f when we enter a guest. If that
      guest accesses PMXEVCNTR_EL0, the access may UNDEF at EL1,
      despite the guest not having done anything wrong.
      
      In order to avoid this unfortunate course of events (haha!), let's
      sanitize PMSELR_EL0 on guest entry. This ensures that the guest
      won't explode unexpectedly.
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c33e1abd
    • Heiko Carstens's avatar
      s390/kexec: use node 0 when re-adding crash kernel memory · d3d61bb3
      Heiko Carstens authored
      commit 9f88eb4d upstream.
      
      When re-adding crash kernel memory within setup_resources() the
      function memblock_add() is used. That function will add memory by
      default to node "MAX_NUMNODES" instead of node 0, like the memory
      detection code does. In case of !NUMA this will trigger this warning
      when the kernel generates the vmemmap:
      
      Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead
      WARNING: CPU: 0 PID: 0 at mm/memblock.c:1261 memblock_virt_alloc_internal+0x76/0x220
      CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc6 #16
      Call Trace:
       [<0000000000d0b2e8>] memblock_virt_alloc_try_nid+0x88/0xc8
       [<000000000083c8ea>] __earlyonly_bootmem_alloc.constprop.1+0x42/0x50
       [<000000000083e7f4>] vmemmap_populate+0x1ac/0x1e0
       [<0000000000840136>] sparse_mem_map_populate+0x46/0x68
       [<0000000000d0c59c>] sparse_init+0x184/0x238
       [<0000000000cf45f6>] paging_init+0xbe/0xf8
       [<0000000000cf1d4a>] setup_arch+0xa02/0xae0
       [<0000000000ced75a>] start_kernel+0x72/0x450
       [<0000000000100020>] _stext+0x20/0x80
      
      If NUMA is selected numa_setup_memory() will fix the node assignments
      before the vmemmap will be populated; so this warning will only appear
      if NUMA is not selected.
      
      To fix this simply use memblock_add_node() and re-add crash kernel
      memory explicitly to node 0.
      Reported-and-tested-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Fixes: 4e042af4 ("s390/kexec: fix crash on resize of reserved memory")
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3d61bb3
    • Gerald Schaefer's avatar
      s390/vmlogrdr: fix IUCV buffer allocation · 9652b62a
      Gerald Schaefer authored
      commit 5457e03d upstream.
      
      The buffer for iucv_message_receive() needs to be below 2 GB. In
      __iucv_message_receive(), the buffer address is casted to an u32, which
      would result in either memory corruption or an addressing exception when
      using addresses >= 2 GB.
      
      Fix this by using GFP_DMA for the buffer allocation.
      Signed-off-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9652b62a
    • Yves-Alexis Perez's avatar
      firmware: fix usermode helper fallback loading · f356ab03
      Yves-Alexis Perez authored
      commit 2e700f8d upstream.
      
      When you use the firmware usermode helper fallback with a timeout value set to a
      value greater than INT_MAX (2147483647) a cast overflow issue causes the
      timeout value to go negative and breaks all usermode helper loading. This
      regression was introduced through commit 68ff2a00 ("firmware_loader:
      handle timeout via wait_for_completion_interruptible_timeout()") on kernel
      v4.0.
      
      The firmware_class drivers relies on the firmware usermode helper
      fallback as a mechanism to look for firmware if the direct filesystem
      search failed only if:
      
        a) You've enabled CONFIG_FW_LOADER_USER_HELPER_FALLBACK (not many distros):
      
        Then all of these callers will rely on the fallback mechanism in case
        the firmware is not found through an initial direct filesystem lookup:
      
        o request_firmware()
        o request_firmware_into_buf()
        o request_firmware_nowait()
      
        b) If you've only enabled CONFIG_FW_LOADER_USER_HELPER (most distros):
      
        Then only callers using request_firmware_nowait() with the second
        argument set to false, this explicitly is requesting the UMH firmware
        fallback to be relied on in case the first filesystem lookup fails.
      
        Using Coccinelle SmPL grammar we have identified only two drivers
        explicitly requesting the UMH firmware fallback mechanism:
      
        - drivers/firmware/dell_rbu.c
        - drivers/leds/leds-lp55xx-common.c
      
      Since most distributions only enable CONFIG_FW_LOADER_USER_HELPER the
      biggest impact of this regression are users of the dell_rbu and
      leds-lp55xx-common device driver which required the UMH to find their
      respective needed firmwares.
      
      The default timeout for the UMH is set to 60 seconds always, as of
      commit 68ff2a00 ("firmware_loader: handle timeout via
      wait_for_completion_interruptible_timeout()") the timeout was bumped
      to MAX_JIFFY_OFFSET ((LONG_MAX >> 1)-1). Additionally the MAX_JIFFY_OFFSET
      value was also used if the timeout was configured by a user to 0.
      
      The following works:
      
      echo 2147483647 > /sys/class/firmware/timeout
      
      But both of the following set the timeout to MAX_JIFFY_OFFSET even if
      we display 0 back to userspace:
      
      echo 2147483648 > /sys/class/firmware/timeout
      cat /sys/class/firmware/timeout
      0
      
      echo 0> /sys/class/firmware/timeout
      cat /sys/class/firmware/timeout
      0
      
      A max value of INT_MAX (2147483647) seconds is therefore implicit due to the
      another cast with simple_strtol().
      
      This fixes the secondary cast (the first one is simple_strtol() but its an
      issue only by forcing an implicit limit) by re-using the timeout variable and
      only setting retval in appropriate cases.
      
      Lastly worth noting systemd had ripped out the UMH firmware fallback
      mechanism from udev since udev 2014 via commit be2ea723b1d023b3d
      ("udev: remove userspace firmware loading support"), so as of systemd v217.
      Signed-off-by: default avatarYves-Alexis Perez <corsac@corsac.net>
      Fixes: 68ff2a00 "firmware_loader: handle timeout via wait_for_completion_interruptible_timeout()"
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Ming Lei <ming.lei@canonical.com>
      Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarLuis R. Rodriguez <mcgrof@kernel.org>
      Reviewed-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      [mcgrof@kernel.org: gave commit log a whole lot of love]
      Signed-off-by: default avatarLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f356ab03
    • Vineet Gupta's avatar
      ARC: mm: arc700: Don't assume 2 colours for aliasing VIPT dcache · 5c98bf83
      Vineet Gupta authored
      commit 08fe0079 upstream.
      
      An ARC700 customer reported linux boot crashes when upgrading to bigger
      L1 dcache (64K from 32K). Turns out they had an aliasing VIPT config and
      current code only assumed 2 colours, while theirs had 4. So default to 4
      colours and complain if there are fewer. Ideally this needs to be a
      Kconfig option, but heck that's too much of hassle for a single user.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c98bf83
    • Wei Fang's avatar
      scsi: avoid a permanent stop of the scsi device's request queue · cc328ce5
      Wei Fang authored
      commit d2a14525 upstream.
      
      A race between scanning and fc_remote_port_delete() may result in a
      permanent stop if the device gets blocked before scsi_sysfs_add_sdev()
      and unblocked after.  The reason is that blocking a device sets both the
      SDEV_BLOCKED state and the QUEUE_FLAG_STOPPED.  However,
      scsi_sysfs_add_sdev() unconditionally sets SDEV_RUNNING which causes the
      device to be ignored by scsi_target_unblock() and thus never have its
      QUEUE_FLAG_STOPPED cleared leading to a device which is apparently
      running but has a stopped queue.
      
      We actually have two places where SDEV_RUNNING is set: once in
      scsi_add_lun() which respects the blocked flag and once in
      scsi_sysfs_add_sdev() which doesn't.  Since the second set is entirely
      spurious, simply remove it to fix the problem.
      Reported-by: default avatarZengxi Chen <chenzengxi@huawei.com>
      Signed-off-by: default avatarWei Fang <fangwei1@huawei.com>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc328ce5
    • Steffen Maier's avatar
      scsi: zfcp: fix rport unblock race with LUN recovery · 6d675dff
      Steffen Maier authored
      commit 6f2ce1c6 upstream.
      
      It is unavoidable that zfcp_scsi_queuecommand() has to finish requests
      with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time
      window when zfcp detected an unavailable rport but
      fc_remote_port_delete(), which is asynchronous via
      zfcp_scsi_schedule_rport_block(), has not yet blocked the rport.
      
      However, for the case when the rport becomes available again, we should
      prevent unblocking the rport too early.  In contrast to other FCP LLDDs,
      zfcp has to open each LUN with the FCP channel hardware before it can
      send I/O to a LUN.  So if a port already has LUNs attached and we
      unblock the rport just after port recovery, recoveries of LUNs behind
      this port can still be pending which in turn force
      zfcp_scsi_queuecommand() to unnecessarily finish requests with
      DID_IMM_RETRY.
      
      This also opens a time window with unblocked rport (until the followup
      LUN reopen recovery has finished).  If a scsi_cmnd timeout occurs during
      this time window fc_timed_out() cannot work as desired and such command
      would indeed time out and trigger scsi_eh. This prevents a clean and
      timely path failover.  This should not happen if the path issue can be
      recovered on FC transport layer such as path issues involving RSCNs.
      
      Fix this by only calling zfcp_scsi_schedule_rport_register(), to
      asynchronously trigger fc_remote_port_add(), after all LUN recoveries as
      children of the rport have finished and no new recoveries of equal or
      higher order were triggered meanwhile.  Finished intentionally includes
      any recovery result no matter if successful or failed (still unblock
      rport so other successful LUNs work).  For simplicity, we check after
      each finished LUN recovery if there is another LUN recovery pending on
      the same port and then do nothing.  We handle the special case of a
      successful recovery of a port without LUN children the same way without
      changing this case's semantics.
      
      For debugging we introduce 2 new trace records written if the rport
      unblock attempt was aborted due to still unfinished or freshly triggered
      recovery. The records are only written above the default trace level.
      
      Benjamin noticed the important special case of new recovery that can be
      triggered between having given up the erp_lock and before calling
      zfcp_erp_action_cleanup() within zfcp_erp_strategy().  We must avoid the
      following sequence:
      
      ERP thread                 rport_work      other context
      -------------------------  --------------  --------------------------------
      port is unblocked, rport still blocked,
       due to pending/running ERP action,
       so ((port->status & ...UNBLOCK) != 0)
       and (port->rport == NULL)
      unlock ERP
      zfcp_erp_action_cleanup()
      case ZFCP_ERP_ACTION_REOPEN_LUN:
      zfcp_erp_try_rport_unblock()
      ((status & ...UNBLOCK) != 0) [OLD!]
                                                 zfcp_erp_port_reopen()
                                                 lock ERP
                                                 zfcp_erp_port_block()
                                                 port->status clear ...UNBLOCK
                                                 unlock ERP
                                                 zfcp_scsi_schedule_rport_block()
                                                 port->rport_task = RPORT_DEL
                                                 queue_work(rport_work)
                                 zfcp_scsi_rport_work()
                                 (port->rport_task != RPORT_ADD)
                                 port->rport_task = RPORT_NONE
                                 zfcp_scsi_rport_block()
                                 if (!port->rport) return
      zfcp_scsi_schedule_rport_register()
      port->rport_task = RPORT_ADD
      queue_work(rport_work)
                                 zfcp_scsi_rport_work()
                                 (port->rport_task == RPORT_ADD)
                                 port->rport_task = RPORT_NONE
                                 zfcp_scsi_rport_register()
                                 (port->rport == NULL)
                                 rport = fc_remote_port_add()
                                 port->rport = rport;
      
      Now the rport was erroneously unblocked while the zfcp_port is blocked.
      This is another situation we want to avoid due to scsi_eh
      potential. This state would at least remain until the new recovery from
      the other context finished successfully, or potentially forever if it
      failed.  In order to close this race, we take the erp_lock inside
      zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or
      LUN.  With that, the possible corresponding rport state sequences would
      be: (unblock[ERP thread],block[other context]) if the ERP thread gets
      erp_lock first and still sees ((port->status & ...UNBLOCK) != 0),
      (block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock
      after the other context has already cleard ...UNBLOCK from port->status.
      
      Since checking fields of struct erp_action is unsafe because they could
      have been overwritten (re-used for new recovery) meanwhile, we only
      check status of zfcp_port and LUN since these are only changed under
      erp_lock elsewhere. Regarding the check of the proper status flags (port
      or port_forced are similar to the shown adapter recovery):
      
      [zfcp_erp_adapter_shutdown()]
      zfcp_erp_adapter_reopen()
       zfcp_erp_adapter_block()
        * clear UNBLOCK ---------------------------------------+
       zfcp_scsi_schedule_rports_block()                       |
       write_lock_irqsave(&adapter->erp_lock, flags);-------+  |
       zfcp_erp_action_enqueue()                            |  |
        zfcp_erp_setup_act()                                |  |
         * set ERP_INUSE -----------------------------------|--|--+
       write_unlock_irqrestore(&adapter->erp_lock, flags);--+  |  |
      .context-switch.                                         |  |
      zfcp_erp_thread()                                        |  |
       zfcp_erp_strategy()                                     |  |
        write_lock_irqsave(&adapter->erp_lock, flags);------+  |  |
        ...                                                 |  |  |
        zfcp_erp_strategy_check_target()                    |  |  |
         zfcp_erp_strategy_check_adapter()                  |  |  |
          zfcp_erp_adapter_unblock()                        |  |  |
           * set UNBLOCK -----------------------------------|--+  |
        zfcp_erp_action_dequeue()                           |     |
         * clear ERP_INUSE ---------------------------------|-----+
        ...                                                 |
        write_unlock_irqrestore(&adapter->erp_lock, flags);-+
      
      Hence, we should check for both UNBLOCK and ERP_INUSE because they are
      interleaved.  Also we need to explicitly check ERP_FAILED for the link
      down case which currently does not clear the UNBLOCK flag in
      zfcp_fsf_link_down_info_eval().
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: 8830271c ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport")
      Fixes: a2fa0aed ("[SCSI] zfcp: Block FC transport rports early on errors")
      Fixes: 5f852be9 ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
      Fixes: 338151e0 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
      Fixes: 3859f6a2 ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d675dff
    • Steffen Maier's avatar
      scsi: zfcp: do not trace pure benign residual HBA responses at default level · 057fe03d
      Steffen Maier authored
      commit 56d23ed7 upstream.
      
      Since quite a while, Linux issues enough SCSI commands per scsi_device
      which successfully return with FCP_RESID_UNDER, FSF_FCP_RSP_AVAILABLE,
      and SAM_STAT_GOOD.  This floods the HBA trace area and we cannot see
      other and important HBA trace records long enough.
      
      Therefore, do not trace HBA response errors for pure benign residual
      under counts at the default trace level.
      
      This excludes benign residual under count combined with other validity
      bits set in FCP_RSP_IU, such as FCP_SNS_LEN_VAL.  For all those other
      cases, we still do want to see both the HBA record and the corresponding
      SCSI record by default.
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: a54ca0f6 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      057fe03d
    • Benjamin Block's avatar
      scsi: zfcp: fix use-after-"free" in FC ingress path after TMF · 5cebfea8
      Benjamin Block authored
      commit dac37e15 upstream.
      
      When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and
      eh_target_reset_handler(), it expects us to relent the ownership over
      the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN
      or target - when returning with SUCCESS from the callback ('release'
      them).  SCSI EH can then reuse those commands.
      
      We did not follow this rule to release commands upon SUCCESS; and if
      later a reply arrived for one of those supposed to be released commands,
      we would still make use of the scsi_cmnd in our ingress tasklet. This
      will at least result in undefined behavior or a kernel panic because of
      a wrong kernel pointer dereference.
      
      To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req
      *)->data in the matching scope if a TMF was successful. This is done
      under the locks (struct zfcp_adapter *)->abort_lock and (struct
      zfcp_reqlist *)->lock to prevent the requests from being removed from
      the request-hashtable, and the ingress tasklet from making use of the
      scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler().
      
      For cases where a reply arrives during SCSI EH, but before we get a
      chance to NULLify the pointer - but before we return from the callback
      -, we assume that the code is protected from races via the CAS operation
      in blk_complete_request() that is called in scsi_done().
      
      The following stacktrace shows an example for a crash resulting from the
      previous behavior:
      
      Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000
      Oops: 0038 [#1] SMP
      CPU: 2 PID: 0 Comm: swapper/2 Not tainted
      task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000
      Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40)
                 R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
      Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015
                 ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800
                 000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93
                 00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918
      Krnl Code: 00000000001156a2: a7190000        lghi    %r1,0
                 00000000001156a6: a7380015        lhi    %r3,21
                #00000000001156aa: e32050000008    ag    %r2,0(%r5)
                >00000000001156b0: 482022b0        lh    %r2,688(%r2)
                 00000000001156b4: ae123000        sigp    %r1,%r2,0(%r3)
                 00000000001156b8: b2220020        ipm    %r2
                 00000000001156bc: 8820001c        srl    %r2,28
                 00000000001156c0: c02700000001    xilf    %r2,1
      Call Trace:
      ([<0000000000000000>] 0x0)
       [<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp]
       [<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp]
       [<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp]
       [<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp]
       [<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio]
       [<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio]
       [<0000000000141fd4>] tasklet_action+0x9c/0x170
       [<0000000000141550>] __do_softirq+0xe8/0x258
       [<000000000010ce0a>] do_softirq+0xba/0xc0
       [<000000000014187c>] irq_exit+0xc4/0xe8
       [<000000000046b526>] do_IRQ+0x146/0x1d8
       [<00000000005d6a3c>] io_return+0x0/0x8
       [<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0
      ([<0000000000000000>] 0x0)
       [<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0
       [<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8
       [<0000000000114782>] smp_start_secondary+0xda/0xe8
       [<00000000005d6efe>] restart_int_handler+0x56/0x6c
       [<0000000000000000>] 0x0
      Last Breaking-Event-Address:
       [<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0
      Suggested-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Fixes: ea127f97 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git)
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5cebfea8