1. 27 Sep, 2017 40 commits
    • Jose Abreu's avatar
      ARC: Re-enable MMU upon Machine Check exception · 81306fc3
      Jose Abreu authored
      commit 1ee55a8f upstream.
      
      I recently came upon a scenario where I would get a double fault
      machine check exception tiriggered by a kernel module.
      However the ensuing crash stacktrace (ksym lookup) was not working
      correctly.
      
      Turns out that machine check auto-disables MMU while modules are allocated
      in kernel vaddr spapce.
      
      This patch re-enables the MMU before start printing the stacktrace
      making stacktracing of modules work upon a fatal exception.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Reviewed-by: default avatarAlexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      [vgupta: moved code into low level handler to avoid in 2 places]
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81306fc3
    • Baohong Liu's avatar
      tracing: Apply trace_clock changes to instance max buffer · d28e96be
      Baohong Liu authored
      commit 170b3b10 upstream.
      
      Currently trace_clock timestamps are applied to both regular and max
      buffers only for global trace. For instance trace, trace_clock
      timestamps are applied only to regular buffer. But, regular and max
      buffers can be swapped, for example, following a snapshot. So, for
      instance trace, bad timestamps can be seen following a snapshot.
      Let's apply trace_clock timestamps to instance max buffer as well.
      
      Link: http://lkml.kernel.org/r/ebdb168d0be042dcdf51f81e696b17fabe3609c1.1504642143.git.tom.zanussi@linux.intel.com
      
      Fixes: 277ba044 ("tracing: Add interface to allow multiple trace buffers")
      Signed-off-by: default avatarBaohong Liu <baohong.liu@intel.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d28e96be
    • Steven Rostedt (VMware)'s avatar
      ftrace: Fix selftest goto location on error · 753154fc
      Steven Rostedt (VMware) authored
      commit 46320a6a upstream.
      
      In the second iteration of trace_selftest_ops(), the error goto label is
      wrong in the case where trace_selftest_test_global_cnt is off. In the
      case of error, it leaks the dynamic ops that was allocated.
      
      Fixes: 95950c2e ("ftrace: Add self-tests for multiple function trace users")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      753154fc
    • Dan Carpenter's avatar
      scsi: qla2xxx: Fix an integer overflow in sysfs code · d8663aa2
      Dan Carpenter authored
      commit e6f77540 upstream.
      
      The value of "size" comes from the user.  When we add "start + size" it
      could lead to an integer overflow bug.
      
      It means we vmalloc() a lot more memory than we had intended.  I believe
      that on 64 bit systems vmalloc() can succeed even if we ask it to
      allocate huge 4GB buffers.  So we would get memory corruption and likely
      a crash when we call ha->isp_ops->write_optrom() and ->read_optrom().
      
      Only root can trigger this bug.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=194061
      
      Fixes: b7cc176c ("[SCSI] qla2xxx: Allow region-based flash-part accesses.")
      Reported-by: default avatarshqking <shqking@gmail.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d8663aa2
    • Hannes Reinecke's avatar
      scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE · 72896ca3
      Hannes Reinecke authored
      commit 3e009749 upstream.
      
      When calling SG_GET_REQUEST_TABLE ioctl only a half-filled table is
      returned; the remaining part will then contain stale kernel memory
      information.  This patch zeroes out the entire table to avoid this
      issue.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72896ca3
    • Hannes Reinecke's avatar
      scsi: sg: factor out sg_fill_request_table() · c04996ad
      Hannes Reinecke authored
      commit 4759df90 upstream.
      
      Factor out sg_fill_request_table() for better readability.
      
      [mkp: typos, applied by hand]
      Signed-off-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c04996ad
    • Dan Carpenter's avatar
      scsi: sg: off by one in sg_ioctl() · f0cd701d
      Dan Carpenter authored
      commit bd46fc40 upstream.
      
      If "val" is SG_MAX_QUEUE then we are one element beyond the end of the
      "rinfo" array so the > should be >=.
      
      Fixes: 109bade9 ("scsi: sg: use standard lists for sg_requests")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0cd701d
    • Hannes Reinecke's avatar
      scsi: sg: use standard lists for sg_requests · 3682e0c6
      Hannes Reinecke authored
      commit 109bade9 upstream.
      
      'Sg_request' is using a private list implementation; convert it to
      standard lists.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Tested-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3682e0c6
    • Hannes Reinecke's avatar
    • Long Li's avatar
      scsi: storvsc: fix memory leak on ring buffer busy · cf22210c
      Long Li authored
      commit 0208eeaa upstream.
      
      When storvsc is sending I/O to Hyper-v, it may allocate a bigger buffer
      descriptor for large data payload that can't fit into a pre-allocated
      buffer descriptor. This bigger buffer is freed on return path.
      
      If I/O request to Hyper-v fails due to ring buffer busy, the storvsc
      allocated buffer descriptor should also be freed.
      
      [mkp: applied by hand]
      
      Fixes: be0cf6ca ("scsi: storvsc: Set the tablesize based on the information given by the host")
      Signed-off-by: default avatarLong Li <longli@microsoft.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf22210c
    • Shivasharan S's avatar
      scsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE... · b4730f45
      Shivasharan S authored
      scsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE in case adapter is dead
      
      commit eb3fe263 upstream.
      
      After a kill adapter, since the cmd_status is not set, the IOCTLs will
      be hung in driver resulting in application hang.  Set cmd_status
      MFI_STAT_WRONG_STATE when completing pended IOCTLs.
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@broadcom.com>
      Signed-off-by: default avatarShivasharan S <shivasharan.srikanteshwara@broadcom.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarTomas Henzl <thenzl@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      b4730f45
    • Shivasharan S's avatar
    • Steffen Maier's avatar
      scsi: zfcp: trace high part of "new" 64 bit SCSI LUN · 4dd6cbbc
      Steffen Maier authored
      commit 5d4a3d0a upstream.
      
      Complements debugging aspects of the otherwise functionally complete
      v3.17 commit 9cb78c16 ("scsi: use 64-bit LUNs").
      
      While I don't have access to a target exporting 3 or 4 level LUNs,
      I did test it by explicitly attaching a non-existent fake 4 level LUN
      by means of zfcp sysfs attribute "unit_add".
      In order to see corresponding trace records of otherwise successful
      events, we had to increase the trace level of area SCSI and HBA to 6.
      
      $ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_scsi/level
      $ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_hba/level
      
      $ echo 0x4011402240334044 > \
        /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/unit_add
      
      Example output formatted by an updated zfcpdbf from the s390-tools
      package interspersed with kernel messages at scsi_logging_level=4605:
      
      Timestamp      : ...
      Area           : REC
      Subarea        : 00
      Level          : 1
      Exception      : -
      CPU ID         : ..
      Caller         : 0x...
      Record ID      : 1
      Tag            : scsla_1
      LUN            : 0x4011402240334044
      WWPN           : 0x50050763031bd327
      D_ID           : 0x00......
      Adapter status : 0x5400050b
      Port status    : 0x54000001
      LUN status     : 0x41000000
      Ready count    : 0x00000001
      Running count  : 0x00000000
      ERP want       : 0x01
      ERP need       : 0x01
      
      scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 1 length 36
      scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0
      
      Timestamp      : ...
      Area           : HBA
      Subarea        : 00
      Level          : 6
      Exception      : -
      CPU ID         : ..
      Caller         : 0x...
      Record ID      : 1
      Tag            : fs_norm
      Request ID     : 0x<inquiry2-req-id>
      Request status : 0x00000010
      FSF cmnd       : 0x00000001
      FSF sequence no: 0x...
      FSF issued     : ...
      FSF stat       : 0x00000000
      FSF stat qual  : 00000000 00000000 00000000 00000000
      Prot stat      : 0x00000001
      Prot stat qual : ........ ........ 00000000 00000000
      Port handle    : 0x...
      LUN handle     : 0x...
      |
      Timestamp      : ...
      Area           : SCSI
      Subarea        : 00
      Level          : 6
      Exception      : -
      CPU ID         : ..
      Caller         : 0x...
      Record ID      : 1
      Tag            : rsl_nor
      Request ID     : 0x<inquiry2-req-id>
      SCSI ID        : 0x00000000
      SCSI LUN       : 0x40224011
      SCSI LUN high  : 0x40444033 <=======================
      SCSI result    : 0x00000000
      SCSI retries   : 0x00
      SCSI allowed   : 0x03
      SCSI scribble  : 0x<inquiry2-req-id>
      SCSI opcode    : 12000000 a4000000 00000000 00000000
      FCP rsp inf cod: 0x00
      FCP rsp IU     : 00000000 00000000 00000000 00000000
                       00000000 00000000
      
      scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 2 length 164
      scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0
      scsi 2:0:0:4630896905707208721: scsi scan: peripheral device type of 31, \
      no device added
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: 9cb78c16 ("scsi: use 64-bit LUNs")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Reviewed-by: default avatarJens Remus <jremus@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4dd6cbbc
    • Steffen Maier's avatar
      scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response · 1e6c640a
      Steffen Maier authored
      commit fdb7cee3 upstream.
      
      At the default trace level, we only trace unsuccessful events including
      FSF responses.
      
      zfcp_dbf_hba_fsf_response() only used protocol status and FSF status to
      decide on an unsuccessful response. However, this is only one of multiple
      possible sources determining a failed struct zfcp_fsf_req.
      
      An FSF request can also "fail" if its response runs into an ERP timeout
      or if it gets dismissed because a higher level recovery was triggered
      [trace tags "erscf_1" or "erscf_2" in zfcp_erp_strategy_check_fsfreq()].
      FSF requests with ERP timeout are:
      FSF_QTCB_EXCHANGE_CONFIG_DATA, FSF_QTCB_EXCHANGE_PORT_DATA,
      FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT or
      FSF_QTCB_CLOSE_PHYSICAL_PORT for target ports,
      FSF_QTCB_OPEN_LUN, FSF_QTCB_CLOSE_LUN.
      One example is slow queue processing which can cause follow-on errors,
      e.g. FSF_PORT_ALREADY_OPEN after FSF_QTCB_OPEN_PORT_WITH_DID timed out.
      In order to see the root cause, we need to see late responses even if the
      channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.
      Example trace records formatted with zfcpdbf from the s390-tools package:
      
      Timestamp      : ...
      Area           : REC
      Subarea        : 00
      Level          : 1
      Exception      : -
      CPU ID         : ..
      Caller         : ...
      Record ID      : 1
      Tag            : fcegpf1
      LUN            : 0xffffffffffffffff
      WWPN           : 0x<WWPN>
      D_ID           : 0x00<D_ID>
      Adapter status : 0x5400050b
      Port status    : 0x41200000
      LUN status     : 0x00000000
      Ready count    : 0x00000001
      Running count  : 0x...
      ERP want       : 0x02				ZFCP_ERP_ACTION_REOPEN_PORT
      ERP need       : 0x02				ZFCP_ERP_ACTION_REOPEN_PORT
      |
      Timestamp      : ...				30 seconds later
      Area           : REC
      Subarea        : 00
      Level          : 1
      Exception      : -
      CPU ID         : ..
      Caller         : ...
      Record ID      : 2
      Tag            : erscf_2
      LUN            : 0xffffffffffffffff
      WWPN           : 0x<WWPN>
      D_ID           : 0x00<D_ID>
      Adapter status : 0x5400050b
      Port status    : 0x41200000
      LUN status     : 0x00000000
      Request ID     : 0x<request_ID>
      ERP status     : 0x10000000			ZFCP_STATUS_ERP_TIMEDOUT
      ERP step       : 0x0800				ZFCP_ERP_STEP_PORT_OPENING
      ERP action     : 0x02				ZFCP_ERP_ACTION_REOPEN_PORT
      ERP count      : 0x00
      |
      Timestamp      : ...				later than previous record
      Area           : HBA
      Subarea        : 00
      Level          : 5	> default level		=> 3	<= default level
      Exception      : -
      CPU ID         : 00
      Caller         : ...
      Record ID      : 1
      Tag            : fs_qtcb			=> fs_rerr
      Request ID     : 0x<request_ID>
      Request status : 0x00001010			ZFCP_STATUS_FSFREQ_DISMISSED
      						| ZFCP_STATUS_FSFREQ_CLEANUP
      FSF cmnd       : 0x00000005
      FSF sequence no: 0x...
      FSF issued     : ...				> 30 seconds ago
      FSF stat       : 0x00000000			FSF_GOOD
      FSF stat qual  : 00000000 00000000 00000000 00000000
      Prot stat      : 0x00000001			FSF_PROT_GOOD
      Prot stat qual : 00000000 00000000 00000000 00000000
      Port handle    : 0x...
      LUN handle     : 0x00000000
      QTCB log length: ...
      QTCB log info  : ...
      
      In case of problems detecting that new responses are waiting on the input
      queue, we sooner or later trigger adapter recovery due to an FSF request
      timeout (trace tag "fsrth_1").
      FSF requests with FSF request timeout are:
      typically FSF_QTCB_ABORT_FCP_CMND; but theoretically also
      FSF_QTCB_EXCHANGE_CONFIG_DATA or FSF_QTCB_EXCHANGE_PORT_DATA via sysfs,
      FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT for WKA ports,
      FSF_QTCB_FCP_CMND for task management function (LUN / target reset).
      One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
      because the channel filled in the response via DMA into the request's QTCB.
      
      In a theroretical case, inject code can create an erroneous FSF request
      on purpose. If data router is enabled, it uses deferred error reporting.
      A READ SCSI command can succeed with FSF_PROT_GOOD, FSF_GOOD, and
      SAM_STAT_GOOD. But on writing the read data to host memory via DMA,
      it can still fail, e.g. if an intentionally wrong scatter list does not
      provide enough space. Rather than getting an unsuccessful response,
      we get a QDIO activate check which in turn triggers adapter recovery.
      One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
      because the channel filled in the response via DMA into the request's QTCB.
      Example trace records formatted with zfcpdbf from the s390-tools package:
      
      Timestamp      : ...
      Area           : HBA
      Subarea        : 00
      Level          : 6	> default level		=> 3	<= default level
      Exception      : -
      CPU ID         : ..
      Caller         : ...
      Record ID      : 1
      Tag            : fs_norm			=> fs_rerr
      Request ID     : 0x<request_ID2>
      Request status : 0x00001010			ZFCP_STATUS_FSFREQ_DISMISSED
      						| ZFCP_STATUS_FSFREQ_CLEANUP
      FSF cmnd       : 0x00000001
      FSF sequence no: 0x...
      FSF issued     : ...
      FSF stat       : 0x00000000			FSF_GOOD
      FSF stat qual  : 00000000 00000000 00000000 00000000
      Prot stat      : 0x00000001			FSF_PROT_GOOD
      Prot stat qual : ........ ........ 00000000 00000000
      Port handle    : 0x...
      LUN handle     : 0x...
      |
      Timestamp      : ...
      Area           : SCSI
      Subarea        : 00
      Level          : 3
      Exception      : -
      CPU ID         : ..
      Caller         : ...
      Record ID      : 1
      Tag            : rsl_err
      Request ID     : 0x<request_ID2>
      SCSI ID        : 0x...
      SCSI LUN       : 0x...
      SCSI result    : 0x000e0000			DID_TRANSPORT_DISRUPTED
      SCSI retries   : 0x00
      SCSI allowed   : 0x05
      SCSI scribble  : 0x<request_ID2>
      SCSI opcode    : 28...				Read(10)
      FCP rsp inf cod: 0x00
      FCP rsp IU     : 00000000 00000000 00000000 00000000
                                               ^^	SAM_STAT_GOOD
                       00000000 00000000
      
      Only with luck in both above cases, we could see a follow-on trace record
      of an unsuccesful event following a successful but late FSF response with
      FSF_PROT_GOOD and FSF_GOOD. Typically this was the case for I/O requests
      resulting in a SCSI trace record "rsl_err" with DID_TRANSPORT_DISRUPTED
      [On ZFCP_STATUS_FSFREQ_DISMISSED, zfcp_fsf_protstatus_eval() sets
      ZFCP_STATUS_FSFREQ_ERROR seen by the request handler functions as failure].
      However, the reason for this follow-on trace was invisible because the
      corresponding HBA trace record was missing at the default trace level
      (by default hidden records with tags "fs_norm", "fs_qtcb", or "fs_open").
      
      On adapter recovery, after we had shut down the QDIO queues, we perform
      unsuccessful pseudo completions with flag ZFCP_STATUS_FSFREQ_DISMISSED
      for each pending FSF request in zfcp_fsf_req_dismiss_all().
      In order to find the root cause, we need to see all pseudo responses even
      if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.
      
      Therefore, check zfcp_fsf_req.status for ZFCP_STATUS_FSFREQ_DISMISSED
      or ZFCP_STATUS_FSFREQ_ERROR and trace with a new tag "fs_rerr".
      
      It does not matter that there are numerous places which set
      ZFCP_STATUS_FSFREQ_ERROR after the location where we trace an FSF response
      early. These cases are based on protocol status != FSF_PROT_GOOD or
      == FSF_PROT_FSF_STATUS_PRESENTED and are thus already traced by default
      as trace tag "fs_perr" or "fs_ferr" respectively.
      
      NB: The trace record with tag "fssrh_1" for status read buffers on dismiss
      all remains. zfcp_fsf_req_complete() handles this and returns early.
      All other FSF request types are handled separately and as described above.
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: 8a36e453 ("[SCSI] zfcp: enhancement of zfcp debug features")
      Fixes: 2e261af8 ("[SCSI] zfcp: Only collect FSF/HBA debug data for matching trace levels")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e6c640a
    • Steffen Maier's avatar
      scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records · 71948224
      Steffen Maier authored
      commit 12c3e575 upstream.
      
      If the FCP_RSP UI has optional parts (FCP_SNS_INFO or FCP_RSP_INFO) and
      thus does not fit into the fsp_rsp field built into a SCSI trace record,
      trace the full FCP_RSP UI with all optional parts as payload record
      instead of just FCP_SNS_INFO as payload and
      a 1 byte RSP_INFO_CODE part of FCP_RSP_INFO built into the SCSI record.
      
      That way we would also get the full FCP_SNS_INFO in case a
      target would ever send more than
      min(SCSI_SENSE_BUFFERSIZE==96, ZFCP_DBF_PAY_MAX_REC==256)==96.
      
      The mandatory part of FCP_RSP IU is only 24 bytes.
      PAYload costs at least one full PAY record of 256 bytes anyway.
      We cap to the hardware response size which is only FSF_FCP_RSP_SIZE==128.
      So we can just put the whole FCP_RSP IU with any optional parts into
      PAYload similarly as we do for SAN PAY since v4.9 commit aceeffbb
      ("zfcp: trace full payload of all SAN records (req,resp,iels)").
      This does not cause any additional trace records wasting memory.
      
      Decoded trace records were confusing because they showed a hard-coded
      sense data length of 96 even if the FCP_RSP_IU field FCP_SNS_LEN showed
      actually less.
      
      Since the same commit, we set pl_len for SAN traces to the full length of a
      request/response even if we cap the corresponding trace.
      In contrast, here for SCSI traces we set pl_len to the pre-computed
      length of FCP_RSP IU considering SNS_LEN or RSP_LEN if valid.
      Nonetheless we trace a hardcoded payload of length FSF_FCP_RSP_SIZE==128
      if there were optional parts.
      This makes it easier for the zfcpdbf tool to format only the relevant
      part of the long FCP_RSP UI buffer. And any trailing information is still
      available in the payload trace record just in case.
      
      Rename the payload record tag from "fcp_sns" to "fcp_riu" to make the new
      content explicit to zfcpdbf which can then pick a suitable field name such
      as "FCP rsp IU all:" instead of "Sense info :"
      Also, the same zfcpdbf can still be backwards compatible with "fcp_sns".
      
      Old example trace record before this fix, formatted with the tool zfcpdbf
      from s390-tools:
      
      Timestamp      : ...
      Area           : SCSI
      Subarea        : 00
      Level          : 3
      Exception      : -
      CPU id         : ..
      Caller         : 0x...
      Record id      : 1
      Tag            : rsl_err
      Request id     : 0x<request_id>
      SCSI ID        : 0x...
      SCSI LUN       : 0x...
      SCSI result    : 0x00000002
      SCSI retries   : 0x00
      SCSI allowed   : 0x05
      SCSI scribble  : 0x<request_id>
      SCSI opcode    : 00000000 00000000 00000000 00000000
      FCP rsp inf cod: 0x00
      FCP rsp IU     : 00000000 00000000 00000202 00000000
                                             ^^==FCP_SNS_LEN_VALID
                       00000020 00000000
                       ^^^^^^^^==FCP_SNS_LEN==32
      Sense len      : 96 <==min(SCSI_SENSE_BUFFERSIZE,ZFCP_DBF_PAY_MAX_REC)
      Sense info     : 70000600 00000018 00000000 29000000
                       00000400 00000000 00000000 00000000
                       00000000 00000000 00000000 00000000<==superfluous
                       00000000 00000000 00000000 00000000<==superfluous
                       00000000 00000000 00000000 00000000<==superfluous
                       00000000 00000000 00000000 00000000<==superfluous
      
      New example trace records with this fix:
      
      Timestamp      : ...
      Area           : SCSI
      Subarea        : 00
      Level          : 3
      Exception      : -
      CPU ID         : ..
      Caller         : 0x...
      Record ID      : 1
      Tag            : rsl_err
      Request ID     : 0x<request_id>
      SCSI ID        : 0x...
      SCSI LUN       : 0x...
      SCSI result    : 0x00000002
      SCSI retries   : 0x00
      SCSI allowed   : 0x03
      SCSI scribble  : 0x<request_id>
      SCSI opcode    : a30c0112 00000000 02000000 00000000
      FCP rsp inf cod: 0x00
      FCP rsp IU     : 00000000 00000000 00000a02 00000200
                       00000020 00000000
      FCP rsp IU len : 56
      FCP rsp IU all : 00000000 00000000 00000a02 00000200
                                             ^^=FCP_RESID_UNDER|FCP_SNS_LEN_VALID
                       00000020 00000000 70000500 00000018
                       ^^^^^^^^==FCP_SNS_LEN
                                         ^^^^^^^^^^^^^^^^^
                       00000000 240000cb 00011100 00000000
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                       00000000 00000000
                       ^^^^^^^^^^^^^^^^^==FCP_SNS_INFO
      
      Timestamp      : ...
      Area           : SCSI
      Subarea        : 00
      Level          : 1
      Exception      : -
      CPU ID         : ..
      Caller         : 0x...
      Record ID      : 1
      Tag            : lr_okay
      Request ID     : 0x<request_id>
      SCSI ID        : 0x...
      SCSI LUN       : 0x...
      SCSI result    : 0x00000000
      SCSI retries   : 0x00
      SCSI allowed   : 0x05
      SCSI scribble  : 0x<request_id>
      SCSI opcode    : <CDB of unrelated SCSI command passed to eh handler>
      FCP rsp inf cod: 0x00
      FCP rsp IU     : 00000000 00000000 00000100 00000000
                       00000000 00000008
      FCP rsp IU len : 32
      FCP rsp IU all : 00000000 00000000 00000100 00000000
                                             ^^==FCP_RSP_LEN_VALID
                       00000000 00000008 00000000 00000000
                                ^^^^^^^^==FCP_RSP_LEN
                                         ^^^^^^^^^^^^^^^^^==FCP_RSP_INFO
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: 250a1352 ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71948224
    • Steffen Maier's avatar
      scsi: zfcp: fix missing trace records for early returns in TMF eh handlers · d0fbe221
      Steffen Maier authored
      commit 1a5d999e upstream.
      
      For problem determination we need to see that we were in scsi_eh
      as well as whether and why we were successful or not.
      
      The following commits introduced new early returns without adding
      a trace record:
      
      v2.6.35 commit a1dbfddd
      ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
      on fc_block_scsi_eh() returning != 0 which is FAST_IO_FAIL,
      
      v2.6.30 commit 63caf367
      ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
      on not having gotten an FSF request after the maximum number of retry
      attempts and thus could not issue a TMF and has to return FAILED.
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: a1dbfddd ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
      Fixes: 63caf367 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0fbe221
    • Steffen Maier's avatar
      scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA · 1a847369
      Steffen Maier authored
      commit 9fe5d2b2 upstream.
      
      Without this fix we get SCSI trace records on task management functions
      which cannot be correlated to HBA trace records because all fields
      related to the FSF request are empty (zero).
      Also, the FCP_RSP_IU is missing as well as any sense data if available.
      
      This was caused by v2.6.14 commit 8a36e453 ("[SCSI] zfcp: enhancement
      of zfcp debug features") introducing trace records for TMFs but
      hard coding NULL for a possibly existing TMF FSF request.
      The scsi_cmnd scribble is also zero or unrelated for the TMF request
      so it also could not lookup a suitable FSF request from there.
      
      A broken example trace record formatted with zfcpdbf from the s390-tools
      package:
      
      Timestamp      : ...
      Area           : SCSI
      Subarea        : 00
      Level          : 1
      Exception      : -
      CPU ID         : ..
      Caller         : 0x...
      Record ID      : 1
      Tag            : lr_fail
      Request ID     : 0x0000000000000000
                         ^^^^^^^^^^^^^^^^ no correlation to HBA record
      SCSI ID        : 0x<scsitarget>
      SCSI LUN       : 0x<scsilun>
      SCSI result    : 0x000e0000
      SCSI retries   : 0x00
      SCSI allowed   : 0x05
      SCSI scribble  : 0x0000000000000000
      SCSI opcode    : 2a000017 3bb80000 08000000 00000000
      FCP rsp inf cod: 0x00
                         ^^ no TMF response
      FCP rsp IU     : 00000000 00000000 00000000 00000000
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                       00000000 00000000
                       ^^^^^^^^^^^^^^^^^ no interesting FCP_RSP_IU
      Sense len      : ...
      ^^^^^^^^^^^^^^^^^^^^ no sense data length
      Sense info     : ...
      ^^^^^^^^^^^^^^^^^^^^ no sense data content, even if present
      
      There are some true cases where we really do not have an FSF request:
      "rsl_fai" from zfcp_dbf_scsi_fail_send() called for early
      returns / completions in zfcp_scsi_queuecommand(),
      "abrt_or", "abrt_bl", "abrt_ru", "abrt_ar" from
      zfcp_scsi_eh_abort_handler() where we did not get as far,
      "lr_nres", "tr_nres" from zfcp_task_mgmt_function() where we're
      successful and do not need to do anything because adapter stopped.
      For these cases it's correct to pass NULL for fsf_req to _zfcp_dbf_scsi().
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: 8a36e453 ("[SCSI] zfcp: enhancement of zfcp debug features")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a847369
    • Steffen Maier's avatar
      scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records · 52661717
      Steffen Maier authored
      commit 975171b4 upstream.
      
      v4.9 commit aceeffbb ("zfcp: trace full payload of all SAN records
      (req,resp,iels)") fixed trace data loss of 2.6.38 commit 2c55b750
      ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
      necessary for problem determination, e.g. to see the
      currently active zone set during automatic port scan.
      
      While it already saves space by not dumping any empty residual entries
      of the large successful GPN_FT response (4 pages), there are seldom cases
      where the GPN_FT response is unsuccessful and likely does not have
      FC_NS_FID_LAST set in fp_flags so we did not cap the trace record.
      We typically see such case for an initiator WWPN, which is not in any zone.
      
      Cap unsuccessful responses to at least the actual basic CT_IU response
      plus whatever fits the SAN trace record built-in "payload" buffer
      just in case there's trailing information
      of which we would at least see the existence and its beginning.
      
      In order not to erroneously cap successful responses, we need to swap
      calling the trace function and setting the CT / ELS status to success (0).
      
      Example trace record pair formatted with zfcpdbf:
      
      Timestamp      : ...
      Area           : SAN
      Subarea        : 00
      Level          : 1
      Exception      : -
      CPU ID         : ..
      Caller         : 0x...
      Record ID      : 1
      Tag            : fssct_1
      Request ID     : 0x<request_id>
      Destination ID : 0x00fffffc
      SAN req short  : 01000000 fc020000 01720ffc 00000000
                       00000008
      SAN req length : 20
      |
      Timestamp      : ...
      Area           : SAN
      Subarea        : 00
      Level          : 1
      Exception      : -
      CPU ID         : ..
      Caller         : 0x...
      Record ID      : 2
      Tag            : fsscth2
      Request ID     : 0x<request_id>
      Destination ID : 0x00fffffc
      SAN resp short : 01000000 fc020000 80010000 00090700
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
      SAN resp length: 16384
      San resp info  : 01000000 fc020000 80010000 00090700
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
                       00000000 00000000 00000000 00000000 [trailing info]
      
      The fix saves all but one of the previously associated 64 PAYload trace
      record chunks of size 256 bytes each.
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: aceeffbb ("zfcp: trace full payload of all SAN records (req,resp,iels)")
      Fixes: 2c55b750 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52661717
    • Benjamin Block's avatar
      scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path · d0c02c6f
      Benjamin Block authored
      commit a099b7b1 upstream.
      
      Up until now zfcp would just ignore the FCP_RESID_OVER flag in the FCP
      response IU. When this flag is set, it is possible, in regards to the
      FCP standard, that the storage-server processes the command normally, up
      to the point where data is missing and simply ignores those.
      
      In this case no CHECK CONDITION would be set, and because we ignored the
      FCP_RESID_OVER flag we resulted in at least a data loss or even
      -corruption as a follow-up error, depending on how the
      applications/layers on top behave. To prevent this, we now set the
      host-byte of the corresponding scsi_cmnd to DID_ERROR.
      
      Other storage-behaviors, where the same condition results in a CHECK
      CONDITION set in the answer, don't need to be changed as they are
      handled in the mid-layer already.
      
      Following is an example trace record decoded with zfcpdbf from the
      s390-tools package. We forcefully injected a fc_dl which is one byte too
      small:
      
      Timestamp      : ...
      Area           : SCSI
      Subarea        : 00
      Level          : 3
      Exception      : -
      CPU ID         : ..
      Caller         : 0x...
      Record ID      : 1
      Tag            : rsl_err
      Request ID     : 0x...
      SCSI ID        : 0x...
      SCSI LUN       : 0x...
      SCSI result    : 0x00070000
                           ^^DID_ERROR
      SCSI retries   : 0x..
      SCSI allowed   : 0x..
      SCSI scribble  : 0x...
      SCSI opcode    : 2a000000 00000000 08000000 00000000
      FCP rsp inf cod: 0x00
      FCP rsp IU     : 00000000 00000000 00000400 00000001
                                             ^^fr_flags==FCP_RESID_OVER
                                               ^^fr_status==SAM_STAT_GOOD
                                                  ^^^^^^^^fr_resid
                       00000000 00000000
      
      As of now, we don't actively handle to possibility that a response IU
      has both flags - FCP_RESID_OVER and FCP_RESID_UNDER - set at once.
      Reported-by: default avatarLuke M. Hopkins <lmhopkin@us.ibm.com>
      Reviewed-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: 553448f6 ("[SCSI] zfcp: Message cleanup")
      Fixes: ea127f97 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git)
      Signed-off-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0c02c6f
    • Steffen Maier's avatar
      scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled · cfc49967
      Steffen Maier authored
      commit 71b8e45d upstream.
      
      Since commit db007fc5 ("[SCSI] Command protection operation"),
      scsi_eh_prep_cmnd() saves scmd->prot_op and temporarily resets it to
      SCSI_PROT_NORMAL.
      Other FCP LLDDs such as qla2xxx and lpfc shield their queuecommand()
      to only access any of scsi_prot_sg...() if
      (scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL).
      
      Do the same thing for zfcp, which introduced DIX support with
      commit ef3eb71d ("[SCSI] zfcp: Introduce experimental support for
      DIF/DIX").
      
      Otherwise, TUR SCSI commands as part of scsi_eh likely fail in zfcp,
      because the regular SCSI command with DIX protection data, that scsi_eh
      re-uses in scsi_send_eh_cmnd(), of course still has
      (scsi_prot_sg_count() != 0) and so zfcp sends down bogus requests to the
      FCP channel hardware.
      
      This causes scsi_eh_test_devices() to have (finish_cmds == 0)
      [not SCSI device is online or not scsi_eh_tur() failed]
      so regular SCSI commands, that caused / were affected by scsi_eh,
      are moved to work_q and scsi_eh_test_devices() itself returns false.
      In turn, it unnecessarily escalates in our case in scsi_eh_ready_devs()
      beyond host reset to finally scsi_eh_offline_sdevs()
      which sets affected SCSI devices offline with the following kernel message:
      
      "kernel: sd H:0:T:L: Device offlined - not ready after error recovery"
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: ef3eb71d ("[SCSI] zfcp: Introduce experimental support for DIF/DIX")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfc49967
    • Bart Van Assche's avatar
      skd: Submit requests to firmware before triggering the doorbell · 19978c50
      Bart Van Assche authored
      commit 5fbd545c upstream.
      
      Ensure that the members of struct skd_msg_buf have been transferred
      to the PCIe adapter before the doorbell is triggered. This patch
      avoids that I/O fails sporadically and that the following error
      message is reported:
      
      (skd0:STM000196603:[0000:00:09.0]): Completion mismatch comp_id=0x0000 skreq=0x0400 new=0x0000
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19978c50
    • Bart Van Assche's avatar
      skd: Avoid that module unloading triggers a use-after-free · 0bcaf517
      Bart Van Assche authored
      commit 7277cc67 upstream.
      
      Since put_disk() triggers a disk_release() call and since that
      last function calls blk_put_queue() if disk->queue != NULL, clear
      the disk->queue pointer before calling put_disk(). This avoids
      that unloading the skd kernel module triggers the following
      use-after-free:
      
      WARNING: CPU: 8 PID: 297 at lib/refcount.c:128 refcount_sub_and_test+0x70/0x80
      refcount_t: underflow; use-after-free.
      CPU: 8 PID: 297 Comm: kworker/8:1 Not tainted 4.11.10-300.fc26.x86_64 #1
      Workqueue: events work_for_cpu_fn
      Call Trace:
       dump_stack+0x63/0x84
       __warn+0xcb/0xf0
       warn_slowpath_fmt+0x5a/0x80
       refcount_sub_and_test+0x70/0x80
       refcount_dec_and_test+0x11/0x20
       kobject_put+0x1f/0x50
       blk_put_queue+0x15/0x20
       disk_release+0xae/0xf0
       device_release+0x32/0x90
       kobject_release+0x67/0x170
       kobject_put+0x2b/0x50
       put_disk+0x17/0x20
       skd_destruct+0x5c/0x890 [skd]
       skd_pci_probe+0x124d/0x13a0 [skd]
       local_pci_probe+0x42/0xa0
       work_for_cpu_fn+0x14/0x20
       process_one_work+0x19e/0x470
       worker_thread+0x1dc/0x4a0
       kthread+0x125/0x140
       ret_from_fork+0x25/0x30
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bcaf517
    • NeilBrown's avatar
      md/bitmap: disable bitmap_resize for file-backed bitmaps. · f05dafbd
      NeilBrown authored
      commit e8a27f83 upstream.
      
      bitmap_resize() does not work for file-backed bitmaps.
      The buffer_heads are allocated and initialized when
      the bitmap is read from the file, but resize doesn't
      read from the file, it loads from the internal bitmap.
      When it comes time to write the new bitmap, the bh is
      non-existent and we crash.
      
      The common case when growing an array involves making the array larger,
      and that normally means making the bitmap larger.  Doing
      that inside the kernel is possible, but would need more code.
      It is probably easier to require people who use file-backed
      bitmaps to remove them and re-add after a reshape.
      
      So this patch disables the resizing of arrays which have
      file-backed bitmaps.  This is better than crashing.
      Reported-by: default avatarZhilong Liu <zlliu@suse.com>
      Fixes: d60b479d ("md/bitmap: add bitmap_resize function to allow bitmap resizing.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f05dafbd
    • Bart Van Assche's avatar
      block: Relax a check in blk_start_queue() · 30e81e7f
      Bart Van Assche authored
      commit 4ddd56b0 upstream.
      
      Calling blk_start_queue() from interrupt context with the queue
      lock held and without disabling IRQs, as the skd driver does, is
      safe. This patch avoids that loading the skd driver triggers the
      following warning:
      
      WARNING: CPU: 11 PID: 1348 at block/blk-core.c:283 blk_start_queue+0x84/0xa0
      RIP: 0010:blk_start_queue+0x84/0xa0
      Call Trace:
       skd_unquiesce_dev+0x12a/0x1d0 [skd]
       skd_complete_internal+0x1e7/0x5a0 [skd]
       skd_complete_other+0xc2/0xd0 [skd]
       skd_isr_completion_posted.isra.30+0x2a5/0x470 [skd]
       skd_isr+0x14f/0x180 [skd]
       irq_forced_thread_fn+0x2a/0x70
       irq_thread+0x144/0x1a0
       kthread+0x125/0x140
       ret_from_fork+0x2a/0x40
      
      Fixes: commit a038e253 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Andrew Morton <akpm@osdl.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30e81e7f
    • Michael Ellerman's avatar
      powerpc: Fix DAR reporting when alignment handler faults · a918d325
      Michael Ellerman authored
      commit f9effe92 upstream.
      
      Anton noticed that if we fault part way through emulating an unaligned
      instruction, we don't update the DAR to reflect that.
      
      The DAR value is eventually reported back to userspace as the address
      in the SEGV signal, and if userspace is using that value to demand
      fault then it can be confused by us not setting the value correctly.
      
      This patch is ugly as hell, but is intended to be the minimal fix and
      back ports easily.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a918d325
    • zhangyi (F)'s avatar
      ext4: fix quota inconsistency during orphan cleanup for read-only mounts · c53f0169
      zhangyi (F) authored
      commit 95f1fda4 upstream.
      
      Quota does not get enabled for read-only mounts if filesystem
      has quota feature, so that quotas cannot updated during orphan
      cleanup, which will lead to quota inconsistency.
      
      This patch turn on quotas during orphan cleanup for this case,
      make sure quotas can be updated correctly.
      Reported-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c53f0169
    • zhangyi (F)'s avatar
      ext4: fix incorrect quotaoff if the quota feature is enabled · cd46241e
      zhangyi (F) authored
      commit b0a5a958 upstream.
      
      Current ext4 quota should always "usage enabled" if the
      quota feautre is enabled. But in ext4_orphan_cleanup(), it
      turn quotas off directly (used for the older journaled
      quota), so we cannot turn it on again via "quotaon" unless
      umount and remount ext4.
      
      Simple reproduce:
      
        mkfs.ext4 -O project,quota /dev/vdb1
        mount -o prjquota /dev/vdb1 /mnt
        chattr -p 123 /mnt
        chattr +P /mnt
        touch /mnt/aa /mnt/bb
        exec 100<>/mnt/aa
        rm -f /mnt/aa
        sync
        echo c > /proc/sysrq-trigger
      
        #reboot and mount
        mount -o prjquota /dev/vdb1 /mnt
        #query status
        quotaon -Ppv /dev/vdb1
        #output
        quotaon: Cannot find mountpoint for device /dev/vdb1
        quotaon: No correct mountpoint specified.
      
      This patch add check for journaled quotas to avoid incorrect
      quotaoff when ext4 has quota feautre.
      Signed-off-by: default avatarzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd46241e
    • Stephan Mueller's avatar
      crypto: AF_ALG - remove SGL terminator indicator when chaining · 5e9d28b0
      Stephan Mueller authored
      Fixed differently upstream as commit 2d97591e ("crypto: af_alg - consolidation of duplicate code")
      
      The SGL is MAX_SGL_ENTS + 1 in size. The last SG entry is used for the
      chaining and is properly updated with the sg_chain invocation. During
      the filling-in of the initial SG entries, sg_mark_end is called for each
      SG entry. This is appropriate as long as no additional SGL is chained
      with the current SGL. However, when a new SGL is chained and the last
      SG entry is updated with sg_chain, the last but one entry still contains
      the end marker from the sg_mark_end. This end marker must be removed as
      otherwise a walk of the chained SGLs will cause a NULL pointer
      dereference at the last but one SG entry, because sg_next will return
      NULL.
      
      The patch only applies to all kernels up to and including 4.13. The
      patch 2d97591e added to 4.14-rc1
      introduced a complete new code base which addresses this bug in
      a different way. Yet, that patch is too invasive for stable kernels
      and was therefore not marked for stable.
      
      Fixes: 8ff59090 ("crypto: algif_skcipher - User-space interface for skcipher operations")
      Signed-off-by: default avatarStephan Mueller <smueller@chronox.de>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e9d28b0
    • Aleksandar Markovic's avatar
      MIPS: math-emu: MINA.<D|S>: Fix some cases of infinity and zero inputs · 9354f4d0
      Aleksandar Markovic authored
      commit 304bfe47 upstream.
      
      Fix following special cases for MINA>.<D|S>:
      
        - if one of the inputs is zero, and the other is subnormal, normal,
          or infinity, the  value of the former should be returned (that is,
          a zero).
        - if one of the inputs is infinity, and the other input is normal,
          or subnormal, the value of the latter should be returned.
      
      The previous implementation's logic for such cases was incorrect - it
      appears as if it implements MAXA, and not MINA instruction.
      
      A relevant example:
      
      MINA.S fd,fs,ft:
        If fs contains 100.0, and ft contains 0.0, fd is going to contain
        0.0 (without this patch, it used to contain 100.0).
      
      Fixes: a79f5f9b ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
      Fixes: 4e9561b2 ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")
      Signed-off-by: default avatarMiodrag Dinic <miodrag.dinic@imgtec.com>
      Signed-off-by: default avatarGoran Ferenc <goran.ferenc@imgtec.com>
      Signed-off-by: default avatarAleksandar Markovic <aleksandar.markovic@imgtec.com>
      Reviewed-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Bo Hu <bohu@google.com>
      Cc: Douglas Leung <douglas.leung@imgtec.com>
      Cc: Jin Qian <jinqian@google.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
      Cc: Raghu Gandham <raghu.gandham@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/16885/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9354f4d0
    • Aleksandar Markovic's avatar
      MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of both infinite inputs · f4d77fc7
      Aleksandar Markovic authored
      commit 3444c4eb upstream.
      
      Fix the value returned by <MAXA|MINA>.<D|S> fd,fs,ft, if both inputs
      are infinite. The previous implementation returned always the value
      contained in ft in such cases. The correct behavior is specified
      in Mips instruction set manual and is as follows:
      
          fs    ft        MAXA     MINA
        ---------------------------------
          inf   inf        inf      inf
          inf  -inf        inf     -inf
         -inf   inf        inf     -inf
         -inf  -inf       -inf     -inf
      
      A relevant example:
      
      MAXA.S fd,fs,ft:
        If fs contains +inf, and ft contains -inf, fd is going to contain
        +inf (without this patch, it used to contain -inf).
      
      Fixes: a79f5f9b ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
      Fixes: 4e9561b2 ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")
      Signed-off-by: default avatarMiodrag Dinic <miodrag.dinic@imgtec.com>
      Signed-off-by: default avatarGoran Ferenc <goran.ferenc@imgtec.com>
      Signed-off-by: default avatarAleksandar Markovic <aleksandar.markovic@imgtec.com>
      Reviewed-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Bo Hu <bohu@google.com>
      Cc: Douglas Leung <douglas.leung@imgtec.com>
      Cc: Jin Qian <jinqian@google.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
      Cc: Raghu Gandham <raghu.gandham@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/16884/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4d77fc7
    • Aleksandar Markovic's avatar
      MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of input values with opposite signs · 322bf697
      Aleksandar Markovic authored
      commit 1a41b3b4 upstream.
      
      Fix the value returned by <MAXA|MINA>.<D|S>, if the inputs are normal
      fp numbers of the same absolute value, but opposite signs.
      
      A relevant example:
      
      MAXA.S fd,fs,ft:
        If fs contains -3.0, and ft contains +3.0, fd is going to contain
        +3.0 (without this patch, it used to contain -3.0).
      
      Fixes: a79f5f9b ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
      Fixes: 4e9561b2 ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")
      Signed-off-by: default avatarMiodrag Dinic <miodrag.dinic@imgtec.com>
      Signed-off-by: default avatarGoran Ferenc <goran.ferenc@imgtec.com>
      Signed-off-by: default avatarAleksandar Markovic <aleksandar.markovic@imgtec.com>
      Reviewed-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Bo Hu <bohu@google.com>
      Cc: Douglas Leung <douglas.leung@imgtec.com>
      Cc: Jin Qian <jinqian@google.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
      Cc: Raghu Gandham <raghu.gandham@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/16883/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      322bf697
    • Aleksandar Markovic's avatar
      MIPS: math-emu: <MAX|MIN>.<D|S>: Fix cases of both inputs negative · a83ffb58
      Aleksandar Markovic authored
      commit aabf5cf0 upstream.
      
      Fix the value returned by <MAX|MIN>.<D|S>, if both inputs are negative
      normal fp numbers. The previous logic did not take into account that
      if both inputs have the same sign, there should be separate treatment
      of the cases when both inputs are negative and when both inputs are
      positive.
      
      A relevant example:
      
      MAX.S fd,fs,ft:
        If fs contains -5.0, and ft contains -7.0, fd is going to contain
        -5.0 (without this patch, it used to contain -7.0).
      
      Fixes: a79f5f9b ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
      Fixes: 4e9561b2 ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")
      Signed-off-by: default avatarMiodrag Dinic <miodrag.dinic@imgtec.com>
      Signed-off-by: default avatarGoran Ferenc <goran.ferenc@imgtec.com>
      Signed-off-by: default avatarAleksandar Markovic <aleksandar.markovic@imgtec.com>
      Reviewed-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Bo Hu <bohu@google.com>
      Cc: Douglas Leung <douglas.leung@imgtec.com>
      Cc: Jin Qian <jinqian@google.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
      Cc: Raghu Gandham <raghu.gandham@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/16882/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a83ffb58
    • Aleksandar Markovic's avatar
      MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix cases of both inputs zero · 6acd1d26
      Aleksandar Markovic authored
      commit 15560a58 upstream.
      
      Fix the value returned by <MAX|MAXA|MIN|MINA>.<D|S>, if both inputs
      are zeros. The right behavior in such cases is stated in instruction
      reference manual and is as follows:
      
         fs  ft       MAX     MIN       MAXA    MINA
        ---------------------------------------------
          0   0        0       0         0       0
          0  -0        0      -0         0      -0
         -0   0        0      -0         0      -0
         -0  -0       -0      -0        -0      -0
      
      Prior to this patch, some of the above cases were yielding correct
      results. However, for the sake of code consistency, all such cases
      are rewritten in this patch.
      
      A relevant example:
      
      MAX.S fd,fs,ft:
        If fs contains +0.0, and ft contains -0.0, fd is going to contain
        +0.0 (without this patch, it used to contain -0.0).
      
      Fixes: a79f5f9b ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
      Fixes: 4e9561b2 ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")
      Signed-off-by: default avatarMiodrag Dinic <miodrag.dinic@imgtec.com>
      Signed-off-by: default avatarGoran Ferenc <goran.ferenc@imgtec.com>
      Signed-off-by: default avatarAleksandar Markovic <aleksandar.markovic@imgtec.com>
      Reviewed-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Bo Hu <bohu@google.com>
      Cc: Douglas Leung <douglas.leung@imgtec.com>
      Cc: Jin Qian <jinqian@google.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
      Cc: Raghu Gandham <raghu.gandham@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/16881/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6acd1d26
    • Aleksandar Markovic's avatar
      MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix quiet NaN propagation · b6c818d8
      Aleksandar Markovic authored
      commit e78bf0dc upstream.
      
      Fix the value returned by <MAX|MAXA|MIN|MINA>.<D|S> fd,fs,ft, if both
      inputs are quiet NaNs. The <MAX|MAXA|MIN|MINA>.<D|S> specifications
      state that the returned value in such cases should be the quiet NaN
      contained in register fs.
      
      A relevant example:
      
      MAX.S fd,fs,ft:
        If fs contains qNaN1, and ft contains qNaN2, fd is going to contain
        qNaN1 (without this patch, it used to contain qNaN2).
      
      Fixes: a79f5f9b ("MIPS: math-emu: Add support for the MIPS R6 MAX{, A} FPU instruction")
      Fixes: 4e9561b2 ("MIPS: math-emu: Add support for the MIPS R6 MIN{, A} FPU instruction")
      Signed-off-by: default avatarMiodrag Dinic <miodrag.dinic@imgtec.com>
      Signed-off-by: default avatarGoran Ferenc <goran.ferenc@imgtec.com>
      Signed-off-by: default avatarAleksandar Markovic <aleksandar.markovic@imgtec.com>
      Reviewed-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Bo Hu <bohu@google.com>
      Cc: Douglas Leung <douglas.leung@imgtec.com>
      Cc: Jin Qian <jinqian@google.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Petar Jovanovic <petar.jovanovic@imgtec.com>
      Cc: Raghu Gandham <raghu.gandham@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/16880/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6c818d8
    • Kai-Heng Feng's avatar
      Input: i8042 - add Gigabyte P57 to the keyboard reset table · bf592dde
      Kai-Heng Feng authored
      commit 697c5d8a upstream.
      
      Similar to other Gigabyte laptops, the touchpad on P57 requires a
      keyboard reset to detect Elantech touchpad correctly.
      
      BugLink: https://bugs.launchpad.net/bugs/1594214Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf592dde
    • Arnd Bergmann's avatar
      tty: fix __tty_insert_flip_char regression · c13c5c7e
      Arnd Bergmann authored
      commit 8a5a90a2 upstream.
      
      Sergey noticed a small but fatal mistake in __tty_insert_flip_char,
      leading to an oops in an interrupt handler when using any serial
      port.
      
      The problem is that I accidentally took the tty_buffer pointer
      before calling __tty_buffer_request_room(), which replaces the
      buffer. This moves the pointer lookup to the right place after
      allocating the new buffer space.
      
      Fixes: 979990c6 ("tty: improve tty_insert_flip_char() fast path")
      Reported-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Tested-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c13c5c7e
    • Arnd Bergmann's avatar
      tty: improve tty_insert_flip_char() slow path · 077933dc
      Arnd Bergmann authored
      commit 065ea0a7 upstream.
      
      While working on improving the fast path of tty_insert_flip_char(),
      I noticed that by calling tty_buffer_request_room(), we needlessly
      move to the separate flag buffer mode for the tty, even when all
      characters use TTY_NORMAL as the flag.
      
      This changes the code to call __tty_buffer_request_room() with the
      correct flag, which will then allocate a regular buffer when it rounds
      out of space but no special flags have been used. I'm guessing that
      this is the behavior that Peter Hurley intended when he introduced
      the compacted flip buffers.
      
      Fixes: acc0f67f ("tty: Halve flip buffer GFP_ATOMIC memory consumption")
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      077933dc
    • Arnd Bergmann's avatar
      tty: improve tty_insert_flip_char() fast path · e1e6620f
      Arnd Bergmann authored
      commit 979990c6 upstream.
      
      kernelci.org reports a crazy stack usage for the VT code when CONFIG_KASAN
      is enabled:
      
      drivers/tty/vt/keyboard.c: In function 'kbd_keycode':
      drivers/tty/vt/keyboard.c:1452:1: error: the frame size of 2240 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
      
      The problem is that tty_insert_flip_char() gets inlined many times into
      kbd_keycode(), and also into other functions, and each copy requires 128
      bytes for stack redzone to check for a possible out-of-bounds access on
      the 'ch' and 'flags' arguments that are passed into
      tty_insert_flip_string_flags as a variable-length string.
      
      This introduces a new __tty_insert_flip_char() function for the slow
      path, which receives the two arguments by value. This completely avoids
      the problem and the stack usage goes back down to around 100 bytes.
      
      Without KASAN, this is also slightly better, as we don't have to
      spill the arguments to the stack but can simply pass 'ch' and 'flag'
      in registers, saving a few bytes in .text for each call site.
      
      This should be backported to linux-4.0 or later, which first introduced
      the stack sanitizer in the kernel.
      
      Fixes: c420f167 ("kasan: enable stack instrumentation")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1e6620f
    • Minchan Kim's avatar
      mm: prevent double decrease of nr_reserved_highatomic · c576160f
      Minchan Kim authored
      commit 4855e4a7 upstream.
      
      There is race between page freeing and unreserved highatomic.
      
       CPU 0				    CPU 1
      
          free_hot_cold_page
            mt = get_pfnblock_migratetype
            set_pcppage_migratetype(page, mt)
          				    unreserve_highatomic_pageblock
          				    spin_lock_irqsave(&zone->lock)
          				    move_freepages_block
          				    set_pageblock_migratetype(page)
          				    spin_unlock_irqrestore(&zone->lock)
            free_pcppages_bulk
              __free_one_page(mt) <- mt is stale
      
      By above race, a page on CPU 0 could go non-highorderatomic free list
      since the pageblock's type is changed.  By that, unreserve logic of
      highorderatomic can decrease reserved count on a same pageblock severak
      times and then it will make mismatch between nr_reserved_highatomic and
      the number of reserved pageblock.
      
      So, this patch verifies whether the pageblock is highatomic or not and
      decrease the count only if the pageblock is highatomic.
      
      Link: http://lkml.kernel.org/r/1476259429-18279-3-git-send-email-minchan@kernel.orgSigned-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Sangseok Lee <sangseok.lee@lge.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Miles Chen <miles.chen@mediatek.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c576160f
    • Chuck Lever's avatar
      nfsd: Fix general protection fault in release_lock_stateid() · 6ea627b2
      Chuck Lever authored
      commit f46c445b upstream.
      
      When I push NFSv4.1 / RDMA hard, (xfstests generic/089, for example),
      I get this crash on the server:
      
      Oct 28 22:04:30 klimt kernel: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      Oct 28 22:04:30 klimt kernel: Modules linked in: cts rpcsec_gss_krb5 iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btrfs irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd xor pcspkr raid6_pq i2c_i801 i2c_smbus lpc_ich mfd_core sg mei_me mei ioatdma shpchp wmi ipmi_si ipmi_msghandler rpcrdma ib_ipoib rdma_ucm acpi_power_meter acpi_pad ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb ahci libahci ptp mlx4_core pps_core dca libata i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
      Oct 28 22:04:30 klimt kernel: CPU: 7 PID: 1558 Comm: nfsd Not tainted 4.9.0-rc2-00005-g82cd754 #8
      Oct 28 22:04:30 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
      Oct 28 22:04:30 klimt kernel: task: ffff880835c3a100 task.stack: ffff8808420d8000
      Oct 28 22:04:30 klimt kernel: RIP: 0010:[<ffffffffa05a759f>]  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
      Oct 28 22:04:30 klimt kernel: RSP: 0018:ffff8808420dbce0  EFLAGS: 00010246
      Oct 28 22:04:30 klimt kernel: RAX: ffff88084e6660f0 RBX: ffff88084e667020 RCX: 0000000000000000
      Oct 28 22:04:30 klimt kernel: RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88084e667020
      Oct 28 22:04:30 klimt kernel: RBP: ffff8808420dbcf8 R08: 0000000000000001 R09: 0000000000000000
      Oct 28 22:04:30 klimt kernel: R10: ffff880835c3a100 R11: ffff880835c3aca8 R12: 6b6b6b6b6b6b6b6b
      Oct 28 22:04:30 klimt kernel: R13: ffff88084e6670d8 R14: ffff880835f546f0 R15: ffff880835f1c548
      Oct 28 22:04:30 klimt kernel: FS:  0000000000000000(0000) GS:ffff88087bdc0000(0000) knlGS:0000000000000000
      Oct 28 22:04:30 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Oct 28 22:04:30 klimt kernel: CR2: 00007ff020389000 CR3: 0000000001c06000 CR4: 00000000001406e0
      Oct 28 22:04:30 klimt kernel: Stack:
      Oct 28 22:04:30 klimt kernel: ffff88084e667020 0000000000000000 ffff88084e6670d8 ffff8808420dbd20
      Oct 28 22:04:30 klimt kernel: ffffffffa05ac80d ffff880835f54548 ffff88084e640008 ffff880835f545b0
      Oct 28 22:04:30 klimt kernel: ffff8808420dbd70 ffffffffa059803d ffff880835f1c768 0000000000000870
      Oct 28 22:04:30 klimt kernel: Call Trace:
      Oct 28 22:04:30 klimt kernel: [<ffffffffa05ac80d>] nfsd4_free_stateid+0xfd/0x1b0 [nfsd]
      Oct 28 22:04:30 klimt kernel: [<ffffffffa059803d>] nfsd4_proc_compound+0x40d/0x690 [nfsd]
      Oct 28 22:04:30 klimt kernel: [<ffffffffa0583114>] nfsd_dispatch+0xd4/0x1d0 [nfsd]
      Oct 28 22:04:30 klimt kernel: [<ffffffffa047bbf9>] svc_process_common+0x3d9/0x700 [sunrpc]
      Oct 28 22:04:30 klimt kernel: [<ffffffffa047ca64>] svc_process+0xf4/0x330 [sunrpc]
      Oct 28 22:04:30 klimt kernel: [<ffffffffa05827ca>] nfsd+0xfa/0x160 [nfsd]
      Oct 28 22:04:30 klimt kernel: [<ffffffffa05826d0>] ? nfsd_destroy+0x170/0x170 [nfsd]
      Oct 28 22:04:30 klimt kernel: [<ffffffff810b367b>] kthread+0x10b/0x120
      Oct 28 22:04:30 klimt kernel: [<ffffffff810b3570>] ? kthread_stop+0x280/0x280
      Oct 28 22:04:30 klimt kernel: [<ffffffff8174e8ba>] ret_from_fork+0x2a/0x40
      Oct 28 22:04:30 klimt kernel: Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 b0 00 00 00 48 89 fb 4c 8b a0 98 00 00 00 <49> 8b 44 24 20 48 8d b8 80 03 00 00 e8 10 66 1a e1 48 89 df e8
      Oct 28 22:04:30 klimt kernel: RIP  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
      Oct 28 22:04:30 klimt kernel: RSP <ffff8808420dbce0>
      Oct 28 22:04:30 klimt kernel: ---[ end trace cf5d0b371973e167 ]---
      
      Jeff Layton says:
      > Hm...now that I look though, this is a little suspicious:
      >
      >    struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
      >
      > I wonder if it's possible for the openstateid to have already been
      > destroyed at this point.
      >
      > We might be better off doing something like this to get the client pointer:
      >
      >    stp->st_stid.sc_client;
      >
      > ...which should be more direct and less dependent on other stateids
      > staying valid.
      
      With the suggested change, I am no longer able to reproduce the above oops.
      
      v2: Fix unhash_lock_stateid() as well
      Fix-suggested-by: default avatarJeff Layton <jlayton@redhat.com>
      Fixes: 42691398 ('nfsd: Fix race between FREE_STATEID and LOCK')
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Cc: Christian Theune <ct@flyingcircus.io>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ea627b2