1. 26 Jan, 2017 40 commits
    • Geoff Levand's avatar
      powerpc/ps3: Fix system hang with GCC 5 builds · 818e3035
      Geoff Levand authored
      commit 6dff5b67 upstream.
      
      GCC 5 generates different code for this bootwrapper null check that
      causes the PS3 to hang very early in its bootup. This check is of
      limited value, so just get rid of it.
      Signed-off-by: default avatarGeoff Levand <geoff@infradead.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      818e3035
    • Al Viro's avatar
      nfs_write_end(): fix handling of short copies · 274125f9
      Al Viro authored
      commit c0cf3ef5 upstream.
      
      What matters when deciding if we should make a page uptodate is
      not how much we _wanted_ to copy, but how much we actually have
      copied.  As it is, on architectures that do not zero tail on
      short copy we can leave uninitialized data in page marked uptodate.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      274125f9
    • Ilya Dryomov's avatar
      libceph: verify authorize reply on connect · b8e9e113
      Ilya Dryomov authored
      commit 5c056fdc upstream.
      
      After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
      the client gets back a ceph_x_authorize_reply, which it is supposed to
      verify to ensure the authenticity and protect against replay attacks.
      The code for doing this is there (ceph_x_verify_authorizer_reply(),
      ceph_auth_verify_authorizer_reply() + plumbing), but it is never
      invoked by the the messenger.
      
      AFAICT this goes back to 2009, when ceph authentication protocols
      support was added to the kernel client in 4e7a5dcd ("ceph:
      negotiate authentication protocol; implement AUTH_NONE protocol").
      
      The second param of ceph_connection_operations::verify_authorizer_reply
      is unused all the way down.  Pass 0 to facilitate backporting, and kill
      it in the next commit.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b8e9e113
    • Alan Stern's avatar
      PCI: Check for PME in targeted sleep state · d211a4a2
      Alan Stern authored
      commit 6496ebd7 upstream.
      
      One some systems, the firmware does not allow certain PCI devices to be put
      in deep D-states.  This can cause problems for wakeup signalling, if the
      device does not support PME# in the deepest allowed suspend state.  For
      example, Pierre reports that on his system, ACPI does not permit his xHCI
      host controller to go into D3 during runtime suspend -- but D3 is the only
      state in which the controller can generate PME# signals.  As a result, the
      controller goes into runtime suspend but never wakes up, so it doesn't work
      properly.  USB devices plugged into the controller are never detected.
      
      If the device relies on PME# for wakeup signals but is not capable of
      generating PME# in the target state, the PCI core should accurately report
      that it cannot do wakeup from runtime suspend.  This patch modifies the
      pci_dev_run_wake() routine to add this check.
      Reported-by: default avatarPierre de Villemereuil <flyos@mailoo.org>
      Tested-by: default avatarPierre de Villemereuil <flyos@mailoo.org>
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      CC: Lukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d211a4a2
    • Bart Van Assche's avatar
      IB/multicast: Check ib_find_pkey() return value · aacf6f08
      Bart Van Assche authored
      commit d3a2418e upstream.
      
      This patch avoids that Coverity complains about not checking the
      ib_find_pkey() return value.
      
      Fixes: commit 547af765 ("IB/multicast: Report errors on multicast groups if P_key changes")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      aacf6f08
    • Bart Van Assche's avatar
      IB/mad: Fix an array index check · 9d765155
      Bart Van Assche authored
      commit 2fe2f378 upstream.
      
      The array ib_mad_mgmt_class_table.method_table has MAX_MGMT_CLASS
      (80) elements. Hence compare the array index with that value instead
      of with IB_MGMT_MAX_METHODS (128). This patch avoids that Coverity
      reports the following:
      
      Overrunning array class->method_table of 80 8-byte elements at element index 127 (byte offset 1016) using index convert_mgmt_class(mad_hdr->mgmt_class) (which evaluates to 127).
      
      Fixes: commit b7ab0b19 ("IB/mad: Verify mgmt class in received MADs")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Reviewed-by: default avatarHal Rosenstock <hal@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9d765155
    • Steven Rostedt (Red Hat)'s avatar
      ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it · 7a2a5f7c
      Steven Rostedt (Red Hat) authored
      commit 847fa1a6 upstream.
      
      With new binutils, gcc may get smart with its optimization and change a jmp
      from a 5 byte jump to a 2 byte one even though it was jumping to a global
      function. But that global function existed within a 2 byte radius, and gcc
      was able to optimize it. Unfortunately, that jump was also being modified
      when function graph tracing begins. Since ftrace expected that jump to be 5
      bytes, but it was only two, it overwrote code after the jump, causing a
      crash.
      
      This was fixed for x86_64 with commit 8329e818, with the same subject as
      this commit, but nothing was done for x86_32.
      
      Fixes: d61f82d0 ("ftrace: use dynamic patching for updating mcount calls")
      Reported-by: default avatarColin Ian King <colin.king@canonical.com>
      Tested-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7a2a5f7c
    • Jim Mattson's avatar
      kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF) · aaa9f982
      Jim Mattson authored
      commit ef85b673 upstream.
      
      When L2 exits to L0 due to "exception or NMI", software exceptions
      (#BP and #OF) for which L1 has requested an intercept should be
      handled by L1 rather than L0. Previously, only hardware exceptions
      were forwarded to L1.
      Signed-off-by: default avatarJim Mattson <jmattson@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      aaa9f982
    • Konstantin Khlebnikov's avatar
      md/raid5: limit request size according to implementation limits · 1bf684a8
      Konstantin Khlebnikov authored
      commit e8d7c332 upstream.
      
      Current implementation employ 16bit counter of active stripes in lower
      bits of bio->bi_phys_segments. If request is big enough to overflow
      this counter bio will be completed and freed too early.
      
      Fortunately this not happens in default configuration because several
      other limits prevent that: stripe_cache_size * nr_disks effectively
      limits count of active stripes. And small max_sectors_kb at lower
      disks prevent that during normal read/write operations.
      
      Overflow easily happens in discard if it's enabled by module parameter
      "devices_handle_discard_safely" and stripe_cache_size is set big enough.
      
      This patch limits requests size with 256Mb - 8Kb to prevent overflows.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Neil Brown <neilb@suse.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1bf684a8
    • Gerald Schaefer's avatar
      s390/vmlogrdr: fix IUCV buffer allocation · 22f604f0
      Gerald Schaefer authored
      commit 5457e03d upstream.
      
      The buffer for iucv_message_receive() needs to be below 2 GB. In
      __iucv_message_receive(), the buffer address is casted to an u32, which
      would result in either memory corruption or an addressing exception when
      using addresses >= 2 GB.
      
      Fix this by using GFP_DMA for the buffer allocation.
      Signed-off-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      22f604f0
    • Wei Fang's avatar
      scsi: avoid a permanent stop of the scsi device's request queue · 75343701
      Wei Fang authored
      commit d2a14525 upstream.
      
      A race between scanning and fc_remote_port_delete() may result in a
      permanent stop if the device gets blocked before scsi_sysfs_add_sdev()
      and unblocked after.  The reason is that blocking a device sets both the
      SDEV_BLOCKED state and the QUEUE_FLAG_STOPPED.  However,
      scsi_sysfs_add_sdev() unconditionally sets SDEV_RUNNING which causes the
      device to be ignored by scsi_target_unblock() and thus never have its
      QUEUE_FLAG_STOPPED cleared leading to a device which is apparently
      running but has a stopped queue.
      
      We actually have two places where SDEV_RUNNING is set: once in
      scsi_add_lun() which respects the blocked flag and once in
      scsi_sysfs_add_sdev() which doesn't.  Since the second set is entirely
      spurious, simply remove it to fix the problem.
      Reported-by: default avatarZengxi Chen <chenzengxi@huawei.com>
      Signed-off-by: default avatarWei Fang <fangwei1@huawei.com>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      75343701
    • Steffen Maier's avatar
      scsi: zfcp: fix rport unblock race with LUN recovery · 197663e6
      Steffen Maier authored
      commit 6f2ce1c6 upstream.
      
      It is unavoidable that zfcp_scsi_queuecommand() has to finish requests
      with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time
      window when zfcp detected an unavailable rport but
      fc_remote_port_delete(), which is asynchronous via
      zfcp_scsi_schedule_rport_block(), has not yet blocked the rport.
      
      However, for the case when the rport becomes available again, we should
      prevent unblocking the rport too early.  In contrast to other FCP LLDDs,
      zfcp has to open each LUN with the FCP channel hardware before it can
      send I/O to a LUN.  So if a port already has LUNs attached and we
      unblock the rport just after port recovery, recoveries of LUNs behind
      this port can still be pending which in turn force
      zfcp_scsi_queuecommand() to unnecessarily finish requests with
      DID_IMM_RETRY.
      
      This also opens a time window with unblocked rport (until the followup
      LUN reopen recovery has finished).  If a scsi_cmnd timeout occurs during
      this time window fc_timed_out() cannot work as desired and such command
      would indeed time out and trigger scsi_eh. This prevents a clean and
      timely path failover.  This should not happen if the path issue can be
      recovered on FC transport layer such as path issues involving RSCNs.
      
      Fix this by only calling zfcp_scsi_schedule_rport_register(), to
      asynchronously trigger fc_remote_port_add(), after all LUN recoveries as
      children of the rport have finished and no new recoveries of equal or
      higher order were triggered meanwhile.  Finished intentionally includes
      any recovery result no matter if successful or failed (still unblock
      rport so other successful LUNs work).  For simplicity, we check after
      each finished LUN recovery if there is another LUN recovery pending on
      the same port and then do nothing.  We handle the special case of a
      successful recovery of a port without LUN children the same way without
      changing this case's semantics.
      
      For debugging we introduce 2 new trace records written if the rport
      unblock attempt was aborted due to still unfinished or freshly triggered
      recovery. The records are only written above the default trace level.
      
      Benjamin noticed the important special case of new recovery that can be
      triggered between having given up the erp_lock and before calling
      zfcp_erp_action_cleanup() within zfcp_erp_strategy().  We must avoid the
      following sequence:
      
      ERP thread                 rport_work      other context
      -------------------------  --------------  --------------------------------
      port is unblocked, rport still blocked,
       due to pending/running ERP action,
       so ((port->status & ...UNBLOCK) != 0)
       and (port->rport == NULL)
      unlock ERP
      zfcp_erp_action_cleanup()
      case ZFCP_ERP_ACTION_REOPEN_LUN:
      zfcp_erp_try_rport_unblock()
      ((status & ...UNBLOCK) != 0) [OLD!]
                                                 zfcp_erp_port_reopen()
                                                 lock ERP
                                                 zfcp_erp_port_block()
                                                 port->status clear ...UNBLOCK
                                                 unlock ERP
                                                 zfcp_scsi_schedule_rport_block()
                                                 port->rport_task = RPORT_DEL
                                                 queue_work(rport_work)
                                 zfcp_scsi_rport_work()
                                 (port->rport_task != RPORT_ADD)
                                 port->rport_task = RPORT_NONE
                                 zfcp_scsi_rport_block()
                                 if (!port->rport) return
      zfcp_scsi_schedule_rport_register()
      port->rport_task = RPORT_ADD
      queue_work(rport_work)
                                 zfcp_scsi_rport_work()
                                 (port->rport_task == RPORT_ADD)
                                 port->rport_task = RPORT_NONE
                                 zfcp_scsi_rport_register()
                                 (port->rport == NULL)
                                 rport = fc_remote_port_add()
                                 port->rport = rport;
      
      Now the rport was erroneously unblocked while the zfcp_port is blocked.
      This is another situation we want to avoid due to scsi_eh
      potential. This state would at least remain until the new recovery from
      the other context finished successfully, or potentially forever if it
      failed.  In order to close this race, we take the erp_lock inside
      zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or
      LUN.  With that, the possible corresponding rport state sequences would
      be: (unblock[ERP thread],block[other context]) if the ERP thread gets
      erp_lock first and still sees ((port->status & ...UNBLOCK) != 0),
      (block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock
      after the other context has already cleard ...UNBLOCK from port->status.
      
      Since checking fields of struct erp_action is unsafe because they could
      have been overwritten (re-used for new recovery) meanwhile, we only
      check status of zfcp_port and LUN since these are only changed under
      erp_lock elsewhere. Regarding the check of the proper status flags (port
      or port_forced are similar to the shown adapter recovery):
      
      [zfcp_erp_adapter_shutdown()]
      zfcp_erp_adapter_reopen()
       zfcp_erp_adapter_block()
        * clear UNBLOCK ---------------------------------------+
       zfcp_scsi_schedule_rports_block()                       |
       write_lock_irqsave(&adapter->erp_lock, flags);-------+  |
       zfcp_erp_action_enqueue()                            |  |
        zfcp_erp_setup_act()                                |  |
         * set ERP_INUSE -----------------------------------|--|--+
       write_unlock_irqrestore(&adapter->erp_lock, flags);--+  |  |
      .context-switch.                                         |  |
      zfcp_erp_thread()                                        |  |
       zfcp_erp_strategy()                                     |  |
        write_lock_irqsave(&adapter->erp_lock, flags);------+  |  |
        ...                                                 |  |  |
        zfcp_erp_strategy_check_target()                    |  |  |
         zfcp_erp_strategy_check_adapter()                  |  |  |
          zfcp_erp_adapter_unblock()                        |  |  |
           * set UNBLOCK -----------------------------------|--+  |
        zfcp_erp_action_dequeue()                           |     |
         * clear ERP_INUSE ---------------------------------|-----+
        ...                                                 |
        write_unlock_irqrestore(&adapter->erp_lock, flags);-+
      
      Hence, we should check for both UNBLOCK and ERP_INUSE because they are
      interleaved.  Also we need to explicitly check ERP_FAILED for the link
      down case which currently does not clear the UNBLOCK flag in
      zfcp_fsf_link_down_info_eval().
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: 8830271c ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport")
      Fixes: a2fa0aed ("[SCSI] zfcp: Block FC transport rports early on errors")
      Fixes: 5f852be9 ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
      Fixes: 338151e0 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
      Fixes: 3859f6a2 ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      197663e6
    • Steffen Maier's avatar
      scsi: zfcp: do not trace pure benign residual HBA responses at default level · 82a5b214
      Steffen Maier authored
      commit 56d23ed7 upstream.
      
      Since quite a while, Linux issues enough SCSI commands per scsi_device
      which successfully return with FCP_RESID_UNDER, FSF_FCP_RSP_AVAILABLE,
      and SAM_STAT_GOOD.  This floods the HBA trace area and we cannot see
      other and important HBA trace records long enough.
      
      Therefore, do not trace HBA response errors for pure benign residual
      under counts at the default trace level.
      
      This excludes benign residual under count combined with other validity
      bits set in FCP_RSP_IU, such as FCP_SNS_LEN_VAL.  For all those other
      cases, we still do want to see both the HBA record and the corresponding
      SCSI record by default.
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Fixes: a54ca0f6 ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
      Reviewed-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      82a5b214
    • Benjamin Block's avatar
      scsi: zfcp: fix use-after-"free" in FC ingress path after TMF · bd8d14a5
      Benjamin Block authored
      commit dac37e15 upstream.
      
      When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and
      eh_target_reset_handler(), it expects us to relent the ownership over
      the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN
      or target - when returning with SUCCESS from the callback ('release'
      them).  SCSI EH can then reuse those commands.
      
      We did not follow this rule to release commands upon SUCCESS; and if
      later a reply arrived for one of those supposed to be released commands,
      we would still make use of the scsi_cmnd in our ingress tasklet. This
      will at least result in undefined behavior or a kernel panic because of
      a wrong kernel pointer dereference.
      
      To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req
      *)->data in the matching scope if a TMF was successful. This is done
      under the locks (struct zfcp_adapter *)->abort_lock and (struct
      zfcp_reqlist *)->lock to prevent the requests from being removed from
      the request-hashtable, and the ingress tasklet from making use of the
      scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler().
      
      For cases where a reply arrives during SCSI EH, but before we get a
      chance to NULLify the pointer - but before we return from the callback
      -, we assume that the code is protected from races via the CAS operation
      in blk_complete_request() that is called in scsi_done().
      
      The following stacktrace shows an example for a crash resulting from the
      previous behavior:
      
      Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000
      Oops: 0038 [#1] SMP
      CPU: 2 PID: 0 Comm: swapper/2 Not tainted
      task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000
      Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40)
                 R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
      Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015
                 ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800
                 000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93
                 00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918
      Krnl Code: 00000000001156a2: a7190000        lghi    %r1,0
                 00000000001156a6: a7380015        lhi    %r3,21
                #00000000001156aa: e32050000008    ag    %r2,0(%r5)
                >00000000001156b0: 482022b0        lh    %r2,688(%r2)
                 00000000001156b4: ae123000        sigp    %r1,%r2,0(%r3)
                 00000000001156b8: b2220020        ipm    %r2
                 00000000001156bc: 8820001c        srl    %r2,28
                 00000000001156c0: c02700000001    xilf    %r2,1
      Call Trace:
      ([<0000000000000000>] 0x0)
       [<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp]
       [<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp]
       [<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp]
       [<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp]
       [<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio]
       [<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio]
       [<0000000000141fd4>] tasklet_action+0x9c/0x170
       [<0000000000141550>] __do_softirq+0xe8/0x258
       [<000000000010ce0a>] do_softirq+0xba/0xc0
       [<000000000014187c>] irq_exit+0xc4/0xe8
       [<000000000046b526>] do_IRQ+0x146/0x1d8
       [<00000000005d6a3c>] io_return+0x0/0x8
       [<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0
      ([<0000000000000000>] 0x0)
       [<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0
       [<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8
       [<0000000000114782>] smp_start_secondary+0xda/0xe8
       [<00000000005d6efe>] restart_int_handler+0x56/0x6c
       [<0000000000000000>] 0x0
      Last Breaking-Event-Address:
       [<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0
      Suggested-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Block <bblock@linux.vnet.ibm.com>
      Fixes: ea127f97 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git)
      Signed-off-by: default avatarSteffen Maier <maier@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bd8d14a5
    • Rabin Vincent's avatar
      block: protect iterate_bdevs() against concurrent close · 2a0a8ae2
      Rabin Vincent authored
      commit af309226 upstream.
      
      If a block device is closed while iterate_bdevs() is handling it, the
      following NULL pointer dereference occurs because bdev->b_disk is NULL
      in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
      turn called by the mapping_cap_writeback_dirty() call in
      __filemap_fdatawrite_range()):
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000508
       IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
       PGD 9e62067 PUD 9ee8067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
       Modules linked in:
       CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
       task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000
       RIP: 0010:[<ffffffff81314790>]  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
       RSP: 0018:ffff880009f5fe68  EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940
       RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0
       RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860
       R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38
       FS:  00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0
       Stack:
        ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
        0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
        ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
       Call Trace:
        [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90
        [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30
        [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20
        [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130
        [<ffffffff811b2763>] sys_sync+0x63/0x90
        [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76
       Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d
       RIP  [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
        RSP <ffff880009f5fe68>
       CR2: 0000000000000508
       ---[ end trace 2487336ceb3de62d ]---
      
      The crash is easily reproducible by running the following command, if an
      msleep(100) is inserted before the call to func() in iterate_devs():
      
       while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done
      
      Fix it by holding the bd_mutex across the func() call and only calling
      func() if the bdev is opened.
      
      Fixes: 5c0d6b60 ("vfs: Create function for iterating over block devices")
      Reported-and-tested-by: default avatarWei Fang <fangwei1@huawei.com>
      Signed-off-by: default avatarRabin Vincent <rabinv@axis.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2a0a8ae2
    • Russell Currey's avatar
      drivers/gpu/drm/ast: Fix infinite loop if read fails · 0401ed69
      Russell Currey authored
      commit 298360af upstream.
      
      ast_get_dram_info() configures a window in order to access BMC memory.
      A BMC register can be configured to disallow this, and if so, causes
      an infinite loop in the ast driver which renders the system unusable.
      
      Fix this by erroring out if an error is detected.  On powerpc systems with
      EEH, this leads to the device being fenced and the system continuing to
      operate.
      Signed-off-by: default avatarRussell Currey <ruscur@russell.cc>
      Reviewed-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161215051241.20815-1-ruscur@russell.ccSigned-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0401ed69
    • Patrik Jakobsson's avatar
      drm/gma500: Add compat ioctl · cc61286c
      Patrik Jakobsson authored
      commit 0a97c81a upstream.
      
      Hook up drm_compat_ioctl to support 32-bit userspace on 64-bit kernels.
      It turns out that N2600 and N2800 comes with 64-bit enabled. We
      previously assumed there where no such systems out there.
      Signed-off-by: default avatarPatrik Jakobsson <patrik.r.jakobsson@gmail.com>
      Signed-off-by: default avatarSean Paul <seanpaul@chromium.org>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161101144315.2955-1-patrik.r.jakobsson@gmail.comSigned-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cc61286c
    • Alex Deucher's avatar
      drm/radeon: add additional pci revision to dpm workaround · 83adc27e
      Alex Deucher authored
      commit 8729675c upstream.
      
      New variant.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      83adc27e
    • Krzysztof Kozlowski's avatar
      thermal: hwmon: Properly report critical temperature in sysfs · 9aaa0c7c
      Krzysztof Kozlowski authored
      commit f37fabb8 upstream.
      
      In the critical sysfs entry the thermal hwmon was returning wrong
      temperature to the user-space.  It was reporting the temperature of the
      first trip point instead of the temperature of critical trip point.
      
      For example:
      	/sys/class/hwmon/hwmon0/temp1_crit:50000
      	/sys/class/thermal/thermal_zone0/trip_point_0_temp:50000
      	/sys/class/thermal/thermal_zone0/trip_point_0_type:active
      	/sys/class/thermal/thermal_zone0/trip_point_3_temp:120000
      	/sys/class/thermal/thermal_zone0/trip_point_3_type:critical
      
      Since commit e68b16ab ("thermal: add hwmon sysfs I/F") the driver
      have been registering a sysfs entry if get_crit_temp() callback was
      provided.  However when accessed, it was calling get_trip_temp() instead
      of the get_crit_temp().
      
      Fixes: e68b16ab ("thermal: add hwmon sysfs I/F")
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9aaa0c7c
    • Larry Finger's avatar
      ssb: Fix error routine when fallback SPROM fails · a906df2c
      Larry Finger authored
      commit 8052d724 upstream.
      
      When there is a CRC error in the SPROM read from the device, the code
      attempts to handle a fallback SPROM. When this also fails, the driver
      returns zero rather than an error code.
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a906df2c
    • Eric Sandeen's avatar
      xfs: set AGI buffer type in xlog_recover_clear_agi_bucket · b514c427
      Eric Sandeen authored
      commit 6b10b23c upstream.
      
      xlog_recover_clear_agi_bucket didn't set the
      type to XFS_BLFT_AGI_BUF, so we got a warning during log
      replay (or an ASSERT on a debug build).
      
          XFS (md0): Unknown buffer type 0!
          XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1
      
      Fix this, as was done in f19b872b for 2 other locations
      with the same problem.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b514c427
    • Julien Grall's avatar
      arm/xen: Use alloc_percpu rather than __alloc_percpu · b7e20558
      Julien Grall authored
      commit 24d5373d upstream.
      
      The function xen_guest_init is using __alloc_percpu with an alignment
      which are not power of two.
      
      However, the percpu allocator never supported alignments which are not power
      of two and has always behaved incorectly in thise case.
      
      Commit 3ca45a46 "percpu: ensure requested alignment is power of two"
      introduced a check which trigger a warning [1] when booting linux-next
      on Xen. But in reality this bug was always present.
      
      This can be fixed by replacing the call to __alloc_percpu with
      alloc_percpu. The latter will use an alignment which are a power of two.
      
      [1]
      
      [    0.023921] illegal size (48) or align (48) for percpu allocation
      [    0.024167] ------------[ cut here ]------------
      [    0.024344] WARNING: CPU: 0 PID: 1 at linux/mm/percpu.c:892 pcpu_alloc+0x88/0x6c0
      [    0.024584] Modules linked in:
      [    0.024708]
      [    0.024804] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
      4.9.0-rc7-next-20161128 #473
      [    0.025012] Hardware name: Foundation-v8A (DT)
      [    0.025162] task: ffff80003d870000 task.stack: ffff80003d844000
      [    0.025351] PC is at pcpu_alloc+0x88/0x6c0
      [    0.025490] LR is at pcpu_alloc+0x88/0x6c0
      [    0.025624] pc : [<ffff00000818e678>] lr : [<ffff00000818e678>]
      pstate: 60000045
      [    0.025830] sp : ffff80003d847cd0
      [    0.025946] x29: ffff80003d847cd0 x28: 0000000000000000
      [    0.026147] x27: 0000000000000000 x26: 0000000000000000
      [    0.026348] x25: 0000000000000000 x24: 0000000000000000
      [    0.026549] x23: 0000000000000000 x22: 00000000024000c0
      [    0.026752] x21: ffff000008e97000 x20: 0000000000000000
      [    0.026953] x19: 0000000000000030 x18: 0000000000000010
      [    0.027155] x17: 0000000000000a3f x16: 00000000deadbeef
      [    0.027357] x15: 0000000000000006 x14: ffff000088f79c3f
      [    0.027573] x13: ffff000008f79c4d x12: 0000000000000041
      [    0.027782] x11: 0000000000000006 x10: 0000000000000042
      [    0.027995] x9 : ffff80003d847a40 x8 : 6f697461636f6c6c
      [    0.028208] x7 : 6120757063726570 x6 : ffff000008f79c84
      [    0.028419] x5 : 0000000000000005 x4 : 0000000000000000
      [    0.028628] x3 : 0000000000000000 x2 : 000000000000017f
      [    0.028840] x1 : ffff80003d870000 x0 : 0000000000000035
      [    0.029056]
      [    0.029152] ---[ end trace 0000000000000000 ]---
      [    0.029297] Call trace:
      [    0.029403] Exception stack(0xffff80003d847b00 to
                                     0xffff80003d847c30)
      [    0.029621] 7b00: 0000000000000030 0001000000000000
      ffff80003d847cd0 ffff00000818e678
      [    0.029901] 7b20: 0000000000000002 0000000000000004
      ffff000008f7c060 0000000000000035
      [    0.030153] 7b40: ffff000008f79000 ffff000008c4cd88
      ffff80003d847bf0 ffff000008101778
      [    0.030402] 7b60: 0000000000000030 0000000000000000
      ffff000008e97000 00000000024000c0
      [    0.030647] 7b80: 0000000000000000 0000000000000000
      0000000000000000 0000000000000000
      [    0.030895] 7ba0: 0000000000000035 ffff80003d870000
      000000000000017f 0000000000000000
      [    0.031144] 7bc0: 0000000000000000 0000000000000005
      ffff000008f79c84 6120757063726570
      [    0.031394] 7be0: 6f697461636f6c6c ffff80003d847a40
      0000000000000042 0000000000000006
      [    0.031643] 7c00: 0000000000000041 ffff000008f79c4d
      ffff000088f79c3f 0000000000000006
      [    0.031877] 7c20: 00000000deadbeef 0000000000000a3f
      [    0.032051] [<ffff00000818e678>] pcpu_alloc+0x88/0x6c0
      [    0.032229] [<ffff00000818ece8>] __alloc_percpu+0x18/0x20
      [    0.032409] [<ffff000008d9606c>] xen_guest_init+0x174/0x2f4
      [    0.032591] [<ffff0000080830f8>] do_one_initcall+0x38/0x130
      [    0.032783] [<ffff000008d90c34>] kernel_init_freeable+0xe0/0x248
      [    0.032995] [<ffff00000899a890>] kernel_init+0x10/0x100
      [    0.033172] [<ffff000008082ec0>] ret_from_fork+0x10/0x50
      Reported-by: default avatarWei Chen <wei.chen@arm.com>
      Link: https://lkml.org/lkml/2016/11/28/669Signed-off-by: default avatarJulien Grall <julien.grall@arm.com>
      Signed-off-by: default avatarStefano Stabellini <sstabellini@kernel.org>
      Reviewed-by: default avatarStefano Stabellini <sstabellini@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b7e20558
    • Boris Ostrovsky's avatar
      xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing · 7c5f9434
      Boris Ostrovsky authored
      commit 30faaafd upstream.
      
      Commit 9c17d965 ("xen/gntdev: Grant maps should not be subject to
      NUMA balancing") set VM_IO flag to prevent grant maps from being
      subjected to NUMA balancing.
      
      It was discovered recently that this flag causes get_user_pages() to
      always fail with -EFAULT.
      
      check_vma_flags
      __get_user_pages
      __get_user_pages_locked
      __get_user_pages_unlocked
      get_user_pages_fast
      iov_iter_get_pages
      dio_refill_pages
      do_direct_IO
      do_blockdev_direct_IO
      do_blockdev_direct_IO
      ext4_direct_IO_read
      generic_file_read_iter
      aio_run_iocb
      
      (which can happen if guest's vdisk has direct-io-safe option).
      
      To avoid this let's use VM_MIXEDMAP flag instead --- it prevents
      NUMA balancing just as VM_IO does and has no effect on
      check_vma_flags().
      Reported-by: default avatarOlaf Hering <olaf@aepfle.de>
      Suggested-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Tested-by: default avatarOlaf Hering <olaf@aepfle.de>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7c5f9434
    • Pavel Shilovsky's avatar
      CIFS: Fix a possible memory corruption in push locks · dd4617bd
      Pavel Shilovsky authored
      commit e3d240e9 upstream.
      
      If maxBuf is not 0 but less than a size of SMB2 lock structure
      we can end up with a memory corruption.
      Signed-off-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      dd4617bd
    • Pavel Shilovsky's avatar
      c291abb7
    • Pavel Shilovsky's avatar
      CIFS: Fix a possible memory corruption during reconnect · 123b228a
      Pavel Shilovsky authored
      commit 53e0e11e upstream.
      
      We can not unlock/lock cifs_tcp_ses_lock while walking through ses
      and tcon lists because it can corrupt list iterator pointers and
      a tcon structure can be released if we don't hold an extra reference.
      Fix it by moving a reconnect process to a separate delayed work
      and acquiring a reference to every tcon that needs to be reconnected.
      Also do not send an echo request on newly established connections.
      Signed-off-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      123b228a
    • Benjamin Marzinski's avatar
      dm space map metadata: fix 'struct sm_metadata' leak on failed create · 102ddf72
      Benjamin Marzinski authored
      commit 314c25c5 upstream.
      
      In dm_sm_metadata_create() we temporarily change the dm_space_map
      operations from 'ops' (whose .destroy function deallocates the
      sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't).
      
      If dm_sm_metadata_create() fails in sm_ll_new_metadata() or
      sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls
      dm_sm_destroy() with the intention of freeing the sm_metadata, but it
      doesn't (because the dm_space_map operations is still set to
      'bootstrap_ops').
      
      Fix this by setting the dm_space_map operations back to 'ops' if
      dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'.
      
      [js] no nr_blocks test in 3.12 yet
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      102ddf72
    • Ondrej Kozina's avatar
      dm crypt: mark key as invalid until properly loaded · 1a131486
      Ondrej Kozina authored
      commit 265e9098 upstream.
      
      In crypt_set_key(), if a failure occurs while replacing the old key
      (e.g. tfm->setkey() fails) the key must not have DM_CRYPT_KEY_VALID flag
      set.  Otherwise, the crypto layer would have an invalid key that still
      has DM_CRYPT_KEY_VALID flag set.
      Signed-off-by: default avatarOndrej Kozina <okozina@redhat.com>
      Reviewed-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1a131486
    • Aleksa Sarai's avatar
      fs: exec: apply CLOEXEC before changing dumpable task flags · aae555fb
      Aleksa Sarai authored
      commit 613cc2b6 upstream.
      
      If you have a process that has set itself to be non-dumpable, and it
      then undergoes exec(2), any CLOEXEC file descriptors it has open are
      "exposed" during a race window between the dumpable flags of the process
      being reset for exec(2) and CLOEXEC being applied to the file
      descriptors. This can be exploited by a process by attempting to access
      /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.
      
      The race in question is after set_dumpable has been (for get_link,
      though the trace is basically the same for readlink):
      
      [vfs]
      -> proc_pid_link_inode_operations.get_link
         -> proc_pid_get_link
            -> proc_fd_access_allowed
               -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
      
      Which will return 0, during the race window and CLOEXEC file descriptors
      will still be open during this window because do_close_on_exec has not
      been called yet. As a result, the ordering of these calls should be
      reversed to avoid this race window.
      
      This is of particular concern to container runtimes, where joining a
      PID namespace with file descriptors referring to the host filesystem
      can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
      against access of CLOEXEC file descriptors -- file descriptors which may
      reference filesystem objects the container shouldn't have access to).
      
      Cc: dev@opencontainers.org
      Reported-by: default avatarMichael Crosby <crosbymichael@gmail.com>
      Signed-off-by: default avatarAleksa Sarai <asarai@suse.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      aae555fb
    • Shaohua Li's avatar
      mm/vmscan.c: set correct defer count for shrinker · 9ba6fb6a
      Shaohua Li authored
      commit 5f33a080 upstream.
      
      Our system uses significantly more slab memory with memcg enabled with
      the latest kernel.  With 3.10 kernel, slab uses 2G memory, while with
      4.6 kernel, 6G memory is used.  The shrinker has problem.  Let's see we
      have two memcg for one shrinker.  In do_shrink_slab:
      
      1. Check cg1.  nr_deferred = 0, assume total_scan = 700.  batch size
         is 1024, then no memory is freed.  nr_deferred = 700
      
      2. Check cg2.  nr_deferred = 700.  Assume freeable = 20, then
         total_scan = 10 or 40.  Let's assume it's 10.  No memory is freed.
         nr_deferred = 10.
      
      The deferred share of cg1 is lost in this case.  kswapd will free no
      memory even run above steps again and again.
      
      The fix makes sure one memcg's deferred share isn't lost.
      
      Link: http://lkml.kernel.org/r/2414be961b5d25892060315fbb56bb19d81d0c07.1476227351.git.shli@fb.comSigned-off-by: default avatarShaohua Li <shli@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9ba6fb6a
    • Nicolai Stange's avatar
      f2fs: set ->owner for debugfs status file's file_operations · 5b0d12d3
      Nicolai Stange authored
      commit 05e6ea26 upstream.
      
      The struct file_operations instance serving the f2fs/status debugfs file
      lacks an initialization of its ->owner.
      
      This means that although that file might have been opened, the f2fs module
      can still get removed. Any further operation on that opened file, releasing
      included,  will cause accesses to unmapped memory.
      
      Indeed, Mike Marshall reported the following:
      
        BUG: unable to handle kernel paging request at ffffffffa0307430
        IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90
        <...>
        Call Trace:
         [] __fput+0xdf/0x1d0
         [] ____fput+0xe/0x10
         [] task_work_run+0x8e/0xc0
         [] do_exit+0x2ae/0xae0
         [] ? __audit_syscall_entry+0xae/0x100
         [] ? syscall_trace_enter+0x1ca/0x310
         [] do_group_exit+0x44/0xc0
         [] SyS_exit_group+0x14/0x20
         [] do_syscall_64+0x61/0x150
         [] entry_SYSCALL64_slow_path+0x25/0x25
        <...>
        ---[ end trace f22ae883fa3ea6b8 ]---
        Fixing recursive fault but reboot is needed!
      
      Fix this by initializing the f2fs/status file_operations' ->owner with
      THIS_MODULE.
      
      This will allow debugfs to grab a reference to the f2fs module upon any
      open on that file, thus preventing it from getting removed.
      
      Fixes: 902829aa ("f2fs: move proc files to debugfs")
      Reported-by: default avatarMike Marshall <hubcap@omnibond.com>
      Reported-by: default avatarMartin Brandenburg <martin@omnibond.com>
      Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5b0d12d3
    • Dan Carpenter's avatar
      ext4: return -ENOMEM instead of success · e832a16a
      Dan Carpenter authored
      commit 578620f4 upstream.
      
      We should set the error code if kzalloc() fails.
      
      Fixes: 67cf5b09 ("ext4: add the basic function for inline data support")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e832a16a
    • Darrick J. Wong's avatar
      ext4: reject inodes with negative size · 30256224
      Darrick J. Wong authored
      commit 7e6e1ef4 upstream.
      
      Don't load an inode with a negative size; this causes integer overflow
      problems in the VFS.
      
      [ Added EXT4_ERROR_INODE() to mark file system as corrupted. -TYT]
      
      js: use EIO for 3.12 instead of EFSCORRUPTED.
      
      Fixes: a48380f7 (ext4: rename i_dir_acl to i_size_high)
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      30256224
    • Theodore Ts'o's avatar
      ext4: add sanity checking to count_overhead() · 2b47f735
      Theodore Ts'o authored
      commit c48ae41b upstream.
      
      The commit "ext4: sanity check the block and cluster size at mount
      time" should prevent any problems, but in case the superblock is
      modified while the file system is mounted, add an extra safety check
      to make sure we won't overrun the allocated buffer.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2b47f735
    • Theodore Ts'o's avatar
      ext4: fix in-superblock mount options processing · 399cf969
      Theodore Ts'o authored
      commit 5aee0f8a upstream.
      
      Fix a large number of problems with how we handle mount options in the
      superblock.  For one, if the string in the superblock is long enough
      that it is not null terminated, we could run off the end of the string
      and try to interpret superblocks fields as characters.  It's unlikely
      this will cause a security problem, but it could result in an invalid
      parse.  Also, parse_options is destructive to the string, so in some
      cases if there is a comma-separated string, it would be modified in
      the superblock.  (Fortunately it only happens on file systems with a
      1k block size.)
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      399cf969
    • Theodore Ts'o's avatar
      ext4: use more strict checks for inodes_per_block on mount · de4f994b
      Theodore Ts'o authored
      commit cd6bb35b upstream.
      
      Centralize the checks for inodes_per_block and be more strict to make
      sure the inodes_per_block_group can't end up being zero.
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      de4f994b
    • Chandan Rajendra's avatar
      ext4: fix stack memory corruption with 64k block size · c4c0dbd2
      Chandan Rajendra authored
      commit 30a9d7af upstream.
      
      The number of 'counters' elements needed in 'struct sg' is
      super_block->s_blocksize_bits + 2. Presently we have 16 'counters'
      elements in the array. This is insufficient for block sizes >= 32k. In
      such cases the memcpy operation performed in ext4_mb_seq_groups_show()
      would cause stack memory corruption.
      
      Fixes: c9de560dSigned-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c4c0dbd2
    • Chandan Rajendra's avatar
      ext4: fix mballoc breakage with 64k block size · a844e8a7
      Chandan Rajendra authored
      commit 69e43e8c upstream.
      
      'border' variable is set to a value of 2 times the block size of the
      underlying filesystem. With 64k block size, the resulting value won't
      fit into a 16-bit variable. Hence this commit changes the data type of
      'border' to 'unsigned int'.
      
      Fixes: c9de560dSigned-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a844e8a7
    • Alex Porosanu's avatar
      crypto: caam - fix AEAD givenc descriptors · a8e872c9
      Alex Porosanu authored
      commit d128af17 upstream.
      
      The AEAD givenc descriptor relies on moving the IV through the
      output FIFO and then back to the CTX2 for authentication. The
      SEQ FIFO STORE could be scheduled before the data can be
      read from OFIFO, especially since the SEQ FIFO LOAD needs
      to wait for the SEQ FIFO LOAD SKIP to finish first. The
      SKIP takes more time when the input is SG than when it's
      a contiguous buffer. If the SEQ FIFO LOAD is not scheduled
      before the STORE, the DECO will hang waiting for data
      to be available in the OFIFO so it can be transferred to C2.
      In order to overcome this, first force transfer of IV to C2
      by starting the "cryptlen" transfer first and then starting to
      store data from OFIFO to the output buffer.
      
      Fixes: 1acebad3 ("crypto: caam - faster aead implementation")
      Signed-off-by: default avatarAlex Porosanu <alexandru.porosanu@nxp.com>
      Signed-off-by: default avatarHoria Geantă <horia.geanta@nxp.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a8e872c9
    • NeilBrown's avatar
      block_dev: don't test bdev->bd_contains when it is not stable · 94b87438
      NeilBrown authored
      commit bcc7f5b4 upstream.
      
      bdev->bd_contains is not stable before calling __blkdev_get().
      When __blkdev_get() is called on a parition with ->bd_openers == 0
      it sets
        bdev->bd_contains = bdev;
      which is not correct for a partition.
      After a call to __blkdev_get() succeeds, ->bd_openers will be > 0
      and then ->bd_contains is stable.
      
      When FMODE_EXCL is used, blkdev_get() calls
         bd_start_claiming() ->  bd_prepare_to_claim() -> bd_may_claim()
      
      This call happens before __blkdev_get() is called, so ->bd_contains
      is not stable.  So bd_may_claim() cannot safely use ->bd_contains.
      It currently tries to use it, and this can lead to a BUG_ON().
      
      This happens when a whole device is already open with a bd_holder (in
      use by dm in my particular example) and two threads race to open a
      partition of that device for the first time, one opening with O_EXCL and
      one without.
      
      The thread that doesn't use O_EXCL gets through blkdev_get() to
      __blkdev_get(), gains the ->bd_mutex, and sets bdev->bd_contains = bdev;
      
      Immediately thereafter the other thread, using FMODE_EXCL, calls
      bd_start_claiming() from blkdev_get().  This should fail because the
      whole device has a holder, but because bdev->bd_contains == bdev
      bd_may_claim() incorrectly reports success.
      This thread continues and blocks on bd_mutex.
      
      The first thread then sets bdev->bd_contains correctly and drops the mutex.
      The thread using FMODE_EXCL then continues and when it calls bd_may_claim()
      again in:
      			BUG_ON(!bd_may_claim(bdev, whole, holder));
      The BUG_ON fires.
      
      Fix this by removing the dependency on ->bd_contains in
      bd_may_claim().  As bd_may_claim() has direct access to the whole
      device, it can simply test if the target bdev is the whole device.
      
      Fixes: 6b4517a7 ("block: implement bd_claiming and claiming block")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      94b87438