Commits · 6aaa58c994277647f8b05ffef3b9b225a2d08f36 · nexedi / linux

22 Oct, 2018 1 commit

Jack Wang authored Oct 19, 2018

I noticed kmemleak report memory leak when run create/stop
md in a loop, backtrace:
[<000000001ca975e7>] mempool_create_node+0x86/0xd0
[<0000000095576bcd>] md_run+0x1057/0x1410 [md_mod]
[<000000007b45c5fc>] do_md_run+0x15/0x130 [md_mod]
[<000000001ede9ec0>] md_ioctl+0x1f49/0x25d0 [md_mod]
[<000000004142cacf>] blkdev_ioctl+0x680/0xd00

The root cause is we alloc mddev->flush_pool and
mddev->flush_bio_pool in md_run, but from do_md_stop
will not call into md_stop but __md_stop, move the
mempool_destroy to __md_stop fixes the problem for me.

The bug was introduced in 5a409b4f, the fixes should go to
4.18+

Fixes: 5a409b4f ("MD: fix lock contention for flush bios")
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>

6aaa58c9

18 Oct, 2018 8 commits

md-cluster: remove suspend_info · ea89238c

Guoqing Jiang authored Oct 18, 2018

Previously, we allow multiple nodes can resync device, but we
had changed it to only support one node can do resync at one
time, but suspend_info is still used.

Now, let's remove the structure and use suspend_lo/hi to record
the range.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>

ea89238c

md-cluster: send BITMAP_NEEDS_SYNC message if reshaping is interrupted · cb9ee154

Guoqing Jiang authored Oct 18, 2018

We need to continue the reshaping if it was interrupted in
original node. So original node should call resync_bitmap
in case reshaping is aborted.

Then BITMAP_NEEDS_SYNC message is broadcasted to other nodes,
node which continues the reshaping should restart reshape from
mddev->reshape_position instead of from the first beginning.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>

cb9ee154

md-cluster/bitmap: don't call md_bitmap_sync_with_cluster during reshaping stage · cbce6863

Guoqing Jiang authored Oct 18, 2018

When reshape is happening in one node, other nodes could receive
lots of RESYNCING messages, so md_bitmap_sync_with_cluster is called.

Since the resyncing window is typically small in these RESYNCING
messages, so WARN is always triggered, so we should not call the
func when reshape is happening.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>

cbce6863

md-cluster/raid10: don't call remove_and_add_spares during reshaping stage · ca1e98e0

Guoqing Jiang authored Oct 18, 2018

remove_and_add_spares is not needed if reshape is
happening in another node, because raid10_add_disk
called inside raid10_start_reshape would handle the
role changes of disk. Plus, remove_and_add_spares
can't deal with the role change due to reshape.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>

ca1e98e0

md-cluster/raid10: call update_size in md_reap_sync_thread · aefb2e5f

Guoqing Jiang authored Oct 18, 2018

We need to change the capacity in all nodes after one node
finishs reshape. And as we did before, we can't change the
capacity directly in md_do_sync, instead, the capacity should
be only changed in update_size or received CHANGE_CAPACITY
msg.

So master node calls update_size after completes reshape in
md_reap_sync_thread, but we need to skip ops->update_size if
MD_CLOSING is set since reshaping could not be finish.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>

aefb2e5f

md-cluster: introduce resync_info_get interface for sanity check · 5ebaf80b

Guoqing Jiang authored Oct 18, 2018

Since the resync region from suspend_info means one node
is reshaping this area, so the position of reshape_progress
should be included in the area.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>

5ebaf80b

md-cluster/raid10: support add disk under grow mode · 7564beda

Guoqing Jiang authored Oct 18, 2018

For clustered raid10 scenario, we need to let all the nodes
know about that a new disk is added to the array, and the
reshape caused by add new member just need to be happened in
one node, but other nodes should know about the change.

Since reshape means read data from somewhere (which is already
used by array) and write data to unused region. Obviously, it
is awful if one node is reading data from address while another
node is writing to the same address. Considering we have
implemented suspend writes in the resyncing area, so we can
just broadcast the reading address to other nodes to avoid the
trouble.

For master node, it would call reshape_request then update sb
during the reshape period. To avoid above trouble, we call
resync_info_update to send RESYNC message in reshape_request.

Then from slave node's view, it receives two type messages:
1. RESYNCING message
Slave node add the address (where master node reading data from)
to suspend list.

2. METADATA_UPDATED message
Once slave nodes know the reshaping is started in master node,
it is time to update reshape position and call start_reshape to
follow master node's step. After reshape is done, only reshape
position is need to be updated, so the majority task of reshaping
is happened on the master node.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>

7564beda

md-cluster/raid10: resize all the bitmaps before start reshape · afd75628

Guoqing Jiang authored Oct 18, 2018

To support add disk under grow mode, we need to resize
all the bitmaps of each node before reshape, so that we
can ensure all nodes have the same view of the bitmap of
the clustered raid.

So after the master node resized the bitmap, it broadcast
a message to other slave nodes, and it checks the size of
each bitmap are same or not by compare pages. We can only
continue the reshaping after all nodes update the bitmap
to the same size (by checking the pages), otherwise revert
bitmap size to previous value.

The resize_bitmaps interface and BITMAP_RESIZE message are
introduced in md-cluster.c for the purpose.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>

afd75628

15 Oct, 2018 1 commit

MD: fix invalid stored role for a disk - try2 · 9e753ba9

Shaohua Li authored Oct 14, 2018

Commit d595567d (MD: fix invalid stored role for a disk) broke linear
hotadd. Let's only fix the role for disks in raid1/10.
Based on Guoqing's original patch.
Reported-by: kernel test robot <rong.a.chen@intel.com>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>

9e753ba9

10 Oct, 2018 2 commits

md/bitmap: use mddev_suspend/resume instead of ->quiesce() · f8f83d8f

Jack Wang authored Oct 08, 2018

After 9e1cc0a5 ("md: use mddev_suspend/resume instead of ->quiesce()")
We still have similar left in bitmap functions.

Replace quiesce() with mddev_suspend/resume.

Also move md_bitmap_create out of mddev_suspend. and move mddev_resume
after md_bitmap_destroy. as we did in set_bitmap_file.
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Shaohua Li <shli@fb.com>

f8f83d8f

md: remove redundant code that is no longer reachable · 116d99ad

Colin Ian King authored Oct 08, 2018

And earlier commit removed the error label to two statements that
are now never reachable.  Since this code is now dead code, remove it.

Detected by CoverityScan, CID#1462409 ("Structurally dead code")

Fixes: d5d885fd ("md: introduce new personality funciton start()")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Shaohua Li <shli@fb.com>

116d99ad

03 Oct, 2018 1 commit

md: allow metadata updates while suspending an array - fix · 059421e0

NeilBrown authored Oct 03, 2018

Commit 35bfc521 ("md: allow metadata update while suspending.")
added support for allowing md_check_recovery() to still perform
metadata updates while the array is entering the 'suspended' state.
This is needed to allow the processes of entering the state to
complete.

Unfortunately, the patch doesn't really work.  The test for
"mddev->suspended" at the start of md_check_recovery() means that the
function doesn't try to do anything at all while entering suspend.

This patch moves the code of updating the metadata while suspending to
*before* the test on mddev->suspended.
Reported-by: Jeff Mahoney <jeffm@suse.com>
Fixes: 35bfc521 ("md: allow metadata update while suspending.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>

059421e0

02 Oct, 2018 1 commit

MD: fix invalid stored role for a disk · d595567d

Shaohua Li authored Oct 01, 2018

If we change the number of array's device after device is removed from array,
then add the device back to array, we can see that device is added as active
role instead of spare which we expected.

Please see the below link for details:
https://marc.info/?l=linux-raid&m=153736982015076&w=2

This is caused by that we prefer to use device's previous role which is
recorded by saved_raid_disk, but we should respect the new number of
conf->raid_disks since it could be changed after device is removed.
Reported-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Tested-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Acked-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>

d595567d

28 Sep, 2018 5 commits

md/raid10: Fix raid10 replace hang when new added disk faulty · ee37d731

Alex Wu authored Sep 21, 2018

[Symptom]

Resync thread hang when new added disk faulty during replacing.

[Root Cause]

In raid10_sync_request(), we expect to issue a bio with callback
end_sync_read(), and a bio with callback end_sync_write().

In normal situation, we will add resyncing sectors into
mddev->recovery_active when raid10_sync_request() returned, and sub
resynced sectors from mddev->recovery_active when end_sync_write()
calls end_sync_request().

If new added disk, which are replacing the old disk, is set faulty,
there is a race condition:
    1. In the first rcu protected section, resync thread did not detect
       that mreplace is set faulty and pass the condition.
    2. In the second rcu protected section, mreplace is set faulty.
    3. But, resync thread will prepare the read object first, and then
       check the write condition.
    4. It will find that mreplace is set faulty and do not have to
       prepare write object.
This cause we add resync sectors but never sub it.

[How to Reproduce]

This issue can be easily reproduced by the following steps:
    mdadm -C /dev/md0 --assume-clean -l 10 -n 4 /dev/sd[abcd]
    mdadm /dev/md0 -a /dev/sde
    mdadm /dev/md0 --replace /dev/sdd
    sleep 1
    mdadm /dev/md0 -f /dev/sde

[How to Fix]

This issue can be fixed by using local variables to record the result
of test conditions. Once the conditions are satisfied, we can make sure
that we need to issue a bio for read and a bio for write.

Previous 'commit 24afd80d ("md/raid10: handle recovery of
replacement devices.")' will also check whether bio is NULL, but leave
the comment saying that it is a pointless test. So we remove this dummy
check.
Reported-by: Alex Chen <alexchen@synology.com>
Reviewed-by: Allen Peng <allenpeng@synology.com>
Reviewed-by: BingJing Chang <bingjingc@synology.com>
Signed-off-by: Alex Wu <alexwu@synology.com>
Signed-off-by: Shaohua Li <shli@fb.com>

ee37d731

raid5: block failing device if raid will be failed · fb73b357

Mariusz Tkaczyk authored Sep 04, 2018

Currently there is an inconsistency for failing the member drives
for arrays with different RAID levels. For RAID456 - there is a possibility
to fail all of the devices. However - for other RAID levels - kernel blocks
removing the member drive, if the operation results in array's FAIL state
(EBUSY is returned). For example - removing last drive from RAID1 is not
possible.
This kind of blocker was never implemented for raid456 and we cannot see
the reason why.

We had tested following patch and did not observe any regression, so do you
have any comments/reasons for current approach, or we can send the proper
patch for this?
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>

fb73b357

Merge tag 'drm-fixes-2018-09-28' of git://anongit.freedesktop.org/drm/drm · f151f57b

Greg Kroah-Hartman authored Sep 28, 2018

Dave writes:
  "drm fixes for 4.19-rc6

   Looks like a pretty normal week for graphics,

   core: syncobj fix, panel link regression revert
   amd: suspend/resume fixes, EDID emulation fix
   mali-dp: NV12 writeback and vblank reset fixes
   etnaviv: DMA setup fix"

* tag 'drm-fixes-2018-09-28' of git://anongit.freedesktop.org/drm/drm:
  drm/amd/display: Fix Edid emulation for linux
  drm/amd/display: Fix Vega10 lightup on S3 resume
  drm/amdgpu: Fix vce work queue was not cancelled when suspend
  Revert "drm/panel: Add device_link from panel device to DRM device"
  drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
  drm/malidp: Fix writeback in NV12
  drm: mali-dp: Call drm_crtc_vblank_reset on device init
  drm/etnaviv: add DMA configuration for etnaviv platform device

f151f57b

Merge tag 'riscv-for-linus-4.19-rc6' of... · ed1b3f4c

Greg Kroah-Hartman authored Sep 28, 2018

Merge tag 'riscv-for-linus-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux

Palmer writes:
  "A Single RISC-V Update for 4.19-rc6

   The Debian guys have been pushing on our port and found some
   unversioned symbols leaking into modules.  This PR contains a single
   fix for that issue."

* tag 'riscv-for-linus-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  RISC-V: include linux/ftrace.h in asm-prototypes.h

ed1b3f4c

Merge tag 'pci-v4.19-fixes-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 278e59a0

Greg Kroah-Hartman authored Sep 28, 2018

Bjorn writes:
  "PCI fixes:

  - Fix ACPI hotplug issue that causes black screen crash at boot (Mika
    Westerberg)

  - Fix DesignWare "scheduling while atomic" issues (Jisheng Zhang)

  - Add PPC contacts to MAINTAINERS for PCI core error handling (Bjorn
    Helgaas)

  - Sort Mobiveil MAINTAINERS entry (Lorenzo Pieralisi)"

* tag 'pci-v4.19-fixes-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge
  PCI: dwc: Fix scheduling while atomic issues
  MAINTAINERS: Move mobiveil PCI driver entry where it belongs
  MAINTAINERS: Update PPC contacts for PCI core error handling

278e59a0

27 Sep, 2018 10 commits

Merge branch 'drm-fixes-4.19' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · fcb1349a

Dave Airlie authored Sep 28, 2018

Just a few fixes for 4.19:
- Couple of suspend/resume fixes
- Fix EDID emulation with DC
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180927155418.2813-1-alexander.deucher@amd.com

fcb1349a

Merge tag 'drm-misc-fixes-2018-09-27-1' of... · adba0e54

Dave Airlie authored Sep 28, 2018

Merge tag 'drm-misc-fixes-2018-09-27-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

- Revert adding device-link to panels
- Don't leak fences in drm/syncobj
Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20180927152712.GA53076@art_vandelay

adba0e54

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · ad037148

Greg Kroah-Hartman authored Sep 27, 2018

Jason writes:
  "Second RDMA rc pull request

   - Fix a long standing race bug when destroying comp_event file descriptors

   - srp, hfi1, bnxt_re: Various driver crashes from missing validation
     and other cases

   - Fixes for regressions in patches merged this window in the gid
     cache, devx, ucma and uapi."

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/core: Set right entry state before releasing reference
  IB/mlx5: Destroy the DEVX object upon error flow
  IB/uverbs: Free uapi on destroy
  RDMA/bnxt_re: Fix system crash during RDMA resource initialization
  IB/hfi1: Fix destroy_qp hang after a link down
  IB/hfi1: Fix context recovery when PBC has an UnsupportedVL
  IB/hfi1: Invalid user input can result in crash
  IB/hfi1: Fix SL array bounds check
  RDMA/uverbs: Fix validity check for modify QP
  IB/srp: Avoid that sg_reset -d ${srp_device} triggers an infinite loop
  ucma: fix a use-after-free in ucma_resolve_ip()
  RDMA/uverbs: Atomically flush and mark closed the comp event queue
  cxgb4: fix abort_req_rss6 struct

ad037148

Merge tag 'for_v4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · c127e59b

Greg Kroah-Hartman authored Sep 27, 2018

Jan writes:
  "an ext2 patch fixing fsync(2) for DAX mounts."

* tag 'for_v4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext2, dax: set ext2_dax_aops for dax files

c127e59b

drm/amd/display: Fix Edid emulation for linux · fbbdadf2

Bhawanpreet Lakha authored Sep 26, 2018

[Why]
EDID emulation didn't work properly for linux, as we stop programming
if nothing is connected physically.

[How]
We get a flag from DRM when we want to do edid emulation. We check if
this flag is true and nothing is connected physically, if so we only
program the front end using VIRTUAL_SIGNAL.
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

fbbdadf2

drm/amd/display: Fix Vega10 lightup on S3 resume · 599760d6

Roman Li authored Sep 26, 2018

[Why]
There have been a few reports of Vega10 display remaining blank
after S3 resume. The regression is caused by workaround for mode
change on Vega10 - skip set_bandwidth if stream count is 0.
As a result we skipped dispclk reset on suspend, thus on resume
we may skip the clock update assuming it hasn't been changed.
On some systems it causes display blank or 'out of range'.

[How]
Revert "drm/amd/display: Fix Vega10 black screen after mode change"
Verified that it hadn't cause mode change regression.
Signed-off-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Sun peng Li <Sunpeng.Li@amd.com>
Acked-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

599760d6

drm/amdgpu: Fix vce work queue was not cancelled when suspend · 61ea6f58

Rex Zhu authored Sep 27, 2018

The vce cancel_delayed_work_sync never be called.
driver call the function in error path.

This caused the A+A suspend hang when runtime pm enebled.
As we will visit the smu in the idle queue. this will cause
smu hang because the dgpu has been suspend, and the dgpu also
will be waked up. As the smu has been hang, so the dgpu resume
will failed.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

61ea6f58

Revert "drm/panel: Add device_link from panel device to DRM device" · d6a77ba0

Linus Walleij authored Sep 27, 2018

This reverts commit 0c08754b.

commit 0c08754b
("drm/panel: Add device_link from panel device to DRM device")
creates a circular dependency under these circumstances:

1. The panel depends on dsi-host because it is MIPI-DSI child
   device.
2. dsi-host depends on the drm parent device (connector->dev->dev)
   this should be allowed.
3. drm parent dev (connector->dev->dev) depends on the panel
   after this patch.

This makes the dependency circular and while it appears it
does not affect any in-tree drivers (they do not seem to have
dsi hosts depending on the same parent device) this does not
seem right.

As noted in a response from Andrzej Hajda, the intent is
likely to make the panel dependent on the DRM device
(connector->dev) not its parent. But we have no way of
doing that since the DRM device doesn't contain any
struct device on its own (arguably it should).

Revert this until a proper approach is figured out.

Cc: Jyri Sarha <jsarha@ti.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20180927124130.9102-1-linus.walleij@linaro.org

d6a77ba0

Merge branch 'for-upstream/malidp-fixes' of git://linux-arm.org/linux-ld into drm-fixes · 576156bb

Dave Airlie authored Sep 27, 2018

Fix NV12 writeback and fix vblank reset.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Liviu Dudau <Liviu.Dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180921112354.GR936@e110455-lin.cambridge.arm.com

576156bb

Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm-fixes · e89fe98b

Dave Airlie authored Sep 27, 2018

one fix to get a proper DMA configuration in place for the etnaviv
virtual device. I'm sending this as a fix, as a dma-mapping change at
the ARC architecture side during the 4.19 cycle broke etnaviv on this
platform, which gets remedied with this patch, but it also enables
ARM64.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas Stach <l.stach@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/ea1f712bf09bf9439c6b092bf2c2bde7bb01cf5e.camel@pengutronix.de

e89fe98b

26 Sep, 2018 4 commits

ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge · f188b99f

Mika Westerberg authored Sep 26, 2018

HP 6730b laptop has an ethernet NIC connected to one of the PCIe root
ports.  The root ports themselves are native PCIe hotplug capable.  Now,
during boot after PCI devices are scanned the BIOS triggers ACPI bus check
directly to the NIC:

  ACPI: \_SB_.PCI0.RP06.NIC_: Bus check in hotplug_event()

It is not clear why it is sending bus check but regardless the ACPI hotplug
notify handler calls enable_slot() directly (instead of going through
acpiphp_check_bridge() as there is no bridge), which ends up handling
special case for non-hotplug bridges with native PCIe hotplug.  This
results a crash of some kind but the reporter only sees black screen so it
is hard to figure out the exact spot and what actually happens.  Based on
a few fix proposals it was tracked to crash somewhere inside
pci_assign_unassigned_bridge_resources().

In any case we should not really be in that special branch at all because
the ACPI notify happened to a slot that is not a PCI bridge (it is just a
regular PCI device).

Fix this so that we only go to that special branch if we are calling
enable_slot() for a bridge (e.g., the ACPI notification was for the
bridge).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=201127
Fixes: 84c8b58e ("ACPI / hotplug / PCI: Don't scan bridges managed by native hotplug")
Reported-by: Peter Anemone <peter.anemone@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: stable@vger.kernel.org	# v4.18+

f188b99f

drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set · 337fe9f5

Jason Ekstrand authored Sep 26, 2018

We attempt to get fences earlier in the hopes that everything will
already have fences and no callbacks will be needed. If we do succeed
in getting a fence, getting one a second time will result in a duplicate
ref with no unref. This is causing memory leaks in Vulkan applications
that create a lot of fences; playing for a few hours can, apparently,
bring down the system.

Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107899Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20180926071703.15257-1-jason.ekstrand@intel.com

337fe9f5

Merge tag 'iommu-fixes-v4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · c307aaf3

Greg Kroah-Hartman authored Sep 26, 2018

Joerg writes:
  "IOMMU Fixes for Linux v4.19-rc5

   Three fixes queued up:

	- Warning fix for Rockchip IOMMU where there were IRQ handlers
	  for offlined hardware.

	- Fix for Intel VT-d because recent changes caused boot failures
	  on some machines because it tried to allocate to much
	  contiguous memory.

	- Fix for AMD IOMMU to handle eMMC devices correctly that appear
	  as ACPI HID devices."

* tag 'iommu-fixes-v4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Return devid as alias for ACPI HID devices
  iommu/vt-d: Handle memory shortage on pasid table allocation
  iommu/rockchip: Free irqs in shutdown handler

c307aaf3

iommu/amd: Return devid as alias for ACPI HID devices · 5ebb1bc2

Arindam Nath authored Sep 18, 2018

ACPI HID devices do not actually have an alias for
them in the IVRS. But dev_data->alias is still used
for indexing into the IOMMU device table for devices
being handled by the IOMMU. So for ACPI HID devices,
we simply return the corresponding devid as an alias,
as parsed from IVRS table.
Signed-off-by: Arindam Nath <arindam.nath@amd.com>
Fixes: 2bf9a0a1 ('iommu/amd: Add iommu support for ACPI HID devices')
Signed-off-by: Joerg Roedel <jroedel@suse.de>

5ebb1bc2

25 Sep, 2018 7 commits

RDMA/core: Set right entry state before releasing reference · 5c5702e2

Parav Pandit authored Sep 25, 2018

Currently add_modify_gid() for IB link layer has followong issue
in cache update path.

When GID update event occurs, core releases reference to the GID
table without updating its state and/or entry pointer.

CPU-0                              CPU-1
------                             -----
ib_cache_update()                    IPoIB ULP
   add_modify_gid()                   [..]
      put_gid_entry()
      refcnt = 0, but
      state = valid,
      entry is valid.
      (work item is not yet executed).
                                   ipoib_create_ah()
                                     rdma_create_ah()
                                        rdma_get_gid_attr() <--
                                   	Tries to acquire gid_attr
                                        which has refcnt = 0.
                                   	This is incorrect.

GID entry state and entry pointer is provides the accurate GID enty
state. Such fields must be updated with rwlock to protect against
readers and, such fields must be in sane state before refcount can drop
to zero. Otherwise above race condition can happen leading to
use-after-free situation.

Following backtrace has been observed when cache update for an IB port
is triggered while IPoIB ULP is creating an AH.

Therefore, when updating GID entry, first mark a valid entry as invalid
through state and set the barrier so that no callers can acquired
the GID entry, followed by release reference to it.

refcount_t: increment on 0; use-after-free.
WARNING: CPU: 4 PID: 29106 at lib/refcount.c:153 refcount_inc_checked+0x30/0x50
Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
RIP: 0010:refcount_inc_checked+0x30/0x50
RSP: 0018:ffff8802ad36f600 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000008 RDI: ffffffff86710100
RBP: ffff8802d6e60a30 R08: ffffed005d67bf8b R09: ffffed005d67bf8b
R10: 0000000000000001 R11: ffffed005d67bf8a R12: ffff88027620cee8
R13: ffff8802d6e60988 R14: ffff8802d6e60a78 R15: 0000000000000202
FS: 0000000000000000(0000) GS:ffff8802eb200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3ab35e5c88 CR3: 00000002ce84a000 CR4: 00000000000006e0
IPv6: ADDRCONF(NETDEV_CHANGE): ib1: link becomes ready
Call Trace:
rdma_get_gid_attr+0x220/0x310 [ib_core]
? lock_acquire+0x145/0x3a0
rdma_fill_sgid_attr+0x32c/0x470 [ib_core]
rdma_create_ah+0x89/0x160 [ib_core]
? rdma_fill_sgid_attr+0x470/0x470 [ib_core]
? ipoib_create_ah+0x52/0x260 [ib_ipoib]
ipoib_create_ah+0xf5/0x260 [ib_ipoib]
ipoib_mcast_join_complete+0xbbe/0x2540 [ib_ipoib]

Fixes: b150c386 ("IB/core: Introduce GID entry reference counts")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

5c5702e2

IB/mlx5: Destroy the DEVX object upon error flow · e8ef090a

Yishai Hadas authored Sep 25, 2018

Upon DEVX object creation the object must be destroyed upon a follows
error flow.

Fixes: 7efce369 ("IB/mlx5: Add obj create and destroy functionality")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

e8ef090a

IB/uverbs: Free uapi on destroy · a9360abd

Mark Bloch authored Sep 25, 2018

Make sure we free struct uverbs_api once we clean the radix tree. It was
allocated by uverbs_alloc_api().

Fixes: 9ed3e5f4 ("IB/uverbs: Build the specs into a radix tree at runtime")
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

a9360abd

erge tag 'libnvdimm-fixes-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · a3852318

Greg Kroah-Hartman authored Sep 25, 2018

Dan writes:
  "libnvdimm/dax for 4.19-rc6

  * (2) fixes for the dax error handling updates that were merged for
  v4.19-rc1. My mails to Al have been bouncing recently, so I do not have
  his ack but the uaccess change is of the trivial / obviously correct
  variety. The address_space_operations fixes a regression.

  * A filesystem-dax fix to correct the zero page lookup to be compatible
   with non-x86 (mips and s390) architectures."

* tag 'libnvdimm-fixes-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  device-dax: Add missing address_space_operations
  uaccess: Fix is_source param for check_copy_size() in copy_to_iter_mcsafe()
  filesystem-dax: Fix use of zero page

a3852318

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 846e8dd4

Greg Kroah-Hartman authored Sep 25, 2018

James writes:
  "SCSI fixes on 20180925

   Nine obvious bug fixes mostly in individual drivers.  The target fix
   is of particular importance because it's CVE related."

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: don't crash the host on invalid commands
  scsi: ipr: System hung while dlpar adding primary ipr adapter back
  scsi: target: iscsi: Use bin2hex instead of a re-implementation
  scsi: target: iscsi: Use hex2bin instead of a re-implementation
  scsi: lpfc: Synchronize access to remoteport via rport
  scsi: ufs: Disable blk-mq for now
  scsi: sd: Contribute to randomness when running rotational device
  scsi: ibmvscsis: Ensure partition name is properly NUL terminated
  scsi: ibmvscsis: Fix a stringop-overflow warning

846e8dd4

Merge tag 'usb-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · bfb0e9b4

Greg Kroah-Hartman authored Sep 25, 2018

I wrote:
  "USB fixes for 4.19-rc6

   Here are some small USB core and driver fixes for reported issues for
   4.19-rc6.

   The most visible is the oops fix for when the USB core is built into the
   kernel that is present in 4.18.  Turns out not many people actually do
   that so it went unnoticed for a while.  The rest is some tiny typec,
   musb, and other core fixes.

   All have been in linux-next with no reported issues."

* tag 'usb-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: mux: Take care of driver module reference counting
  usb: core: safely deal with the dynamic quirk lists
  usb: roles: Take care of driver module reference counting
  USB: handle NULL config in usb_find_alt_setting()
  USB: fix error handling in usb_driver_claim_interface()
  USB: remove LPM management from usb_driver_claim_interface()
  USB: usbdevfs: restore warning for nonsensical flags
  USB: usbdevfs: sanitize flags more
  Revert "usb: cdc-wdm: Fix a sleep-in-atomic-context bug in service_outstanding_interrupt()"
  usb: musb: dsps: do not disable CPPI41 irq in driver teardown

bfb0e9b4

Merge tag 'tty-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · ccf791e5

Greg Kroah-Hartman authored Sep 25, 2018

I wrote:
  "TTY/Serial driver fixes for 4.19-rc6

   Here are a number of small tty and serial driver fixes for reported
   issues for 4.19-rc6.

   One should hopefully resolve a much-reported issue that syzbot has found
   in the tty layer.  Although there are still more issues there, getting
   this fixed is nice to see finally happen.

   All of these have been in linux-next for a while with no reported
   issues."

* tag 'tty-4.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: imx: restore handshaking irq for imx1
  tty: vt_ioctl: fix potential Spectre v1
  tty: Drop tty->count on tty_reopen() failure
  serial: cpm_uart: return immediately from console poll
  tty: serial: lpuart: avoid leaking struct tty_struct
  serial: mvebu-uart: Fix reporting of effective CSIZE to userspace

ccf791e5