Commit 76727c02 authored by Johan Hedberg's avatar Johan Hedberg Committed by Marcel Holtmann

Bluetooth: Call drain_workqueue() before resetting state

Doing things like hci_conn_hash_flush() while holding the hdev lock is
risky since its synchronous pending work cancellation could cause the
L2CAP layer to try to reacquire the hdev lock. Right now there doesn't
seem to be any obvious places where this would for certain happen but
it's already enough to cause lockdep to start warning against the hdev
and the work struct locks being taken in the "wrong" order:

[  +0.000373] mgmt-tester/1603 is trying to acquire lock:
[  +0.000292]  ((&conn->pending_rx_work)){+.+.+.}, at: [<c104266d>] flush_work+0x0/0x181
[  +0.000270]
but task is already holding lock:
[  +0.000000]  (&hdev->lock){+.+.+.}, at: [<c13b9a80>] hci_dev_do_close+0x166/0x359
[  +0.000000]
which lock already depends on the new lock.

[  +0.000000]
the existing dependency chain (in reverse order) is:
[  +0.000000]
-> #1 (&hdev->lock){+.+.+.}:
[  +0.000000]        [<c105ea8f>] lock_acquire+0xe3/0x156
[  +0.000000]        [<c140c663>] mutex_lock_nested+0x54/0x375
[  +0.000000]        [<c13d644b>] l2cap_recv_frame+0x293/0x1a9c
[  +0.000000]        [<c13d7ca4>] process_pending_rx+0x50/0x5e
[  +0.000000]        [<c1041a3f>] process_one_work+0x21c/0x436
[  +0.000000]        [<c1041e3d>] worker_thread+0x1be/0x251
[  +0.000000]        [<c1045a22>] kthread+0x94/0x99
[  +0.000000]        [<c140f801>] ret_from_kernel_thread+0x21/0x30
[  +0.000000]
-> #0 ((&conn->pending_rx_work)){+.+.+.}:
[  +0.000000]        [<c105e158>] __lock_acquire+0xa07/0xc89
[  +0.000000]        [<c105ea8f>] lock_acquire+0xe3/0x156
[  +0.000000]        [<c1042696>] flush_work+0x29/0x181
[  +0.000000]        [<c1042864>] __cancel_work_timer+0x76/0x8f
[  +0.000000]        [<c104288c>] cancel_work_sync+0xf/0x11
[  +0.000000]        [<c13d4c18>] l2cap_conn_del+0x72/0x183
[  +0.000000]        [<c13d8953>] l2cap_disconn_cfm+0x49/0x55
[  +0.000000]        [<c13be37a>] hci_conn_hash_flush+0x7a/0xc3
[  +0.000000]        [<c13b9af6>] hci_dev_do_close+0x1dc/0x359
[  +0.012038]        [<c13bbe38>] hci_unregister_dev+0x6e/0x1a3
[  +0.000000]        [<c12d33c1>] vhci_release+0x28/0x47
[  +0.000000]        [<c10dd6a9>] __fput+0xd6/0x154
[  +0.000000]        [<c10dd757>] ____fput+0xd/0xf
[  +0.000000]        [<c1044bb2>] task_work_run+0x6b/0x8d
[  +0.000000]        [<c1001bd2>] do_notify_resume+0x3c/0x3f
[  +0.000000]        [<c140fa70>] work_notifysig+0x29/0x31
[  +0.000000]
other info that might help us debug this:

[  +0.000000]  Possible unsafe locking scenario:

[  +0.000000]        CPU0                    CPU1
[  +0.000000]        ----                    ----
[  +0.000000]   lock(&hdev->lock);
[  +0.000000]                                lock((&conn->pending_rx_work));
[  +0.000000]                                lock(&hdev->lock);
[  +0.000000]   lock((&conn->pending_rx_work));
[  +0.000000]
 *** DEADLOCK ***

Fully fixing this would require some quite heavy refactoring to change
how the hdev lock and hci_conn instances are handled together. A simpler
solution for now which this patch takes is to try ensure that the hdev
workqueue is empty before proceeding with the various cleanup calls,
including hci_conn_hash_flush().
Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
parent 38da1703
...@@ -2561,6 +2561,11 @@ static int hci_dev_do_close(struct hci_dev *hdev) ...@@ -2561,6 +2561,11 @@ static int hci_dev_do_close(struct hci_dev *hdev)
if (test_bit(HCI_MGMT, &hdev->dev_flags)) if (test_bit(HCI_MGMT, &hdev->dev_flags))
cancel_delayed_work_sync(&hdev->rpa_expired); cancel_delayed_work_sync(&hdev->rpa_expired);
/* Avoid potential lockdep warnings from the *_flush() calls by
* ensuring the workqueue is empty up front.
*/
drain_workqueue(hdev->workqueue);
hci_dev_lock(hdev); hci_dev_lock(hdev);
hci_inquiry_cache_flush(hdev); hci_inquiry_cache_flush(hdev);
hci_pend_le_actions_clear(hdev); hci_pend_le_actions_clear(hdev);
...@@ -2684,6 +2689,11 @@ int hci_dev_reset(__u16 dev) ...@@ -2684,6 +2689,11 @@ int hci_dev_reset(__u16 dev)
skb_queue_purge(&hdev->rx_q); skb_queue_purge(&hdev->rx_q);
skb_queue_purge(&hdev->cmd_q); skb_queue_purge(&hdev->cmd_q);
/* Avoid potential lockdep warnings from the *_flush() calls by
* ensuring the workqueue is empty up front.
*/
drain_workqueue(hdev->workqueue);
hci_dev_lock(hdev); hci_dev_lock(hdev);
hci_inquiry_cache_flush(hdev); hci_inquiry_cache_flush(hdev);
hci_conn_hash_flush(hdev); hci_conn_hash_flush(hdev);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment