libata: introduce ata_host->n_tags to avoid oops on SAS controllers
1871ee13 ("libata: support the ata host which implements a queue depth less than 32") directly used ata_port->scsi_host->can_queue from ata_qc_new() to determine the number of tags supported by the host; unfortunately, SAS controllers doing SATA don't initialize ->scsi_host leading to the following oops. BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0 PGD 0 Oops: 0002 [#1] SMP Modules linked in: isci libsas scsi_transport_sas mgag200 drm_kms_helper ttm CPU: 1 PID: 518 Comm: udevd Not tainted 3.16.0-rc6+ #62 Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013 task: ffff880c1a00b280 ti: ffff88061a000000 task.ti: ffff88061a000000 RIP: 0010:[<ffffffff814e0618>] [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0 RSP: 0018:ffff88061a003ae8 EFLAGS: 00010012 RAX: 0000000000000001 RBX: ffff88000241ca80 RCX: 00000000000000fa RDX: 0000000000000020 RSI: 0000000000000020 RDI: ffff8806194aa298 RBP: ffff88061a003ae8 R08: ffff8806194a8000 R09: 0000000000000000 R10: 0000000000000000 R11: ffff88000241ca80 R12: ffff88061ad58200 R13: ffff8806194aa298 R14: ffffffff814e67a0 R15: ffff8806194a8000 FS: 00007f3ad7fe3840(0000) GS:ffff880627620000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000058 CR3: 000000061a118000 CR4: 00000000001407e0 Stack: ffff88061a003b20 ffffffff814e96e1 ffff88000241ca80 ffff88061ad58200 ffff8800b6bf6000 ffff880c1c988000 ffff880619903850 ffff88061a003b68 ffffffffa0056ce1 ffff88061a003b48 0000000013d6e6f8 ffff88000241ca80 Call Trace: [<ffffffff814e96e1>] ata_sas_queuecmd+0xa1/0x430 [<ffffffffa0056ce1>] sas_queuecommand+0x191/0x220 [libsas] [<ffffffff8149afee>] scsi_dispatch_cmd+0x10e/0x300 [<ffffffff814a3bc5>] scsi_request_fn+0x2f5/0x550 [<ffffffff81317613>] __blk_run_queue+0x33/0x40 [<ffffffff8131781a>] queue_unplugged+0x2a/0x90 [<ffffffff8131ceb4>] blk_flush_plug_list+0x1b4/0x210 [<ffffffff8131d274>] blk_finish_plug+0x14/0x50 [<ffffffff8117eaa8>] __do_page_cache_readahead+0x198/0x1f0 [<ffffffff8117ee21>] force_page_cache_readahead+0x31/0x50 [<ffffffff8117ee7e>] page_cache_sync_readahead+0x3e/0x50 [<ffffffff81172ac6>] generic_file_read_iter+0x496/0x5a0 [<ffffffff81219897>] blkdev_read_iter+0x37/0x40 [<ffffffff811e307e>] new_sync_read+0x7e/0xb0 [<ffffffff811e3734>] vfs_read+0x94/0x170 [<ffffffff811e43c6>] SyS_read+0x46/0xb0 [<ffffffff811e33d1>] ? SyS_lseek+0x91/0xb0 [<ffffffff8171ee29>] system_call_fastpath+0x16/0x1b Code: 00 00 00 88 50 29 83 7f 08 01 19 d2 83 e2 f0 83 ea 50 88 50 34 c6 81 1d 02 00 00 40 c6 81 17 02 00 00 00 5d c3 66 0f 1f 44 00 00 <89> 14 25 58 00 00 00 Fix it by introducing ata_host->n_tags which is initialized to ATA_MAX_QUEUE - 1 in ata_host_init() for SAS controllers and set to scsi_host_template->can_queue in ata_host_register() for !SAS ones. As SAS hosts are never registered, this will give them the same ATA_MAX_QUEUE - 1 as before. Note that we can't use scsi_host->can_queue directly for SAS hosts anyway as they can go higher than the libata maximum. Signed-off-by:Tejun Heo <tj@kernel.org> Reported-by:
Mike Qiu <qiudayu@linux.vnet.ibm.com> Reported-by:
Jesse Brandeburg <jesse.brandeburg@gmail.com> Reported-by:
Peter Hurley <peter@hurleysoftware.com> Reported-by:
Peter Zijlstra <peterz@infradead.org> Tested-by:
Alexey Kardashevskiy <aik@ozlabs.ru> Fixes: 1871ee13 ("libata: support the ata host which implements a queue depth less than 32") Cc: Kevin Hao <haokexin@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: stable@vger.kernel.org ahci: add support for the Promise FastTrak TX8660 SATA HBA (ahci mode) Add support of the Promise FastTrak TX8660 SATA HBA in ahci mode by registering the board in the ahci_pci_tbl[]. Note: this HBA also provide a hardware RAID mode when activated in BIOS but specific drivers from the manufacturer are required in this case. Signed-off-by:
Romain Degez <romain.degez@gmail.com> Tested-by:
Romain Degez <romain.degez@gmail.com> Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org drivers/ata/pata_ep93xx.c: use signed int type for result of platform_get_irq() [linux-3.16-rc5/drivers/ata/pata_ep93xx.c:929]: (style) Checking if unsigned variable 'irq' is less than zero. Source code is irq = platform_get_irq(pdev, 0); if (irq < 0) { but unsigned int irq; $ fgrep platform_get_irq `find . -name \*.h -print` ./include/linux/platform_device.h:extern int platform_get_irq(struct platform_device *, unsigned int); Now using "int" type instead of "unsigned int" for "irq" variable. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=80401Reported-by:
David Binderman <dcb314@hotmail.com> Signed-off-by:
Andrey Utkin <andrey.krieger.utkin@gmail.com> Signed-off-by:
Tejun Heo <tj@kernel.org> libata: EH should handle AMNF error condition as a media error libata-eh.c should handle AMNF error condition (error byte bit 0, usually code 0x01) in libata-eh.c along with UNC as a media error so SCSI stack can handle it properly (translation code 0x01 is already present in libata-scsi.c) but was never passed down due to lack of handling in EH. While using linux-based machine (AMD 6550M-based notebook, PCI IDs for the controller are 1022:7801 subsys 1025:059d) and ddrescue to salvage data from failing hard drive (WD7500BPVT 2.5" 750G SATA2), I've found that pure AMNF 0x01 error code generates generic "device error" that is retried several times by SCSI stack instead of "media error" that is passed up to software. So we may assume deprecated AMNF error code is surely not dead yet, and it's better for it to be handled properly. As we may see it is used by modern enough devices, and used properly: drive returned AMNF only when IDs for track cannot be read completely due to dying head or positioning, otherwise it returned UNC(orrectables). Not handling it causes wrong generic error code ("device error") reporting down the stack, can damage failing drives further because of excessive retries, and slows salvaging down a lot. Also, there is handling code in libata-scsi.c for 0x01 AMNF error already. https://bugzilla.kernel.org/show_bug.cgi?id=80031 tj: Shortened $SUBJ and moved its content to the first paragraph. Signed-off-by:
Alexey Asemov <alex@alex-at.ru> Signed-off-by:
Tejun Heo <tj@kernel.org> libata: support the ata host which implements a queue depth less than 32 The sata on fsl mpc8315e is broken after the commit 8a4aeec8 ("libata/ahci: accommodate tag ordered controllers"). The reason is that the ata controller on this SoC only implement a queue depth of 16. When issuing the commands in tag order, all the commands in tag 16 ~ 31 are mapped to tag 0 unconditionally and then causes the sata malfunction. It makes no senses to use a 32 queue in software while the hardware has less queue depth. So consider the queue depth implemented by the hardware when requesting a command tag. Fixes: 8a4aeec8 ("libata/ahci: accommodate tag ordered controllers") Cc: stable@vger.kernel.org Signed-off-by:
Kevin Hao <haokexin@gmail.com> Acked-by:
Dan Williams <dan.j.williams@intel.com> Signed-off-by:
Tejun Heo <tj@kernel.org> (cherry picked from commit 1a112d10 1871ee13) Signed-off-by:
Sasha Levin <sasha.levin@oracle.com>
Showing
Please register or sign in to comment