1. 07 Jun, 2017 40 commits
    • Alex Deucher's avatar
      drm/amd/powerplay/smu7: add vblank check for mclk switching (v2) · 662dbfcc
      Alex Deucher authored
      commit 09be4a52 upstream.
      
      Check to make sure the vblank period is long enough to support
      mclk switching.
      
      v2: drop needless initial assignment (Nils)
      
      bug: https://bugs.freedesktop.org/show_bug.cgi?id=96868Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Reviewed-by: default avatarRex Zhu <Rex.Zhu@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      662dbfcc
    • Ming Lei's avatar
      nvme: avoid to use blk_mq_abort_requeue_list() · a8aa8a0c
      Ming Lei authored
      commit 986f75c8 upstream.
      
      NVMe may add request into requeue list simply and not kick off the
      requeue if hw queues are stopped. Then blk_mq_abort_requeue_list()
      is called in both nvme_kill_queues() and nvme_ns_remove() for
      dealing with this issue.
      
      Unfortunately blk_mq_abort_requeue_list() is absolutely a
      race maker, for example, one request may be requeued during
      the aborting. So this patch just calls blk_mq_kick_requeue_list() in
      nvme_kill_queues() to handle this issue like what nvme_start_queues()
      does. Now all requests in requeue list when queues are stopped will be
      handled by blk_mq_kick_requeue_list() when queues are restarted, either
      in nvme_start_queues() or in nvme_kill_queues().
      Reported-by: default avatarZhang Yi <yizhan@redhat.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8aa8a0c
    • Ming Lei's avatar
      nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() · 20c03f45
      Ming Lei authored
      commit 806f026f upstream.
      
      Inside nvme_kill_queues(), we have to start hw queues for
      draining requests in sw queues, .dispatch list and requeue list,
      so use blk_mq_start_hw_queues() instead of blk_mq_start_stopped_hw_queues()
      which only run queues if queues are stopped, but the queues may have
      been started already, for example nvme_start_queues() is called in reset work
      function.
      
      blk_mq_start_hw_queues() run hw queues in current context, instead
      of running asynchronously like before. Given nvme_kill_queues() is
      run from either remove context or reset worker context, both are fine
      to run hw queue directly. And the mutex of namespaces_mutex isn't a
      problem too becasue nvme_start_freeze() runs hw queue in this way
      already.
      Reported-by: default avatarZhang Yi <yizhan@redhat.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20c03f45
    • Marta Rybczynska's avatar
      nvme-rdma: support devices with queue size < 32 · 0fe9c551
      Marta Rybczynska authored
      commit 0544f549 upstream.
      
      In the case of small NVMe-oF queue size (<32) we may enter a deadlock
      caused by the fact that the IB completions aren't sent waiting for 32
      and the send queue will fill up.
      
      The error is seen as (using mlx5):
      [ 2048.693355] mlx5_0:mlx5_ib_post_send:3765:(pid 7273):
      [ 2048.693360] nvme nvme1: nvme_rdma_post_send failed with error code -12
      
      This patch changes the way the signaling is done so that it depends on
      the queue depth now. The magic define has been removed completely.
      Signed-off-by: default avatarMarta Rybczynska <marta.rybczynska@kalray.eu>
      Signed-off-by: default avatarSamuel Jones <sjones@kalray.eu>
      Acked-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0fe9c551
    • Jason Gerecke's avatar
      HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference · f88d3d6e
      Jason Gerecke authored
      commit 2ac97f0f upstream.
      
      The following Smatch complaint was generated in response to commit
      2a6cdbdd ("HID: wacom: Introduce new 'touch_input' device"):
      
          drivers/hid/wacom_wac.c:1586 wacom_tpc_irq()
                   error: we previously assumed 'wacom->touch_input' could be null (see line 1577)
      
      The 'touch_input' and 'pen_input' variables point to the 'struct input_dev'
      used for relaying touch and pen events to userspace, respectively. If a
      device does not have a touch interface or pen interface, the associated
      input variable is NULL. The 'wacom_tpc_irq()' function is responsible for
      forwarding input reports to a more-specific IRQ handler function. An
      unknown report could theoretically be mistaken as e.g. a touch report
      on a device which does not have a touch interface. This can be prevented
      by only calling the pen/touch functions are called when the pen/touch
      pointers are valid.
      
      Fixes: 2a6cdbdd ("HID: wacom: Introduce new 'touch_input' device")
      Signed-off-by: default avatarJason Gerecke <jason.gerecke@wacom.com>
      Reviewed-by: default avatarPing Cheng <ping.cheng@wacom.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f88d3d6e
    • Bryant G. Ly's avatar
      ibmvscsis: Fix the incorrect req_lim_delta · 8d975ebd
      Bryant G. Ly authored
      commit 75dbf2d3 upstream.
      
      The current code is not correctly calculating the req_lim_delta.
      
      We want to make sure vscsi->credit is always incremented when
      we do not send a response for the scsi op. Thus for the case where
      there is a successfully aborted task we need to make sure the
      vscsi->credit is incremented.
      
      v2 - Moves the original location of the vscsi->credit increment
      to a better spot. Since if we increment credit, the next command
      we send back will have increased req_lim_delta. But we probably
      shouldn't be doing that until the aborted cmd is actually released.
      Otherwise the client will think that it can send a new command, and
      we could find ourselves short of command elements. Not likely, but could
      happen.
      
      This patch depends on both:
      commit 25e78531 ("ibmvscsis: Do not send aborted task response")
      commit 98883f1b ("ibmvscsis: Clear left-over abort_cmd pointers")
      Signed-off-by: default avatarBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichael Cyr <mikecyr@linux.vnet.ibm.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d975ebd
    • Bryant G. Ly's avatar
      ibmvscsis: Clear left-over abort_cmd pointers · e920be83
      Bryant G. Ly authored
      commit 98883f1b upstream.
      
      With the addition of ibmvscsis->abort_cmd pointer within
      commit 25e78531 ("ibmvscsis: Do not send aborted task response"),
      make sure to explicitly NULL these pointers when clearing
      DELAY_SEND flag.
      
      Do this for two cases, when getting the new new ibmvscsis
      descriptor in ibmvscsis_get_free_cmd() and before posting
      the response completion in ibmvscsis_send_messages().
      Signed-off-by: default avatarBryant G. Ly <bryantly@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichael Cyr <mikecyr@linux.vnet.ibm.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e920be83
    • Artem Savkov's avatar
      scsi: scsi_dh_rdac: Use ctlr directly in rdac_failover_get() · 1fb66c6a
      Artem Savkov authored
      commit 0648a07c upstream.
      
      rdac_failover_get references struct rdac_controller as
      ctlr->ms_sdev->handler_data->ctlr for no apparent reason. Besides being
      inefficient this also introduces a null-pointer dereference as
      send_mode_select() sets ctlr->ms_sdev to NULL before calling
      rdac_failover_get():
      
      [   18.432550] device-mapper: multipath service-time: version 0.3.0 loaded
      [   18.436124] BUG: unable to handle kernel NULL pointer dereference at 0000000000000790
      [   18.436129] IP: send_mode_select+0xca/0x560
      [   18.436129] PGD 0
      [   18.436130] P4D 0
      [   18.436130]
      [   18.436132] Oops: 0000 [#1] SMP
      [   18.436133] Modules linked in: dm_service_time sd_mod dm_multipath amdkfd amd_iommu_v2 radeon(+) i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm qla2xxx drm serio_raw scsi_transport_fc bnx2 i2c_core dm_mirror dm_region_hash dm_log dm_mod
      [   18.436143] CPU: 4 PID: 443 Comm: kworker/u16:2 Not tainted 4.12.0-rc1.1.el7.test.x86_64 #1
      [   18.436144] Hardware name: IBM BladeCenter LS22 -[79013SG]-/Server Blade, BIOS -[L8E164AUS-1.07]- 05/25/2011
      [   18.436145] Workqueue: kmpath_rdacd send_mode_select
      [   18.436146] task: ffff880225116a40 task.stack: ffffc90002bd8000
      [   18.436148] RIP: 0010:send_mode_select+0xca/0x560
      [   18.436148] RSP: 0018:ffffc90002bdbda8 EFLAGS: 00010246
      [   18.436149] RAX: 0000000000000000 RBX: ffffc90002bdbe08 RCX: ffff88017ef04a80
      [   18.436150] RDX: ffffc90002bdbe08 RSI: ffff88017ef04a80 RDI: ffff8802248e4388
      [   18.436151] RBP: ffffc90002bdbe48 R08: 0000000000000000 R09: ffffffff81c104c0
      [   18.436151] R10: 00000000000001ff R11: 000000000000035a R12: ffffc90002bdbdd8
      [   18.436152] R13: ffff8802248e4390 R14: ffff880225152800 R15: ffff8802248e4400
      [   18.436153] FS:  0000000000000000(0000) GS:ffff880227d00000(0000) knlGS:0000000000000000
      [   18.436154] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   18.436154] CR2: 0000000000000790 CR3: 000000042535b000 CR4: 00000000000006e0
      [   18.436155] Call Trace:
      [   18.436159]  ? rdac_activate+0x14e/0x150
      [   18.436161]  ? refcount_dec_and_test+0x11/0x20
      [   18.436162]  ? kobject_put+0x1c/0x50
      [   18.436165]  ? scsi_dh_activate+0x6f/0xd0
      [   18.436168]  process_one_work+0x149/0x360
      [   18.436170]  worker_thread+0x4d/0x3c0
      [   18.436172]  kthread+0x109/0x140
      [   18.436173]  ? rescuer_thread+0x380/0x380
      [   18.436174]  ? kthread_park+0x60/0x60
      [   18.436176]  ret_from_fork+0x2c/0x40
      [   18.436177] Code: 49 c7 46 20 00 00 00 00 4c 89 ef c6 07 00 0f 1f 40 00 45 31 ed c7 45 b0 05 00 00 00 44 89 6d b4 4d 89 f5 4c 8b 75 a8 49 8b 45 20 <48> 8b b0 90 07 00 00 48 8b 56 10 8b 42 10 48 8d 7a 28 85 c0 0f
      [   18.436192] RIP: send_mode_select+0xca/0x560 RSP: ffffc90002bdbda8
      [   18.436192] CR2: 0000000000000790
      [   18.436198] ---[ end trace 40f3e4dca1ffabdd ]---
      [   18.436199] Kernel panic - not syncing: Fatal exception
      [   18.436222] Kernel Offset: disabled
      [-- MARK -- Thu May 18 11:45:00 2017]
      
      Fixes: 32782557 scsi_dh_rdac: switch to scsi_execute_req_flags()
      Signed-off-by: default avatarArtem Savkov <asavkov@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1fb66c6a
    • Nicholas Bellinger's avatar
      iscsi-target: Fix initial login PDU asynchronous socket close OOPs · 14ba7893
      Nicholas Bellinger authored
      commit 25cdda95 upstream.
      
      This patch fixes a OOPs originally introduced by:
      
         commit bb048357
         Author: Nicholas Bellinger <nab@linux-iscsi.org>
         Date:   Thu Sep 5 14:54:04 2013 -0700
      
         iscsi-target: Add sk->sk_state_change to cleanup after TCP failure
      
      which would trigger a NULL pointer dereference when a TCP connection
      was closed asynchronously via iscsi_target_sk_state_change(), but only
      when the initial PDU processing in iscsi_target_do_login() from iscsi_np
      process context was blocked waiting for backend I/O to complete.
      
      To address this issue, this patch makes the following changes.
      
      First, it introduces some common helper functions used for checking
      socket closing state, checking login_flags, and atomically checking
      socket closing state + setting login_flags.
      
      Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
      connection has dropped via iscsi_target_sk_state_change(), but the
      initial PDU processing within iscsi_target_do_login() in iscsi_np
      context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
      but doesn't invoke schedule_delayed_work().
      
      The original NULL pointer dereference case reported by MNC is now handled
      by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
      transitioning to FFP to determine when the socket has already closed,
      or iscsi_target_start_negotiation() if the login needs to exchange
      more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
      closed.  For both of these cases, the cleanup up of remaining connection
      resources will occur in iscsi_target_start_negotiation() from iscsi_np
      process context once the failure is detected.
      
      Finally, to handle to case where iscsi_target_sk_state_change() is
      called after the initial PDU procesing is complete, it now invokes
      conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
      existing iscsi_target_sk_check_close() checks detect connection failure.
      For this case, the cleanup of remaining connection resources will occur
      in iscsi_target_do_login_rx() from delayed workqueue process context
      once the failure is detected.
      Reported-by: default avatarMike Christie <mchristi@redhat.com>
      Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
      Tested-by: default avatarMike Christie <mchristi@redhat.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Reported-by: default avatarHannes Reinecke <hare@suse.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Varun Prakash <varun@chelsio.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14ba7893
    • Jiang Yi's avatar
      iscsi-target: Always wait for kthread_should_stop() before kthread exit · c732f308
      Jiang Yi authored
      commit 5e0cf5e6 upstream.
      
      There are three timing problems in the kthread usages of iscsi_target_mod:
      
       - np_thread of struct iscsi_np
       - rx_thread and tx_thread of struct iscsi_conn
      
      In iscsit_close_connection(), it calls
      
       send_sig(SIGINT, conn->tx_thread, 1);
       kthread_stop(conn->tx_thread);
      
      In conn->tx_thread, which is iscsi_target_tx_thread(), when it receive
      SIGINT the kthread will exit without checking the return value of
      kthread_should_stop().
      
      So if iscsi_target_tx_thread() exit right between send_sig(SIGINT...)
      and kthread_stop(...), the kthread_stop() will try to stop an already
      stopped kthread.
      
      This is invalid according to the documentation of kthread_stop().
      
      (Fix -ECONNRESET logout handling in iscsi_target_tx_thread and
       early iscsi_target_rx_thread failure case - nab)
      Signed-off-by: default avatarJiang Yi <jiangyilism@gmail.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c732f308
    • Long Li's avatar
      scsi: zero per-cmd private driver data for each MQ I/O · a168ac5b
      Long Li authored
      commit 1bad6c4a upstream.
      
      In lower layer driver's (LLD) scsi_host_template, the driver may
      optionally ask SCSI to allocate its private driver memory for each
      command, by specifying cmd_size. This memory is allocated at the end of
      scsi_cmnd by SCSI.  Later when SCSI queues a command, the LLD can use
      scsi_cmd_priv to get to its private data.
      
      Some LLD, e.g. hv_storvsc, doesn't clear its private data before use. In
      this case, the LLD may get to stale or uninitialized data in its private
      driver memory. This may result in unexpected driver and hardware
      behavior.
      
      Fix this problem by also zeroing the private driver memory before
      passing them to LLD.
      Signed-off-by: default avatarLong Li <longli@microsoft.com>
      Reviewed-by: default avatarBart Van Assche <Bart.VanAssche@sandisk.com>
      Reviewed-by: default avatarKY Srinivasan <kys@microsoft.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a168ac5b
    • Srinath Mannam's avatar
      mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read · 21f8aa4c
      Srinath Mannam authored
      commit f5f968f2 upstream.
      
      The stingray SDHCI hardware supports ACMD12 and automatically
      issues after multi block transfer completed.
      
      If ACMD12 in SDHCI is disabled, spurious tx done interrupts are seen
      on multi block read command with below error message:
      
      Got data interrupt 0x00000002 even though no data
      operation was in progress.
      
      This patch uses SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12 to enable
      ACM12 support in SDHCI hardware and suppress spurious interrupt.
      Signed-off-by: default avatarSrinath Mannam <srinath.mannam@broadcom.com>
      Reviewed-by: default avatarRay Jui <ray.jui@broadcom.com>
      Reviewed-by: default avatarScott Branden <scott.branden@broadcom.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Fixes: b580c52d ("mmc: sdhci-iproc: add IPROC SDHCI driver")
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21f8aa4c
    • Benjamin Tissoires's avatar
      Revert "ACPI / button: Change default behavior to lid_init_state=open" · 4c5681af
      Benjamin Tissoires authored
      commit 878d8db0 upstream.
      
      Revert commit 77e9a4aa (ACPI / button: Change default behavior to
      lid_init_state=open) which changed the kernel's behavior on laptops
      that boot with closed lids and expect the lid switch state to be
      reported accurately by the kernel.
      
      If you boot or resume your laptop with the lid closed on a docking
      station while using an external monitor connected to it, both internal
      and external displays will light on, while only the external should.
      
      There is a design choice in gdm to only provide the greeter on the
      internal display when lit on, so users only see a gray area on the
      external monitor. Also, the cursor will not show up as it's by
      default on the internal display too.
      
      To "fix" that, users have to open the laptop once and close it once
      again to sync the state of the switch with the hardware state.
      
      Even if the "method" operation mode implementation can be buggy on
      some platforms, the "open" choice is worse.  It breaks docking
      stations basically and there is no way to have a user-space hwdb to
      fix that.
      
      On the contrary, it's rather easy in user-space to have a hwdb
      with the problematic platforms. Then,  libinput (1.7.0+) can fix
      the state of the lid switch for us: you need to set the udev
      property LIBINPUT_ATTR_LID_SWITCH_RELIABILITY to 'write_open'.
      
      When libinput detects internal keyboard events, it will overwrite the
      state of the switch to open, making it reliable again.  Given that
      logind only checks the lid switch value after a timeout, we can
      assume the user will use the internal keyboard before this timeout
      expires.
      
      For example, such a hwdb entry is:
      
      libinput:name:*Lid Switch*:dmi:*svnMicrosoftCorporation:pnSurface3:*
       LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=write_open
      
      Link: https://bugzilla.gnome.org/show_bug.cgi?id=782380Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c5681af
    • Lv Zheng's avatar
      ACPICA: Tables: Fix regression introduced by a too early mechanism enabling · 5d0e4205
      Lv Zheng authored
      commit 2ea65321 upstream.
      
      In the Linux kernel, acpi_get_table() "clones" haven't been fully
      balanced by acpi_put_table() invocations.  In upstream ACPICA, due to
      the design change, there are also unbalanced acpi_get_table_by_index()
      invocations requiring special care.
      
      acpi_get_table() reference counting mismatches may occor due to that
      and printing error messages related to them is not useful at this
      point.  The strict balanced validation count check should only be
      enabled after confirming that all invocations are safe and aligned
      with their designed purposes.
      
      Thus this patch removes the error value returned by acpi_tb_get_table()
      in that case along with the accompanying error message to fix the
      issue.
      
      Fixes: 174cc718 (ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel)
      Reported-by: default avatarAnush Seetharaman <anush.seetharaman@intel.com>
      Reported-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      [ rjw: Changelog ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d0e4205
    • Dan Williams's avatar
      ACPI / sysfs: fix acpi_get_table() leak / acpi-sysfs denial of service · 34211cbf
      Dan Williams authored
      commit 0de0e198 upstream.
      
      Reading an ACPI table through the /sys/firmware/acpi/tables interface
      more than 65,536 times leads to the following log message:
      
       ACPI Error: Table ffff88033595eaa8, Validation count is zero after increment
        (20170119/tbutils-423)
      
      ...and the table being unavailable until the next reboot. Add the
      missing acpi_put_table() so the table ->validation_count is decremented
      after each read.
      Reported-by: default avatarAnush Seetharaman <anush.seetharaman@intel.com>
      Fixes: 174cc718 "ACPICA: Tables: Back port acpi_get_table_with_size() ..."
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34211cbf
    • Vishal Verma's avatar
      acpi, nfit: Fix the memory error check in nfit_handle_mce() · 93da4e6c
      Vishal Verma authored
      commit fc08a470 upstream.
      
      The check for an MCE being a memory error in the NFIT mce handler was
      bogus. Use the new mce_is_memory_error() helper to detect the error
      properly.
      Reported-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/20170519093915.15413-3-bp@alien8.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93da4e6c
    • Borislav Petkov's avatar
      x86/MCE: Export memory_error() · 9183980a
      Borislav Petkov authored
      commit 2d1f4061 upstream.
      
      Export the function which checks whether an MCE is a memory error to
      other users so that we can reuse the logic. Drop the boot_cpu_data use,
      while at it, as mce.cpuvendor already has the CPU vendor in there.
      
      Integrate a piece from a patch from Vishal Verma
      <vishal.l.verma@intel.com> to export it for modules (nfit).
      
      The main reason we're exporting it is that the nfit handler
      nfit_handle_mce() needs to detect a memory error properly before doing
      its recovery actions.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Link: http://lkml.kernel.org/r/20170519093915.15413-2-bp@alien8.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9183980a
    • Lv Zheng's avatar
      Revert "ACPI / button: Remove lid_init_state=method mode" · 8f8dca3c
      Lv Zheng authored
      commit f369fdf4 upstream.
      
      This reverts commit ecb10b69.
      
      The only expected ACPI control method lid device's usage model is
      
       1. Listen to the lid notification,
       2. Evaluate _LID after being notified by BIOS,
       3. Suspend the system (if users configure to do so) after seeing "close".
      
      It's not ensured that BIOS will notify OS after boot/resume, and
      it's not ensured that BIOS will always generate "open" event upon
      opening the lid.
      
      But there are 2 wrong usage models:
      
       1. When the lid device is responsible for suspend/resume the system,
          userspace requires to see "open" event to be paired with "close" after
          the system is resumed, or it will suspend the system again.
      
       2. When an external monitor connects to the laptop attached docks,
          userspace requires to see "close" event after the system is resumed so
          that it can determine whether the internal display should remain dark
          and the external display should be lit on.
      
      After we made default kernel behavior to be suitable for usage model 1,
      users of usage model 2 start to report regressions for such behavior
      change.
      
      Reversion of button.lid_init_state=method doesn't actually reverts to old
      default behavior as doing so can enter a regression loop, but facilitates
      users to work the reported regressions around with
      button.lid_init_state=method.
      
      Fixes: ecb10b69 (ACPI / button: Remove lid_init_state=method mode)
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=195455
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1430259Tested-by: default avatarSteffen Weber <steffen.weber@gmail.com>
      Tested-by: default avatarJulian Wiedmann <julian.wiedmann@jwi.name>
      Reported-by: default avatarJoachim Frieben <jfrieben@hotmail.com>
      Signed-off-by: default avatarLv Zheng <lv.zheng@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f8dca3c
    • Herbert Xu's avatar
      crypto: skcipher - Add missing API setkey checks · f5eef8d2
      Herbert Xu authored
      commit 9933e113 upstream.
      
      The API setkey checks for key sizes and alignment went AWOL during the
      skcipher conversion.  This patch restores them.
      
      Fixes: 4e6c3df4 ("crypto: skcipher - Add low-level skcipher...")
      Reported-by: default avatarBaozeng <sploving1@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5eef8d2
    • Sebastian Reichel's avatar
      i2c: i2c-tiny-usb: fix buffer not being DMA capable · 2da75188
      Sebastian Reichel authored
      commit 5165da59 upstream.
      
      Since v4.9 i2c-tiny-usb generates the below call trace
      and longer works, since it can't communicate with the
      USB device. The reason is, that since v4.9 the USB
      stack checks, that the buffer it should transfer is DMA
      capable. This was a requirement since v2.2 days, but it
      usually worked nevertheless.
      
      [   17.504959] ------------[ cut here ]------------
      [   17.505488] WARNING: CPU: 0 PID: 93 at drivers/usb/core/hcd.c:1587 usb_hcd_map_urb_for_dma+0x37c/0x570
      [   17.506545] transfer buffer not dma capable
      [   17.507022] Modules linked in:
      [   17.507370] CPU: 0 PID: 93 Comm: i2cdetect Not tainted 4.11.0-rc8+ #10
      [   17.508103] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [   17.509039] Call Trace:
      [   17.509320]  ? dump_stack+0x5c/0x78
      [   17.509714]  ? __warn+0xbe/0xe0
      [   17.510073]  ? warn_slowpath_fmt+0x5a/0x80
      [   17.510532]  ? nommu_map_sg+0xb0/0xb0
      [   17.510949]  ? usb_hcd_map_urb_for_dma+0x37c/0x570
      [   17.511482]  ? usb_hcd_submit_urb+0x336/0xab0
      [   17.511976]  ? wait_for_completion_timeout+0x12f/0x1a0
      [   17.512549]  ? wait_for_completion_timeout+0x65/0x1a0
      [   17.513125]  ? usb_start_wait_urb+0x65/0x160
      [   17.513604]  ? usb_control_msg+0xdc/0x130
      [   17.514061]  ? usb_xfer+0xa4/0x2a0
      [   17.514445]  ? __i2c_transfer+0x108/0x3c0
      [   17.514899]  ? i2c_transfer+0x57/0xb0
      [   17.515310]  ? i2c_smbus_xfer_emulated+0x12f/0x590
      [   17.515851]  ? _raw_spin_unlock_irqrestore+0x11/0x20
      [   17.516408]  ? i2c_smbus_xfer+0x125/0x330
      [   17.516876]  ? i2c_smbus_xfer+0x125/0x330
      [   17.517329]  ? i2cdev_ioctl_smbus+0x1c1/0x2b0
      [   17.517824]  ? i2cdev_ioctl+0x75/0x1c0
      [   17.518248]  ? do_vfs_ioctl+0x9f/0x600
      [   17.518671]  ? vfs_write+0x144/0x190
      [   17.519078]  ? SyS_ioctl+0x74/0x80
      [   17.519463]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
      [   17.519959] ---[ end trace d047c04982f5ac50 ]---
      Signed-off-by: default avatarSebastian Reichel <sebastian.reichel@collabora.co.uk>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarTill Harbaum <till@harbaum.org>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2da75188
    • Ard Biesheuvel's avatar
      drivers/tty: 8250: only call fintek_8250_probe when doing port I/O · e4bab31c
      Ard Biesheuvel authored
      commit 4c4fc909 upstream.
      
      Commit fa01e2ca ("serial: 8250: Integrate Fintek into 8250_base")
      modified the probing logic for PNP0501 devices, to remove a collision
      between the generic 16550A driver and the Fintek driver, which reused
      the same ACPI _HID.
      
      The Fintek device probe is now incorporated into the common 8250 probe
      path, and gets called for all discovered 16550A compatible devices,
      including ones that are MMIO mapped rather than IO mapped. However,
      the Fintek driver assumes the port base is a I/O address, and proceeds
      to probe some arbitrary offsets above it.
      
      This is generally a wrong thing to do, but on ARM systems (having no
      native port I/O), this may result in faulting accesses of completely
      unrelated MMIO regions in the PCI I/O space. Given that this is at
      serial probe time, this results in hard to diagnose crashes at boot.
      
      So let's restrict the Fintek probe to devices that we know are using
      port I/O in the first place.
      
      Fixes: fa01e2ca ("serial: 8250: Integrate Fintek into 8250_base")
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarRicardo Ribalda <ricardo.ribalda@gmail.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4bab31c
    • Johan Hovold's avatar
      serdev: fix tty-port client deregistration · 84ac7693
      Johan Hovold authored
      commit aee5da78 upstream.
      
      The port client data must be set when registering the serdev controller
      or client deregistration will fail (and the serdev devices are left
      registered and allocated) if the port was never opened in between.
      
      Make sure to clear the port client data on any probe errors to avoid a
      use-after-free when the client is later deregistered unconditionally
      (e.g. in a tty-port deregistration helper).
      
      Also move port client operation initialisation to registration. Note
      that the client ops must be restored on failed probe.
      
      Fixes: bed35c6d ("serdev: add a tty port controller driver")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84ac7693
    • Johan Hovold's avatar
      Revert "tty_port: register tty ports with serdev bus" · 427fa8e3
      Johan Hovold authored
      commit d3ba126a upstream.
      
      This reverts commit 8ee3fde0.
      
      The new serdev bus hooked into the tty layer in
      tty_port_register_device() by registering a serdev controller instead of
      a tty device whenever a serdev client is present, and by deregistering
      the controller in the tty-port destructor. This is broken in several
      ways:
      
      Firstly, it leads to a NULL-pointer dereference whenever a tty driver
      later deregisters its devices as no corresponding character device will
      exist.
      
      Secondly, far from every tty driver uses tty-port refcounting (e.g.
      serial core) so the serdev devices might never be deregistered or
      deallocated.
      
      Thirdly, deregistering at tty-port destruction is too late as the
      underlying device and structures may be long gone by then. A port is not
      released before an open tty device is closed, something which a
      registered serdev client can prevent from ever happening. A driver
      callback while the device is gone typically also leads to crashes.
      
      Many tty drivers even keep their ports around until the driver is
      unloaded (e.g. serial core), something which even if a late callback
      never happens, leads to leaks if a device is unbound from its driver and
      is later rebound.
      
      The right solution here is to add a new tty_port_unregister_device()
      helper and to never call tty_device_unregister() whenever the port has
      been claimed by serdev, but since this requires modifying just about
      every tty driver (and multiple subsystems) it will need to be done
      incrementally.
      
      Reverting the offending patch is the first step in fixing the broken
      lifetime assumptions. A follow-up patch will add a new pair of
      tty-device registration helpers, which a vetted tty driver can use to
      support serdev (initially serial core). When every tty driver uses the
      serdev helpers (at least for deregistration), we can add serdev
      registration to tty_port_register_device() again.
      
      Note that this also fixes another issue with serdev, which currently
      allocates and registers a serdev controller for every tty device
      registered using tty_port_device_register() only to immediately
      deregister and deallocate it when the corresponding OF node or serdev
      child node is missing. This should be addressed before enabling serdev
      for hot-pluggable buses.
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      427fa8e3
    • Jeremy Kerr's avatar
      powerpc/spufs: Fix hash faults for kernel regions · baa4d411
      Jeremy Kerr authored
      commit d75e4919 upstream.
      
      Commit ac29c640 ("powerpc/mm: Replace _PAGE_USER with
      _PAGE_PRIVILEGED") swapped _PAGE_USER for _PAGE_PRIVILEGED, and
      introduced check_pte_access() which denied kernel access to
      non-_PAGE_PRIVILEGED pages.
      
      However, it didn't add _PAGE_PRIVILEGED to the hash fault handler
      for spufs' kernel accesses, so the DMAs required to establish SPE
      memory no longer work.
      
      This change adds _PAGE_PRIVILEGED to the hash fault handler for
      kernel accesses.
      
      Fixes: ac29c640 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED")
      Signed-off-by: default avatarJeremy Kerr <jk@ozlabs.org>
      Reported-by: default avatarSombat Tragolgosol <sombat3960@gmail.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baa4d411
    • Michael Neuling's avatar
      powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N · 919c7173
      Michael Neuling authored
      commit d957fb4d upstream.
      
      Currently if you disable CONFIG_PPC_RADIX_MMU you'll crash on boot on
      a P9. This is because we still set MMU_FTR_TYPE_RADIX via
      ibm,pa-features and MMU_FTR_TYPE_RADIX is what's used for code patching
      in much of the asm code (ie. slb_miss_realmode)
      
      This patch fixes the problem by stopping MMU_FTR_TYPE_RADIX from being
      set from ibm.pa-features.
      
      We may eventually end up removing the CONFIG_PPC_RADIX_MMU option
      completely but until then this fixes the issue.
      
      Fixes: 17a3dd2f ("powerpc/mm/radix: Use firmware feature to enable Radix MMU")
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      919c7173
    • Richard Narron's avatar
      fs/ufs: Set UFS default maximum bytes per file · 72351ac5
      Richard Narron authored
      commit 239e250e upstream.
      
      This fixes a problem with reading files larger than 2GB from a UFS-2
      file system:
      
          https://bugzilla.kernel.org/show_bug.cgi?id=195721
      
      The incorrect UFS s_maxsize limit became a problem as of commit
      c2a9737f ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
      which started using s_maxbytes to avoid a page index overflow in
      do_generic_file_read().
      
      That caused files to be truncated on UFS-2 file systems because the
      default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it.
      
      Here I simply increase the default to a common value used by other file
      systems.
      Signed-off-by: default avatarRichard Narron <comet.berkeley@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Will B <will.brokenbourgh2877@gmail.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72351ac5
    • Liam R. Howlett's avatar
      sparc/ftrace: Fix ftrace graph time measurement · f351b122
      Liam R. Howlett authored
      
      [ Upstream commit 48078d2d ]
      
      The ftrace function_graph time measurements of a given function is not
      accurate according to those recorded by ftrace using the function
      filters.  This change pulls the x86_64 fix from 'commit 722b3c74
      ("ftrace/graph: Trace function entry before updating index")' into the
      sparc specific prepare_ftrace_return which stops ftrace from
      counting interrupted tasks in the time measurement.
      
      Example measurements for select_task_rq_fair running "hackbench 100
      process 1000":
      
                    |  tracing/trace_stat/function0  |  function_graph
       Before patch |  2.802 us                      |  4.255 us
       After patch  |  2.749 us                      |  3.094 us
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@Oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f351b122
    • Orlando Arias's avatar
      sparc: Fix -Wstringop-overflow warning · 76037bf9
      Orlando Arias authored
      
      [ Upstream commit deba804c ]
      
      Greetings,
      
      GCC 7 introduced the -Wstringop-overflow flag to detect buffer overflows
      in calls to string handling functions [1][2]. Due to the way
      ``empty_zero_page'' is declared in arch/sparc/include/setup.h, this
      causes a warning to trigger at compile time in the function mem_init(),
      which is subsequently converted to an error. The ensuing patch fixes
      this issue and aligns the declaration of empty_zero_page to that of
      other architectures. Thank you.
      
      Cheers,
      Orlando.
      
      [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02308.html
      [2] https://gcc.gnu.org/gcc-7/changes.htmlSigned-off-by: default avatarOrlando Arias <oarias@knights.ucf.edu>
      
      --------------------------------------------------------------------------------
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76037bf9
    • Nitin Gupta's avatar
      sparc64: Fix mapping of 64k pages with MAP_FIXED · e346489f
      Nitin Gupta authored
      
      [ Upstream commit b6c41cb0 ]
      
      An incorrect huge page alignment check caused
      mmap failure for 64K pages when MAP_FIXED is used
      with address not aligned to HPAGE_SIZE.
      
      Orabug: 25885991
      
      Fixes: dcd1912d ("sparc64: Add 64K page size support")
      Signed-off-by: default avatarNitin Gupta <nitin.m.gupta@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e346489f
    • Daniel Borkmann's avatar
      bpf: adjust verifier heuristics · 21dccb0f
      Daniel Borkmann authored
      
      [ Upstream commit 3c2ce60b ]
      
      Current limits with regards to processing program paths do not
      really reflect today's needs anymore due to programs becoming
      more complex and verifier smarter, keeping track of more data
      such as const ALU operations, alignment tracking, spilling of
      PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
      smarter matching of what LLVM generates.
      
      This also comes with the side-effect that we result in fewer
      opportunities to prune search states and thus often need to do
      more work to prove safety than in the past due to different
      register states and stack layout where we mismatch. Generally,
      it's quite hard to determine what caused a sudden increase in
      complexity, it could be caused by something as trivial as a
      single branch somewhere at the beginning of the program where
      LLVM assigned a stack slot that is marked differently throughout
      other branches and thus causing a mismatch, where verifier
      then needs to prove safety for the whole rest of the program.
      Subsequently, programs with even less than half the insn size
      limit can get rejected. We noticed that while some programs
      load fine under pre 4.11, they get rejected due to hitting
      limits on more recent kernels. We saw that in the vast majority
      of cases (90+%) pruning failed due to register mismatches. In
      case of stack mismatches, majority of cases failed due to
      different stack slot types (invalid, spill, misc) rather than
      differences in spilled registers.
      
      This patch makes pruning more aggressive by also adding markers
      that sit at conditional jumps as well. Currently, we only mark
      jump targets for pruning. For example in direct packet access,
      these are usually error paths where we bail out. We found that
      adding these markers, it can reduce number of processed insns
      by up to 30%. Another option is to ignore reg->id in probing
      PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
      slightly as well by up to 7% observed complexity reduction as
      stand-alone. Meaning, if a previous path with register type
      PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
      in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
      the same map X must be safe as well. Last but not least the
      patch also adds a scheduling point and bumps the current limit
      for instructions to be processed to a more adequate value.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21dccb0f
    • Daniel Borkmann's avatar
      bpf: fix wrong exposure of map_flags into fdinfo for lpm · 87cebd0f
      Daniel Borkmann authored
      
      [ Upstream commit a316338c ]
      
      trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
      attr->map_flags, since it does not support preallocation yet. We
      check the flag, but we never copy the flag into trie->map.map_flags,
      which is later on exposed into fdinfo and used by loaders such as
      iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
      whether a pinned map has the same spec as the one from the BPF obj
      file and if not, bails out, which is currently the case for lpm
      since it exposes always 0 as flags.
      
      Also copy over flags in array_map_alloc() and stack_map_alloc().
      They always have to be 0 right now, but we should make sure to not
      miss to copy them over at a later point in time when we add actual
      flags for them to use.
      
      Fixes: b95a5c4d ("bpf: add a longest prefix match trie map implementation")
      Reported-by: default avatarJarno Rajahalme <jarno@covalent.io>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87cebd0f
    • Daniel Borkmann's avatar
      bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data · d6d2860e
      Daniel Borkmann authored
      
      [ Upstream commit 41703a73 ]
      
      The bpf_clone_redirect() still needs to be listed in
      bpf_helper_changes_pkt_data() since we call into
      bpf_try_make_head_writable() from there, thus we need
      to invalidate prior pkt regs as well.
      
      Fixes: 36bbef52 ("bpf: direct packet write and access for helpers for clsact progs")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6d2860e
    • Eric Dumazet's avatar
      ipv4: add reference counting to metrics · 3b69d651
      Eric Dumazet authored
      
      [ Upstream commit 3fb07daf ]
      
      Andrey Konovalov reported crashes in ipv4_mtu()
      
      I could reproduce the issue with KASAN kernels, between
      10.246.7.151 and 10.246.7.152 :
      
      1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &
      
      2) At the same time run following loop :
      while :
      do
       ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
       ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
      done
      
      Cong Wang attempted to add back rt->fi in commit
      82486aa6 ("ipv4: restore rt->fi for reference counting")
      but this proved to add some issues that were complex to solve.
      
      Instead, I suggested to add a refcount to the metrics themselves,
      being a standalone object (in particular, no reference to other objects)
      
      I tried to make this patch as small as possible to ease its backport,
      instead of being super clean. Note that we believe that only ipv4 dst
      need to take care of the metric refcount. But if this is wrong,
      this patch adds the basic infrastructure to extend this to other
      families.
      
      Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
      for his efforts on this problem.
      
      Fixes: 2860583f ("ipv4: Kill rt->fi")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b69d651
    • Peter Dawson's avatar
      ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets · d3edf403
      Peter Dawson authored
      
      [ Upstream commit 0e9a7095 ]
      
      This fix addresses two problems in the way the DSCP field is formulated
       on the encapsulating header of IPv6 tunnels.
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661
      
      1) The IPv6 tunneling code was manipulating the DSCP field of the
       encapsulating packet using the 32b flowlabel. Since the flowlabel is
       only the lower 20b it was incorrect to assume that the upper 12b
       containing the DSCP and ECN fields would remain intact when formulating
       the encapsulating header. This fix handles the 'inherit' and
       'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.
      
      2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
       incorrect and resulted in the DSCP value always being set to 0.
      
      Commit 90427ef5 ("ipv6: fix flow labels when the traffic class
       is non-0") caused the regression by masking out the flowlabel
       which exposed the incorrect handling of the DSCP portion of the
       flowlabel in ip6_tunnel and ip6_gre.
      
      Fixes: 90427ef5 ("ipv6: fix flow labels when the traffic class is non-0")
      Signed-off-by: default avatarPeter Dawson <peter.a.dawson@boeing.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3edf403
    • Davide Caratti's avatar
      sctp: fix ICMP processing if skb is non-linear · 90e7c332
      Davide Caratti authored
      
      [ Upstream commit 804ec7eb ]
      
      sometimes ICMP replies to INIT chunks are ignored by the client, even if
      the encapsulated SCTP headers match an open socket. This happens when the
      ICMP packet is carried by a paged skb: use skb_header_pointer() to read
      packet contents beyond the SCTP header, so that chunk header and initiate
      tag are validated correctly.
      
      v2:
      - don't use skb_header_pointer() to read the transport header, since
        icmp_socket_deliver() already puts these 8 bytes in the linear area.
      - change commit message to make specific reference to INIT chunks.
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90e7c332
    • Wei Wang's avatar
      tcp: avoid fastopen API to be used on AF_UNSPEC · 0236d8c4
      Wei Wang authored
      
      [ Upstream commit ba615f67 ]
      
      Fastopen API should be used to perform fastopen operations on the TCP
      socket. It does not make sense to use fastopen API to perform disconnect
      by calling it with AF_UNSPEC. The fastopen data path is also prone to
      race conditions and bugs when using with AF_UNSPEC.
      
      One issue reported and analyzed by Vegard Nossum is as follows:
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      Thread A:                            Thread B:
      ------------------------------------------------------------------------
      sendto()
       - tcp_sendmsg()
           - sk_stream_memory_free() = 0
               - goto wait_for_sndbuf
      	     - sk_stream_wait_memory()
      	        - sk_wait_event() // sleep
                |                          sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
      	  |                           - tcp_sendmsg()
      	  |                              - tcp_sendmsg_fastopen()
      	  |                                 - __inet_stream_connect()
      	  |                                    - tcp_disconnect() //because of AF_UNSPEC
      	  |                                       - tcp_transmit_skb()// send RST
      	  |                                    - return 0; // no reconnect!
      	  |                           - sk_stream_wait_connect()
      	  |                                 - sock_error()
      	  |                                    - xchg(&sk->sk_err, 0)
      	  |                                    - return -ECONNRESET
      	- ... // wake up, see sk->sk_err == 0
          - skb_entail() on TCP_CLOSE socket
      
      If the connection is reopened then we will send a brand new SYN packet
      after thread A has already queued a buffer. At this point I think the
      socket internal state (sequence numbers etc.) becomes messed up.
      
      When the new connection is closed, the FIN-ACK is rejected because the
      sequence number is outside the window. The other side tries to
      retransmit,
      but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
      corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      
      Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
      and return EOPNOTSUPP to user if such case happens.
      
      Fixes: cf60af03 ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0236d8c4
    • Eric Garver's avatar
      geneve: fix fill_info when using collect_metadata · 1642394f
      Eric Garver authored
      
      [ Upstream commit 11387fe4 ]
      
      Since 9b4437a5 ("geneve: Unify LWT and netdev handling.") fill_info
      does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
      because it uses ip_tunnel_info_af() with the device level info, which is
      not valid for COLLECT_METADATA.
      
      Fix by checking for the presence of the actual sockets.
      
      Fixes: 9b4437a5 ("geneve: Unify LWT and netdev handling.")
      Signed-off-by: default avatarEric Garver <e@erig.me>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1642394f
    • Vlad Yasevich's avatar
      virtio-net: enable TSO/checksum offloads for Q-in-Q vlans · 4dbbbaad
      Vlad Yasevich authored
      
      [ Upstream commit 2836b4f2 ]
      
      Since virtio does not provide it's own ndo_features_check handler,
      TSO, and now checksum offload, are disabled for stacked vlans.
      Re-enable the support and let the host take care of it.  This
      restores/improves Guest-to-Guest performance over Q-in-Q vlans.
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4dbbbaad
    • Vlad Yasevich's avatar
      be2net: Fix offload features for Q-in-Q packets · acc866e9
      Vlad Yasevich authored
      
      [ Upstream commit cc6e9de6 ]
      
      At least some of the be2net cards do not seem to be capabled
      of performing checksum offload computions on Q-in-Q packets.
      In these case, the recevied checksum on the remote is invalid
      and TCP syn packets are dropped.
      
      This patch adds a call to check disbled acceleration features
      on Q-in-Q tagged traffic.
      
      CC: Sathya Perla <sathya.perla@broadcom.com>
      CC: Ajit Khaparde <ajit.khaparde@broadcom.com>
      CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      CC: Somnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acc866e9
    • Vlad Yasevich's avatar
      vlan: Fix tcp checksum offloads in Q-in-Q vlans · 423c1b43
      Vlad Yasevich authored
      
      [ Upstream commit 35d2f80b ]
      
      It appears that TCP checksum offloading has been broken for
      Q-in-Q vlans.  The behavior was execerbated by the
      series
          commit afb0bc97 ("Merge branch 'stacked_vlan_tso'")
      that that enabled accleleration features on stacked vlans.
      
      However, event without that series, it is possible to trigger
      this issue.  It just requires a lot more specialized configuration.
      
      The root cause is the interaction between how
      netdev_intersect_features() works, the features actually set on
      the vlan devices and HW having the ability to run checksum with
      longer headers.
      
      The issue starts when netdev_interesect_features() replaces
      NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
      if the HW advertises IP|IPV6 specific checksums.  This happens
      for tagged and multi-tagged packets.   However, HW that enables
      IP|IPV6 checksum offloading doesn't gurantee that packets with
      arbitrarily long headers can be checksummed.
      
      This patch disables IP|IPV6 checksums on the packet for multi-tagged
      packets.
      
      CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      CC: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      423c1b43