1. 30 May, 2018 40 commits
    • Michael Ellerman's avatar
      selftests: Print the test we're running to /dev/kmsg · 3feab927
      Michael Ellerman authored
      [ Upstream commit 88893cf7 ]
      
      Some tests cause the kernel to print things to the kernel log
      buffer (ie. printk), in particular oops and warnings etc. However when
      running all the tests in succession it's not always obvious which
      test(s) caused the kernel to print something.
      
      We can narrow it down by printing which test directory we're running
      in to /dev/kmsg, if it's writable.
      
      Example output:
      
        [  170.149149] kselftest: Running tests in powerpc
        [  305.300132] kworker/dying (71) used greatest stack depth: 7776 bytes
                       left
        [  808.915456] kselftest: Running tests in pstore
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarShuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3feab927
    • Frank Asseg's avatar
      tools/thermal: tmon: fix for segfault · 98b21980
      Frank Asseg authored
      [ Upstream commit 6c59f64b ]
      
      Fixes a segfault occurring when e.g. <TAB> is pressed multiple times in the
      ncurses tmon application. The segfault is caused by incrementing
      cur_thermal_record in the main function without checking if it's value reached
      NR_THERMAL_RECORD immediately. Since the boundary check only occurred in
      update_thermal_data a race condition existed, which lead to an attempted read
      beyond the last element of the trec array.
      
      The fix was implemented by moving the cur_thermal_record incrementation to the
      update_thermal_data function using a temporary variable on which the boundary
      condition is checked before updating cur_thread_record, so that the variable is
      never incremented beyond the trec array's boundary.
      
      It seems the segfault does not occur on every machine: On a HP EliteBook G4 the
      segfault happens, while it does not happen on a Thinkpad T540p.
      Signed-off-by: default avatarFrank Asseg <frank.asseg@objecthunter.net>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98b21980
    • Michael Ellerman's avatar
      powerpc/perf: Fix kernel address leak via sampling registers · bbcc07d5
      Michael Ellerman authored
      [ Upstream commit e1ebd0e5 ]
      
      Current code in power_pmu_disable() does not clear the sampling
      registers like Sampling Instruction Address Register (SIAR) and
      Sampling Data Address Register (SDAR) after disabling the PMU. Since
      these are userspace readable and could contain kernel addresses, add
      code to explicitly clear the content of these registers.
      
      Also add a "context synchronizing instruction" to enforce no further
      updates to these registers as suggested by Power ISA v3.0B. From
      section 9.4, on page 1108:
      
        "If an mtspr instruction is executed that changes the value of a
        Performance Monitor register other than SIAR, SDAR, and SIER, the
        change is not guaranteed to have taken effect until after a
        subsequent context synchronizing instruction has been executed (see
        Chapter 11. "Synchronization Requirements for Context Alterations"
        on page 1133)."
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      [mpe: Massage change log and add ISA reference]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bbcc07d5
    • Madhavan Srinivasan's avatar
      powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer · 0ebbbeb8
      Madhavan Srinivasan authored
      [ Upstream commit bb19af81 ]
      
      The current Branch History Rolling Buffer (BHRB) code does not check
      for any privilege levels before updating the data from BHRB. This
      could leak kernel addresses to userspace even when profiling only with
      userspace privileges. Add proper checks to prevent it.
      Acked-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ebbbeb8
    • Guenter Roeck's avatar
      hwmon: (nct6775) Fix writing pwmX_mode · 2a48e89c
      Guenter Roeck authored
      [ Upstream commit 415eb2a1 ]
      
      pwmX_mode is defined in the ABI as 0=DC mode, 1=pwm mode. The chip
      register bit is set to 1 for DC mode. This got mixed up, and writing
      1 into pwmX_mode resulted in DC mode enabled. Fix it up by using
      the ABI definition throughout the driver for consistency.
      
      Fixes: 77eb5b37 ("hwmon: (nct6775) Add support for pwm, pwm_mode, ... ")
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a48e89c
    • Helge Deller's avatar
      parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode · 0c16b7ed
      Helge Deller authored
      [ Upstream commit b845f66f ]
      
      Carlo Pisani noticed that his C3600 workstation behaved unstable during heavy
      I/O on the PCI bus with a VIA VT6421 IDE/SATA PCI card.
      
      To avoid such instability, this patch switches the LBA PCI bus from Hard Fail
      mode into Soft Fail mode. In this mode the bus will return -1UL for timed out
      MMIO transactions, which is exactly how the x86 (and most other architectures)
      PCI busses behave.
      
      This patch is based on a proposal by Grant Grundler and Kyle McMartin 10
      years ago:
      https://www.spinics.net/lists/linux-parisc/msg01027.html
      
      Cc: Carlo Pisani <carlojpisani@gmail.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Reviewed-by: default avatarGrant Grundler <grantgrundler@gmail.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c16b7ed
    • Greg Ungerer's avatar
      m68k: set dma and coherent masks for platform FEC ethernets · 4f971111
      Greg Ungerer authored
      [ Upstream commit f61e6431 ]
      
      As of commit 205e1b7f ("dma-mapping: warn when there is no
      coherent_dma_mask") the Freescale FEC driver is issuing the following
      warning on driver initialization on ColdFire systems:
      
      WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 0x40159e20
      Modules linked in:
      CPU: 0 PID: 1 Comm: swapper Not tainted 4.16.0-rc7-dirty #4
      Stack from 41833dd8:
              41833dd8 40259c53 40025534 40279e26 00000003 00000000 4004e514 41827000
              400255de 40244e42 00000204 40159e20 00000009 00000000 00000000 4024531d
              40159e20 40244e42 00000204 00000000 00000000 00000000 00000007 00000000
              00000000 40279e26 4028d040 40226576 4003ae88 40279e26 418273f6 41833ef8
              7fffffff 418273f2 41867028 4003c9a2 4180ac6c 00000004 41833f8c 4013e71c
              40279e1c 40279e26 40226c16 4013ced2 40279e26 40279e58 4028d040 00000000
      Call Trace:
              [<40025534>] 0x40025534
       [<4004e514>] 0x4004e514
       [<400255de>] 0x400255de
       [<40159e20>] 0x40159e20
       [<40159e20>] 0x40159e20
      
      It is not fatal, the driver and the system continue to function normally.
      
      As per the warning the coherent_dma_mask is not set on this device.
      There is nothing special about the DMA memory coherency on this hardware
      so we can just set the mask to 32bits in the platform data for the FEC
      ethernet devices.
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f971111
    • Michael Ellerman's avatar
      powerpc/mpic: Check if cpu_possible() in mpic_physmask() · c73749be
      Michael Ellerman authored
      [ Upstream commit 0834d627 ]
      
      In mpic_physmask() we loop over all CPUs up to 32, then get the hard
      SMP processor id of that CPU.
      
      Currently that's possibly walking off the end of the paca array, but
      in a future patch we will change the paca array to be an array of
      pointers, and in that case we will get a NULL for missing CPUs and
      oops. eg:
      
        Unable to handle kernel paging request for data at address 0x88888888888888b8
        Faulting instruction address: 0xc00000000004e380
        Oops: Kernel access of bad area, sig: 11 [#1]
        ...
        NIP .mpic_set_affinity+0x60/0x1a0
        LR  .irq_do_set_affinity+0x48/0x100
      
      Fix it by checking the CPU is possible, this also fixes the code if
      there are gaps in the CPU numbering which probably never happens on
      mpic systems but who knows.
      Debugged-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c73749be
    • Lenny Szubowicz's avatar
      ACPI: acpi_pad: Fix memory leak in power saving threads · bebc3f01
      Lenny Szubowicz authored
      [ Upstream commit 8b29d29a ]
      
      Fix once per second (round_robin_time) memory leak of about 1 KB in
      each acpi_pad kernel idling thread that is activated.
      
      Found by testing with kmemleak.
      Signed-off-by: default avatarLenny Szubowicz <lszubowi@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bebc3f01
    • Aaro Koskinen's avatar
      drivers: macintosh: rack-meter: really fix bogus memsets · bc45cf24
      Aaro Koskinen authored
      [ Upstream commit e283655b ]
      
      We should zero an array using sizeof instead of number of elements.
      
      Fixes the following compiler (GCC 7.3.0) warnings:
      
      drivers/macintosh/rack-meter.c: In function 'rackmeter_do_pause':
      drivers/macintosh/rack-meter.c:157:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
      drivers/macintosh/rack-meter.c:158:2: warning: 'memset' used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size]
      
      Fixes: 4f7bef7a ("drivers: macintosh: rack-meter: fix bogus memsets")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc45cf24
    • Dan Carpenter's avatar
      xen/acpi: off by one in read_acpi_id() · f2298423
      Dan Carpenter authored
      [ Upstream commit c37a3c94 ]
      
      If acpi_id is == nr_acpi_bits, then we access one element beyond the end
      of the acpi_psd[] array or we set one bit beyond the end of the bit map
      when we do __set_bit(acpi_id, acpi_id_present);
      
      Fixes: 59a56802 ("xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2298423
    • David Howells's avatar
      rxrpc: Don't treat call aborts as conn aborts · 34f0516b
      David Howells authored
      [ Upstream commit 57b0c9d4 ]
      
      If a call-level abort is received for the previous call to complete on a
      connection channel, then that abort is queued for the connection processor
      to handle.  Unfortunately, the connection processor then assumes without
      checking that the abort is connection-level (ie. callNumber is 0) and
      distributes it over all active calls on that connection, thereby
      incorrectly aborting them.
      
      Fix this by discarding aborts aimed at a completed call.
      
      Further, discard all packets aimed at a call that's complete if there's
      currently an active call on a channel, since the DATA packets associated
      with the new call automatically terminate the old call.
      
      Fixes: 18bfeba5 ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34f0516b
    • David Howells's avatar
      rxrpc: Fix Tx ring annotation after initial Tx failure · 9b7c95ac
      David Howells authored
      [ Upstream commit 03877bf6 ]
      
      rxrpc calls have a ring of packets that are awaiting ACK or retransmission
      and a parallel ring of annotations that tracks the state of those packets.
      If the initial transmission of a packet on the underlying UDP socket fails
      then the packet annotation is marked for resend - but the setting of this
      mark accidentally erases the last-packet mark also stored in the same
      annotation slot.  If this happens, a call won't switch out of the Tx phase
      when all the packets have been transmitted.
      
      Fix this by retaining the last-packet mark and only altering the packet
      state.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b7c95ac
    • Jeff Mahoney's avatar
      btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers · 9c38c3ba
      Jeff Mahoney authored
      [ Upstream commit 8a5a916d ]
      
      While running btrfs/011, I hit the following lockdep splat.
      
      This is the important bit:
         pcpu_alloc+0x1ac/0x5e0
         __percpu_counter_init+0x4e/0xb0
         btrfs_init_fs_root+0x99/0x1c0 [btrfs]
         btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
         resolve_indirect_refs+0x130/0x830 [btrfs]
         find_parent_nodes+0x69e/0xff0 [btrfs]
         btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
         btrfs_find_all_roots+0x50/0x70 [btrfs]
         btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
         btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
      
      The percpu_counter_init call in btrfs_alloc_subvolume_writers
      uses GFP_KERNEL, which we can't do during transaction commit.
      
      This switches it to GFP_NOFS.
      
      ========================================================
      WARNING: possible irq lock inversion dependency detected
      4.12.14-kvmsmall #8 Tainted: G        W
      --------------------------------------------------------
      kswapd0/50 just changed the state of lock:
       (&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
      but this lock took another, RECLAIM_FS-unsafe lock in the past:
       (pcpu_alloc_mutex){+.+.+.}
      
      and interrupts could create inverse lock ordering between them.
      
      other info that might help us debug this:
      Chain exists of:
        &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex
      
       Possible interrupt unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(pcpu_alloc_mutex);
                                     local_irq_disable();
                                     lock(&delayed_node->mutex);
                                     lock(&found->groups_sem);
        <Interrupt>
          lock(&delayed_node->mutex);
      
       *** DEADLOCK ***
      
      2 locks held by kswapd0/50:
       #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0
       #1:  (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50
      
      the shortest dependencies between 2nd lock and 1st lock:
         -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 {
            HARDIRQ-ON-W at:
                                __mutex_lock+0x4e/0x8c0
                                pcpu_alloc+0x1ac/0x5e0
                                alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                __do_tune_cpucache+0x2c/0x220
                                do_tune_cpucache+0x26/0xc0
                                enable_cpucache+0x6d/0xf0
                                kmem_cache_init_late+0x42/0x75
                                start_kernel+0x343/0x4cb
                                x86_64_start_kernel+0x127/0x134
                                secondary_startup_64+0xa5/0xb0
            SOFTIRQ-ON-W at:
                                __mutex_lock+0x4e/0x8c0
                                pcpu_alloc+0x1ac/0x5e0
                                alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                __do_tune_cpucache+0x2c/0x220
                                do_tune_cpucache+0x26/0xc0
                                enable_cpucache+0x6d/0xf0
                                kmem_cache_init_late+0x42/0x75
                                start_kernel+0x343/0x4cb
                                x86_64_start_kernel+0x127/0x134
                                secondary_startup_64+0xa5/0xb0
            RECLAIM_FS-ON-W at:
                                   __kmalloc+0x47/0x310
                                   pcpu_extend_area_map+0x2b/0xc0
                                   pcpu_alloc+0x3ec/0x5e0
                                   alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                   __do_tune_cpucache+0x2c/0x220
                                   do_tune_cpucache+0x26/0xc0
                                   enable_cpucache+0x6d/0xf0
                                   __kmem_cache_create+0x1bf/0x390
                                   create_cache+0xba/0x1b0
                                   kmem_cache_create+0x1f8/0x2b0
                                   ksm_init+0x6f/0x19d
                                   do_one_initcall+0x50/0x1b0
                                   kernel_init_freeable+0x201/0x289
                                   kernel_init+0xa/0x100
                                   ret_from_fork+0x3a/0x50
            INITIAL USE at:
                               __mutex_lock+0x4e/0x8c0
                               pcpu_alloc+0x1ac/0x5e0
                               alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                               setup_cpu_cache+0x2f/0x1f0
                               __kmem_cache_create+0x1bf/0x390
                               create_boot_cache+0x8b/0xb1
                               kmem_cache_init+0xa1/0x19e
                               start_kernel+0x270/0x4cb
                               x86_64_start_kernel+0x127/0x134
                               secondary_startup_64+0xa5/0xb0
          }
          ... key      at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0
          ... acquired at:
         pcpu_alloc+0x1ac/0x5e0
         __percpu_counter_init+0x4e/0xb0
         btrfs_init_fs_root+0x99/0x1c0 [btrfs]
         btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
         resolve_indirect_refs+0x130/0x830 [btrfs]
         find_parent_nodes+0x69e/0xff0 [btrfs]
         btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
         btrfs_find_all_roots+0x50/0x70 [btrfs]
         btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
         btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
         transaction_kthread+0x176/0x1b0 [btrfs]
         kthread+0x102/0x140
         ret_from_fork+0x3a/0x50
      
        -> (&fs_info->commit_root_sem){++++..} ops: 1566382 {
           HARDIRQ-ON-W at:
                              down_write+0x3e/0xa0
                              cache_block_group+0x287/0x420 [btrfs]
                              find_free_extent+0x106c/0x12d0 [btrfs]
                              btrfs_reserve_extent+0xd8/0x170 [btrfs]
                              cow_file_range.isra.66+0x133/0x470 [btrfs]
                              run_delalloc_range+0x121/0x410 [btrfs]
                              writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                              __extent_writepage+0x19a/0x360 [btrfs]
                              extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                              extent_writepages+0x4d/0x60 [btrfs]
                              do_writepages+0x1a/0x70
                              __filemap_fdatawrite_range+0xa7/0xe0
                              btrfs_rename+0x5ee/0xdb0 [btrfs]
                              vfs_rename+0x52a/0x7e0
                              SyS_rename+0x351/0x3b0
                              do_syscall_64+0x79/0x1e0
                              entry_SYSCALL_64_after_hwframe+0x42/0xb7
           HARDIRQ-ON-R at:
                              down_read+0x35/0x90
                              caching_thread+0x57/0x560 [btrfs]
                              normal_work_helper+0x1c0/0x5e0 [btrfs]
                              process_one_work+0x1e0/0x5c0
                              worker_thread+0x44/0x390
                              kthread+0x102/0x140
                              ret_from_fork+0x3a/0x50
           SOFTIRQ-ON-W at:
                              down_write+0x3e/0xa0
                              cache_block_group+0x287/0x420 [btrfs]
                              find_free_extent+0x106c/0x12d0 [btrfs]
                              btrfs_reserve_extent+0xd8/0x170 [btrfs]
                              cow_file_range.isra.66+0x133/0x470 [btrfs]
                              run_delalloc_range+0x121/0x410 [btrfs]
                              writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                              __extent_writepage+0x19a/0x360 [btrfs]
                              extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                              extent_writepages+0x4d/0x60 [btrfs]
                              do_writepages+0x1a/0x70
                              __filemap_fdatawrite_range+0xa7/0xe0
                              btrfs_rename+0x5ee/0xdb0 [btrfs]
                              vfs_rename+0x52a/0x7e0
                              SyS_rename+0x351/0x3b0
                              do_syscall_64+0x79/0x1e0
                              entry_SYSCALL_64_after_hwframe+0x42/0xb7
           SOFTIRQ-ON-R at:
                              down_read+0x35/0x90
                              caching_thread+0x57/0x560 [btrfs]
                              normal_work_helper+0x1c0/0x5e0 [btrfs]
                              process_one_work+0x1e0/0x5c0
                              worker_thread+0x44/0x390
                              kthread+0x102/0x140
                              ret_from_fork+0x3a/0x50
           INITIAL USE at:
                             down_write+0x3e/0xa0
                             cache_block_group+0x287/0x420 [btrfs]
                             find_free_extent+0x106c/0x12d0 [btrfs]
                             btrfs_reserve_extent+0xd8/0x170 [btrfs]
                             cow_file_range.isra.66+0x133/0x470 [btrfs]
                             run_delalloc_range+0x121/0x410 [btrfs]
                             writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                             __extent_writepage+0x19a/0x360 [btrfs]
                             extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                             extent_writepages+0x4d/0x60 [btrfs]
                             do_writepages+0x1a/0x70
                             __filemap_fdatawrite_range+0xa7/0xe0
                             btrfs_rename+0x5ee/0xdb0 [btrfs]
                             vfs_rename+0x52a/0x7e0
                             SyS_rename+0x351/0x3b0
                             do_syscall_64+0x79/0x1e0
                             entry_SYSCALL_64_after_hwframe+0x42/0xb7
         }
         ... key      at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs]
         ... acquired at:
         cache_block_group+0x287/0x420 [btrfs]
         find_free_extent+0x106c/0x12d0 [btrfs]
         btrfs_reserve_extent+0xd8/0x170 [btrfs]
         btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
         btrfs_create_tree+0xbb/0x2a0 [btrfs]
         btrfs_create_uuid_tree+0x37/0x140 [btrfs]
         open_ctree+0x23c0/0x2660 [btrfs]
         btrfs_mount+0xd36/0xf90 [btrfs]
         mount_fs+0x3a/0x160
         vfs_kern_mount+0x66/0x150
         btrfs_mount+0x18c/0xf90 [btrfs]
         mount_fs+0x3a/0x160
         vfs_kern_mount+0x66/0x150
         do_mount+0x1c1/0xcc0
         SyS_mount+0x7e/0xd0
         do_syscall_64+0x79/0x1e0
         entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
       -> (&found->groups_sem){++++..} ops: 2134587 {
          HARDIRQ-ON-W at:
                            down_write+0x3e/0xa0
                            __link_block_group+0x34/0x130 [btrfs]
                            btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                            open_ctree+0x2054/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          HARDIRQ-ON-R at:
                            down_read+0x35/0x90
                            btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                            open_ctree+0x207b/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          SOFTIRQ-ON-W at:
                            down_write+0x3e/0xa0
                            __link_block_group+0x34/0x130 [btrfs]
                            btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                            open_ctree+0x2054/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          SOFTIRQ-ON-R at:
                            down_read+0x35/0x90
                            btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                            open_ctree+0x207b/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          INITIAL USE at:
                           down_write+0x3e/0xa0
                           __link_block_group+0x34/0x130 [btrfs]
                           btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                           open_ctree+0x2054/0x2660 [btrfs]
                           btrfs_mount+0xd36/0xf90 [btrfs]
                           mount_fs+0x3a/0x160
                           vfs_kern_mount+0x66/0x150
                           btrfs_mount+0x18c/0xf90 [btrfs]
                           mount_fs+0x3a/0x160
                           vfs_kern_mount+0x66/0x150
                           do_mount+0x1c1/0xcc0
                           SyS_mount+0x7e/0xd0
                           do_syscall_64+0x79/0x1e0
                           entry_SYSCALL_64_after_hwframe+0x42/0xb7
        }
        ... key      at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs]
        ... acquired at:
         find_free_extent+0xcb4/0x12d0 [btrfs]
         btrfs_reserve_extent+0xd8/0x170 [btrfs]
         btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
         __btrfs_cow_block+0x110/0x5b0 [btrfs]
         btrfs_cow_block+0xd7/0x290 [btrfs]
         btrfs_search_slot+0x1f6/0x960 [btrfs]
         btrfs_lookup_inode+0x2a/0x90 [btrfs]
         __btrfs_update_delayed_inode+0x65/0x210 [btrfs]
         btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs]
         btrfs_evict_inode+0x3fe/0x6a0 [btrfs]
         evict+0xc4/0x190
         __dentry_kill+0xbf/0x170
         dput+0x2ae/0x2f0
         SyS_rename+0x2a6/0x3b0
         do_syscall_64+0x79/0x1e0
         entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> (&delayed_node->mutex){+.+.-.} ops: 5580204 {
         HARDIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                          btrfs_update_inode+0x83/0x110 [btrfs]
                          btrfs_dirty_inode+0x62/0xe0 [btrfs]
                          touch_atime+0x8c/0xb0
                          do_generic_file_read+0x818/0xb10
                          __vfs_read+0xdc/0x150
                          vfs_read+0x8a/0x130
                          SyS_read+0x45/0xa0
                          do_syscall_64+0x79/0x1e0
                          entry_SYSCALL_64_after_hwframe+0x42/0xb7
         SOFTIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                          btrfs_update_inode+0x83/0x110 [btrfs]
                          btrfs_dirty_inode+0x62/0xe0 [btrfs]
                          touch_atime+0x8c/0xb0
                          do_generic_file_read+0x818/0xb10
                          __vfs_read+0xdc/0x150
                          vfs_read+0x8a/0x130
                          SyS_read+0x45/0xa0
                          do_syscall_64+0x79/0x1e0
                          entry_SYSCALL_64_after_hwframe+0x42/0xb7
         IN-RECLAIM_FS-W at:
                             __mutex_lock+0x4e/0x8c0
                             __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
                             btrfs_evict_inode+0x22c/0x6a0 [btrfs]
                             evict+0xc4/0x190
                             dispose_list+0x35/0x50
                             prune_icache_sb+0x42/0x50
                             super_cache_scan+0x139/0x190
                             shrink_slab+0x262/0x5b0
                             shrink_node+0x2eb/0x2f0
                             kswapd+0x2eb/0x890
                             kthread+0x102/0x140
                             ret_from_fork+0x3a/0x50
         INITIAL USE at:
                         __mutex_lock+0x4e/0x8c0
                         btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                         btrfs_update_inode+0x83/0x110 [btrfs]
                         btrfs_dirty_inode+0x62/0xe0 [btrfs]
                         touch_atime+0x8c/0xb0
                         do_generic_file_read+0x818/0xb10
                         __vfs_read+0xdc/0x150
                         vfs_read+0x8a/0x130
                         SyS_read+0x45/0xa0
                         do_syscall_64+0x79/0x1e0
                         entry_SYSCALL_64_after_hwframe+0x42/0xb7
       }
       ... key      at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs]
       ... acquired at:
         __lock_acquire+0x264/0x11c0
         lock_acquire+0xbd/0x1e0
         __mutex_lock+0x4e/0x8c0
         __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
         btrfs_evict_inode+0x22c/0x6a0 [btrfs]
         evict+0xc4/0x190
         dispose_list+0x35/0x50
         prune_icache_sb+0x42/0x50
         super_cache_scan+0x139/0x190
         shrink_slab+0x262/0x5b0
         shrink_node+0x2eb/0x2f0
         kswapd+0x2eb/0x890
         kthread+0x102/0x140
         ret_from_fork+0x3a/0x50
      
      stack backtrace:
      CPU: 1 PID: 50 Comm: kswapd0 Tainted: G        W        4.12.14-kvmsmall #8 SLE15 (unreleased)
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
      Call Trace:
       dump_stack+0x78/0xb7
       print_irq_inversion_bug.part.38+0x19f/0x1aa
       check_usage_forwards+0x102/0x120
       ? ret_from_fork+0x3a/0x50
       ? check_usage_backwards+0x110/0x110
       mark_lock+0x16c/0x270
       __lock_acquire+0x264/0x11c0
       ? pagevec_lookup_entries+0x1a/0x30
       ? truncate_inode_pages_range+0x2b3/0x7f0
       lock_acquire+0xbd/0x1e0
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       __mutex_lock+0x4e/0x8c0
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs]
       __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       btrfs_evict_inode+0x22c/0x6a0 [btrfs]
       evict+0xc4/0x190
       dispose_list+0x35/0x50
       prune_icache_sb+0x42/0x50
       super_cache_scan+0x139/0x190
       shrink_slab+0x262/0x5b0
       shrink_node+0x2eb/0x2f0
       kswapd+0x2eb/0x890
       kthread+0x102/0x140
       ? mem_cgroup_shrink_node+0x2c0/0x2c0
       ? kthread_create_on_node+0x40/0x40
       ret_from_fork+0x3a/0x50
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c38c3ba
    • Filipe Manana's avatar
      Btrfs: fix copy_items() return value when logging an inode · bee3c02a
      Filipe Manana authored
      [ Upstream commit 8434ec46 ]
      
      When logging an inode, at tree-log.c:copy_items(), if we call
      btrfs_next_leaf() at the loop which checks for the need to log holes, we
      need to make sure copy_items() returns the value 1 to its caller and
      not 0 (on success). This is because the path the caller passed was
      released and is now different from what is was before, and the caller
      expects a return value of 0 to mean both success and that the path
      has not changed, while a return value of 1 means both success and
      signals the caller that it can not reuse the path, it has to perform
      another tree search.
      
      Even though this is a case that should not be triggered on normal
      circumstances or very rare at least, its consequences can be very
      unpredictable (especially when replaying a log tree).
      
      Fixes: 16e7549f ("Btrfs: incompatible format change to remove hole extents")
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bee3c02a
    • Qu Wenruo's avatar
      btrfs: tests/qgroup: Fix wrong tree backref level · 5934adaf
      Qu Wenruo authored
      [ Upstream commit 3c0efdf0 ]
      
      The extent tree of the test fs is like the following:
      
       BTRFS info (device (null)): leaf 16327509003777336587 total ptrs 1 free space 3919
        item 0 key (4096 168 4096) itemoff 3944 itemsize 51
                extent refs 1 gen 1 flags 2
                tree block key (68719476736 0 0) level 1
                                                 ^^^^^^^
                ref#0: tree block backref root 5
      
      And it's using an empty tree for fs tree, so there is no way that its
      level can be 1.
      
      For REAL (created by mkfs) fs tree backref with no skinny metadata, the
      result should look like:
      
       item 3 key (30408704 EXTENT_ITEM 4096) itemoff 3845 itemsize 51
               refs 1 gen 4 flags TREE_BLOCK
               tree block key (256 INODE_ITEM 0) level 0
                                                 ^^^^^^^
               tree block backref root 5
      
      Fix the level to 0, so it won't break later tree level checker.
      
      Fixes: faa2dbf0 ("Btrfs: add sanity tests for new qgroup accounting code")
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5934adaf
    • Florian Fainelli's avatar
      net: bgmac: Fix endian access in bgmac_dma_tx_ring_free() · deb064c4
      Florian Fainelli authored
      [ Upstream commit 60d6e6f0 ]
      
      bgmac_dma_tx_ring_free() assigns the ctl1 word which is a litle endian
      32-bit word without using proper accessors, fix this, and because a
      length cannot be negative, use unsigned int while at it.
      
      Fixes: 9cde9450 ("bgmac: implement scatter/gather support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      deb064c4
    • David S. Miller's avatar
      sparc64: Make atomic_xchg() an inline function rather than a macro. · 8e02caa1
      David S. Miller authored
      [ Upstream commit d13864b6 ]
      
      This avoids a lot of -Wunused warnings such as:
      
      ====================
      kernel/debug/debug_core.c: In function ‘kgdb_cpu_enter’:
      ./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value]
       #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
      
      ./arch/sparc/include/asm/atomic_64.h:86:30: note: in expansion of macro ‘xchg’
       #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
                                    ^~~~
      kernel/debug/debug_core.c:508:4: note: in expansion of macro ‘atomic_xchg’
          atomic_xchg(&kgdb_active, cpu);
          ^~~~~~~~~~~
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e02caa1
    • David Howells's avatar
      fscache: Fix hanging wait on page discarded by writeback · 9397f74d
      David Howells authored
      [ Upstream commit 2c984257 ]
      
      If the fscache asynchronous write operation elects to discard a page that's
      pending storage to the cache because the page would be over the store limit
      then it needs to wake the page as someone may be waiting on completion of
      the write.
      
      The problem is that the store limit may be updated by a different
      asynchronous operation - and so may miss the write - and that the store
      limit may not even get updated until later by the netfs.
      
      Fix the kernel hang by making fscache_write_op() mark as written any pages
      that are over the limit.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9397f74d
    • Sean Christopherson's avatar
      KVM: VMX: raise internal error for exception during invalid protected mode state · 94b4fed8
      Sean Christopherson authored
      [ Upstream commit add5ff7a ]
      
      Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter
      an exception in Protected Mode while emulating guest due to invalid
      guest state.  Unlike Big RM, KVM doesn't support emulating exceptions
      in PM, i.e. PM exceptions are always injected via the VMCS.  Because
      we will never do VMRESUME due to emulation_required, the exception is
      never realized and we'll keep emulating the faulting instruction over
      and over until we receive a signal.
      
      Exit to userspace iff there is a pending exception, i.e. don't exit
      simply on a requested event. The purpose of this check and exit is to
      aid in debugging a guest that is in all likelihood already doomed.
      Invalid guest state in PM is extremely limited in normal operation,
      e.g. it generally only occurs for a few instructions early in BIOS,
      and any exception at this time is all but guaranteed to be fatal.
      Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly
      handled/emulated, while checking for vectored interrupts, e.g. INTR
      and NMI, without hitting false positives would add a fair amount of
      complexity for almost no benefit (getting hit by lightning seems
      more likely than encountering this specific scenario).
      
      Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an
      exception via the VMCS and emulation_required is true.
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94b4fed8
    • Davidlohr Bueso's avatar
      sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning · 3d063255
      Davidlohr Bueso authored
      [ Upstream commit d29a2064 ]
      
      While running rt-tests' pi_stress program I got the following splat:
      
        rq->clock_update_flags < RQCF_ACT_SKIP
        WARNING: CPU: 27 PID: 0 at kernel/sched/sched.h:960 assert_clock_updated.isra.38.part.39+0x13/0x20
      
        [...]
      
        <IRQ>
        enqueue_top_rt_rq+0xf4/0x150
        ? cpufreq_dbs_governor_start+0x170/0x170
        sched_rt_rq_enqueue+0x65/0x80
        sched_rt_period_timer+0x156/0x360
        ? sched_rt_rq_enqueue+0x80/0x80
        __hrtimer_run_queues+0xfa/0x260
        hrtimer_interrupt+0xcb/0x220
        smp_apic_timer_interrupt+0x62/0x120
        apic_timer_interrupt+0xf/0x20
        </IRQ>
      
        [...]
      
        do_idle+0x183/0x1e0
        cpu_startup_entry+0x5f/0x70
        start_secondary+0x192/0x1d0
        secondary_startup_64+0xa5/0xb0
      
      We can get rid of it be the "traditional" means of adding an
      update_rq_clock() call after acquiring the rq->lock in
      do_sched_rt_period_timer().
      
      The case for the RT task throttling (which this workload also hits)
      can be ignored in that the skip_update call is actually bogus and
      quite the contrary (the request bits are removed/reverted).
      
      By setting RQCF_UPDATED we really don't care if the skip is happening
      or not and will therefore make the assert_clock_updated() check happy.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Reviewed-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: linux-kernel@vger.kernel.org
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/20180402164954.16255-1-dave@stgolabs.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d063255
    • Jun Piao's avatar
      ocfs2/dlm: don't handle migrate lockres if already in shutdown · fbf947dd
      Jun Piao authored
      [ Upstream commit bb34f24c ]
      
      We should not handle migrate lockres if we are already in
      'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after leaving
      dlm domain.  At last other nodes will get stuck into infinite loop when
      requsting lock from us.
      
      The problem is caused by concurrency umount between nodes.  Before
      receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as the
      migrate target.  So N2 will continue sending lockres to N1 even though
      N1 has left domain.
      
              N1                             N2 (owner)
                                             touch file
      
          access the file,
          and get pr lock
      
                                             begin leave domain and
                                             pick up N1 as new owner
      
          begin leave domain and
          migrate all lockres done
      
                                             begin migrate lockres to N1
      
          end leave domain, but
          the lockres left
          unexpectedly, because
          migrate task has passed
      
      [piaojun@huawei.com: v3]
        Link: http://lkml.kernel.org/r/5A9CBD19.5020107@huawei.com
      Link: http://lkml.kernel.org/r/5A99F028.2090902@huawei.comSigned-off-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Reviewed-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fbf947dd
    • Nikolay Borisov's avatar
      btrfs: Fix possible softlock on single core machines · 4367fb9e
      Nikolay Borisov authored
      [ Upstream commit 1e1c50a9 ]
      
      do_chunk_alloc implements a loop checking whether there is a pending
      chunk allocation and if so causes the caller do loop. Generally this
      loop is executed only once, however testing with btrfs/072 on a single
      core vm machines uncovered an extreme case where the system could loop
      indefinitely. This is due to a missing cond_resched when loop which
      doesn't give a chance to the previous chunk allocator finish its job.
      
      The fix is to simply add the missing cond_resched.
      
      Fixes: 6d74119f ("Btrfs: avoid taking the chunk_mutex in do_chunk_alloc")
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4367fb9e
    • Liu Bo's avatar
      Btrfs: fix NULL pointer dereference in log_dir_items · 14c4d5f6
      Liu Bo authored
      [ Upstream commit 80c0b421 ]
      
      0, 1 and <0 can be returned by btrfs_next_leaf(), and when <0 is
      returned, path->nodes[0] could be NULL, log_dir_items lacks such a
      check for <0 and we may run into a null pointer dereference panic.
      
      Fixes: e02119d5 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14c4d5f6
    • Liu Bo's avatar
      Btrfs: bail out on error during replay_dir_deletes · b3835752
      Liu Bo authored
      [ Upstream commit b98def7c ]
      
      If errors were returned by btrfs_next_leaf(), replay_dir_deletes needs
      to bail out, otherwise @ret would be forced to be 0 after 'break;' and
      the caller won't be aware of it.
      
      Fixes: e02119d5 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3835752
    • Huang Ying's avatar
      mm: fix races between address_space dereference and free in page_evicatable · d7f4e948
      Huang Ying authored
      [ Upstream commit e92bb4dd ]
      
      When page_mapping() is called and the mapping is dereferenced in
      page_evicatable() through shrink_active_list(), it is possible for the
      inode to be truncated and the embedded address space to be freed at the
      same time.  This may lead to the following race.
      
      CPU1                                                CPU2
      
      truncate(inode)                                     shrink_active_list()
        ...                                                 page_evictable(page)
        truncate_inode_page(mapping, page);
          delete_from_page_cache(page)
            spin_lock_irqsave(&mapping->tree_lock, flags);
              __delete_from_page_cache(page, NULL)
                page_cache_tree_delete(..)
                  ...                                         mapping = page_mapping(page);
                  page->mapping = NULL;
                  ...
            spin_unlock_irqrestore(&mapping->tree_lock, flags);
            page_cache_free_page(mapping, page)
              put_page(page)
                if (put_page_testzero(page)) -> false
      - inode now has no pages and can be freed including embedded address_space
      
                                                              mapping_unevictable(mapping)
      							  test_bit(AS_UNEVICTABLE, &mapping->flags);
      - we've dereferenced mapping which is potentially already free.
      
      Similar race exists between swap cache freeing and page_evicatable()
      too.
      
      The address_space in inode and swap cache will be freed after a RCU
      grace period.  So the races are fixed via enclosing the page_mapping()
      and address_space usage in rcu_read_lock/unlock().  Some comments are
      added in code to make it clear what is protected by the RCU read lock.
      
      Link: http://lkml.kernel.org/r/20180212081227.1940-1-ying.huang@intel.comSigned-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7f4e948
    • Claudio Imbrenda's avatar
      mm/ksm: fix interaction with THP · 2272b832
      Claudio Imbrenda authored
      [ Upstream commit 77da2ba0 ]
      
      This patch fixes a corner case for KSM.  When two pages belong or
      belonged to the same transparent hugepage, and they should be merged,
      KSM fails to split the page, and therefore no merging happens.
      
      This bug can be reproduced by:
      * making sure ksm is running (in case disabling ksmtuned)
      * enabling transparent hugepages
      * allocating a THP-aligned 1-THP-sized buffer
        e.g. on amd64: posix_memalign(&p, 1<<21, 1<<21)
      * filling it with the same values
        e.g. memset(p, 42, 1<<21)
      * performing madvise to make it mergeable
        e.g. madvise(p, 1<<21, MADV_MERGEABLE)
      * waiting for KSM to perform a few scans
      
      The expected outcome is that the all the pages get merged (1 shared and
      the rest sharing); the actual outcome is that no pages get merged (1
      unshared and the rest volatile)
      
      The reason of this behaviour is that we increase the reference count
      once for both pages we want to merge, but if they belong to the same
      hugepage (or compound page), the reference counter used in both cases is
      the one of the head of the compound page.  This means that
      split_huge_page will find a value of the reference counter too high and
      will fail.
      
      This patch solves this problem by testing if the two pages to merge
      belong to the same hugepage when attempting to merge them.  If so, the
      hugepage is split safely.  This means that the hugepage is not split if
      not necessary.
      
      Link: http://lkml.kernel.org/r/1521548069-24758-1-git-send-email-imbrenda@linux.vnet.ibm.comSigned-off-by: default avatarClaudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Co-authored-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2272b832
    • Esben Haabendal's avatar
      dp83640: Ensure against premature access to PHY registers after reset · bb928fbe
      Esben Haabendal authored
      [ Upstream commit 76327a35 ]
      
      The datasheet specifies a 3uS pause after performing a software
      reset. The default implementation of genphy_soft_reset() does not
      provide this, so implement soft_reset with the needed pause.
      Signed-off-by: default avatarEsben Haabendal <eha@deif.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb928fbe
    • Shunyong Yang's avatar
      cpufreq: CPPC: Initialize shared perf capabilities of CPUs · 707f25a2
      Shunyong Yang authored
      [ Upstream commit 8913315e ]
      
      When multiple CPUs are related in one cpufreq policy, the first online
      CPU will be chosen by default to handle cpufreq operations. Let's take
      cpu0 and cpu1 as an example.
      
      When cpu0 is offline, policy->cpu will be shifted to cpu1. cpu1's perf
      capabilities should be initialized. Otherwise, perf capabilities are 0s
      and speed change can not take effect.
      
      This patch copies perf capabilities of the first online CPU to other
      shared CPUs when policy shared type is CPUFREQ_SHARED_TYPE_ANY.
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarShunyong Yang <shunyong.yang@hxt-semitech.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      707f25a2
    • Carlos Maiolino's avatar
      Force log to disk before reading the AGF during a fstrim · 3bb576ce
      Carlos Maiolino authored
      [ Upstream commit 8c81dd46 ]
      
      Forcing the log to disk after reading the agf is wrong, we might be
      calling xfs_log_force with XFS_LOG_SYNC with a metadata lock held.
      
      This can cause a deadlock when racing a fstrim with a filesystem
      shutdown.
      
      The deadlock has been identified due a miscalculation bug in device-mapper
      dm-thin, which returns lack of space to its users earlier than the device itself
      really runs out of space, changing the device-mapper volume into an error state.
      
      The problem happened while filling the filesystem with a single file,
      triggering the bug in device-mapper, consequently causing an IO error
      and shutting down the filesystem.
      
      If such file is removed, and fstrim executed before the XFS finishes the
      shut down process, the fstrim process will end up holding the buffer
      lock, and going to sleep on the cil wait queue.
      
      At this point, the shut down process will try to wake up all the threads
      waiting on the cil wait queue, but for this, it will try to hold the
      same buffer log already held my the fstrim, locking up the filesystem.
      Signed-off-by: default avatarCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3bb576ce
    • Jens Axboe's avatar
      sr: get/drop reference to device in revalidate and check_events · b2666b2b
      Jens Axboe authored
      [ Upstream commit 2d097c50 ]
      
      We can't just use scsi_cd() to get the scsi_cd structure, we have
      to grab a live reference to the device. For both callbacks, we're
      not inside an open where we already hold a reference to the device.
      
      This fixes device removal/addition under concurrent device access,
      which otherwise could result in the below oops.
      
      NULL pointer dereference at 0000000000000010
      PGD 0 P4D 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in:
      sr 12:0:0:0: [sr2] scsi-1 drive
       scsi_debug crc_t10dif crct10dif_generic crct10dif_common nvme nvme_core sb_edac xl
      sr 12:0:0:0: Attached scsi CD-ROM sr2
       sr_mod cdrom btrfs xor zstd_decompress zstd_compress xxhash lzo_compress zlib_defc
      sr 12:0:0:0: Attached scsi generic sg7 type 5
       igb ahci libahci i2c_algo_bit libata dca [last unloaded: crc_t10dif]
      CPU: 43 PID: 4629 Comm: systemd-udevd Not tainted 4.16.0+ #650
      Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
      RIP: 0010:sr_block_revalidate_disk+0x23/0x190 [sr_mod]
      RSP: 0018:ffff883ff357bb58 EFLAGS: 00010292
      RAX: ffffffffa00b07d0 RBX: ffff883ff3058000 RCX: ffff883ff357bb66
      RDX: 0000000000000003 RSI: 0000000000007530 RDI: ffff881fea631000
      RBP: 0000000000000000 R08: ffff881fe4d38400 R09: 0000000000000000
      R10: 0000000000000000 R11: 00000000000001b6 R12: 000000000800005d
      R13: 000000000800005d R14: ffff883ffd9b3790 R15: 0000000000000000
      FS:  00007f7dc8e6d8c0(0000) GS:ffff883fff340000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000010 CR3: 0000003ffda98005 CR4: 00000000003606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       ? __invalidate_device+0x48/0x60
       check_disk_change+0x4c/0x60
       sr_block_open+0x16/0xd0 [sr_mod]
       __blkdev_get+0xb9/0x450
       ? iget5_locked+0x1c0/0x1e0
       blkdev_get+0x11e/0x320
       ? bdget+0x11d/0x150
       ? _raw_spin_unlock+0xa/0x20
       ? bd_acquire+0xc0/0xc0
       do_dentry_open+0x1b0/0x320
       ? inode_permission+0x24/0xc0
       path_openat+0x4e6/0x1420
       ? cpumask_any_but+0x1f/0x40
       ? flush_tlb_mm_range+0xa0/0x120
       do_filp_open+0x8c/0xf0
       ? __seccomp_filter+0x28/0x230
       ? _raw_spin_unlock+0xa/0x20
       ? __handle_mm_fault+0x7d6/0x9b0
       ? list_lru_add+0xa8/0xc0
       ? _raw_spin_unlock+0xa/0x20
       ? __alloc_fd+0xaf/0x160
       ? do_sys_open+0x1a6/0x230
       do_sys_open+0x1a6/0x230
       do_syscall_64+0x5a/0x100
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      Reviewed-by: default avatarLee Duncan <lduncan@suse.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2666b2b
    • Tom Abraham's avatar
      swap: divide-by-zero when zero length swap file on ssd · 42379478
      Tom Abraham authored
      [ Upstream commit a06ad633 ]
      
      Calling swapon() on a zero length swap file on SSD can lead to a
      divide-by-zero.
      
      Although creating such files isn't possible with mkswap and they woud be
      considered invalid, it would be better for the swapon code to be more
      robust and handle this condition gracefully (return -EINVAL).
      Especially since the fix is small and straightforward.
      
      To help with wear leveling on SSD, the swapon syscall calculates a
      random position in the swap file using modulo p->highest_bit, which is
      set to maxpages - 1 in read_swap_header.
      
      If the swap file is zero length, read_swap_header sets maxpages=1 and
      last_page=0, resulting in p->highest_bit=0 and we divide-by-zero when we
      modulo p->highest_bit in swapon syscall.
      
      This can be prevented by having read_swap_header return zero if
      last_page is zero.
      
      Link: http://lkml.kernel.org/r/5AC747C1020000A7001FA82C@prv-mh.provo.novell.comSigned-off-by: default avatarThomas Abraham <tabraham@suse.com>
      Reported-by: <Mark.Landis@Teradata.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      42379478
    • Danilo Krummrich's avatar
      fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table · b6214383
      Danilo Krummrich authored
      [ Upstream commit a0b0d1c3 ]
      
      proc_sys_link_fill_cache() does not take currently unregistering sysctl
      tables into account, which might result into a page fault in
      sysctl_follow_link() - add a check to fix it.
      
      This bug has been present since v3.4.
      
      Link: http://lkml.kernel.org/r/20180228013506.4915-1-danilokrummrich@dk-develop.de
      Fixes: 0e47c99d ("sysctl: Replace root_list with links between sysctl_table_sets")
      Signed-off-by: default avatarDanilo Krummrich <danilokrummrich@dk-develop.de>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b6214383
    • Dave Hansen's avatar
      x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init · 2bbb81de
      Dave Hansen authored
      [ Upstream commit 639d6aaf ]
      
      __ro_after_init data gets stuck in the .rodata section.  That's normally
      fine because the kernel itself manages the R/W properties.
      
      But, if we run __change_page_attr() on an area which is __ro_after_init,
      the .rodata checks will trigger and force the area to be immediately
      read-only, even if it is early-ish in boot.  This caused problems when
      trying to clear the _PAGE_GLOBAL bit for these area in the PTI code:
      it cleared _PAGE_GLOBAL like I asked, but also took it up on itself
      to clear _PAGE_RW.  The kernel then oopses the next time it wrote to
      a __ro_after_init data structure.
      
      To fix this, add the kernel_set_to_readonly check, just like we have
      for kernel text, just a few lines below in this function.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205514.8D898241@viggo.jf.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2bbb81de
    • Joerg Roedel's avatar
      x86/pgtable: Don't set huge PUD/PMD on non-leaf entries · 1e269416
      Joerg Roedel authored
      [ Upstream commit e3e28812 ]
      
      The pmd_set_huge() and pud_set_huge() functions are used from
      the generic ioremap() code to establish large mappings where this
      is possible.
      
      But the generic ioremap() code does not check whether the
      PMD/PUD entries are already populated with a non-leaf entry,
      so that any page-table pages these entries point to will be
      lost.
      
      Further, on x86-32 with SHARED_KERNEL_PMD=0, this causes a
      BUG_ON() in vmalloc_sync_one() when PMD entries are synced
      from swapper_pg_dir to the current page-table. This happens
      because the PMD entry from swapper_pg_dir was promoted to a
      huge-page entry while the current PGD still contains the
      non-leaf entry. Because both entries are present and point
      to a different page, the BUG_ON() triggers.
      
      This was actually triggered with pti-x32 enabled in a KVM
      virtual machine by the graphics driver.
      
      A real and better fix for that would be to improve the
      page-table handling in the generic ioremap() code. But that is
      out-of-scope for this patch-set and left for later work.
      Reported-by: default avatarDavid H. Gutteridge <dhgutteridge@sympatico.ca>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Waiman Long <llong@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180411152437.GC15462@8bytes.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e269416
    • Johannes Thumshirn's avatar
      nvme: don't send keep-alives to the discovery controller · f2a1bf45
      Johannes Thumshirn authored
      [ Upstream commit 74c6c715 ]
      
      NVMe over Fabrics 1.0 Section 5.2 "Discovery Controller Properties and
      Command Support" Figure 31 "Discovery Controller – Admin Commands"
      explicitly listst all commands but "Get Log Page" and "Identify" as
      reserved, but NetApp report the Linux host is sending Keep Alive
      commands to the discovery controller, which is a violation of the
      Spec.
      
      We're already checking for discovery controllers when configuring the
      keep alive timeout but when creating a discovery controller we're not
      hard wiring the keep alive timeout to 0 and thus remain on
      NVME_DEFAULT_KATO for the discovery controller.
      
      This can be easily remproduced when issuing a direct connect to the
      discovery susbsystem using:
      'nvme connect [...] --nqn=nqn.2014-08.org.nvmexpress.discovery'
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Fixes: 07bfcd09 ("nvme-fabrics: add a generic NVMe over Fabrics library")
      Reported-by: default avatarMartin George <marting@netapp.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2a1bf45
    • Rich Felker's avatar
      sh: fix debug trap failure to process signals before return to user · bec95d21
      Rich Felker authored
      [ Upstream commit 96a59899 ]
      
      When responding to a debug trap (breakpoint) in userspace, the
      kernel's trap handler raised SIGTRAP but returned from the trap via a
      code path that ignored pending signals, resulting in an infinite loop
      re-executing the trapping instruction.
      Signed-off-by: default avatarRich Felker <dalias@libc.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bec95d21
    • Yelena Krivosheev's avatar
      net: mvneta: fix enable of all initialized RXQs · 2b1c1ad8
      Yelena Krivosheev authored
      [ Upstream commit e81b5e01 ]
      
      In mvneta_port_up() we enable relevant RX and TX port queues by write
      queues bit map to an appropriate register.
      
      q_map must be ZERO in the beginning of this process.
      Signed-off-by: default avatarYelena Krivosheev <yelena@marvell.com>
      Acked-by: default avatarThomas Petazzoni <thomas.petazzoni@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b1c1ad8
    • Toshiaki Makita's avatar
      net: Fix untag for vlan packets without ethernet header · b389e04a
      Toshiaki Makita authored
      [ Upstream commit ae474573 ]
      
      In some situation vlan packets do not have ethernet headers. One example
      is packets from tun devices. Users can specify vlan protocol in tun_pi
      field instead of IP protocol, and skb_vlan_untag() attempts to untag such
      packets.
      
      skb_vlan_untag() (more precisely, skb_reorder_vlan_header() called by it)
      however did not expect packets without ethernet headers, so in such a case
      size argument for memmove() underflowed and triggered crash.
      
      ====
      BUG: unable to handle kernel paging request at ffff8801cccb8000
      IP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43
      PGD 9cee067 P4D 9cee067 PUD 1d9401063 PMD 1cccb7063 PTE 2810100028101
      Oops: 000b [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 17663 Comm: syz-executor2 Not tainted 4.16.0-rc7+ #368
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43
      RSP: 0018:ffff8801cc046e28 EFLAGS: 00010287
      RAX: ffff8801ccc244c4 RBX: fffffffffffffffe RCX: fffffffffff6c4c2
      RDX: fffffffffffffffe RSI: ffff8801cccb7ffc RDI: ffff8801cccb8000
      RBP: ffff8801cc046e48 R08: ffff8801ccc244be R09: ffffed0039984899
      R10: 0000000000000001 R11: ffffed0039984898 R12: ffff8801ccc244c4
      R13: ffff8801ccc244c0 R14: ffff8801d96b7c06 R15: ffff8801d96b7b40
      FS:  00007febd562d700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff8801cccb8000 CR3: 00000001ccb2f006 CR4: 00000000001606e0
      DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
      Call Trace:
       memmove include/linux/string.h:360 [inline]
       skb_reorder_vlan_header net/core/skbuff.c:5031 [inline]
       skb_vlan_untag+0x470/0xc40 net/core/skbuff.c:5061
       __netif_receive_skb_core+0x119c/0x3460 net/core/dev.c:4460
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4627
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4701
       netif_receive_skb+0xae/0x390 net/core/dev.c:4725
       tun_rx_batched.isra.50+0x5ee/0x870 drivers/net/tun.c:1555
       tun_get_user+0x299e/0x3c20 drivers/net/tun.c:1962
       tun_chr_write_iter+0xb9/0x160 drivers/net/tun.c:1990
       call_write_iter include/linux/fs.h:1782 [inline]
       new_sync_write fs/read_write.c:469 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:482
       vfs_write+0x189/0x510 fs/read_write.c:544
       SYSC_write fs/read_write.c:589 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:581
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x454879
      RSP: 002b:00007febd562cc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007febd562d6d4 RCX: 0000000000454879
      RDX: 0000000000000157 RSI: 0000000020000180 RDI: 0000000000000014
      RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000006b0 R14: 00000000006fc120 R15: 0000000000000000
      Code: 90 90 90 90 90 90 90 48 89 f8 48 83 fa 20 0f 82 03 01 00 00 48 39 fe 7d 0f 49 89 f0 49 01 d0 49 39 f8 0f 8f 9f 00 00 00 48 89 d1 <f3> a4 c3 48 81 fa a8 02 00 00 72 05 40 38 fe 74 3b 48 83 ea 20
      RIP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43 RSP: ffff8801cc046e28
      CR2: ffff8801cccb8000
      ====
      
      We don't need to copy headers for packets which do not have preceding
      headers of vlan headers, so skip memmove() in that case.
      
      Fixes: 4bbb3e0e ("net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b389e04a
    • Vinayak Menon's avatar
      mm/kmemleak.c: wait for scan completion before disabling free · 51db6743
      Vinayak Menon authored
      [ Upstream commit 914b6dff ]
      
      A crash is observed when kmemleak_scan accesses the object->pointer,
      likely due to the following race.
      
        TASK A             TASK B                     TASK C
        kmemleak_write
         (with "scan" and
         NOT "scan=on")
        kmemleak_scan()
                           create_object
                           kmem_cache_alloc fails
                           kmemleak_disable
                           kmemleak_do_cleanup
                           kmemleak_free_enabled = 0
                                                      kfree
                                                      kmemleak_free bails out
                                                       (kmemleak_free_enabled is 0)
                                                      slub frees object->pointer
        update_checksum
        crash - object->pointer
         freed (DEBUG_PAGEALLOC)
      
      kmemleak_do_cleanup waits for the scan thread to complete, but not for
      direct call to kmemleak_scan via kmemleak_write.  So add a wait for
      kmemleak_scan completion before disabling kmemleak_free, and while at it
      fix the comment on stop_scan_thread.
      
      [vinmenon@codeaurora.org: fix stop_scan_thread comment]
        Link: http://lkml.kernel.org/r/1522219972-22809-1-git-send-email-vinmenon@codeaurora.org
      Link: http://lkml.kernel.org/r/1522063429-18992-1-git-send-email-vinmenon@codeaurora.orgSigned-off-by: default avatarVinayak Menon <vinmenon@codeaurora.org>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51db6743