1. 30 May, 2018 40 commits
    • Tang Junhui's avatar
      bcache: return attach error when no cache set exist · b2fce717
      Tang Junhui authored
      [ Upstream commit 7f4fc93d ]
      
      I attach a back-end device to a cache set, and the cache set is not
      registered yet, this back-end device did not attach successfully, and no
      error returned:
      [root]# echo 87859280-fec6-4bcc-20df7ca8f86b > /sys/block/sde/bcache/attach
      [root]#
      
      In sysfs_attach(), the return value "v" is initialized to "size" in
      the beginning, and if no cache set exist in bch_cache_sets, the "v" value
      would not change any more, and return to sysfs, sysfs regard it as success
      since the "size" is a positive number.
      
      This patch fixes this issue by assigning "v" with "-ENOENT" in the
      initialization.
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2fce717
    • Tang Junhui's avatar
      bcache: fix for data collapse after re-attaching an attached device · 0b803fa8
      Tang Junhui authored
      [ Upstream commit 73ac105b ]
      
      back-end device sdm has already attached a cache_set with ID
      f67ebe1f-f8bc-4d73-bfe5-9dc88607f119, then try to attach with
      another cache set, and it returns with an error:
      [root]# cd /sys/block/sdm/bcache
      [root]# echo 5ccd0a63-148e-48b8-afa2-aca9cbd6279f > attach
      -bash: echo: write error: Invalid argument
      
      After that, execute a command to modify the label of bcache
      device:
      [root]# echo data_disk1 > label
      
      Then we reboot the system, when the system power on, the back-end
      device can not attach to cache_set, a messages show in the log:
      Feb  5 12:05:52 ceph152 kernel: [922385.508498] bcache:
      bch_cached_dev_attach() couldn't find uuid for sdm in set
      
      In sysfs_attach(), dc->sb.set_uuid was assigned to the value
      which input through sysfs, no matter whether it is success
      or not in bch_cached_dev_attach(). For example, If the back-end
      device has already attached to an cache set, bch_cached_dev_attach()
      would fail, but dc->sb.set_uuid was changed. Then modify the
      label of bcache device, it will call bch_write_bdev_super(),
      which would write the dc->sb.set_uuid to the super block, so we
      record a wrong cache set ID in the super block, after the system
      reboot, the cache set couldn't find the uuid of the back-end
      device, so the bcache device couldn't exist and use any more.
      
      In this patch, we don't assigned cache set ID to dc->sb.set_uuid
      in sysfs_attach() directly, but input it into bch_cached_dev_attach(),
      and assigned dc->sb.set_uuid to the cache set ID after the back-end
      device attached to the cache set successful.
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b803fa8
    • Tang Junhui's avatar
      bcache: fix for allocator and register thread race · cab95b25
      Tang Junhui authored
      [ Upstream commit 682811b3 ]
      
      After long time running of random small IO writing,
      I reboot the machine, and after the machine power on,
      I found bcache got stuck, the stack is:
      [root@ceph153 ~]# cat /proc/2510/task/*/stack
      [<ffffffffa06b2455>] closure_sync+0x25/0x90 [bcache]
      [<ffffffffa06b6be8>] bch_journal+0x118/0x2b0 [bcache]
      [<ffffffffa06b6dc7>] bch_journal_meta+0x47/0x70 [bcache]
      [<ffffffffa06be8f7>] bch_prio_write+0x237/0x340 [bcache]
      [<ffffffffa06a8018>] bch_allocator_thread+0x3c8/0x3d0 [bcache]
      [<ffffffff810a631f>] kthread+0xcf/0xe0
      [<ffffffff8164c318>] ret_from_fork+0x58/0x90
      [<ffffffffffffffff>] 0xffffffffffffffff
      [root@ceph153 ~]# cat /proc/2038/task/*/stack
      [<ffffffffa06b1abd>] __bch_btree_map_nodes+0x12d/0x150 [bcache]
      [<ffffffffa06b1bd1>] bch_btree_insert+0xf1/0x170 [bcache]
      [<ffffffffa06b637f>] bch_journal_replay+0x13f/0x230 [bcache]
      [<ffffffffa06c75fe>] run_cache_set+0x79a/0x7c2 [bcache]
      [<ffffffffa06c0cf8>] register_bcache+0xd48/0x1310 [bcache]
      [<ffffffff812f702f>] kobj_attr_store+0xf/0x20
      [<ffffffff8125b216>] sysfs_write_file+0xc6/0x140
      [<ffffffff811dfbfd>] vfs_write+0xbd/0x1e0
      [<ffffffff811e069f>] SyS_write+0x7f/0xe0
      [<ffffffff8164c3c9>] system_call_fastpath+0x16/0x1
      The stack shows the register thread and allocator thread
      were getting stuck when registering cache device.
      
      I reboot the machine several times, the issue always
      exsit in this machine.
      
      I debug the code, and found the call trace as bellow:
      register_bcache()
         ==>run_cache_set()
            ==>bch_journal_replay()
               ==>bch_btree_insert()
                  ==>__bch_btree_map_nodes()
                     ==>btree_insert_fn()
                        ==>btree_split() //node need split
                           ==>btree_check_reserve()
      In btree_check_reserve(), It will check if there is enough buckets
      of RESERVE_BTREE type, since allocator thread did not work yet, so
      no buckets of RESERVE_BTREE type allocated, so the register thread
      waits on c->btree_cache_wait, and goes to sleep.
      
      Then the allocator thread initialized, the call trace is bellow:
      bch_allocator_thread()
      ==>bch_prio_write()
         ==>bch_journal_meta()
            ==>bch_journal()
               ==>journal_wait_for_write()
      In journal_wait_for_write(), It will check if journal is full by
      journal_full(), but the long time random small IO writing
      causes the exhaustion of journal buckets(journal.blocks_free=0),
      In order to release the journal buckets,
      the allocator calls btree_flush_write() to flush keys to
      btree nodes, and waits on c->journal.wait until btree nodes writing
      over or there has already some journal buckets space, then the
      allocator thread goes to sleep. but in btree_flush_write(), since
      bch_journal_replay() is not finished, so no btree nodes have journal
      (condition "if (btree_current_write(b)->journal)" never satisfied),
      so we got no btree node to flush, no journal bucket released,
      and allocator sleep all the times.
      
      Through the above analysis, we can see that:
      1) Register thread wait for allocator thread to allocate buckets of
         RESERVE_BTREE type;
      2) Alloctor thread wait for register thread to replay journal, so it
         can flush btree nodes and get journal bucket.
         then they are all got stuck by waiting for each other.
      
      Hua Rui provided a patch for me, by allocating some buckets of
      RESERVE_BTREE type in advance, so the register thread can get bucket
      when btree node splitting and no need to waiting for the allocator
      thread. I tested it, it has effect, and register thread run a step
      forward, but finally are still got stuck, the reason is only 8 bucket
      of RESERVE_BTREE type were allocated, and in bch_journal_replay(),
      after 2 btree nodes splitting, only 4 bucket of RESERVE_BTREE type left,
      then btree_check_reserve() is not satisfied anymore, so it goes to sleep
      again, and in the same time, alloctor thread did not flush enough btree
      nodes to release a journal bucket, so they all got stuck again.
      
      So we need to allocate more buckets of RESERVE_BTREE type in advance,
      but how much is enough?  By experience and test, I think it should be
      as much as journal buckets. Then I modify the code as this patch,
      and test in the machine, and it works.
      
      This patch modified base on Hua Rui’s patch, and allocate more buckets
      of RESERVE_BTREE type in advance to avoid register thread and allocate
      thread going to wait for each other.
      
      [patch v2] ca->sb.njournal_buckets would be 0 in the first time after
      cache creation, and no journal exists, so just 8 btree buckets is OK.
      Signed-off-by: default avatarHua Rui <huarui.dev@gmail.com>
      Signed-off-by: default avatarTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cab95b25
    • Coly Li's avatar
      bcache: properly set task state in bch_writeback_thread() · acd8bb42
      Coly Li authored
      [ Upstream commit 99361bbf ]
      
      Kernel thread routine bch_writeback_thread() has the following code block,
      
      447         down_write(&dc->writeback_lock);
      448~450     if (check conditions) {
      451                 up_write(&dc->writeback_lock);
      452                 set_current_state(TASK_INTERRUPTIBLE);
      453
      454                 if (kthread_should_stop())
      455                         return 0;
      456
      457                 schedule();
      458                 continue;
      459         }
      
      If condition check is true, its task state is set to TASK_INTERRUPTIBLE
      and call schedule() to wait for others to wake up it.
      
      There are 2 issues in current code,
      1, Task state is set to TASK_INTERRUPTIBLE after the condition checks, if
         another process changes the condition and call wake_up_process(dc->
         writeback_thread), then at line 452 task state is set back to
         TASK_INTERRUPTIBLE, the writeback kernel thread will lose a chance to be
         waken up.
      2, At line 454 if kthread_should_stop() is true, writeback kernel thread
         will return to kernel/kthread.c:kthread() with TASK_INTERRUPTIBLE and
         call do_exit(). It is not good to enter do_exit() with task state
         TASK_INTERRUPTIBLE, in following code path might_sleep() is called and a
         warning message is reported by __might_sleep(): "WARNING: do not call
         blocking ops when !TASK_RUNNING; state=1 set at [xxxx]".
      
      For the first issue, task state should be set before condition checks.
      Ineed because dc->writeback_lock is required when modifying all the
      conditions, calling set_current_state() inside code block where dc->
      writeback_lock is hold is safe. But this is quite implicit, so I still move
      set_current_state() before all the condition checks.
      
      For the second issue, frankley speaking it does not hurt when kernel thread
      exits with TASK_INTERRUPTIBLE state, but this warning message scares users,
      makes them feel there might be something risky with bcache and hurt their
      data.  Setting task state to TASK_RUNNING before returning fixes this
      problem.
      
      In alloc.c:allocator_wait(), there is also a similar issue, and is also
      fixed in this patch.
      
      Changelog:
      v3: merge two similar fixes into one patch
      v2: fix the race issue in v1 patch.
      v1: initial buggy fix.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarMichael Lyle <mlyle@lyle.org>
      Cc: Michael Lyle <mlyle@lyle.org>
      Cc: Junhui Tang <tang.junhui@zte.com.cn>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acd8bb42
    • Arnd Bergmann's avatar
      cifs: silence compiler warnings showing up with gcc-8.0.0 · 94803688
      Arnd Bergmann authored
      [ Upstream commit ade7db99 ]
      
      This bug was fixed before, but came up again with the latest
      compiler in another function:
      
      fs/cifs/cifssmb.c: In function 'CIFSSMBSetEA':
      fs/cifs/cifssmb.c:6362:3: error: 'strncpy' offset 8 is out of the bounds [0, 4] [-Werror=array-bounds]
         strncpy(parm_data->list[0].name, ea_name, name_len);
      
      Let's apply the same fix that was used for the other instances.
      
      Fixes: b2a3ad9c ("cifs: silence compiler warnings showing up with gcc-4.7.0")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94803688
    • Alexey Dobriyan's avatar
      proc: fix /proc/*/map_files lookup · bccf7f7c
      Alexey Dobriyan authored
      [ Upstream commit ac7f1061 ]
      
      Current code does:
      
      	if (sscanf(dentry->d_name.name, "%lx-%lx", start, end) != 2)
      
      However sscanf() is broken garbage.
      
      It silently accepts whitespace between format specifiers
      (did you know that?).
      
      It silently accepts valid strings which result in integer overflow.
      
      Do not use sscanf() for any even remotely reliable parsing code.
      
      	OK
      	# readlink '/proc/1/map_files/55a23af39000-55a23b05b000'
      	/lib/systemd/systemd
      
      	broken
      	# readlink '/proc/1/map_files/               55a23af39000-55a23b05b000'
      	/lib/systemd/systemd
      
      	broken
      	# readlink '/proc/1/map_files/55a23af39000-55a23b05b000    '
      	/lib/systemd/systemd
      
      	very broken
      	# readlink '/proc/1/map_files/1000000000000000055a23af39000-55a23b05b000'
      	/lib/systemd/systemd
      
      Andrei said:
      
      : This patch breaks criu.  It was a bug in criu.  And this bug is on a minor
      : path, which works when memfd_create() isn't available.  It is a reason why
      : I ask to not backport this patch to stable kernels.
      :
      : In CRIU this bug can be triggered, only if this patch will be backported
      : to a kernel which version is lower than v3.16.
      
      Link: http://lkml.kernel.org/r/20171120212706.GA14325@avx2Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Andrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bccf7f7c
    • Will Deacon's avatar
      arm64: spinlock: Fix theoretical trylock() A-B-A with LSE atomics · 07b1d60d
      Will Deacon authored
      [ Upstream commit 202fb4ef ]
      
      If the spinlock "next" ticket wraps around between the initial LDR
      and the cmpxchg in the LSE version of spin_trylock, then we can erroneously
      think that we have successfuly acquired the lock because we only check
      whether the next ticket return by the cmpxchg is equal to the owner ticket
      in our updated lock word.
      
      This patch fixes the issue by performing a full 32-bit check of the lock
      word when trying to determine whether or not the CASA instruction updated
      memory.
      Reported-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07b1d60d
    • Guanglei Li's avatar
      RDS: IB: Fix null pointer issue · c9eb3347
      Guanglei Li authored
      [ Upstream commit 2c0aa086 ]
      
      Scenario:
      1. Port down and do fail over
      2. Ap do rds_bind syscall
      
      PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
       #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
       #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
       #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
       #3 [ffff898e35f15b60] no_context at ffffffff8104854c
       #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
       #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
       #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
       #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
          [exception RIP: unknown or invalid address]
          RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
          RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
          RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
          RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
          R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
          R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
          ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
       #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
       #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
       #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
       #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6
      
      PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
       #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
       #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
       #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
       #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
       #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
       #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
       #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
       #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
       #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670
      
      PID: 45659                          PID: 47039
      rds_ib_laddr_check
        /* create id_priv with a null event_handler */
        rdma_create_id
        rdma_bind_addr
          cma_acquire_dev
            /* add id_priv to cma_dev->id_list */
            cma_attach_to_dev
                                          cma_ndev_work_handler
                                            /* event_hanlder is null */
                                            id_priv->id.event_handler
      Signed-off-by: default avatarGuanglei Li <guanglei.li@oracle.com>
      Signed-off-by: default avatarHonglei Wang <honglei.wang@oracle.com>
      Reviewed-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarYanjun Zhu <yanjun.zhu@oracle.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Acked-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9eb3347
    • Ross Lagerwall's avatar
      xen/grant-table: Use put_page instead of free_page · 95f6218d
      Ross Lagerwall authored
      [ Upstream commit 3ac7292a ]
      
      The page given to gnttab_end_foreign_access() to free could be a
      compound page so use put_page() instead of free_page() since it can
      handle both compound and single pages correctly.
      
      This bug was discovered when migrating a Xen VM with several VIFs and
      CONFIG_DEBUG_VM enabled. It hits a BUG usually after fewer than 10
      iterations. All netfront devices disconnect from the backend during a
      suspend/resume and this will call gnttab_end_foreign_access() if a
      netfront queue has an outstanding skb. The mismatch between calling
      get_page() and free_page() on a compound page causes a reference
      counting error which is detected when DEBUG_VM is enabled.
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95f6218d
    • Ross Lagerwall's avatar
      xen-netfront: Fix race between device setup and open · 6be4fe83
      Ross Lagerwall authored
      [ Upstream commit f599c64f ]
      
      When a netfront device is set up it registers a netdev fairly early on,
      before it has set up the queues and is actually usable. A userspace tool
      like NetworkManager will immediately try to open it and access its state
      as soon as it appears. The bug can be reproduced by hotplugging VIFs
      until the VM runs out of grant refs. It registers the netdev but fails
      to set up any queues (since there are no more grant refs). In the
      meantime, NetworkManager opens the device and the kernel crashes trying
      to access the queues (of which there are none).
      
      Fix this in two ways:
      * For initial setup, register the netdev much later, after the queues
      are setup. This avoids the race entirely.
      * During a suspend/resume cycle, the frontend reconnects to the backend
      and the queues are recreated. It is possible (though highly unlikely) to
      race with something opening the device and accessing the queues after
      they have been destroyed but before they have been recreated. Extend the
      region covered by the rtnl semaphore to protect against this race. There
      is a possibility that we fail to recreate the queues so check for this
      in the open function.
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6be4fe83
    • Matt Redfearn's avatar
      MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS · 492917f3
      Matt Redfearn authored
      [ Upstream commit 0cde5b44 ]
      
      When commit b27311e1 ("MIPS: TXx9: Add RBTX4939 board support")
      added board support for the RBTX4939, it added a call to
      led_classdev_register even if the LED class is built as a module.
      Built-in arch code cannot call module code directly like this. Commit
      b33b4407 ("MIPS: TXX9: use IS_ENABLED() macro") subsequently
      changed the inclusion of this code to a single check that
      CONFIG_LEDS_CLASS is either builtin or a module, but the same issue
      remains.
      
      This leads to MIPS allmodconfig builds failing when CONFIG_MACH_TX49XX=y
      is set:
      
      arch/mips/txx9/rbtx4939/setup.o: In function `rbtx4939_led_probe':
      setup.c:(.init.text+0xc0): undefined reference to `of_led_classdev_register'
      make: *** [Makefile:999: vmlinux] Error 1
      
      Fix this by using the IS_BUILTIN() macro instead.
      
      Fixes: b27311e1 ("MIPS: TXx9: Add RBTX4939 board support")
      Signed-off-by: default avatarMatt Redfearn <matt.redfearn@mips.com>
      Reviewed-by: default avatarJames Hogan <jhogan@kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/18544/Signed-off-by: default avatarJames Hogan <jhogan@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      492917f3
    • Yonghong Song's avatar
      bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y · e7260c8f
      Yonghong Song authored
      [ Upstream commit 09584b40 ]
      
      With CONFIG_BPF_JIT_ALWAYS_ON is defined in the config file,
      tools/testing/selftests/bpf/test_kmod.sh failed like below:
        [root@localhost bpf]# ./test_kmod.sh
        sysctl: setting key "net.core.bpf_jit_enable": Invalid argument
        [ JIT enabled:0 hardened:0 ]
        [  132.175681] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  132.458834] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [ JIT enabled:1 hardened:0 ]
        [  133.456025] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  133.730935] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [ JIT enabled:1 hardened:1 ]
        [  134.769730] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  135.050864] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [ JIT enabled:1 hardened:2 ]
        [  136.442882] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
        [  136.821810] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
        [root@localhost bpf]#
      
      The test_kmod.sh load/remove test_bpf.ko multiple times with different
      settings for sysctl net.core.bpf_jit_{enable,harden}. The failed test #297
      of test_bpf.ko is designed such that JIT always fails.
      
      Commit 290af866 (bpf: introduce BPF_JIT_ALWAYS_ON config)
      introduced the following tightening logic:
          ...
              if (!bpf_prog_is_dev_bound(fp->aux)) {
                      fp = bpf_int_jit_compile(fp);
          #ifdef CONFIG_BPF_JIT_ALWAYS_ON
                      if (!fp->jited) {
                              *err = -ENOTSUPP;
                              return fp;
                      }
          #endif
          ...
      With this logic, Test #297 always gets return value -ENOTSUPP
      when CONFIG_BPF_JIT_ALWAYS_ON is defined, causing the test failure.
      
      This patch fixed the failure by marking Test #297 as expected failure
      when CONFIG_BPF_JIT_ALWAYS_ON is defined.
      
      Fixes: 290af866 (bpf: introduce BPF_JIT_ALWAYS_ON config)
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7260c8f
    • Chen Yu's avatar
      ACPI: processor_perflib: Do not send _PPC change notification if not ready · 0194705a
      Chen Yu authored
      [ Upstream commit ba1edb9a ]
      
      The following warning was triggered after resumed from S3 -
      if all the nonboot CPUs were put offline before suspend:
      
      [ 1840.329515] unchecked MSR access error: RDMSR from 0x771 at rIP: 0xffffffff86061e3a (native_read_msr+0xa/0x30)
      [ 1840.329516] Call Trace:
      [ 1840.329521]  __rdmsr_on_cpu+0x33/0x50
      [ 1840.329525]  generic_exec_single+0x81/0xb0
      [ 1840.329527]  smp_call_function_single+0xd2/0x100
      [ 1840.329530]  ? acpi_ds_result_pop+0xdd/0xf2
      [ 1840.329532]  ? acpi_ds_create_operand+0x215/0x23c
      [ 1840.329534]  rdmsrl_on_cpu+0x57/0x80
      [ 1840.329536]  ? cpumask_next+0x1b/0x20
      [ 1840.329538]  ? rdmsrl_on_cpu+0x57/0x80
      [ 1840.329541]  intel_pstate_update_perf_limits+0xf3/0x220
      [ 1840.329544]  ? notifier_call_chain+0x4a/0x70
      [ 1840.329546]  intel_pstate_set_policy+0x4e/0x150
      [ 1840.329548]  cpufreq_set_policy+0xcd/0x2f0
      [ 1840.329550]  cpufreq_update_policy+0xb2/0x130
      [ 1840.329552]  ? cpufreq_update_policy+0x130/0x130
      [ 1840.329556]  acpi_processor_ppc_has_changed+0x65/0x80
      [ 1840.329558]  acpi_processor_notify+0x80/0x100
      [ 1840.329561]  acpi_ev_notify_dispatch+0x44/0x5c
      [ 1840.329563]  acpi_os_execute_deferred+0x14/0x20
      [ 1840.329565]  process_one_work+0x193/0x3c0
      [ 1840.329567]  worker_thread+0x35/0x3b0
      [ 1840.329569]  kthread+0x125/0x140
      [ 1840.329571]  ? process_one_work+0x3c0/0x3c0
      [ 1840.329572]  ? kthread_park+0x60/0x60
      [ 1840.329575]  ? do_syscall_64+0x67/0x180
      [ 1840.329577]  ret_from_fork+0x25/0x30
      [ 1840.329585] unchecked MSR access error: WRMSR to 0x774 (tried to write 0x0000000000000000) at rIP: 0xffffffff86061f78 (native_write_msr+0x8/0x30)
      [ 1840.329586] Call Trace:
      [ 1840.329587]  __wrmsr_on_cpu+0x37/0x40
      [ 1840.329589]  generic_exec_single+0x81/0xb0
      [ 1840.329592]  smp_call_function_single+0xd2/0x100
      [ 1840.329594]  ? acpi_ds_create_operand+0x215/0x23c
      [ 1840.329595]  ? cpumask_next+0x1b/0x20
      [ 1840.329597]  wrmsrl_on_cpu+0x57/0x70
      [ 1840.329598]  ? rdmsrl_on_cpu+0x57/0x80
      [ 1840.329599]  ? wrmsrl_on_cpu+0x57/0x70
      [ 1840.329602]  intel_pstate_hwp_set+0xd3/0x150
      [ 1840.329604]  intel_pstate_set_policy+0x119/0x150
      [ 1840.329606]  cpufreq_set_policy+0xcd/0x2f0
      [ 1840.329607]  cpufreq_update_policy+0xb2/0x130
      [ 1840.329610]  ? cpufreq_update_policy+0x130/0x130
      [ 1840.329613]  acpi_processor_ppc_has_changed+0x65/0x80
      [ 1840.329615]  acpi_processor_notify+0x80/0x100
      [ 1840.329617]  acpi_ev_notify_dispatch+0x44/0x5c
      [ 1840.329619]  acpi_os_execute_deferred+0x14/0x20
      [ 1840.329620]  process_one_work+0x193/0x3c0
      [ 1840.329622]  worker_thread+0x35/0x3b0
      [ 1840.329624]  kthread+0x125/0x140
      [ 1840.329625]  ? process_one_work+0x3c0/0x3c0
      [ 1840.329626]  ? kthread_park+0x60/0x60
      [ 1840.329628]  ? do_syscall_64+0x67/0x180
      [ 1840.329631]  ret_from_fork+0x25/0x30
      
      This is because if there's only one online CPU, the MSR_PM_ENABLE
      (package wide)can not be enabled after resumed, due to
      intel_pstate_hwp_enable() will only be invoked on AP's online
      process after resumed - if there's no AP online, the HWP remains
      disabled after resumed (BIOS has disabled it in S3). Then if
      there comes a _PPC change notification which touches HWP register
      during this stage, the warning is triggered.
      
      Since we don't call acpi_processor_register_performance() when
      HWP is enabled, the pr->performance will be NULL. When this is
      NULL we don't need to do _PPC change notification.
      Reported-by: default avatarDoug Smythies <dsmythies@telus.net>
      Suggested-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: default avatarYu Chen <yu.c.chen@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0194705a
    • Jean Delvare's avatar
      firmware: dmi_scan: Fix handling of empty DMI strings · 50fb55b3
      Jean Delvare authored
      [ Upstream commit a7770ae1 ]
      
      The handling of empty DMI strings looks quite broken to me:
      * Strings from 1 to 7 spaces are not considered empty.
      * True empty DMI strings (string index set to 0) are not considered
        empty, and result in allocating a 0-char string.
      * Strings with invalid index also result in allocating a 0-char
        string.
      * Strings starting with 8 spaces are all considered empty, even if
        non-space characters follow (sounds like a weird thing to do, but
        I have actually seen occurrences of this in DMI tables before.)
      * Strings which are considered empty are reported as 8 spaces,
        instead of being actually empty.
      
      Some of these issues are the result of an off-by-one error in memcmp,
      the rest is incorrect by design.
      
      So let's get it square: missing strings and strings made of only
      spaces, regardless of their length, should be treated as empty and
      no memory should be allocated for them. All other strings are
      non-empty and should be allocated.
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Fixes: 79da4721 ("x86: fix DMI out of memory problems")
      Cc: Parag Warudkar <parag.warudkar@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50fb55b3
    • Arnd Bergmann's avatar
      x86/power: Fix swsusp_arch_resume prototype · 589d97ba
      Arnd Bergmann authored
      [ Upstream commit 328008a7 ]
      
      The declaration for swsusp_arch_resume marks it as 'asmlinkage', but the
      definition in x86-32 does not, and it fails to include the header with the
      declaration. This leads to a warning when building with
      link-time-optimizations:
      
      kernel/power/power.h:108:23: error: type of 'swsusp_arch_resume' does not match original declaration [-Werror=lto-type-mismatch]
       extern asmlinkage int swsusp_arch_resume(void);
                             ^
      arch/x86/power/hibernate_32.c:148:0: note: 'swsusp_arch_resume' was previously declared here
       int swsusp_arch_resume(void)
      
      This moves the declaration into a globally visible header file and fixes up
      both x86 definitions to match it.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: linux-pm@vger.kernel.org
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Link: https://lkml.kernel.org/r/20180202145634.200291-2-arnd@arndb.deSigned-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      589d97ba
    • Alex Estrin's avatar
      IB/ipoib: Fix for potential no-carrier state · cabf4efd
      Alex Estrin authored
      [ Upstream commit 10293610 ]
      
      On reboot SM can program port pkey table before ipoib registered its
      event handler, which could result in missing pkey event and leave root
      interface with initial pkey value from index 0.
      
      Since OPA port starts with invalid pkey in index 0, root interface will
      fail to initialize and stay down with no-carrier flag.
      
      For IB ipoib interface may end up with pkey different from value
      opensm put in pkey table idx 0, resulting in connectivity issues
      (different mcast groups, for example).
      
      Close the window by calling event handler after registration
      to make sure ipoib pkey is in sync with port pkey table.
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Signed-off-by: default avatarAlex Estrin <alex.estrin@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cabf4efd
    • Mel Gorman's avatar
      mm: pin address_space before dereferencing it while isolating an LRU page · 7a9e41c7
      Mel Gorman authored
      [ Upstream commit 69d763fc ]
      
      Minchan Kim asked the following question -- what locks protects
      address_space destroying when race happens between inode trauncation and
      __isolate_lru_page? Jan Kara clarified by describing the race as follows
      
      CPU1                                            CPU2
      
      truncate(inode)                                 __isolate_lru_page()
        ...
        truncate_inode_page(mapping, page);
          delete_from_page_cache(page)
            spin_lock_irqsave(&mapping->tree_lock, flags);
              __delete_from_page_cache(page, NULL)
                page_cache_tree_delete(..)
                  ...                                   mapping = page_mapping(page);
                  page->mapping = NULL;
                  ...
            spin_unlock_irqrestore(&mapping->tree_lock, flags);
            page_cache_free_page(mapping, page)
              put_page(page)
                if (put_page_testzero(page)) -> false
      - inode now has no pages and can be freed including embedded address_space
      
                                                        if (mapping && !mapping->a_ops->migratepage)
      - we've dereferenced mapping which is potentially already free.
      
      The race is theoretically possible but unlikely.  Before the
      delete_from_page_cache, truncate_cleanup_page is called so the page is
      likely to be !PageDirty or PageWriteback which gets skipped by the only
      caller that checks the mappping in __isolate_lru_page.  Even if the race
      occurs, a substantial amount of work has to happen during a tiny window
      with no preemption but it could potentially be done using a virtual
      machine to artifically slow one CPU or halt it during the critical
      window.
      
      This patch should eliminate the race with truncation by try-locking the
      page before derefencing mapping and aborting if the lock was not
      acquired.  There was a suggestion from Huang Ying to use RCU as a
      side-effect to prevent mapping being freed.  However, I do not like the
      solution as it's an unconventional means of preserving a mapping and
      it's not a context where rcu_read_lock is obviously protecting rcu data.
      
      Link: http://lkml.kernel.org/r/20180104102512.2qos3h5vqzeisrek@techsingularity.net
      Fixes: c8244935 ("mm: compaction: make isolate_lru_page() filter-aware again")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a9e41c7
    • Kirill A. Shutemov's avatar
      asm-generic: provide generic_pmdp_establish() · 8645c430
      Kirill A. Shutemov authored
      [ Upstream commit c58f0bb7 ]
      
      Patch series "Do not lose dirty bit on THP pages", v4.
      
      Vlastimil noted that pmdp_invalidate() is not atomic and we can lose
      dirty and access bits if CPU sets them after pmdp dereference, but
      before set_pmd_at().
      
      The bug can lead to data loss, but the race window is tiny and I haven't
      seen any reports that suggested that it happens in reality.  So I don't
      think it worth sending it to stable.
      
      Unfortunately, there's no way to address the issue in a generic way.  We
      need to fix all architectures that support THP one-by-one.
      
      All architectures that have THP supported have to provide atomic
      pmdp_invalidate() that returns previous value.
      
      If generic implementation of pmdp_invalidate() is used, architecture
      needs to provide atomic pmdp_estabish().
      
      pmdp_estabish() is not used out-side generic implementation of
      pmdp_invalidate() so far, but I think this can change in the future.
      
      This patch (of 12):
      
      This is an implementation of pmdp_establish() that is only suitable for
      an architecture that doesn't have hardware dirty/accessed bits.  In this
      case we can't race with CPU which sets these bits and non-atomic
      approach is fine.
      
      Link: http://lkml.kernel.org/r/20171213105756.69879-2-kirill.shutemov@linux.intel.comSigned-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Nitin Gupta <nitin.m.gupta@oracle.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8645c430
    • Yisheng Xie's avatar
      mm/mempolicy: add nodes_empty check in SYSC_migrate_pages · f8d71050
      Yisheng Xie authored
      [ Upstream commit 0486a38b ]
      
      As in manpage of migrate_pages, the errno should be set to EINVAL when
      none of the node IDs specified by new_nodes are on-line and allowed by
      the process's current cpuset context, or none of the specified nodes
      contain memory.  However, when test by following case:
      
      	new_nodes = 0;
      	old_nodes = 0xf;
      	ret = migrate_pages(pid, old_nodes, new_nodes, MAX);
      
      The ret will be 0 and no errno is set.  As the new_nodes is empty, we
      should expect EINVAL as documented.
      
      To fix the case like above, this patch check whether target nodes AND
      current task_nodes is empty, and then check whether AND
      node_states[N_MEMORY] is empty.
      
      Link: http://lkml.kernel.org/r/1510882624-44342-4-git-send-email-xieyisheng1@huawei.comSigned-off-by: default avatarYisheng Xie <xieyisheng1@huawei.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Chris Salls <salls@cs.ucsb.edu>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Tan Xiaojun <tanxiaojun@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8d71050
    • Yisheng Xie's avatar
      mm/mempolicy: fix the check of nodemask from user · e6e3a480
      Yisheng Xie authored
      [ Upstream commit 56521e7a ]
      
      As Xiaojun reported the ltp of migrate_pages01 will fail on arm64 system
      which has 4 nodes[0...3], all have memory and CONFIG_NODES_SHIFT=2:
      
        migrate_pages01    0  TINFO  :  test_invalid_nodes
        migrate_pages01   14  TFAIL  :  migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1
        migrate_pages01   15  TFAIL  :  migrate_pages_common.c:55: call succeeded unexpectedly
      
      In this case the test_invalid_nodes of migrate_pages01 will call:
      SYSC_migrate_pages as:
      
        migrate_pages(0, , {0x0000000000000001}, 64, , {0x0000000000000010}, 64) = 0
      
      The new nodes specifies one or more node IDs that are greater than the
      maximum supported node ID, however, the errno is not set to EINVAL as
      expected.
      
      As man pages of set_mempolicy[1], mbind[2], and migrate_pages[3]
      mentioned, when nodemask specifies one or more node IDs that are greater
      than the maximum supported node ID, the errno should set to EINVAL.
      However, get_nodes only check whether the part of bits
      [BITS_PER_LONG*BITS_TO_LONGS(MAX_NUMNODES), maxnode) is zero or not, and
      remain [MAX_NUMNODES, BITS_PER_LONG*BITS_TO_LONGS(MAX_NUMNODES)
      unchecked.
      
      This patch is to check the bits of [MAX_NUMNODES, maxnode) in get_nodes
      to let migrate_pages set the errno to EINVAL when nodemask specifies one
      or more node IDs that are greater than the maximum supported node ID,
      which follows the manpage's guide.
      
      [1] http://man7.org/linux/man-pages/man2/set_mempolicy.2.html
      [2] http://man7.org/linux/man-pages/man2/mbind.2.html
      [3] http://man7.org/linux/man-pages/man2/migrate_pages.2.html
      
      Link: http://lkml.kernel.org/r/1510882624-44342-3-git-send-email-xieyisheng1@huawei.comSigned-off-by: default avatarYisheng Xie <xieyisheng1@huawei.com>
      Reported-by: default avatarTan Xiaojun <tanxiaojun@huawei.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Chris Salls <salls@cs.ucsb.edu>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6e3a480
    • piaojun's avatar
      ocfs2: return error when we attempt to access a dirty bh in jbd2 · d278dc22
      piaojun authored
      [ Upstream commit d984187e ]
      
      We should not reuse the dirty bh in jbd2 directly due to the following
      situation:
      
      1. When removing extent rec, we will dirty the bhs of extent rec and
         truncate log at the same time, and hand them over to jbd2.
      
      2. The bhs are submitted to jbd2 area successfully.
      
      3. The write-back thread of device help flush the bhs to disk but
         encounter write error due to abnormal storage link.
      
      4. After a while the storage link become normal. Truncate log flush
         worker triggered by the next space reclaiming found the dirty bh of
         truncate log and clear its 'BH_Write_EIO' and then set it uptodate in
         __ocfs2_journal_access():
      
         ocfs2_truncate_log_worker
           ocfs2_flush_truncate_log
             __ocfs2_flush_truncate_log
               ocfs2_replay_truncate_records
                 ocfs2_journal_access_di
                   __ocfs2_journal_access // here we clear io_error and set 'tl_bh' uptodata.
      
      5. Then jbd2 will flush the bh of truncate log to disk, but the bh of
         extent rec is still in error state, and unfortunately nobody will
         take care of it.
      
      6. At last the space of extent rec was not reduced, but truncate log
         flush worker have given it back to globalalloc. That will cause
         duplicate cluster problem which could be identified by fsck.ocfs2.
      
      Sadly we can hardly revert this but set fs read-only in case of ruining
      atomicity and consistency of space reclaim.
      
      Link: http://lkml.kernel.org/r/5A6E8092.8090701@huawei.com
      Fixes: acf8fdbe ("ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access")
      Signed-off-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Reviewed-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d278dc22
    • piaojun's avatar
      ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute · b35bb8a4
      piaojun authored
      [ Upstream commit 16c8d569 ]
      
      The race between *set_acl and *get_acl will cause getting incomplete
      xattr data as below:
      
        processA                                    processB
      
        ocfs2_set_acl
          ocfs2_xattr_set
            __ocfs2_xattr_set_handle
      
                                                    ocfs2_get_acl_nolock
                                                      ocfs2_xattr_get_nolock:
      
      processB may get incomplete xattr data if processA hasn't set_acl done.
      
      So we should use 'ip_xattr_sem' to protect getting extended attribute in
      ocfs2_get_acl_nolock(), as other processes could be changing it
      concurrently.
      
      Link: http://lkml.kernel.org/r/5A5DDCFF.7030001@huawei.comSigned-off-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarAlex Chen <alex.chen@huawei.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b35bb8a4
    • piaojun's avatar
      ocfs2: return -EROFS to mount.ocfs2 if inode block is invalid · d55cd574
      piaojun authored
      [ Upstream commit 025bcbde ]
      
      If metadata is corrupted such as 'invalid inode block', we will get
      failed by calling 'mount()' and then set filesystem readonly as below:
      
        ocfs2_mount
          ocfs2_initialize_super
            ocfs2_init_global_system_inodes
              ocfs2_iget
                ocfs2_read_locked_inode
                  ocfs2_validate_inode_block
      	      ocfs2_error
      	        ocfs2_handle_error
      	          ocfs2_set_ro_flag(osb, 0);  // set readonly
      
      In this situation we need return -EROFS to 'mount.ocfs2', so that user
      can fix it by fsck.  And then mount again.  In addition, 'mount.ocfs2'
      should be updated correspondingly as it only return 1 for all errno.
      And I will post a patch for 'mount.ocfs2' too.
      
      Link: http://lkml.kernel.org/r/5A4302FA.2010606@huawei.comSigned-off-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarAlex Chen <alex.chen@huawei.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Reviewed-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Reviewed-by: default avatarGang He <ghe@suse.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d55cd574
    • Logan Gunthorpe's avatar
      ntb_transport: Fix bug with max_mw_size parameter · ddda6e24
      Logan Gunthorpe authored
      [ Upstream commit cbd27448 ]
      
      When using the max_mw_size parameter of ntb_transport to limit the size of
      the Memory windows, communication cannot be established and the queues
      freeze.
      
      This is because the mw_size that's reported to the peer is correctly
      limited but the size used locally is not. So the MW is initialized
      with a buffer smaller than the window but the TX side is using the
      full window. This means the TX side will be writing to a region of the
      window that points nowhere.
      
      This is easily fixed by applying the same limit to tx_size in
      ntb_transport_init_queue().
      
      Fixes: e26a5843 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
      Signed-off-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Acked-by: default avatarAllen Hubbe <Allen.Hubbe@dell.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddda6e24
    • Leon Romanovsky's avatar
      RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure · 3ad85b3b
      Leon Romanovsky authored
      [ Upstream commit b081808a ]
      
      Failure in XRCD FW deallocation command leaves memory leaked and
      returns error to the user which he can't do anything about it.
      
      This patch changes behavior to always free memory and always return
      success to the user.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Reviewed-by: default avatarMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ad85b3b
    • Michael Bringmann's avatar
      powerpc/numa: Ensure nodes initialized for hotplug · b4e84e5a
      Michael Bringmann authored
      [ Upstream commit ea05ba7c ]
      
      This patch fixes some problems encountered at runtime with
      configurations that support memory-less nodes, or that hot-add CPUs
      into nodes that are memoryless during system execution after boot. The
      problems of interest include:
      
      * Nodes known to powerpc to be memoryless at boot, but to have CPUs in
        them are allowed to be 'possible' and 'online'. Memory allocations
        for those nodes are taken from another node that does have memory
        until and if memory is hot-added to the node.
      
      * Nodes which have no resources assigned at boot, but which may still
        be referenced subsequently by affinity or associativity attributes,
        are kept in the list of 'possible' nodes for powerpc. Hot-add of
        memory or CPUs to the system can reference these nodes and bring
        them online instead of redirecting the references to one of the set
        of nodes known to have memory at boot.
      
      Note that this software operates under the context of CPU hotplug. We
      are not doing memory hotplug in this code, but rather updating the
      kernel's CPU topology (i.e. arch_update_cpu_topology /
      numa_update_cpu_topology). We are initializing a node that may be used
      by CPUs or memory before it can be referenced as invalid by a CPU
      hotplug operation. CPU hotplug operations are protected by a range of
      APIs including cpu_maps_update_begin/cpu_maps_update_done,
      cpus_read/write_lock / cpus_read/write_unlock, device locks, and more.
      Memory hotplug operations, including try_online_node, are protected by
      mem_hotplug_begin/mem_hotplug_done, device locks, and more. In the
      case of CPUs being hot-added to a previously memoryless node, the
      try_online_node operation occurs wholly within the CPU locks with no
      overlap. Using HMC hot-add/hot-remove operations, we have been able to
      add and remove CPUs to any possible node without failures. HMC
      operations involve a degree self-serialization, though.
      Signed-off-by: default avatarMichael Bringmann <mwb@linux.vnet.ibm.com>
      Reviewed-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4e84e5a
    • Michael Bringmann's avatar
      powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes · bf5b1813
      Michael Bringmann authored
      [ Upstream commit a346137e ]
      
      On powerpc systems which allow 'hot-add' of CPU or memory resources,
      it may occur that the new resources are to be inserted into nodes that
      were not used for these resources at bootup. In the kernel, any node
      that is used must be defined and initialized. These empty nodes may
      occur when,
      
      * Dedicated vs. shared resources. Shared resources require information
        such as the VPHN hcall for CPU assignment to nodes. Associativity
        decisions made based on dedicated resource rules, such as
        associativity properties in the device tree, may vary from decisions
        made using the values returned by the VPHN hcall.
      
      * memoryless nodes at boot. Nodes need to be defined as 'possible' at
        boot for operation with other code modules. Previously, the powerpc
        code would limit the set of possible nodes to those which have
        memory assigned at boot, and were thus online. Subsequent add/remove
        of CPUs or memory would only work with this subset of possible
        nodes.
      
      * memoryless nodes with CPUs at boot. Due to the previous restriction
        on nodes, nodes that had CPUs but no memory were being collapsed
        into other nodes that did have memory at boot. In practice this
        meant that the node assignment presented by the runtime kernel
        differed from the affinity and associativity attributes presented by
        the device tree or VPHN hcalls. Nodes that might be known to the
        pHyp were not 'possible' in the runtime kernel because they did not
        have memory at boot.
      
      This patch ensures that sufficient nodes are defined to support
      configuration requirements after boot, as well as at boot. This patch
      set fixes a couple of problems.
      
      * Nodes known to powerpc to be memoryless at boot, but to have CPUs in
        them are allowed to be 'possible' and 'online'. Memory allocations
        for those nodes are taken from another node that does have memory
        until and if memory is hot-added to the node. * Nodes which have no
        resources assigned at boot, but which may still be referenced
        subsequently by affinity or associativity attributes, are kept in
        the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs
        to the system can reference these nodes and bring them online
        instead of redirecting to one of the set of nodes that were known to
        have memory at boot.
      
      This patch extracts the value of the lowest domain level (number of
      allocable resources) from the device tree property
      "ibm,max-associativity-domains" to use as the maximum number of nodes
      to setup as possibly available in the system. This new setting will
      override the instruction:
      
          nodes_and(node_possible_map, node_possible_map, node_online_map);
      
      presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
      
      If the "ibm,max-associativity-domains" property is not present at
      boot, no operation will be performed to define or enable additional
      nodes, or enable the above 'nodes_and()'.
      Signed-off-by: default avatarMichael Bringmann <mwb@linux.vnet.ibm.com>
      Reviewed-by: default avatarNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf5b1813
    • Jake Daryll Obina's avatar
      jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path · b222aa1c
      Jake Daryll Obina authored
      [ Upstream commit 5bdd0c6f ]
      
      If jffs2_iget() fails for a newly-allocated inode, jffs2_do_clear_inode()
      can get called twice in the error handling path, the first call in
      jffs2_iget() itself and the second through iget_failed(). This can result
      to a use-after-free error in the second jffs2_do_clear_inode() call, such
      as shown by the oops below wherein the second jffs2_do_clear_inode() call
      was trying to free node fragments that were already freed in the first
      jffs2_do_clear_inode() call.
      
      [   78.178860] jffs2: error: (1904) jffs2_do_read_inode_internal: CRC failed for read_inode of inode 24 at physical location 0x1fc00c
      [   78.178914] Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b7b
      [   78.185871] pgd = ffffffc03a567000
      [   78.188794] [6b6b6b6b6b6b6b7b] *pgd=0000000000000000, *pud=0000000000000000
      [   78.194968] Internal error: Oops: 96000004 [#1] PREEMPT SMP
      ...
      [   78.513147] PC is at rb_first_postorder+0xc/0x28
      [   78.516503] LR is at jffs2_kill_fragtree+0x28/0x90 [jffs2]
      [   78.520672] pc : [<ffffff8008323d28>] lr : [<ffffff8000eb1cc8>] pstate: 60000105
      [   78.526757] sp : ffffff800cea38f0
      [   78.528753] x29: ffffff800cea38f0 x28: ffffffc01f3f8e80
      [   78.532754] x27: 0000000000000000 x26: ffffff800cea3c70
      [   78.536756] x25: 00000000dc67c8ae x24: ffffffc033d6945d
      [   78.540759] x23: ffffffc036811740 x22: ffffff800891a5b8
      [   78.544760] x21: 0000000000000000 x20: 0000000000000000
      [   78.548762] x19: ffffffc037d48910 x18: ffffff800891a588
      [   78.552764] x17: 0000000000000800 x16: 0000000000000c00
      [   78.556766] x15: 0000000000000010 x14: 6f2065646f6e695f
      [   78.560767] x13: 6461657220726f66 x12: 2064656c69616620
      [   78.564769] x11: 435243203a6c616e x10: 7265746e695f6564
      [   78.568771] x9 : 6f6e695f64616572 x8 : ffffffc037974038
      [   78.572774] x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008
      [   78.576775] x5 : 002f91d85bd44a2f x4 : 0000000000000000
      [   78.580777] x3 : 0000000000000000 x2 : 000000403755e000
      [   78.584779] x1 : 6b6b6b6b6b6b6b6b x0 : 6b6b6b6b6b6b6b6b
      ...
      [   79.038551] [<ffffff8008323d28>] rb_first_postorder+0xc/0x28
      [   79.042962] [<ffffff8000eb5578>] jffs2_do_clear_inode+0x88/0x100 [jffs2]
      [   79.048395] [<ffffff8000eb9ddc>] jffs2_evict_inode+0x3c/0x48 [jffs2]
      [   79.053443] [<ffffff8008201ca8>] evict+0xb0/0x168
      [   79.056835] [<ffffff8008202650>] iput+0x1c0/0x200
      [   79.060228] [<ffffff800820408c>] iget_failed+0x30/0x3c
      [   79.064097] [<ffffff8000eba0c0>] jffs2_iget+0x2d8/0x360 [jffs2]
      [   79.068740] [<ffffff8000eb0a60>] jffs2_lookup+0xe8/0x130 [jffs2]
      [   79.073434] [<ffffff80081f1a28>] lookup_slow+0x118/0x190
      [   79.077435] [<ffffff80081f4708>] walk_component+0xfc/0x28c
      [   79.081610] [<ffffff80081f4dd0>] path_lookupat+0x84/0x108
      [   79.085699] [<ffffff80081f5578>] filename_lookup+0x88/0x100
      [   79.089960] [<ffffff80081f572c>] user_path_at_empty+0x58/0x6c
      [   79.094396] [<ffffff80081ebe14>] vfs_statx+0xa4/0x114
      [   79.098138] [<ffffff80081ec44c>] SyS_newfstatat+0x58/0x98
      [   79.102227] [<ffffff800808354c>] __sys_trace_return+0x0/0x4
      [   79.106489] Code: d65f03c0 f9400001 b40000e1 aa0103e0 (f9400821)
      
      The jffs2_do_clear_inode() call in jffs2_iget() is unnecessary since
      iget_failed() will eventually call jffs2_do_clear_inode() if needed, so
      just remove it.
      
      Fixes: 5451f79f ("iget: stop JFFS2 from using iget() and read_inode()")
      Reviewed-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarJake Daryll Obina <jake.obina@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b222aa1c
    • Dan Carpenter's avatar
      HID: roccat: prevent an out of bounds read in kovaplus_profile_activated() · ba6ff62a
      Dan Carpenter authored
      [ Upstream commit 7ad81482 ]
      
      We get the "new_profile_index" value from the mouse device when we're
      handling raw events.  Smatch taints it as untrusted data and complains
      that we need a bounds check.  This seems like a reasonable warning
      otherwise there is a small read beyond the end of the array.
      
      Fixes: 0e70f97f ("HID: roccat: Add support for Kova[+] mouse")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarSilvan Jegen <s.jegen@gmail.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba6ff62a
    • Arnd Bergmann's avatar
      scsi: fas216: fix sense buffer initialization · c92e6b9b
      Arnd Bergmann authored
      [ Upstream commit 96d5eaa9 ]
      
      While testing with the ARM specific memset() macro removed, I ran into a
      compiler warning that shows an old bug:
      
      drivers/scsi/arm/fas216.c: In function 'fas216_rq_sns_done':
      drivers/scsi/arm/fas216.c:2014:40: error: argument to 'sizeof' in 'memset' call is the same expression as the destination; did you mean to provide an explicit length? [-Werror=sizeof-pointer-memaccess]
      
      It turns out that the definition of the scsi_cmd structure changed back
      in linux-2.6.25, so now we clear only four bytes (sizeof(pointer))
      instead of 96 (SCSI_SENSE_BUFFERSIZE). I did not check whether we
      actually need to initialize the buffer here, but it's clear that if we
      do it, we should use the correct size.
      
      Fixes: de25deb1 ("[SCSI] use dynamically allocated sense buffer")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c92e6b9b
    • Liu Bo's avatar
      Btrfs: fix scrub to repair raid6 corruption · 95b286da
      Liu Bo authored
      [ Upstream commit 762221f0 ]
      
      The raid6 corruption is that,
      suppose that all disks can be read without problems and if the content
      that was read out doesn't match its checksum, currently for raid6
      btrfs at most retries twice,
      
      - the 1st retry is to rebuild with all other stripes, it'll eventually
        be a raid5 xor rebuild,
      - if the 1st fails, the 2nd retry will deliberately fail parity p so
        that it will do raid6 style rebuild,
      
      however, the chances are that another non-parity stripe content also
      has something corrupted, so that the above retries are not able to
      return correct content.
      
      We've fixed normal reads to rebuild raid6 correctly with more retries
      in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix
      scrub to do the exactly same rebuild process.
      
      [1]: https://patchwork.kernel.org/patch/10091755/Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95b286da
    • Nikolay Borisov's avatar
      btrfs: Fix out of bounds access in btrfs_search_slot · 72af2c8c
      Nikolay Borisov authored
      [ Upstream commit 9ea2c7c9 ]
      
      When modifying a tree where the root is at BTRFS_MAX_LEVEL - 1 then
      the level variable is going to be 7 (this is the max height of the
      tree). On the other hand btrfs_cow_block is always called with
      "level + 1" as an index into the nodes and slots arrays. This leads to
      an out of bounds access. Admittdely this will be benign since an OOB
      access of the nodes array will likely read the 0th element from the
      slots array, which in this case is going to be 0 (since we start CoW at
      the top of the tree). The OOB access into the slots array in turn will
      read the 0th and 1st values of the locks array, which would both be 0
      at the time. However, this benign behavior relies on the fact that the
      path being passed hasn't been initialised, if it has already been used to
      query a btree then it could potentially have populated the nodes/slots arrays.
      
      Fix it by explicitly checking if we are at level 7 (the maximum allowed
      index in nodes/slots arrays) and explicitly call the CoW routine with
      NULL for parent's node/slot.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Fixes-coverity-id: 711515
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72af2c8c
    • Liu Bo's avatar
      Btrfs: set plug for fsync · 51f12a06
      Liu Bo authored
      [ Upstream commit 343e4fc1 ]
      
      Setting plug can merge adjacent IOs before dispatching IOs to the disk
      driver.
      
      Without plug, it'd not be a problem for single disk usecases, but for
      multiple disks using raid profile, a large IO can be split to several
      IOs of stripe length, and plug can be helpful to bring them together
      for each disk so that we can save several disk access.
      
      Moreover, fsync issues synchronous writes, so plug can really take
      effect.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51f12a06
    • Wei Yongjun's avatar
      ipmi/powernv: Fix error return code in ipmi_powernv_probe() · a89f66de
      Wei Yongjun authored
      [ Upstream commit e749d328 ]
      
      Fix to return a negative error code from the request_irq() error
      handling case instead of 0, as done elsewhere in this function.
      
      Fixes: dce143c3 ("ipmi/powernv: Convert to irq event interface")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a89f66de
    • weiyongjun (A)'s avatar
      mac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl() · 3fc6adc3
      weiyongjun (A) authored
      [ Upstream commit 0ddcff49 ]
      
      'hwname' is malloced in hwsim_new_radio_nl() and should be freed
      before leaving from the error handling cases, otherwise it will cause
      memory leak.
      
      Fixes: ff4dd73d ("mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3fc6adc3
    • Ulf Magnusson's avatar
      kconfig: Fix expr_free() E_NOT leak · 5bfe39f1
      Ulf Magnusson authored
      [ Upstream commit 5b1374b3 ]
      
      Only the E_NOT operand and not the E_NOT node itself was freed, due to
      accidentally returning too early in expr_free(). Outline of leak:
      
      	switch (e->type) {
      	...
      	case E_NOT:
      		expr_free(e->left.expr);
      		return;
      	...
      	}
      	*Never reached, 'e' leaked*
      	free(e);
      
      Fix by changing the 'return' to a 'break'.
      
      Summary from Valgrind on 'menuconfig' (ARCH=x86) before the fix:
      
      	LEAK SUMMARY:
      	   definitely lost: 44,448 bytes in 1,852 blocks
      	   ...
      
      Summary after the fix:
      
      	LEAK SUMMARY:
      	   definitely lost: 1,608 bytes in 67 blocks
      	   ...
      Signed-off-by: default avatarUlf Magnusson <ulfalizer@gmail.com>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5bfe39f1
    • Ulf Magnusson's avatar
      kconfig: Fix automatic menu creation mem leak · 921fcd94
      Ulf Magnusson authored
      [ Upstream commit ae7440ef ]
      
      expr_trans_compare() always allocates and returns a new expression,
      giving the following leak outline:
      
      	...
      	*Allocate*
      	basedep = expr_trans_compare(basedep, E_UNEQUAL, &symbol_no);
      	...
      	for (menu = parent->next; menu; menu = menu->next) {
      		...
      		*Copy*
      		dep2 = expr_copy(basedep);
      		...
      		*Free copy*
      		expr_free(dep2);
      	}
      	*basedep lost!*
      
      Fix by freeing 'basedep' after the loop.
      
      Summary from Valgrind on 'menuconfig' (ARCH=x86) before the fix:
      
      	LEAK SUMMARY:
      	   definitely lost: 344,376 bytes in 14,349 blocks
      	   ...
      
      Summary after the fix:
      
      	LEAK SUMMARY:
      	   definitely lost: 44,448 bytes in 1,852 blocks
      	   ...
      Signed-off-by: default avatarUlf Magnusson <ulfalizer@gmail.com>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      921fcd94
    • Ulf Magnusson's avatar
      kconfig: Don't leak main menus during parsing · 608b42a3
      Ulf Magnusson authored
      [ Upstream commit 0724a7c3 ]
      
      If a 'mainmenu' entry appeared in the Kconfig files, two things would
      leak:
      
      	- The 'struct property' allocated for the default "Linux Kernel
      	  Configuration" prompt.
      
      	- The string for the T_WORD/T_WORD_QUOTE prompt after the
      	  T_MAINMENU token, allocated on the heap in zconf.l.
      
      To fix it, introduce a new 'no_mainmenu_stmt' nonterminal that matches
      if there's no 'mainmenu' and adds the default prompt. That means the
      prompt only gets allocated once regardless of whether there's a
      'mainmenu' statement or not, and managing it becomes simple.
      
      Summary from Valgrind on 'menuconfig' (ARCH=x86) before the fix:
      
      	LEAK SUMMARY:
      	   definitely lost: 344,568 bytes in 14,352 blocks
      	   ...
      
      Summary after the fix:
      
      	LEAK SUMMARY:
      	   definitely lost: 344,440 bytes in 14,350 blocks
      	   ...
      Signed-off-by: default avatarUlf Magnusson <ulfalizer@gmail.com>
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      608b42a3
    • Guenter Roeck's avatar
      watchdog: sp5100_tco: Fix watchdog disable bit · 278480bd
      Guenter Roeck authored
      [ Upstream commit f541c09e ]
      
      According to all published information, the watchdog disable bit for SB800
      compatible controllers is bit 1 of PM register 0x48, not bit 2. For the
      most part that doesn't matter in practice, since the bit has to be cleared
      to enable watchdog address decoding, which is the default setting, but it
      still needs to be fixed.
      
      Cc: Zoltán Böszörményi <zboszor@pr.hu>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarWim Van Sebroeck <wim@iguana.be>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      278480bd
    • Jan Chochol's avatar
      nfs: Do not convert nfs_idmap_cache_timeout to jiffies · 929d4510
      Jan Chochol authored
      [ Upstream commit cbebc6ef ]
      
      Since commit 57e62324 ("NFS: Store the legacy idmapper result in the
      keyring") nfs_idmap_cache_timeout changed units from jiffies to seconds.
      Unfortunately sysctl interface was not updated accordingly.
      
      As a effect updating /proc/sys/fs/nfs/idmap_cache_timeout with some
      value will incorrectly multiply this value by HZ.
      Also reading /proc/sys/fs/nfs/idmap_cache_timeout will show real value
      divided by HZ.
      
      Fixes: 57e62324 ("NFS: Store the legacy idmapper result in the keyring")
      Signed-off-by: default avatarJan Chochol <jan@chochol.info>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      929d4510