1. 10 Nov, 2018 40 commits
    • David S. Miller's avatar
      sparc64: Fix regression in pmdp_invalidate(). · 24b71d18
      David S. Miller authored
      [ Upstream commit cfb61b5e ]
      
      pmdp_invalidate() was changed to update the pmd atomically
      (to not lose dirty/access bits) and return the original pmd
      value.
      
      However, in doing so, we lost a lot of the essential work that
      set_pmd_at() does, namely to update hugepage mapping counts and
      queuing up the batched TLB flush entry.
      
      Thus we were not flushing entries out of the TLB when making
      such PMD changes.
      
      Fix this by abstracting the accounting work of set_pmd_at() out into a
      separate function, and call it from pmdp_establish().
      
      Fixes: a8e654f0 ("sparc64: update pmdp_invalidate() to return old pmd value")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      24b71d18
    • Ross Lagerwall's avatar
      xen-netfront: Update features after registering netdev · 43bbab66
      Ross Lagerwall authored
      [ Upstream commit 45c8184c ]
      
      Update the features after calling register_netdev() otherwise the
      device features are not set up correctly and it not possible to change
      the MTU of the device. After this change, the features reported by
      ethtool match the device's features before the commit which introduced
      the issue and it is possible to change the device's MTU.
      
      Fixes: f599c64f ("xen-netfront: Fix race between device setup and open")
      Reported-by: default avatarLiam Shepherd <liam@dancer.es>
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      43bbab66
    • Thadeu Lima de Souza Cascardo's avatar
      test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other arches · acfbd286
      Thadeu Lima de Souza Cascardo authored
      [ Upstream commit 52fda36d ]
      
      Function bpf_fill_maxinsns11 is designed to not be able to be JITed on
      x86_64. So, it fails when CONFIG_BPF_JIT_ALWAYS_ON=y, and
      commit 09584b40 ("bpf: fix selftests/bpf test_kmod.sh failure when
      CONFIG_BPF_JIT_ALWAYS_ON=y") makes sure that failure is detected on that
      case.
      
      However, it does not fail on other architectures, which have a different
      JIT compiler design. So, test_bpf has started to fail to load on those.
      
      After this fix, test_bpf loads fine on both x86_64 and ppc64el.
      
      Fixes: 09584b40 ("bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y")
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Reviewed-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      acfbd286
    • Takashi Iwai's avatar
      ALSA: hda - Fix incorrect usage of IS_REACHABLE() · dba8e960
      Takashi Iwai authored
      [ Upstream commit 6a30abaa ]
      
      The commit c469652b ("ALSA: hda - Use IS_REACHABLE() for
      dependency on input") simplified the dependencies with IS_REACHABLE()
      macro, but it broke due to its incorrect usage: it should have been
      IS_REACHABLE(CONFIG_INPUT) instead of IS_REACHABLE(INPUT).
      
      Fixes: c469652b ("ALSA: hda - Use IS_REACHABLE() for dependency on input")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dba8e960
    • Jiri Slaby's avatar
      futex: futex_wake_op, do not fail on invalid op · da92e8a2
      Jiri Slaby authored
      [ Upstream commit e78c38f6 ]
      
      In commit 30d6e0a4 ("futex: Remove duplicated code and fix undefined
      behaviour"), I let FUTEX_WAKE_OP to fail on invalid op.  Namely when op
      should be considered as shift and the shift is out of range (< 0 or > 31).
      
      But strace's test suite does this madness:
      
        futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee);
        futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xbadfaced);
        futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xffffffff);
      
      When I pick the first 0xa0caffee, it decodes as:
      
        0x80000000 & 0xa0caffee: oparg is shift
        0x70000000 & 0xa0caffee: op is FUTEX_OP_OR
        0x0f000000 & 0xa0caffee: cmp is FUTEX_OP_CMP_EQ
        0x00fff000 & 0xa0caffee: oparg is sign-extended 0xcaf = -849
        0x00000fff & 0xa0caffee: cmparg is sign-extended 0xfee = -18
      
      That means the op tries to do this:
      
        (futex |= (1 << (-849))) == -18
      
      which is completely bogus. The new check of op in the code is:
      
              if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) {
                      if (oparg < 0 || oparg > 31)
                              return -EINVAL;
                      oparg = 1 << oparg;
              }
      
      which results obviously in the "Invalid argument" errno:
      
        FAIL: futex
        ===========
      
        futex(0x7fabd78bcffc, 0x5, 0xfacefeed, 0xb, 0x7fabd78bcffc, 0xa0caffee) = -1: Invalid argument
        futex.test: failed test: ../futex failed with code 1
      
      So let us soften the failure to print only a (ratelimited) message, crop
      the value and continue as if it were right.  When userspace keeps up, we
      can switch this to return -EINVAL again.
      
      [v2] Do not return 0 immediatelly, proceed with the cropped value.
      
      Fixes: 30d6e0a4 ("futex: Remove duplicated code and fix undefined behaviour")
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Darren Hart <dvhart@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      da92e8a2
    • Geert Uytterhoeven's avatar
      cifs: Use ULL suffix for 64-bit constant · a2b9a0de
      Geert Uytterhoeven authored
      [ Upstream commit 3995bbf5 ]
      
      On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1):
      
          fs/cifs/inode.c: In function ‘simple_hashstr’:
          fs/cifs/inode.c:713: warning: integer constant is too large for ‘long’ type
      
      Fixes: 7ea884c7 ("smb3: Fix root directory when server returns inode number of zero")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Reviewed-by: default avatarAurelien Aptel <aaptel@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a2b9a0de
    • Jiri Olsa's avatar
      perf/core: Fix locking for children siblings group read · 6279fc56
      Jiri Olsa authored
      [ Upstream commit 2aeb1883 ]
      
      We're missing ctx lock when iterating children siblings
      within the perf_read path for group reading. Following
      race and crash can happen:
      
      User space doing read syscall on event group leader:
      
      T1:
        perf_read
          lock event->ctx->mutex
          perf_read_group
            lock leader->child_mutex
            __perf_read_group_add(child)
              list_for_each_entry(sub, &leader->sibling_list, group_entry)
      
      ---->   sub might be invalid at this point, because it could
              get removed via perf_event_exit_task_context in T2
      
      Child exiting and cleaning up its events:
      
      T2:
        perf_event_exit_task_context
          lock ctx->mutex
          list_for_each_entry_safe(child_event, next, &child_ctx->event_list,...
            perf_event_exit_event(child)
              lock ctx->lock
              perf_group_detach(child)
              unlock ctx->lock
      
      ---->   child is removed from sibling_list without any sync
              with T1 path above
      
              ...
              free_event(child)
      
      Before the child is removed from the leader's child_list,
      (and thus is omitted from perf_read_group processing), we
      need to ensure that perf_read_group touches child's
      siblings under its ctx->lock.
      
      Peter further notes:
      
      | One additional note; this bug got exposed by commit:
      |
      |   ba5213ae ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
      |
      | which made it possible to actually trigger this code-path.
      Tested-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: ba5213ae ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
      Link: http://lkml.kernel.org/r/20170720141455.2106-1-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6279fc56
    • Sabrina Dubroca's avatar
      macsec: fix memory leaks when skb_to_sgvec fails · 5f6590d6
      Sabrina Dubroca authored
      [ Upstream commit 5aba2ba5 ]
      
      Fixes: cda7ea69 ("macsec: check return value of skb_to_sgvec always")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5f6590d6
    • James Chapman's avatar
      l2tp: remove configurable payload offset · 65b05d03
      James Chapman authored
      [ Upstream commit 900631ee ]
      
      If L2TP_ATTR_OFFSET is set to a non-zero value in L2TPv3 tunnels, it
      results in L2TPv3 packets being transmitted which might not be
      compliant with the L2TPv3 RFC. This patch has l2tp ignore the offset
      setting and send all packets with no offset.
      
      In more detail:
      
      L2TPv2 supports a variable offset from the L2TPv2 header to the
      payload. The offset value is indicated by an optional field in the
      L2TP header.  Our L2TP implementation already detects the presence of
      the optional offset and skips that many bytes when handling data
      received packets. All transmitted packets are always transmitted with
      no offset.
      
      L2TPv3 has no optional offset field in the L2TPv3 packet
      header. Instead, L2TPv3 defines optional fields in a "Layer-2 Specific
      Sublayer". At the time when the original L2TP code was written, there
      was talk at IETF of offset being implemented in a new Layer-2 Specific
      Sublayer. A L2TP_ATTR_OFFSET netlink attribute was added so that this
      offset could be configured and the intention was to allow it to be
      also used to set the tx offset for L2TPv2. However, no L2TPv3 offset
      was ever specified and the L2TP_ATTR_OFFSET parameter was forgotten
      about.
      
      Setting L2TP_ATTR_OFFSET results in L2TPv3 packets being transmitted
      with the specified number of bytes padding between L2TPv3 header and
      payload. This is not compliant with L2TPv3 RFC3931. This change
      removes the configurable offset altogether while retaining
      L2TP_ATTR_OFFSET for backwards compatibility. Any L2TP_ATTR_OFFSET
      value is ignored.
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      65b05d03
    • Geert Uytterhoeven's avatar
      iio: pressure: zpa2326: Remove always-true check which confuses gcc · 1680ca36
      Geert Uytterhoeven authored
      [ Upstream commit f61dfff2 ]
      
      With gcc 4.1.2:
      
          drivers/iio/pressure/zpa2326.c: In function ‘zpa2326_wait_oneshot_completion’:
          drivers/iio/pressure/zpa2326.c:868: warning: ‘ret’ may be used uninitialized in this function
      
      When testing for "timeout < 0", timeout is already guaranteed to be
      strict negative, so the branch is always taken, and ret is thus always
      initialized.  But (some version of) gcc is not smart enough to notice.
      
      Remove the check to fix this.
      As there is no other code in between assigning the error codes and
      returning them, the error codes can be returned immediately, and the
      intermediate variable can be dropped.
      Drop the "else" to please checkpatch.
      
      Fixes: e7215fe4 ("iio: pressure: zpa2326: report interrupted case as failure")
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1680ca36
    • Arnd Bergmann's avatar
      module: fix DEBUG_SET_MODULE_RONX typo · dfdf8be7
      Arnd Bergmann authored
      [ Upstream commit 4d217a5a ]
      
      The newly added 'rodata_enabled' global variable is protected by
      the wrong #ifdef, leading to a link error when CONFIG_DEBUG_SET_MODULE_RONX
      is turned on:
      
      kernel/module.o: In function `disable_ro_nx':
      module.c:(.text.unlikely.disable_ro_nx+0x88): undefined reference to `rodata_enabled'
      kernel/module.o: In function `module_disable_ro':
      module.c:(.text.module_disable_ro+0x8c): undefined reference to `rodata_enabled'
      kernel/module.o: In function `module_enable_ro':
      module.c:(.text.module_enable_ro+0xb0): undefined reference to `rodata_enabled'
      
      CONFIG_SET_MODULE_RONX does not exist, so use the correct one instead.
      
      Fixes: 39290b38 ("module: extend 'rodata=off' boot cmdline parameter to module mappings")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJessica Yu <jeyu@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dfdf8be7
    • Ben Hutchings's avatar
      drm/msm: Fix possible null dereference on failure of get_pages() · c87dd592
      Ben Hutchings authored
      [ Upstream commit 3976626e ]
      
      Commit 62e3a3e3 changed get_pages() to initialise
      msm_gem_object::pages before trying to initialise msm_gem_object::sgt,
      so that put_pages() would properly clean up pages in the failure
      case.
      
      However, this means that put_pages() now needs to check that
      msm_gem_object::sgt is not null before trying to clean it up, and
      this check was only applied to part of the cleanup code.  Move
      it all into the conditional block.  (Strictly speaking we don't
      need to make the kfree() conditional, but since we can't avoid
      checking for null ourselves we may as well do so.)
      
      Fixes: 62e3a3e3 ("drm/msm: fix leak in failed get_pages")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Reviewed-by: default avatarJordan Crouse <jcrouse@codeaurora.org>
      Signed-off-by: default avatarRob Clark <robdclark@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c87dd592
    • Filipe Manana's avatar
      Btrfs: incremental send, fix invalid memory access · 97d65c1b
      Filipe Manana authored
      [ Upstream commit 24e52b11 ]
      
      When doing an incremental send, while processing an extent that changed
      between the parent and send snapshots and that extent was an inline extent
      in the parent snapshot, it's possible to access a memory region beyond
      the end of leaf if the inline extent is very small and it is the first
      item in a leaf.
      
      An example scenario is described below.
      
      The send snapshot has the following leaf:
      
       leaf 33865728 items 33 free space 773 generation 46 owner 5
       fs uuid ab7090d8-dafd-4fb9-9246-723b6d2e2fb7
       chunk uuid 2d16478c-c704-4ab9-b574-68bff2281b1f
              (...)
              item 14 key (335 EXTENT_DATA 0) itemoff 3052 itemsize 53
                      generation 36 type 1 (regular)
                      extent data disk byte 12791808 nr 4096
                      extent data offset 0 nr 4096 ram 4096
                      extent compression 0 (none)
              item 15 key (335 EXTENT_DATA 8192) itemoff 2999 itemsize 53
                      generation 36 type 1 (regular)
                      extent data disk byte 138170368 nr 225280
                      extent data offset 0 nr 225280 ram 225280
                      extent compression 0 (none)
              (...)
      
      And the parent snapshot has the following leaf:
      
       leaf 31272960 items 17 free space 17 generation 31 owner 5
       fs uuid ab7090d8-dafd-4fb9-9246-723b6d2e2fb7
       chunk uuid 2d16478c-c704-4ab9-b574-68bff2281b1f
              item 0 key (335 EXTENT_DATA 0) itemoff 3951 itemsize 44
                      generation 31 type 0 (inline)
                      inline extent data size 23 ram_bytes 613 compression 1 (zlib)
              (...)
      
      When computing the send stream, it is detected that the extent of inode
      335, at file offset 0, and at fs/btrfs/send.c:is_extent_unchanged() we
      grab the leaf from the parent snapshot and access the inline extent item.
      However, before jumping to the 'out' label, we access the 'offset' and
      'disk_bytenr' fields of the extent item, which should not be done for
      inline extents since the inlined data starts at the offset of the
      'disk_bytenr' field and can be very small. For example accessing the
      'offset' field of the file extent item results in the following trace:
      
      [  599.705368] general protection fault: 0000 [#1] PREEMPT SMP
      [  599.706296] Modules linked in: btrfs psmouse i2c_piix4 ppdev acpi_cpufreq serio_raw parport_pc i2c_core evdev tpm_tis tpm_tis_core sg pcspkr parport tpm button su$
      [  599.709340] CPU: 7 PID: 5283 Comm: btrfs Not tainted 4.10.0-rc8-btrfs-next-46+ #1
      [  599.709340] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
      [  599.709340] task: ffff88023eedd040 task.stack: ffffc90006658000
      [  599.709340] RIP: 0010:read_extent_buffer+0xdb/0xf4 [btrfs]
      [  599.709340] RSP: 0018:ffffc9000665ba00 EFLAGS: 00010286
      [  599.709340] RAX: db73880000000000 RBX: 0000000000000000 RCX: 0000000000000001
      [  599.709340] RDX: ffffc9000665ba60 RSI: db73880000000000 RDI: ffffc9000665ba5f
      [  599.709340] RBP: ffffc9000665ba30 R08: 0000000000000001 R09: ffff88020dc5e098
      [  599.709340] R10: 0000000000001000 R11: 0000160000000000 R12: 6db6db6db6db6db7
      [  599.709340] R13: ffff880000000000 R14: 0000000000000000 R15: ffff88020dc5e088
      [  599.709340] FS:  00007f519555a8c0(0000) GS:ffff88023f3c0000(0000) knlGS:0000000000000000
      [  599.709340] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  599.709340] CR2: 00007f1411afd000 CR3: 0000000235f8e000 CR4: 00000000000006e0
      [  599.709340] Call Trace:
      [  599.709340]  btrfs_get_token_64+0x93/0xce [btrfs]
      [  599.709340]  ? printk+0x48/0x50
      [  599.709340]  btrfs_get_64+0xb/0xd [btrfs]
      [  599.709340]  process_extent+0x3a1/0x1106 [btrfs]
      [  599.709340]  ? btree_read_extent_buffer_pages+0x5/0xef [btrfs]
      [  599.709340]  changed_cb+0xb03/0xb3d [btrfs]
      [  599.709340]  ? btrfs_get_token_32+0x7a/0xcc [btrfs]
      [  599.709340]  btrfs_compare_trees+0x432/0x53d [btrfs]
      [  599.709340]  ? process_extent+0x1106/0x1106 [btrfs]
      [  599.709340]  btrfs_ioctl_send+0x960/0xe26 [btrfs]
      [  599.709340]  btrfs_ioctl+0x181b/0x1fed [btrfs]
      [  599.709340]  ? trace_hardirqs_on_caller+0x150/0x1ac
      [  599.709340]  vfs_ioctl+0x21/0x38
      [  599.709340]  ? vfs_ioctl+0x21/0x38
      [  599.709340]  do_vfs_ioctl+0x611/0x645
      [  599.709340]  ? rcu_read_unlock+0x5b/0x5d
      [  599.709340]  ? __fget+0x6d/0x79
      [  599.709340]  SyS_ioctl+0x57/0x7b
      [  599.709340]  entry_SYSCALL_64_fastpath+0x18/0xad
      [  599.709340] RIP: 0033:0x7f51945eec47
      [  599.709340] RSP: 002b:00007ffc21c13e98 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
      [  599.709340] RAX: ffffffffffffffda RBX: ffffffff81096459 RCX: 00007f51945eec47
      [  599.709340] RDX: 00007ffc21c13f20 RSI: 0000000040489426 RDI: 0000000000000004
      [  599.709340] RBP: ffffc9000665bf98 R08: 00007f519450d700 R09: 00007f519450d700
      [  599.709340] R10: 00007f519450d9d0 R11: 0000000000000202 R12: 0000000000000046
      [  599.709340] R13: ffffc9000665bf78 R14: 0000000000000000 R15: 00007f5195574040
      [  599.709340]  ? trace_hardirqs_off_caller+0x43/0xb1
      [  599.709340] Code: 29 f0 49 39 d8 4c 0f 47 c3 49 03 81 58 01 00 00 44 89 c1 4c 01 c2 4c 29 c3 48 c1 f8 03 49 0f af c4 48 c1 e0 0c 4c 01 e8 48 01 c6 <f3> a4 31 f6 4$
      [  599.709340] RIP: read_extent_buffer+0xdb/0xf4 [btrfs] RSP: ffffc9000665ba00
      [  599.762057] ---[ end trace fe00d7af61b9f49e ]---
      
      This is because the 'offset' field starts at an offset of 37 bytes
      (offsetof(struct btrfs_file_extent_item, offset)), has a length of 8
      bytes and therefore attemping to read it causes a 1 byte access beyond
      the end of the leaf, as the first item's content in a leaf is located
      at the tail of the leaf, the item size is 44 bytes and the offset of
      that field plus its length (37 + 8 = 45) goes beyond the item's size
      by 1 byte.
      
      So fix this by accessing the 'offset' and 'disk_bytenr' fields after
      jumping to the 'out' label if we are processing an inline extent. We
      move the reading operation of the 'disk_bytenr' field too because we
      have the same problem as for the 'offset' field explained above when
      the inline data is less then 8 bytes. The access to the 'generation'
      field is also moved but just for the sake of grouping access to all
      the fields.
      
      Fixes: e1cbfd7b ("Btrfs: send, fix file hole not being preserved due to inline extent")
      Cc: <stable@vger.kernel.org>  # v4.12+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      97d65c1b
    • Alex Estrin's avatar
      Revert "IB/ipoib: Update broadcast object if PKey value was changed in index 0" · 9629cf8e
      Alex Estrin authored
      [ Upstream commit 612601d0 ]
      
      commit 9a9b8112 will cause core to fail UD QP from being destroyed
      on ipoib unload, therefore cause resources leakage.
      On pkey change event above patch modifies mgid before calling underlying
      driver to detach it from QP. Drivers' detach_mcast() will fail to find
      modified mgid it was never given to attach in a first place.
      Core qp->usecnt will never go down, so ib_destroy_qp() will fail.
      
      IPoIB driver actually does take care of new broadcast mgid based on new
      pkey by destroying an old mcast object in ipoib_mcast_dev_flush())
      ....
      	if (priv->broadcast) {
      		rb_erase(&priv->broadcast->rb_node, &priv->multicast_tree);
      		list_add_tail(&priv->broadcast->list, &remove_list);
      		priv->broadcast = NULL;
      	}
      ...
      
      then in restarted ipoib_macst_join_task() creating a new broadcast mcast
      object, sending join request and on completion tells the driver to attach
      to reinitialized QP:
      ...
      if (!priv->broadcast) {
      ...
      	broadcast = ipoib_mcast_alloc(dev, 0);
      ...
      	memcpy(broadcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
      	       sizeof (union ib_gid));
      	priv->broadcast = broadcast;
      ...
      
      Fixes: 9a9b8112 ("IB/ipoib: Update broadcast object if PKey value was changed in index 0")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarAlex Estrin <alex.estrin@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Reviewed-by: default avatarFeras Daoud <ferasda@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9629cf8e
    • Anjali Singhai Jain's avatar
      i40e: avoid NVM acquire deadlock during NVM update · 78ba91de
      Anjali Singhai Jain authored
      [ Upstream commit 09f79fd4 ]
      
      X722 devices use the AdminQ to access the NVM, and this requires taking
      the AdminQ lock. Because of this, we lock the AdminQ during
      i40e_read_nvm(), which is also called in places where the lock is
      already held, such as the firmware update path which wants to lock once
      and then unlock when finished after performing several tasks.
      
      Although this should have only affected X722 devices, commit
      96a39aed ("i40e: Acquire NVM lock before reads on all devices",
      2016-12-02) added locking for all NVM reads, regardless of device
      family.
      
      This resulted in us accidentally causing NVM acquire timeouts on all
      devices, causing failed firmware updates which left the eeprom in
      a corrupt state.
      
      Create unsafe non-locked variants of i40e_read_nvm_word and
      i40e_read_nvm_buffer, __i40e_read_nvm_word and __i40e_read_nvm_buffer
      respectively. These variants will not take the NVM lock and are expected
      to only be called in places where the NVM lock is already held if
      needed.
      
      Since the only caller of i40e_read_nvm_buffer() was in such a path,
      remove it entirely in favor of the unsafe version. If necessary we can
      always add it back in the future.
      
      Additionally, we now need to hold the NVM lock in i40e_validate_checksum
      because the call to i40e_calc_nvm_checksum now assumes that the NVM lock
      is held. We can further move the call to read I40E_SR_SW_CHECKSUM_WORD
      up a bit so that we do not need to acquire the NVM lock twice.
      
      This should resolve firmware updates and also fix potential raise that
      could have caused the driver to report an invalid NVM checksum upon
      driver load.
      Reported-by: default avatarStefan Assmann <sassmann@kpanic.de>
      Fixes: 96a39aed ("i40e: Acquire NVM lock before reads on all devices", 2016-12-02)
      Signed-off-by: default avatarAnjali Singhai Jain <anjali.singhai@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      78ba91de
    • Gabriel Krisman Bertazi's avatar
      drm: bochs: Don't remove uninitialized fbdev framebuffer · 1e6abb88
      Gabriel Krisman Bertazi authored
      [ Upstream commit 4fa13dbe ]
      
      In the same spirit of the fix for QXL in commit 86107838 ("drm: qxl:
      Don't alloc fbdev if emulation is not supported"), prevent the Oops in
      the unbind path of Bochs if fbdev emulation is disabled.
      
      [  112.176009] Oops: 0002 [#1] SMP
      [  112.176009] Modules linked in: bochs_drm
      [  112.176009] CPU: 0 PID: 3002 Comm: bash Not tainted 4.11.0-rc1+ #111
      [  112.176009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
      [  112.176009] task: ffff8800743bbac0 task.stack: ffffc90000b5c000
      [  112.176009] RIP: 0010:mutex_lock+0x18/0x30
      [  112.176009] RSP: 0018:ffffc90000b5fc78 EFLAGS: 00010246
      [  112.176009] RAX: 0000000000000000 RBX: 0000000000000260 RCX: 0000000000000000
      [  112.176009] RDX: ffff8800743bbac0 RSI: ffff8800787176e0 RDI: 0000000000000260
      [  112.176009] RBP: ffffc90000b5fc80 R08: ffffffff00000000 R09: 00000000ffffffff
      [  112.176009] R10: ffff88007b463650 R11: 0000000000000000 R12: 0000000000000260
      [  112.176009] R13: ffff8800787176e0 R14: ffffffffa0003068 R15: 0000000000000060
      [  112.176009] FS:  00007f20564c7b40(0000) GS:ffff88007ce00000(0000) knlGS:0000000000000000
      [  112.176009] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  112.176009] CR2: 0000000000000260 CR3: 000000006b89c000 CR4: 00000000000006f0
      [  112.176009] Call Trace:
      [  112.176009]  drm_mode_object_unregister+0x1e/0x50
      [  112.176009]  drm_framebuffer_unregister_private+0x15/0x20
      [  112.176009]  bochs_fbdev_fini+0x57/0x70 [bochs_drm]
      [  112.176009]  bochs_unload+0x16/0x50 [bochs_drm]
      [  112.176009]  drm_dev_unregister+0x37/0xd0
      [  112.176009]  drm_put_dev+0x31/0x60
      [  112.176009]  bochs_pci_remove+0x10/0x20 [bochs_drm]
      [  112.176009]  pci_device_remove+0x34/0xb0
      [  112.176009]  device_release_driver_internal+0x150/0x200
      [  112.176009]  device_release_driver+0xd/0x10
      [  112.176009]  unbind_store+0x108/0x150
      [  112.176009]  drv_attr_store+0x20/0x30
      [  112.176009]  sysfs_kf_write+0x32/0x40
      [  112.176009]  kernfs_fop_write+0x10b/0x190
      [  112.176009]  __vfs_write+0x23/0x120
      [  112.176009]  ? security_file_permission+0x36/0xb0
      [  112.176009]  ? rw_verify_area+0x49/0xb0
      [  112.176009]  vfs_write+0xb0/0x190
      [  112.176009]  SyS_write+0x41/0xa0
      [  112.176009]  entry_SYSCALL_64_fastpath+0x1a/0xa9
      [  112.176009] RIP: 0033:0x7f2055bd5620
      [  112.176009] RSP: 002b:00007ffed2f487d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  112.176009] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2055bd5620
      [  112.176009] RDX: 000000000000000d RSI: 0000000000ee0008 RDI: 0000000000000001
      [  112.176009] RBP: 0000000000000001 R08: 00007f2055e94760 R09: 00007f20564c7b40
      [  112.176009] R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000000
      [  112.176009] R13: 00007ffed2f48d70 R14: 0000000000000000 R15: 0000000000000000
      [  112.176009] Code: 00 00 00 55 be 02 00 00 00 48 89 e5 e8 62 fb ff ff 5d c3 55 48 89 e5 53 48 89 fb e8 53 e9 ff ff 65 48 8b 14 25 40 c4 00 00 31 c0 <f0> 48 0f b1 13 48 85 c0 74 08 48 89 df e8c6 ff ff ff 5b 5d c3
      [  112.176009] RIP: mutex_lock+0x18/0x30 RSP: ffffc90000b5fc78
      [  112.176009] CR2: 0000000000000260
      [  112.205622] ---[ end trace 76189cd7a9bdd155 ]---
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@collabora.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170317181409.4183-1-krisman@collabora.co.ukSigned-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1e6abb88
    • Ben Hutchings's avatar
      scsi: qla2xxx: Avoid double completion of abort command · 9388bd23
      Ben Hutchings authored
      [ Upstream commit 3a9910d7 ]
      
      qla2x00_tmf_sp_done() now deletes the timer that will run
      qla2x00_tmf_iocb_timeout(), but doesn't check whether the timer already
      expired.  Check the return value from del_timer() to avoid calling
      complete() a second time.
      
      Fixes: 4440e46d ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous ...")
      Fixes: 1514839b ("scsi: qla2xxx: Fix NULL pointer crash due to active ...")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Acked-by: default avatarHimanshu Madhani <himanshu.madhani@cavium.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9388bd23
    • Noa Osherovich's avatar
      IB/mlx5: Avoid passing an invalid QP type to firmware · d287f1da
      Noa Osherovich authored
      [ Upstream commit e7b169f3 ]
      
      During QP creation, the mlx5 driver translates the QP type to an
      internal value which is passed on to FW. There was no check to make
      sure that the translated value is valid, and -EINVAL was coerced into
      the mailbox command.
      
      Current firmware refuses this as an invalid QP type, but future/past
      firmware may do something else.
      
      Fixes: 09a7d9ec ('{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc')
      Reviewed-by: default avatarIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: default avatarNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d287f1da
    • Christophe JAILLET's avatar
      mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()' · a96406d4
      Christophe JAILLET authored
      [ Upstream commit 1f704fd0 ]
      
      A semaphore is acquired before this check, so we must release it before
      leaving.
      
      Link: http://lkml.kernel.org/r/20171211211009.4971-1-christophe.jaillet@wanadoo.fr
      Fixes: b7f0554a ("mm: fail get_vaddr_frames() for filesystem-dax mappings")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David Sterba <dsterba@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a96406d4
    • Josef Bacik's avatar
      nbd: only set MSG_MORE when we have more to send · 4b7c09a5
      Josef Bacik authored
      [ Upstream commit d61b7f97 ]
      
      A user noticed that write performance was horrible over loopback and we
      traced it to an inversion of when we need to set MSG_MORE.  It should be
      set when we have more bvec's to send, not when we are on the last bvec.
      This patch made the test go from 20 iops to 78k iops.
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Fixes: 429a787b ("nbd: fix use-after-free of rq/bio in the xmit path")
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4b7c09a5
    • Doug Ledford's avatar
      IB/rxe: put the pool on allocation failure · 150b28b4
      Doug Ledford authored
      [ Upstream commit 6b9f8970 ]
      
      If the allocation of elem fails, it is not sufficient to simply check
      for NULL and return.  We need to also put our reference on the pool or
      else we will leave the pool with a permanent ref count and we will never
      be able to free it.
      
      Fixes: 4831ca9e ("IB/rxe: check for allocation failure on elem")
      Suggested-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      150b28b4
    • Alex Vesker's avatar
      IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush · 2e6474b2
      Alex Vesker authored
      [ Upstream commit 1f80bd6a ]
      
      The locking order of vlan_rwsem (LOCK A) and then rtnl (LOCK B),
      contradicts other flows such as ipoib_open possibly causing a deadlock.
      To prevent this deadlock heavy flush is called with RTNL locked and
      only then tries to acquire vlan_rwsem.
      This deadlock is possible only when there are child interfaces.
      
      [  140.941758] ======================================================
      [  140.946276] WARNING: possible circular locking dependency detected
      [  140.950950] 4.15.0-rc1+ #9 Tainted: G           O
      [  140.954797] ------------------------------------------------------
      [  140.959424] kworker/u32:1/146 is trying to acquire lock:
      [  140.963450]  (rtnl_mutex){+.+.}, at: [<ffffffffc083516a>] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
      [  140.970006]
      but task is already holding lock:
      [  140.975141]  (&priv->vlan_rwsem){++++}, at: [<ffffffffc0834ee1>] __ipoib_ib_dev_flush+0x51/0x4e0 [ib_ipoib]
      [  140.982105]
      which lock already depends on the new lock.
      [  140.990023]
      the existing dependency chain (in reverse order) is:
      [  140.998650]
      -> #1 (&priv->vlan_rwsem){++++}:
      [  141.005276]        down_read+0x4d/0xb0
      [  141.009560]        ipoib_open+0xad/0x120 [ib_ipoib]
      [  141.014400]        __dev_open+0xcb/0x140
      [  141.017919]        __dev_change_flags+0x1a4/0x1e0
      [  141.022133]        dev_change_flags+0x23/0x60
      [  141.025695]        devinet_ioctl+0x704/0x7d0
      [  141.029156]        sock_do_ioctl+0x20/0x50
      [  141.032526]        sock_ioctl+0x221/0x300
      [  141.036079]        do_vfs_ioctl+0xa6/0x6d0
      [  141.039656]        SyS_ioctl+0x74/0x80
      [  141.042811]        entry_SYSCALL_64_fastpath+0x1f/0x96
      [  141.046891]
      -> #0 (rtnl_mutex){+.+.}:
      [  141.051701]        lock_acquire+0xd4/0x220
      [  141.055212]        __mutex_lock+0x88/0x970
      [  141.058631]        __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
      [  141.063160]        __ipoib_ib_dev_flush+0x71/0x4e0 [ib_ipoib]
      [  141.067648]        process_one_work+0x1f5/0x610
      [  141.071429]        worker_thread+0x4a/0x3f0
      [  141.074890]        kthread+0x141/0x180
      [  141.078085]        ret_from_fork+0x24/0x30
      [  141.081559]
      
      other info that might help us debug this:
      [  141.088967]  Possible unsafe locking scenario:
      [  141.094280]        CPU0                    CPU1
      [  141.097953]        ----                    ----
      [  141.101640]   lock(&priv->vlan_rwsem);
      [  141.104771]                                lock(rtnl_mutex);
      [  141.109207]                                lock(&priv->vlan_rwsem);
      [  141.114032]   lock(rtnl_mutex);
      [  141.116800]
       *** DEADLOCK ***
      
      Fixes: b4b678b0 ("IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop")
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2e6474b2
    • Sabrina Dubroca's avatar
      ipv6: fix cleanup ordering for ip6_mr failure · 4a4d2c93
      Sabrina Dubroca authored
      [ Upstream commit afe49de4 ]
      
      Commit 15e66807 ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
      moved the cleanup label for ipmr_fail, but should have changed the
      contents of the cleanup labels as well. Now we can end up cleaning up
      icmpv6 even though it hasn't been initialized (jump to icmp_fail or
      ipmr_fail).
      
      Simply undo things in the reverse order of their initialization.
      
      Example of panic (triggered by faking a failure of icmpv6_init):
      
          kasan: GPF could be caused by NULL-ptr deref or user memory access
          general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
          [...]
          RIP: 0010:__list_del_entry_valid+0x79/0x160
          [...]
          Call Trace:
           ? lock_release+0x8a0/0x8a0
           unregister_pernet_operations+0xd4/0x560
           ? ops_free_list+0x480/0x480
           ? down_write+0x91/0x130
           ? unregister_pernet_subsys+0x15/0x30
           ? down_read+0x1b0/0x1b0
           ? up_read+0x110/0x110
           ? kmem_cache_create_usercopy+0x1b4/0x240
           unregister_pernet_subsys+0x1d/0x30
           icmpv6_cleanup+0x1d/0x30
           inet6_init+0x1b5/0x23f
      
      Fixes: 15e66807 ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4a4d2c93
    • Kalle Valo's avatar
      ath10k: convert warning about non-existent OTP board id to debug message · 16bcf48a
      Kalle Valo authored
      [ Upstream commit 7be52c03 ]
      
      Currently ath10k unncessarily warns about board id not available from OTP:
      
      ath10k_pci 0000:02:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
      ath10k_pci 0000:02:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000
      ath10k_pci 0000:02:00.0: kconfig debug 1 debugfs 1 tracing 1 dfs 1 testmode 1
      ath10k_pci 0000:02:00.0: firmware ver 10.2.4.70.9-2 api 5 features no-p2p,raw-mode crc32 b8d50af5
      ath10k_pci 0000:02:00.0: board id is not exist in otp, ignore it
      ath10k_pci 0000:02:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
      ath10k_pci 0000:02:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal otp max-sta 128 raw 0 hwcrypto 1
      
      But not all boards have the board id in OTP so this is not a problem and no
      need to confuse the user with that info. So this can be safely changed to a
      debug message.
      
      Also fix grammar in the debug message.
      
      Fixes: d2e202c0 ("ath10k: ignore configuring the incorrect board_id")
      Signed-off-by: default avatarKalle Valo <kvalo@qca.qualcomm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      16bcf48a
    • Takashi Iwai's avatar
      ALSA: hda - No loopback on ALC299 codec · ddf39e0f
      Takashi Iwai authored
      [ Upstream commit fa16b69f ]
      
      ALC299 has no loopback mixer, but the driver still tries to add a beep
      control over the mixer NID which leads to the error at accessing it.
      This patch fixes it by properly declaring mixer_nid=0 for this codec.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195775
      Fixes: 28f1f9b2 ("ALSA: hda/realtek - Add new codec ID ALC299")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ddf39e0f
    • Xin Long's avatar
      sctp: use right member as the param of list_for_each_entry · 54ad2bbe
      Xin Long authored
      [ Upstream commit a8dd3979 ]
      
      Commit d04adf1b ("sctp: reset owner sk for data chunks on out queues
      when migrating a sock") made a mistake that using 'list' as the param of
      list_for_each_entry to traverse the retransmit, sacked and abandoned
      queues, while chunks are using 'transmitted_list' to link into these
      queues.
      
      It could cause NULL dereference panic if there are chunks in any of these
      queues when peeling off one asoc.
      
      So use the chunk member 'transmitted_list' instead in this patch.
      
      Fixes: d04adf1b ("sctp: reset owner sk for data chunks on out queues when migrating a sock")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      54ad2bbe
    • Bjørn Mork's avatar
      net: cdc_ncm: GetNtbFormat endian fix · 0ef75f51
      Bjørn Mork authored
      [ Upstream commit 6314dab4 ]
      
      The GetNtbFormat and SetNtbFormat requests operate on 16 bit little
      endian values. We get away with ignoring this most of the time, because
      we only care about USB_CDC_NCM_NTB16_FORMAT which is 0x0000.  This
      fails for USB_CDC_NCM_NTB32_FORMAT.
      
      Fix comparison between LE value from device and constant by converting
      the constant to LE.
      Reported-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Fixes: 2b02c20c ("cdc_ncm: Set NTB format again after altsetting switch for Huawei devices")
      Cc: Enrico Mioso <mrkiko.rs@gmail.com>
      Cc: Christian Panton <christian@panton.org>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Acked-By: default avatarEnrico Mioso <mrkiko.rs@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0ef75f51
    • Eric Ren's avatar
      ocfs2: fix deadlock caused by recursive locking in xattr · 855bf147
      Eric Ren authored
      [ Upstream commit 8818efaa ]
      
      Another deadlock path caused by recursive locking is reported.  This
      kind of issue was introduced since commit 743b5f14 ("ocfs2: take
      inode lock in ocfs2_iop_set/get_acl()").  Two deadlock paths have been
      fixed by commit b891fa50 ("ocfs2: fix deadlock issue when taking
      inode lock at vfs entry points").  Yes, we intend to fix this kind of
      case in incremental way, because it's hard to find out all possible
      paths at once.
      
      This one can be reproduced like this.  On node1, cp a large file from
      home directory to ocfs2 mountpoint.  While on node2, run
      setfacl/getfacl.  Both nodes will hang up there.  The backtraces:
      
      On node1:
        __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
        ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
        ocfs2_write_begin+0x43/0x1a0 [ocfs2]
        generic_perform_write+0xa9/0x180
        __generic_file_write_iter+0x1aa/0x1d0
        ocfs2_file_write_iter+0x4f4/0xb40 [ocfs2]
        __vfs_write+0xc3/0x130
        vfs_write+0xb1/0x1a0
        SyS_write+0x46/0xa0
      
      On node2:
        __ocfs2_cluster_lock.isra.39+0x357/0x740 [ocfs2]
        ocfs2_inode_lock_full_nested+0x17d/0x840 [ocfs2]
        ocfs2_xattr_set+0x12e/0xe80 [ocfs2]
        ocfs2_set_acl+0x22d/0x260 [ocfs2]
        ocfs2_iop_set_acl+0x65/0xb0 [ocfs2]
        set_posix_acl+0x75/0xb0
        posix_acl_xattr_set+0x49/0xa0
        __vfs_setxattr+0x69/0x80
        __vfs_setxattr_noperm+0x72/0x1a0
        vfs_setxattr+0xa7/0xb0
        setxattr+0x12d/0x190
        path_setxattr+0x9f/0xb0
        SyS_setxattr+0x14/0x20
      
      Fix this one by using ocfs2_inode_{lock|unlock}_tracker, which is
      exported by commit 439a36b8 ("ocfs2/dlmglue: prepare tracking logic
      to avoid recursive cluster lock").
      
      Link: http://lkml.kernel.org/r/20170622014746.5815-1-zren@suse.com
      Fixes: 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
      Signed-off-by: default avatarEric Ren <zren@suse.com>
      Reported-by: default avatarThomas Voegtle <tv@lio96.de>
      Tested-by: default avatarThomas Voegtle <tv@lio96.de>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      855bf147
    • Mintz, Yuval's avatar
      qed: Warn PTT usage by wrong hw-function · 5f300509
      Mintz, Yuval authored
      [ Upstream commit 3a50d351 ]
      
      PTT entries are per-hwfn; If some errneous flow is trying
      to use a PTT belonging to a differnet hwfn warn user, as this
      can break every register accessing flow later and is very hard
      to root-cause.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5f300509
    • Hans de Goede's avatar
      iio: adc: Revert "axp288: Drop bogus AXP288_ADC_TS_PIN_CTRL register modifications" · d0136b8a
      Hans de Goede authored
      [ Upstream commit 631b010a ]
      
      Inheriting the ADC BIAS current settings from the BIOS instead of
      hardcoding then causes the AXP288 to disable charging (I think it
      mis-detects an overheated battery) on at least one model tablet.
      
      So lets go back to hard coding the values, this reverts
      commit fa2849e9 ("iio: adc: axp288: Drop bogus
      AXP288_ADC_TS_PIN_CTRL register modifications"), fixing charging not
      working on the model tablet in question.
      
      The exact cause is not fully understood, hence the revert to a known working
      state.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarUmberto Ixxo <sfumato1977@gmail.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d0136b8a
    • Dag Moxnes's avatar
      rds: ib: Fix missing call to rds_ib_dev_put in rds_ib_setup_qp · 271b4a84
      Dag Moxnes authored
      [ Upstream commit 91a82529 ]
      
      The function rds_ib_setup_qp is calling rds_ib_get_client_data and
      should correspondingly call rds_ib_dev_put. This call was lost in
      the non-error path with the introduction of error handling done in
      commit 3b12f73a ("rds: ib: add error handle")
      Signed-off-by: default avatarDag Moxnes <dag.moxnes@oracle.com>
      Reviewed-by: default avatarHåkon Bugge <haakon.bugge@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      271b4a84
    • Aditya Shankar's avatar
      staging: wilc1000: Fix problem with wrong vif index · c2c87f5b
      Aditya Shankar authored
      [ Upstream commit 0e490657 ]
      
      The vif->idx value is always 0 for two interfaces.
      
      wl->vif_num = 0;
      
      loop {
           ...
      
           vif->idx = wl->vif_num;
           ...
           wl->vif_num = i;
            ....
           i++;
           ...
      }
      
      At present, vif->idx is assigned the value of wl->vif_num
      at the beginning of this block and device is initialized
      based on this index value.
      In the next iteration, wl->vif_num is still 0 as it is only updated
      later but gets assigned to vif->idx in the beginning. This causes problems
      later when we try to reference a particular interface and also while
      configuring the firmware.
      
      This patch moves the assignment to vif->idx from the beginning
      of the block to after wl->vif_num is updated with latest value of i.
      
      Fixes: commit 735bb39c ("staging: wilc1000: simplify vif[i]->ndev accesses")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAditya Shankar <aditya.shankar@microchip.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c2c87f5b
    • Michael S. Tsirkin's avatar
      ptr_ring: fix up after recent ptr_ring changes · 7334e285
      Michael S. Tsirkin authored
      [ Upstream commit 5790eabc ]
      
      Add more stubs to make it build.
      
      Fixes: 81fbfe8a ("ptr_ring: use kmalloc_array()")
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7334e285
    • Andrzej Pietrasiewicz's avatar
      clk: samsung: Fix m2m scaler clock on Exynos542x · 503c3941
      Andrzej Pietrasiewicz authored
      [ Upstream commit c07c1a0f ]
      
      The TOP "aclk400_mscl" clock should be kept enabled all the time
      to allow proper access to power management control for MSC power
      domain and devices that are a part of it. This change is required
      for the scaler to work properly after domain power on/off sequence.
      
      Fixes: 318fa46c ("clk/samsung: exynos542x: mark some clocks as critical")
      Signed-off-by: default avatarAndrzej Pietrasiewicz <andrzej.p@samsung.com>
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarSylwester Nawrocki <s.nawrocki@samsung.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      503c3941
    • Vignesh R's avatar
      usb: dwc3: omap: remove IRQ_NOAUTOEN used with shared irq · a34dcf51
      Vignesh R authored
      [ Upstream commit ee249b45 ]
      
      IRQ_NOAUTOEN cannot be used with shared IRQs, since commit 04c848d3
      ("genirq: Warn when IRQ_NOAUTOEN is used with shared interrupts") and
      kernel now throws a warn dump. But OMAP DWC3 driver uses this flag. As
      per commit 12a7f17f ("usb: dwc3: omap: fix race of pm runtime with
      irq handler in probe") that introduced this flag, PM runtime can race
      with IRQ handler when deferred probing happens due to extcon,
      therefore IRQ_NOAUTOEN needs to be set so that irq is not enabled until
      extcon is registered.
      
      Remove setting of IRQ_NOAUTOEN and move the registration of
      shared irq to a point after dwc3_omap_extcon_register() and
      of_platform_populate(). This avoids possibility of probe deferring and
      above said race condition.
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarVignesh R <vigneshr@ti.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a34dcf51
    • Yoshihiro Shimoda's avatar
      usb: renesas_usbhs: gadget: fix unused-but-set-variable warning · 7bbba613
      Yoshihiro Shimoda authored
      [ Upstream commit b7d44c36 ]
      
      The commit b8b9c974 ("usb: renesas_usbhs: gadget: disable all eps
      when the driver stops") causes the unused-but-set-variable warning.
      But, if the usbhsg_ep_disable() will return non-zero value, udc/core.c
      doesn't clear the ep->enabled flag. So, this driver should not return
      non-zero value, if the pipe is zero because this means the pipe is
      already disabled. Otherwise, the ep->enabled flag is never cleared
      when the usbhsg_ep_disable() is called by the renesas_usbhs driver first.
      
      Fixes: b8b9c974 ("usb: renesas_usbhs: gadget: disable all eps when the driver stops")
      Fixes: 11432050 ("usb: renesas_usbhs: gadget: fix NULL pointer dereference in ep_disable()")
      Signed-off-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7bbba613
    • Yoshihiro Shimoda's avatar
      usb: renesas_usbhs: gadget: fix spin_lock_init() for &uep->lock · 3dd952b4
      Yoshihiro Shimoda authored
      [ Upstream commit 14a8d4bf ]
      
      This patch fixes an issue that the spin_lock_init() is not called
      for almost all pipes. Otherwise, the lockdep output the following
      message when we connect a usb cable using g_ncm:
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
      Reported-by: default avatarKazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
      Fixes: b8b9c974 ("usb: renesas_usbhs: gadget: disable all eps when the driver stops")
      Signed-off-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Tested-by: default avatarKazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3dd952b4
    • Moshe Shemesh's avatar
      net/mlx5: Fix health work queue spin lock to IRQ safe · 3dde6c97
      Moshe Shemesh authored
      [ Upstream commit 6377ed0b ]
      
      spin_lock/unlock of health->wq_lock should be IRQ safe.
      It was changed to spin_lock_irqsave since adding commit 0179720d
      ("net/mlx5: Introduce trigger_health_work function") which uses
      spin_lock from asynchronous event (IRQ) context.
      Thus, all spin_lock/unlock of health->wq_lock should have been moved
      to IRQ safe mode.
      However, one occurrence on new code using this lock missed that
      change, resulting in possible deadlock:
        kernel: Possible unsafe locking scenario:
        kernel:       CPU0
        kernel:       ----
        kernel:  lock(&(&health->wq_lock)->rlock);
        kernel:  <Interrupt>
        kernel:    lock(&(&health->wq_lock)->rlock);
        kernel: #012 *** DEADLOCK ***
      
      Fixes: 2a0165a0 ("net/mlx5: Cancel delayed recovery work when unloading the driver")
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3dde6c97
    • Björn Töpel's avatar
      perf probe: Fix probe definition for inlined functions · 2cdc70f3
      Björn Töpel authored
      [ Upstream commit 7598f8bc ]
      
      In commit 613f050d ("perf probe: Fix to probe on gcc generated
      functions in modules"), the offset from symbol is, incorrectly, added
      to the trace point address. This leads to incorrect probe trace points
      for inlined functions and when using relative line number on symbols.
      
      Prior this patch:
        $ perf probe -m nf_nat -D in_range
        p:probe/in_range nf_nat:in_range.isra.9+0
        $ perf probe -m i40e -D i40e_clean_rx_irq
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2212
        $ perf probe -m i40e -D i40e_clean_rx_irq:16
        p:probe/i40e_clean_rx_irq i40e:i40e_lan_xmit_frame+626
      
      After:
        $ perf probe -m nf_nat -D in_range
        p:probe/in_range nf_nat:in_range.isra.9+0
        $ perf probe -m i40e -D i40e_clean_rx_irq
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+1106
        $ perf probe -m i40e -D i40e_clean_rx_irq:16
        p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2665
      
      Committer testing:
      
      Using 'pfunct', a tool found in the 'dwarves' package [1], one can ask what are
      the functions that while not being explicitely marked as inline, were inlined
      by the compiler:
      
        # pfunct --cc_inlined /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko | head
        __ew32
        e1000_regdump
        e1000e_dump_ps_pages
        e1000_desc_unused
        e1000e_systim_to_hwtstamp
        e1000e_rx_hwtstamp
        e1000e_update_rdt_wa
        e1000e_update_tdt_wa
        e1000_put_txbuf
        e1000_consume_page
      
      Then ask 'perf probe' to produce the kprobe_tracer probe definitions for two of
      them:
      
        # perf probe -m e1000e -D e1000e_rx_hwtstamp
        p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+74
      
        # perf probe -m e1000e -D e1000_consume_page
        p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+876
        p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+1506
        p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
      
      Now lets concentrate on the 'e1000_consume_page' one, that was inlined twice in
      e1000_clean_jumbo_rx_irq(), lets see what readelf says about the DWARF tags for
      that function:
      
        $ readelf -wi /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
        <SNIP>
        <1><13e27b>: Abbrev Number: 121 (DW_TAG_subprogram)
          <13e27c>   DW_AT_name        : (indirect string, offset: 0xa8945): e1000_clean_jumbo_rx_irq
          <13e287>   DW_AT_low_pc      : 0x17a30
        <3><13e6ef>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13e6f0>   DW_AT_abstract_origin: <0x13ed2c>
          <13e6f4>   DW_AT_low_pc      : 0x17be6
        <SNIP>
        <1><13ed2c>: Abbrev Number: 142 (DW_TAG_subprogram)
           <13ed2e>   DW_AT_name        : (indirect string, offset: 0xa54c3): e1000_consume_page
      
      So, the first time in e1000_clean_jumbo_rx_irq() where e1000_consume_page() is
      inlined is at PC 0x17be6, which subtracted from e1000_clean_jumbo_rx_irq()'s
      address, gives us the offset we should use in the probe definition:
      
        0x17be6 - 0x17a30 = 438
      
      but above we have 876, which is twice as much.
      
      Lets see the second inline expansion of e1000_consume_page() in
      e1000_clean_jumbo_rx_irq():
      
        <3><13e86e>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13e86f>   DW_AT_abstract_origin: <0x13ed2c>
          <13e873>   DW_AT_low_pc      : 0x17d21
      
        0x17d21 - 0x17a30 = 753
      
      So we where adding it at twice the offset from the containing function as we
      should.
      
      And then after this patch:
      
        # perf probe -m e1000e -D e1000e_rx_hwtstamp
        p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+37
      
        # perf probe -m e1000e -D e1000_consume_page
        p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+438
        p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+753
        p:probe/e1000_consume_page_2 e1000e:e1000_clean_jumbo_rx_irq+1353
        #
      
      Which matches the two first expansions and shows that because we were
      doubling the offset it would spill over the next function:
      
        readelf -sw /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
         673: 0000000000017a30  1626 FUNC    LOCAL  DEFAULT    2 e1000_clean_jumbo_rx_irq
         674: 0000000000018090  2013 FUNC    LOCAL  DEFAULT    2 e1000_clean_rx_irq_ps
      
      This is the 3rd inline expansion of e1000_consume_page() in
      e1000_clean_jumbo_rx_irq():
      
         <3><13ec77>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
          <13ec78>   DW_AT_abstract_origin: <0x13ed2c>
          <13ec7c>   DW_AT_low_pc      : 0x17f79
      
        0x17f79 - 0x17a30 = 1353
      
       So:
      
         0x17a30 + 2 * 1353 = 0x184c2
      
        And:
      
         0x184c2 - 0x18090 = 1074
      
      Which explains the bogus third expansion for e1000_consume_page() to end up at:
      
         p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
      
      All fixed now :-)
      
      [1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 613f050d ("perf probe: Fix to probe on gcc generated functions in modules")
      Link: http://lkml.kernel.org/r/20170621164134.5701-1-bjorn.topel@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2cdc70f3
    • Arnaldo Carvalho de Melo's avatar
      perf evsel: Fix probing of precise_ip level for default cycles event · 4d19a505
      Arnaldo Carvalho de Melo authored
      [ Upstream commit 7a1ac110 ]
      
      Since commit 18e7a45a ("perf/x86: Reject non sampling events with
      precise_ip") returns -EINVAL for sys_perf_event_open() with an attribute
      with (attr.precise_ip > 0 && attr.sample_period == 0), just like is done
      in the routine used to probe the max precise level when no events were
      passed to 'perf record' or 'perf top', i.e.:
      
      	perf_evsel__new_cycles()
      		perf_event_attr__set_max_precise_ip()
      
      The x86 code, in x86_pmu_hw_config(), which is called all the way from
      sys_perf_event_open() did, starting with the aforementioned commit:
      
                      /* There's no sense in having PEBS for non sampling events: */
                      if (!is_sampling_event(event))
                              return -EINVAL;
      
      Which makes it fail for cycles:ppp, cycles:pp and cycles:p, always using
      just the non precise cycles variant.
      
      To make sure that this is the case, I tested it, before this patch,
      with:
      
        # perf probe -L x86_pmu_hw_config
        <x86_pmu_hw_config@/home/acme/git/linux/arch/x86/events/core.c:0>
              0  int x86_pmu_hw_config(struct perf_event *event)
              1  {
              2         if (event->attr.precise_ip) {
      <SNIP>
             17                 if (event->attr.precise_ip > precise)
             18                         return -EOPNOTSUPP;
      
                                /* There's no sense in having PEBS for non sampling events: */
             21                 if (!is_sampling_event(event))
             22                         return -EINVAL;
                        }
      <SNIP>
        # perf probe x86_pmu_hw_config:22
        Added new events:
          probe:x86_pmu_hw_config (on x86_pmu_hw_config:22)
          probe:x86_pmu_hw_config_1 (on x86_pmu_hw_config:22)
      
        You can now use it in all perf tools, such as:
      
              perf record -e probe:x86_pmu_hw_config_1 -aR sleep 1
      
        # perf trace -e perf_event_open,probe:x86_pmu_hwconfig*/max-stack=16/ perf record usleep 1
           0.000 ( 0.015 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.015 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.000 ( 0.021 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
           0.023 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.025 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.023 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
           0.028 ( 0.002 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8ba110, cpu: -1, group_fd: -1      ) ...
           0.030 (         ): probe:x86_pmu_hw_config:(ffffffff9c0065e1))
                                             x86_pmu_hw_config ([kernel.kallsyms])
                                             hsw_hw_config ([kernel.kallsyms])
                                             x86_pmu_event_init ([kernel.kallsyms])
                                             perf_try_init_event ([kernel.kallsyms])
                                             perf_event_alloc ([kernel.kallsyms])
                                             SYSC_perf_event_open ([kernel.kallsyms])
                                             sys_perf_event_open ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
                                             perf_evsel__new_cycles (/home/acme/bin/perf)
                                             perf_evlist__add_default (/home/acme/bin/perf)
                                             cmd_record (/home/acme/bin/perf)
                                             run_builtin (/home/acme/bin/perf)
                                             handle_internal_command (/home/acme/bin/perf)
           0.028 ( 0.004 ms): perf/4150  ... [continued]: perf_event_open()) = -1 EINVAL Invalid argument
          41.018 ( 0.012 ms): perf/4150 perf_event_open(attr_uptr: 0x7ffebc8b5dd0, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.065 ( 0.011 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.080 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c7db78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
          41.103 ( 0.010 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
          41.115 ( 0.006 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
          41.122 ( 0.004 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
          41.128 ( 0.008 ms): perf/4150 perf_event_open(attr_uptr: 0x3c4e748, pid: 4151 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]
        #
      
      I.e. that return -EINVAL in x86_pmu_hw_config() is hit three times.
      
      So fix it by just setting attr.sample_period
      
      Now, after this patch:
      
        # perf trace --max-stack=2 -e perf_event_open,probe:x86_pmu_hw_config* perf record usleep 1
        [ perf record: Woken up 1 times to write data ]
           0.000 ( 0.017 ms): perf/8469 perf_event_open(attr_uptr: 0x7ffe36c27d10, pid: -1, cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_open_cloexec_flag (/home/acme/bin/perf)
           0.050 ( 0.031 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evlist__config (/home/acme/bin/perf)
           0.092 ( 0.040 ms): perf/8469 perf_event_open(attr_uptr: 0x24ebb78, pid: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evlist__config (/home/acme/bin/perf)
           0.143 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, cpu: -1, group_fd: -1           ) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_event_attr__set_max_precise_ip (/home/acme/bin/perf)
           0.161 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), group_fd: -1, flags: FD_CLOEXEC) = 4
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.171 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 1, group_fd: -1, flags: FD_CLOEXEC) = 5
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.180 ( 0.007 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 2, group_fd: -1, flags: FD_CLOEXEC) = 6
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
           0.190 ( 0.005 ms): perf/8469 perf_event_open(attr_uptr: 0x24bc748, pid: 8470 (perf), cpu: 3, group_fd: -1, flags: FD_CLOEXEC) = 8
                                             syscall (/usr/lib64/libc-2.24.so)
                                             perf_evsel__open (/home/acme/bin/perf)
        [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
        #
      
      The probe one called from perf_event_attr__set_max_precise_ip() works
      the first time, with attr.precise_ip = 3, wit hthe next ones being the
      per cpu ones for the cycles:ppp event.
      
      And here is the text from a report and alternative proposed patch by
      Thomas-Mich Richter:
      
       ---
      
      On s390 the counter and sampling facility do not support a precise IP
      skid level and sometimes returns EOPNOTSUPP when structure member
      precise_ip in struct perf_event_attr is not set to zero.
      
      On s390 commnd 'perf record -- true' fails with error EOPNOTSUPP.  This
      happens only when no events are specified on command line.
      
      The functions called are
      ...
        --> perf_evlist__add_default
            --> perf_evsel__new_cycles
                --> perf_event_attr__set_max_precise_ip
      
      The last function determines the value of structure member precise_ip by
      invoking the perf_event_open() system call and checking the return code.
      The first successful open is the value for precise_ip.
      
      However the value is determined without setting member sample_period and
      indicates no sampling.
      
      On s390 the counter facility and sampling facility are different.  The
      above procedure determines a precise_ip value of 3 using the counter
      facility. Later it uses the sampling facility with a value of 3 and
      fails with EOPNOTSUPP.
      
       ---
      
      v2: Older compilers (e.g. gcc 4.4.7) don't support referencing members
          of unnamed union members in the container struct initialization, so
          move from:
      
      	struct perf_event_attr attr = {
      		...
      		.sample_period = 1,
      	};
      
      to right after it as:
      
      	struct perf_event_attr attr = {
      		...
      	};
      
      	attr.sample_period = 1;
      
      v3: We need to reset .sample_period to 0 to let the users of
      perf_evsel__new_cycles() to properly setup attr.sample_period or
      attr.sample_freq. Reported by Ingo Molnar.
      Reported-and-Acked-by: default avatarThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Acked-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Acked-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: 18e7a45a ("perf/x86: Reject non sampling events with precise_ip")
      Link: http://lkml.kernel.org/n/tip-yv6nnkl7tzqocrm0hl3x7vf1@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4d19a505