1. 03 Feb, 2018 11 commits
    • Aaron Armstrong Skomra's avatar
      HID: wacom: EKR: ensure devres groups at higher indexes are released · ddba3c67
      Aaron Armstrong Skomra authored
      commit 791ae273 upstream.
      
      Background: ExpressKey Remotes communicate their events via usb dongle.
      Each dongle can hold up to 5 pairings at one time and one EKR (identified
      by its serial number) can unfortunately be paired with its dongle
      more than once. The pairing takes place in a round-robin fashion.
      
      Input devices are only created once per EKR, when a new serial number
      is seen in the list of pairings. However, if a device is created for
      a "higher" paring index and subsequently a second pairing occurs at a
      lower pairing index, unpairing the remote with that serial number from
      any pairing index will currently cause a driver crash. This occurs
      infrequently, as two remotes are necessary to trigger this bug and most
      users have only one remote.
      
      As an illustration, to trigger the bug you need to have two remotes,
      and pair them in this order:
      
      1. slot 0 -> remote 1 (input device created for remote 1)
      2. slot 1 -> remote 1 (duplicate pairing - no device created)
      3. slot 2 -> remote 1 (duplicate pairing - no device created)
      4. slot 3 -> remote 1 (duplicate pairing - no device created)
      5. slot 4 -> remote 2 (input device created for remote 2)
      
      6. slot 0 -> remote 2 (1 destroyed and recreated at slot 1)
      7. slot 1 -> remote 2 (1 destroyed and recreated at slot 2)
      8. slot 2 -> remote 2 (1 destroyed and recreated at slot 3)
      9. slot 3 -> remote 2 (1 destroyed and not recreated)
      10. slot 4 -> remote 2 (2 was already in this slot so no changes)
      
      11. slot 0 -> remote 1 (The current code sees remote 2 was paired over in
                              one of the dongle slots it occupied and attempts
                              to remove all information about remote 2 [1]. It
                              calls wacom_remote_destroy_one for remote 2, but
                              the destroy function assumes the lowest index is
                              where the remote's input device was created. The
                              code "cleans up" the other remote 2 pairings
                              including the one which the input device was based
                              on, assuming they were were just duplicate
                              pairings. However, the cleanup doesn't call the
                              devres release function for the input device that
                              was created in slot 4).
      
      This issue is fixed by this commit.
      
      [1] Remote 2 should subsequently be re-created on the next packet from the
      EKR at the lowest numbered slot that it occupies (here slot 1).
      
      Fixes: f9036bd4 ("HID: wacom: EKR: use devres groups to manage resources")
      Signed-off-by: default avatarAaron Armstrong Skomra <aaron.skomra@wacom.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ddba3c67
    • Stephan Mueller's avatar
      crypto: af_alg - whitelist mask and type · b7edc45f
      Stephan Mueller authored
      commit bb30b884 upstream.
      
      The user space interface allows specifying the type and mask field used
      to allocate the cipher. Only a subset of the possible flags are intended
      for user space. Therefore, white-list the allowed flags.
      
      In case the user space caller uses at least one non-allowed flag, EINVAL
      is returned.
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarStephan Mueller <smueller@chronox.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7edc45f
    • Ard Biesheuvel's avatar
      crypto: sha3-generic - fixes for alignment and big endian operation · 1ce8e52f
      Ard Biesheuvel authored
      commit c013cee9 upstream.
      
      Ensure that the input is byte swabbed before injecting it into the
      SHA3 transform. Use the get_unaligned() accessor for this so that
      we don't perform unaligned access inadvertently on architectures
      that do not support that.
      
      Fixes: 53964b9e ("crypto: sha3 - Add SHA-3 hash algorithm")
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ce8e52f
    • Stephan Mueller's avatar
      crypto: aesni - handle zero length dst buffer · 95259cb0
      Stephan Mueller authored
      commit 9c674e1e upstream.
      
      GCM can be invoked with a zero destination buffer. This is possible if
      the AAD and the ciphertext have zero lengths and only the tag exists in
      the source buffer (i.e. a source buffer cannot be zero). In this case,
      the GCM cipher only performs the authentication and no decryption
      operation.
      
      When the destination buffer has zero length, it is possible that no page
      is mapped to the SG pointing to the destination. In this case,
      sg_page(req->dst) is an invalid access. Therefore, page accesses should
      only be allowed if the req->dst->length is non-zero which is the
      indicator that a page must exist.
      
      This fixes a crash that can be triggered by user space via AF_ALG.
      Signed-off-by: default avatarStephan Mueller <smueller@chronox.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95259cb0
    • Hauke Mehrtens's avatar
      crypto: ecdh - fix typo in KPP dependency of CRYPTO_ECDH · f1803207
      Hauke Mehrtens authored
      commit b5b90077 upstream.
      
      This fixes a typo in the CRYPTO_KPP dependency of CRYPTO_ECDH.
      
      Fixes: 3c4b2390 ("crypto: ecdh - Add ECDH software support")
      Signed-off-by: default avatarHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f1803207
    • Linus Walleij's avatar
      gpio: Fix kernel stack leak to userspace · cc1fa4a7
      Linus Walleij authored
      commit 24bd3efc upstream.
      
      The GPIO event descriptor was leaking kernel stack to
      userspace because we don't zero the variable before
      use. Ooops. Fix this.
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarBartosz Golaszewski <brgl@bgdev.pl>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc1fa4a7
    • Patrice Chotard's avatar
      gpio: stmpe: i2c transfer are forbiden in atomic context · 241c04f7
      Patrice Chotard authored
      commit b888fb6f upstream.
      
      Move the workaround from stmpe_gpio_irq_unmask() which is executed
      in atomic context to stmpe_gpio_irq_sync_unlock() which is not.
      
      It fixes the following issue:
      
      [    1.500000] BUG: scheduling while atomic: swapper/1/0x00000002
      [    1.500000] CPU: 0 PID: 1 Comm: swapper Not tainted 4.15.0-rc2-00020-gbd4301f-dirty #28
      [    1.520000] Hardware name: STM32 (Device Tree Support)
      [    1.520000] [<0000bfc9>] (unwind_backtrace) from [<0000b347>] (show_stack+0xb/0xc)
      [    1.530000] [<0000b347>] (show_stack) from [<0001fc49>] (__schedule_bug+0x39/0x58)
      [    1.530000] [<0001fc49>] (__schedule_bug) from [<00168211>] (__schedule+0x23/0x2b2)
      [    1.550000] [<00168211>] (__schedule) from [<001684f7>] (schedule+0x57/0x64)
      [    1.550000] [<001684f7>] (schedule) from [<0016a513>] (schedule_timeout+0x137/0x164)
      [    1.550000] [<0016a513>] (schedule_timeout) from [<00168b91>] (wait_for_common+0x8d/0xfc)
      [    1.570000] [<00168b91>] (wait_for_common) from [<00139753>] (stm32f4_i2c_xfer+0xe9/0xfe)
      [    1.580000] [<00139753>] (stm32f4_i2c_xfer) from [<00138545>] (__i2c_transfer+0x111/0x148)
      [    1.590000] [<00138545>] (__i2c_transfer) from [<001385cf>] (i2c_transfer+0x53/0x70)
      [    1.590000] [<001385cf>] (i2c_transfer) from [<001388a5>] (i2c_smbus_xfer+0x12f/0x36e)
      [    1.600000] [<001388a5>] (i2c_smbus_xfer) from [<00138b49>] (i2c_smbus_read_byte_data+0x1f/0x2a)
      [    1.610000] [<00138b49>] (i2c_smbus_read_byte_data) from [<00124fdd>] (__stmpe_reg_read+0xd/0x24)
      [    1.620000] [<00124fdd>] (__stmpe_reg_read) from [<001252b3>] (stmpe_reg_read+0x19/0x24)
      [    1.630000] [<001252b3>] (stmpe_reg_read) from [<0002c4d1>] (unmask_irq+0x17/0x22)
      [    1.640000] [<0002c4d1>] (unmask_irq) from [<0002c57f>] (irq_startup+0x6f/0x78)
      [    1.650000] [<0002c57f>] (irq_startup) from [<0002b7a1>] (__setup_irq+0x319/0x47c)
      [    1.650000] [<0002b7a1>] (__setup_irq) from [<0002bad3>] (request_threaded_irq+0x6b/0xe8)
      [    1.660000] [<0002bad3>] (request_threaded_irq) from [<0002d0b9>] (devm_request_threaded_irq+0x3b/0x6a)
      [    1.670000] [<0002d0b9>] (devm_request_threaded_irq) from [<001446e7>] (mmc_gpiod_request_cd_irq+0x49/0x8a)
      [    1.680000] [<001446e7>] (mmc_gpiod_request_cd_irq) from [<0013d45d>] (mmc_start_host+0x49/0x60)
      [    1.690000] [<0013d45d>] (mmc_start_host) from [<0013e40b>] (mmc_add_host+0x3b/0x54)
      [    1.700000] [<0013e40b>] (mmc_add_host) from [<00148119>] (mmci_probe+0x4d1/0x60c)
      [    1.710000] [<00148119>] (mmci_probe) from [<000f903b>] (amba_probe+0x7b/0xbe)
      [    1.720000] [<000f903b>] (amba_probe) from [<001170e5>] (driver_probe_device+0x169/0x1f8)
      [    1.730000] [<001170e5>] (driver_probe_device) from [<001171b7>] (__driver_attach+0x43/0x5c)
      [    1.740000] [<001171b7>] (__driver_attach) from [<0011618d>] (bus_for_each_dev+0x3d/0x46)
      [    1.740000] [<0011618d>] (bus_for_each_dev) from [<001165cd>] (bus_add_driver+0xcd/0x124)
      [    1.740000] [<001165cd>] (bus_add_driver) from [<00117713>] (driver_register+0x4d/0x7a)
      [    1.760000] [<00117713>] (driver_register) from [<001fc765>] (do_one_initcall+0xbd/0xe8)
      [    1.770000] [<001fc765>] (do_one_initcall) from [<001fc88b>] (kernel_init_freeable+0xfb/0x134)
      [    1.780000] [<001fc88b>] (kernel_init_freeable) from [<00167ee3>] (kernel_init+0x7/0x9c)
      [    1.790000] [<00167ee3>] (kernel_init) from [<00009b65>] (ret_from_fork+0x11/0x2c)
      Signed-off-by: default avatarAlexandre TORGUE <alexandre.torgue@st.com>
      Signed-off-by: default avatarPatrice Chotard <patrice.chotard@st.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      241c04f7
    • Joel Stanley's avatar
      tools/gpio: Fix build error with musl libc · efe3f94f
      Joel Stanley authored
      commit 1696784e upstream.
      
      The GPIO tools build fails when using a buildroot toolchain that uses musl
      as it's C library:
      
      arm-broomstick-linux-musleabi-gcc -Wp,-MD,./.gpio-event-mon.o.d \
       -Wp,-MT,gpio-event-mon.o -O2 -Wall -g -D_GNU_SOURCE \
       -Iinclude -D"BUILD_STR(s)=#s" -c -o gpio-event-mon.o gpio-event-mon.c
      gpio-event-mon.c:30:6: error: unknown type name ‘u_int32_t’; did you mean ‘uint32_t’?
            u_int32_t handleflags,
            ^~~~~~~~~
            uint32_t
      
      The glibc headers installed on my laptop include sys/types.h in
      unistd.h, but it appears that musl does not.
      
      Fixes: 97f69747 ("tools/gpio: add the gpio-event-mon tool")
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efe3f94f
    • Max Gurtovoy's avatar
      RDMA/mlx5: set UMR wqe fence according to HCA cap · 2a7076e7
      Max Gurtovoy authored
      commit 6e8484c5 upstream.
      
      Cache the needed umr_fence and set the wqe ctrl segmennt
      accordingly.
      Signed-off-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Acked-by: default avatarLeon Romanovsky <leon@kernel.org>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Cc: Marta Rybczynska <mrybczyn@kalray.eu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a7076e7
    • Max Gurtovoy's avatar
      net/mlx5: Define interface bits for fencing UMR wqe · 20e6f5bd
      Max Gurtovoy authored
      commit 1410a90a upstream.
      
      HW can implement UMR wqe re-transmission in various ways.
      Thus, add HCA cap to distinguish the needed fence for UMR to make
      sure that the wqe wouldn't fail on mkey checks.
      Signed-off-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Acked-by: default avatarLeon Romanovsky <leon@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Cc: Marta Rybczynska <mrybczyn@kalray.eu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20e6f5bd
    • Linus Torvalds's avatar
      loop: fix concurrent lo_open/lo_release · 56bc0863
      Linus Torvalds authored
      commit ae665016 upstream.
      
      范龙飞 reports that KASAN can report a use-after-free in __lock_acquire.
      The reason is due to insufficient serialization in lo_release(), which
      will continue to use the loop device even after it has decremented the
      lo_refcnt to zero.
      
      In the meantime, another process can come in, open the loop device
      again as it is being shut down. Confusion ensues.
      Reported-by: default avatar范龙飞 <long7573@126.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56bc0863
  2. 31 Jan, 2018 29 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.79 · 6c6f924f
      Greg Kroah-Hartman authored
      6c6f924f
    • Ben Hutchings's avatar
      nfsd: auth: Fix gid sorting when rootsquash enabled · f12d0602
      Ben Hutchings authored
      commit 19952667 upstream.
      
      Commit bdcf0a42 ("kernel: make groups_sort calling a responsibility
      group_info allocators") appears to break nfsd rootsquash in a pretty
      major way.
      
      It adds a call to groups_sort() inside the loop that copies/squashes
      gids, which means the valid gids are sorted along with the following
      garbage.  The net result is that the highest numbered valid gids are
      replaced with any lower-valued garbage gids, possibly including 0.
      
      We should sort only once, after filling in all the gids.
      
      Fixes: bdcf0a42 ("kernel: make groups_sort calling a responsibility ...")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Acked-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Wolfgang Walter <linux@stwm.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f12d0602
    • Daniel Borkmann's avatar
      bpf: reject stores into ctx via st and xadd · f531fbb0
      Daniel Borkmann authored
      [ upstream commit f37a8cb8 ]
      
      Alexei found that verifier does not reject stores into context
      via BPF_ST instead of BPF_STX. And while looking at it, we
      also should not allow XADD variant of BPF_STX.
      
      The context rewriter is only assuming either BPF_LDX_MEM- or
      BPF_STX_MEM-type operations, thus reject anything other than
      that so that assumptions in the rewriter properly hold. Add
      test cases as well for BPF selftests.
      
      Fixes: d691f9e8 ("bpf: allow programs to write to certain skb fields")
      Reported-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f531fbb0
    • Alexei Starovoitov's avatar
      bpf: fix 32-bit divide by zero · 265d7657
      Alexei Starovoitov authored
      [ upstream commit 68fda450 ]
      
      due to some JITs doing if (src_reg == 0) check in 64-bit mode
      for div/mod operations mask upper 32-bits of src register
      before doing the check
      
      Fixes: 62258278 ("net: filter: x86: internal BPF JIT")
      Fixes: 7a12b503 ("sparc64: Add eBPF JIT.")
      Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      265d7657
    • Eric Dumazet's avatar
      bpf: fix divides by zero · 46060778
      Eric Dumazet authored
      [ upstream commit c366287e ]
      
      Divides by zero are not nice, lets avoid them if possible.
      
      Also do_div() seems not needed when dealing with 32bit operands,
      but this seems a minor detail.
      
      Fixes: bd4cf0ed ("net: filter: rework/optimize internal BPF interpreter's instruction set")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46060778
    • Daniel Borkmann's avatar
      bpf: avoid false sharing of map refcount with max_entries · 5cb917aa
      Daniel Borkmann authored
      [ upstream commit be95a845 ]
      
      In addition to commit b2157399 ("bpf: prevent out-of-bounds
      speculation") also change the layout of struct bpf_map such that
      false sharing of fast-path members like max_entries is avoided
      when the maps reference counter is altered. Therefore enforce
      them to be placed into separate cachelines.
      
      pahole dump after change:
      
        struct bpf_map {
              const struct bpf_map_ops  * ops;                 /*     0     8 */
              struct bpf_map *           inner_map_meta;       /*     8     8 */
              void *                     security;             /*    16     8 */
              enum bpf_map_type          map_type;             /*    24     4 */
              u32                        key_size;             /*    28     4 */
              u32                        value_size;           /*    32     4 */
              u32                        max_entries;          /*    36     4 */
              u32                        map_flags;            /*    40     4 */
              u32                        pages;                /*    44     4 */
              u32                        id;                   /*    48     4 */
              int                        numa_node;            /*    52     4 */
              bool                       unpriv_array;         /*    56     1 */
      
              /* XXX 7 bytes hole, try to pack */
      
              /* --- cacheline 1 boundary (64 bytes) --- */
              struct user_struct *       user;                 /*    64     8 */
              atomic_t                   refcnt;               /*    72     4 */
              atomic_t                   usercnt;              /*    76     4 */
              struct work_struct         work;                 /*    80    32 */
              char                       name[16];             /*   112    16 */
              /* --- cacheline 2 boundary (128 bytes) --- */
      
              /* size: 128, cachelines: 2, members: 17 */
              /* sum members: 121, holes: 1, sum holes: 7 */
        };
      
      Now all entries in the first cacheline are read only throughout
      the life time of the map, set up once during map creation. Overall
      struct size and number of cachelines doesn't change from the
      reordering. struct bpf_map is usually first member and embedded
      in map structs in specific map implementations, so also avoid those
      members to sit at the end where it could potentially share the
      cacheline with first map values e.g. in the array since remote
      CPUs could trigger map updates just as well for those (easily
      dirtying members like max_entries intentionally as well) while
      having subsequent values in cache.
      
      Quoting from Google's Project Zero blog [1]:
      
        Additionally, at least on the Intel machine on which this was
        tested, bouncing modified cache lines between cores is slow,
        apparently because the MESI protocol is used for cache coherence
        [8]. Changing the reference counter of an eBPF array on one
        physical CPU core causes the cache line containing the reference
        counter to be bounced over to that CPU core, making reads of the
        reference counter on all other CPU cores slow until the changed
        reference counter has been written back to memory. Because the
        length and the reference counter of an eBPF array are stored in
        the same cache line, this also means that changing the reference
        counter on one physical CPU core causes reads of the eBPF array's
        length to be slow on other physical CPU cores (intentional false
        sharing).
      
      While this doesn't 'control' the out-of-bounds speculation through
      masking the index as in commit b2157399, triggering a manipulation
      of the map's reference counter is really trivial, so lets not allow
      to easily affect max_entries from it.
      
      Splitting to separate cachelines also generally makes sense from
      a performance perspective anyway in that fast-path won't have a
      cache miss if the map gets pinned, reused in other progs, etc out
      of control path, thus also avoids unintentional false sharing.
      
        [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.htmlSigned-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5cb917aa
    • Daniel Borkmann's avatar
      bpf: arsh is not supported in 32 bit alu thus reject it · fcabc6d0
      Daniel Borkmann authored
      [ upstream commit 7891a87e ]
      
      The following snippet was throwing an 'unknown opcode cc' warning
      in BPF interpreter:
      
        0: (18) r0 = 0x0
        2: (7b) *(u64 *)(r10 -16) = r0
        3: (cc) (u32) r0 s>>= (u32) r0
        4: (95) exit
      
      Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
      generation, not all of them do and interpreter does neither. We can
      leave existing ones and implement it later in bpf-next for the
      remaining ones, but reject this properly in verifier for the time
      being.
      
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fcabc6d0
    • Alexei Starovoitov's avatar
      bpf: introduce BPF_JIT_ALWAYS_ON config · a3d6dd6a
      Alexei Starovoitov authored
      [ upstream commit 290af866 ]
      
      The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
      
      A quote from goolge project zero blog:
      "At this point, it would normally be necessary to locate gadgets in
      the host kernel code that can be used to actually leak data by reading
      from an attacker-controlled location, shifting and masking the result
      appropriately and then using the result of that as offset to an
      attacker-controlled address for a load. But piecing gadgets together
      and figuring out which ones work in a speculation context seems annoying.
      So instead, we decided to use the eBPF interpreter, which is built into
      the host kernel - while there is no legitimate way to invoke it from inside
      a VM, the presence of the code in the host kernel's text section is sufficient
      to make it usable for the attack, just like with ordinary ROP gadgets."
      
      To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
      option that removes interpreter from the kernel in favor of JIT-only mode.
      So far eBPF JIT is supported by:
      x64, arm64, arm32, sparc64, s390, powerpc64, mips64
      
      The start of JITed program is randomized and code page is marked as read-only.
      In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
      
      v2->v3:
      - move __bpf_prog_ret0 under ifdef (Daniel)
      
      v1->v2:
      - fix init order, test_bpf and cBPF (Daniel's feedback)
      - fix offloaded bpf (Jakub's feedback)
      - add 'return 0' dummy in case something can invoke prog->bpf_func
      - retarget bpf tree. For bpf-next the patch would need one extra hunk.
        It will be sent when the trees are merged back to net-next
      
      Considered doing:
        int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
      but it seems better to land the patch as-is and in bpf-next remove
      bpf_jit_enable global variable from all JITs, consolidate in one place
      and remove this jit_init() function.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3d6dd6a
    • Alexei Starovoitov's avatar
      bpf: fix bpf_tail_call() x64 JIT · 5226bb3b
      Alexei Starovoitov authored
      [ upstream commit 90caccdd ]
      
      - bpf prog_array just like all other types of bpf array accepts 32-bit index.
        Clarify that in the comment.
      - fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes
      - tighten corresponding check in the interpreter to stay consistent
      
      The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag
      in commit 96eabe7a in 4.14. Before that the map_flags would stay zero and
      though JIT code is wrong it will check bounds correctly.
      Hence two fixes tags. All other JITs don't have this problem.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Fixes: 96eabe7a ("bpf: Allow selecting numa node during map creation")
      Fixes: b52f00e6 ("x86: bpf_jit: implement bpf_tail_call() helper")
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5226bb3b
    • Eric Dumazet's avatar
      x86: bpf_jit: small optimization in emit_bpf_tail_call() · c964ad34
      Eric Dumazet authored
      [ upstream commit 84ccac6e ]
      
      Saves 4 bytes replacing following instructions :
      
      lea rax, [rsi + rdx * 8 + offsetof(...)]
      mov rax, qword ptr [rax]
      cmp rax, 0
      
      by :
      
      mov rax, [rsi + rdx * 8 + offsetof(...)]
      test rax, rax
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c964ad34
    • Thomas Gleixner's avatar
      hrtimer: Reset hrtimer cpu base proper on CPU hotplug · c98ff729
      Thomas Gleixner authored
      commit d5421ea4 upstream.
      
      The hrtimer interrupt code contains a hang detection and mitigation
      mechanism, which prevents that a long delayed hrtimer interrupt causes a
      continous retriggering of interrupts which prevent the system from making
      progress. If a hang is detected then the timer hardware is programmed with
      a certain delay into the future and a flag is set in the hrtimer cpu base
      which prevents newly enqueued timers from reprogramming the timer hardware
      prior to the chosen delay. The subsequent hrtimer interrupt after the delay
      clears the flag and resumes normal operation.
      
      If such a hang happens in the last hrtimer interrupt before a CPU is
      unplugged then the hang_detected flag is set and stays that way when the
      CPU is plugged in again. At that point the timer hardware is not armed and
      it cannot be armed because the hang_detected flag is still active, so
      nothing clears that flag. As a consequence the CPU does not receive hrtimer
      interrupts and no timers expire on that CPU which results in RCU stalls and
      other malfunctions.
      
      Clear the flag along with some other less critical members of the hrtimer
      cpu base to ensure starting from a clean state when a CPU is plugged in.
      
      Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
      root cause of that hard to reproduce heisenbug. Once understood it's
      trivial and certainly justifies a brown paperbag.
      
      Fixes: 41d2e494 ("hrtimer: Tune hrtimer_interrupt hang logic")
      Reported-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Sewior <bigeasy@linutronix.de>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanosSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c98ff729
    • Jia Zhang's avatar
      x86/microcode/intel: Extend BDW late-loading further with LLC size check · 9f3a6cad
      Jia Zhang authored
      commit 7e702d17 upstream.
      
      Commit b94b7373 ("x86/microcode/intel: Extend BDW late-loading with a
      revision check") reduced the impact of erratum BDF90 for Broadwell model
      79.
      
      The impact can be reduced further by checking the size of the last level
      cache portion per core.
      
      Tony: "The erratum says the problem only occurs on the large-cache SKUs.
      So we only need to avoid the update if we are on a big cache SKU that is
      also running old microcode."
      
      For more details, see erratum BDF90 in document #334165 (Intel Xeon
      Processor E7-8800/4800 v4 Product Family Specification Update) from
      September 2017.
      
      Fixes: b94b7373 ("x86/microcode/intel: Extend BDW late-loading with a revision check")
      Signed-off-by: default avatarJia Zhang <zhang.jia@linux.alibaba.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Link: https://lkml.kernel.org/r/1516321542-31161-1-git-send-email-zhang.jia@linux.alibaba.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f3a6cad
    • Xiao Liang's avatar
      perf/x86/amd/power: Do not load AMD power module on !AMD platforms · dc1932c6
      Xiao Liang authored
      commit 40d4071c upstream.
      
      The AMD power module can be loaded on non AMD platforms, but unload fails
      with the following Oops:
      
       BUG: unable to handle kernel NULL pointer dereference at           (null)
       IP: __list_del_entry_valid+0x29/0x90
       Call Trace:
        perf_pmu_unregister+0x25/0xf0
        amd_power_pmu_exit+0x1c/0xd23 [power]
        SyS_delete_module+0x1a8/0x2b0
        ? exit_to_usermode_loop+0x8f/0xb0
        entry_SYSCALL_64_fastpath+0x20/0x83
      
      Return -ENODEV instead of 0 from the module init function if the CPU does
      not match.
      
      Fixes: c7ab62bf ("perf/x86/amd/power: Add AMD accumulated power reporting mechanism")
      Signed-off-by: default avatarXiao Liang <xiliang@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20180122061252.6394-1-xiliang@redhat.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc1932c6
    • Eric Dumazet's avatar
      flow_dissector: properly cap thoff field · eecfa2ee
      Eric Dumazet authored
      
      [ Upstream commit d0c081b4 ]
      
      syzbot reported yet another crash [1] that is caused by
      insufficient validation of DODGY packets.
      
      Two bugs are happening here to trigger the crash.
      
      1) Flow dissection leaves with incorrect thoff field.
      
      2) skb_probe_transport_header() sets transport header to this invalid
      thoff, even if pointing after skb valid data.
      
      3) qdisc_pkt_len_init() reads out-of-bound data because it
      trusts tcp_hdrlen(skb)
      
      Possible fixes :
      
      - Full flow dissector validation before injecting bad DODGY packets in
      the stack.
       This approach was attempted here : https://patchwork.ozlabs.org/patch/
      861874/
      
      - Have more robust functions in the core.
        This might be needed anyway for stable versions.
      
      This patch fixes the flow dissection issue.
      
      [1]
      CPU: 1 PID: 3144 Comm: syzkaller271204 Not tainted 4.15.0-rc4-mm1+ #49
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:355 [inline]
       kasan_report+0x23b/0x360 mm/kasan/report.c:413
       __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:432
       __tcp_hdrlen include/linux/tcp.h:35 [inline]
       tcp_hdrlen include/linux/tcp.h:40 [inline]
       qdisc_pkt_len_init net/core/dev.c:3160 [inline]
       __dev_queue_xmit+0x20d3/0x2200 net/core/dev.c:3465
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3554
       packet_snd net/packet/af_packet.c:2943 [inline]
       packet_sendmsg+0x3ad5/0x60a0 net/packet/af_packet.c:2968
       sock_sendmsg_nosec net/socket.c:628 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:638
       sock_write_iter+0x31a/0x5d0 net/socket.c:907
       call_write_iter include/linux/fs.h:1776 [inline]
       new_sync_write fs/read_write.c:469 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:482
       vfs_write+0x189/0x510 fs/read_write.c:544
       SYSC_write fs/read_write.c:589 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:581
       entry_SYSCALL_64_fastpath+0x1f/0x96
      
      Fixes: 34fad54c ("net: __skb_flow_dissect() must cap its return value")
      Fixes: a6e544b0 ("flow_dissector: Jump to exit code in __skb_flow_dissect")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eecfa2ee
    • Cong Wang's avatar
      tun: fix a memory leak for tfile->tx_array · 18717ee2
      Cong Wang authored
      
      [ Upstream commit 4df0bfc7 ]
      
      tfile->tun could be detached before we close the tun fd,
      via tun_detach_all(), so it should not be used to check for
      tfile->tx_array.
      
      As Jason suggested, we probably have to clean it up
      unconditionally both in __tun_deatch() and tun_detach_all(),
      but this requires to check if it is initialized or not.
      Currently skb_array_cleanup() doesn't have such a check,
      so I check it in the caller and introduce a helper function,
      it is a bit ugly but we can always improve it in net-next.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: 1576d986 ("tun: switch to use skb array for tx")
      Cc: Jason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18717ee2
    • Yuval Mintz's avatar
      mlxsw: spectrum_router: Don't log an error on missing neighbor · 1105145c
      Yuval Mintz authored
      
      [ Upstream commit 1ecdaea0 ]
      
      Driver periodically samples all neighbors configured in device
      in order to update the kernel regarding their state. When finding
      an entry configured in HW that doesn't show in neigh_lookup()
      driver logs an error message.
      This introduces a race when removing multiple neighbors -
      it's possible that a given entry would still be configured in HW
      as its removal is still being processed but is already removed
      from the kernel's neighbor tables.
      
      Simply remove the error message and gracefully accept such events.
      
      Fixes: c723c735 ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
      Fixes: 60f040ca ("mlxsw: spectrum_router: Periodically dump active IPv6 neighbours")
      Signed-off-by: default avatarYuval Mintz <yuvalm@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1105145c
    • Willem de Bruijn's avatar
      gso: validate gso_type in GSO handlers · 3110e213
      Willem de Bruijn authored
      
      [ Upstream commit 121d57af ]
      
      Validate gso_type during segmentation as SKB_GSO_DODGY sources
      may pass packets where the gso_type does not match the contents.
      
      Syzkaller was able to enter the SCTP gso handler with a packet of
      gso_type SKB_GSO_TCPV4.
      
      On entry of transport layer gso handlers, verify that the gso_type
      matches the transport protocol.
      
      Fixes: 90017acc ("sctp: Add GSO support")
      Link: http://lkml.kernel.org/r/<001a1137452496ffc305617e5fe0@google.com>
      Reported-by: syzbot+fee64147a25aecd48055@syzkaller.appspotmail.com
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3110e213
    • Alexey Kodanev's avatar
      ip6_gre: init dev->mtu and dev->hard_header_len correctly · cc99c6d5
      Alexey Kodanev authored
      
      [ Upstream commit 128bb975 ]
      
      Commit b05229f4 ("gre6: Cleanup GREv6 transmit path,
      call common GRE functions") moved dev->mtu initialization
      from ip6gre_tunnel_setup() to ip6gre_tunnel_init(), as a
      result, the previously set values, before ndo_init(), are
      reset in the following cases:
      
      * rtnl_create_link() can update dev->mtu from IFLA_MTU
        parameter.
      
      * ip6gre_tnl_link_config() is invoked before ndo_init() in
        netlink and ioctl setup, so ndo_init() can reset MTU
        adjustments with the lower device MTU as well, dev->mtu
        and dev->hard_header_len.
      
        Not applicable for ip6gretap because it has one more call
        to ip6gre_tnl_link_config(tunnel, 1) in ip6gre_tap_init().
      
      Fix the first case by updating dev->mtu with 'tb[IFLA_MTU]'
      parameter if a user sets it manually on a device creation,
      and fix the second one by moving ip6gre_tnl_link_config()
      call after register_netdevice().
      
      Fixes: b05229f4 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
      Fixes: db2ec95d ("ip6_gre: Fix MTU setting")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc99c6d5
    • Ivan Vecera's avatar
      be2net: restore properly promisc mode after queues reconfiguration · 1711ba16
      Ivan Vecera authored
      
      [ Upstream commit 52acf064 ]
      
      The commit 62219066 ("be2net: Request RSS capability of Rx interface
      depending on number of Rx rings") modified be_update_queues() so the
      IFACE (HW representation of the netdevice) is destroyed and then
      re-created. This causes a regression because potential promiscuous mode
      is not restored properly during be_open() because the driver thinks
      that the HW has promiscuous mode already enabled.
      
      Note that Lancer is not affected by this bug because RX-filter flags are
      disabled during be_close() for this chipset.
      
      Cc: Sathya Perla <sathya.perla@broadcom.com>
      Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
      Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Cc: Somnath Kotur <somnath.kotur@broadcom.com>
      
      Fixes: 62219066 ("be2net: Request RSS capability of Rx interface depending on number of Rx rings")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1711ba16
    • Guillaume Nault's avatar
      ppp: unlock all_ppp_mutex before registering device · 00f9e47c
      Guillaume Nault authored
      
      [ Upstream commit 0171c418 ]
      
      ppp_dev_uninit(), which is the .ndo_uninit() handler of PPP devices,
      needs to lock pn->all_ppp_mutex. Therefore we mustn't call
      register_netdevice() with pn->all_ppp_mutex already locked, or we'd
      deadlock in case register_netdevice() fails and calls .ndo_uninit().
      
      Fortunately, we can unlock pn->all_ppp_mutex before calling
      register_netdevice(). This lock protects pn->units_idr, which isn't
      used in the device registration process.
      
      However, keeping pn->all_ppp_mutex locked during device registration
      did ensure that no device in transient state would be published in
      pn->units_idr. In practice, unlocking it before calling
      register_netdevice() doesn't change this property: ppp_unit_register()
      is called with 'ppp_mutex' locked and all searches done in
      pn->units_idr hold this lock too.
      
      Fixes: 8cb775bc ("ppp: fix device unregistration upon netns deletion")
      Reported-and-tested-by: syzbot+367889b9c9e279219175@syzkaller.appspotmail.com
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00f9e47c
    • Jim Westfall's avatar
      ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY · 260eb694
      Jim Westfall authored
      
      [ Upstream commit cd9ff4de ]
      
      Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
      to avoid making an entry for every remote ip the device needs to talk to.
      
      This used the be the old behavior but became broken in a263b309
      (ipv4: Make neigh lookups directly in output packet path) and later removed
      in 0bb4087c (ipv4: Fix neigh lookup keying over loopback/point-to-point
      devices) because it was broken.
      Signed-off-by: default avatarJim Westfall <jwestfall@surrealistic.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      260eb694
    • Jim Westfall's avatar
      net: Allow neigh contructor functions ability to modify the primary_key · 014510b1
      Jim Westfall authored
      
      [ Upstream commit 096b9854 ]
      
      Use n->primary_key instead of pkey to account for the possibility that a neigh
      constructor function may have modified the primary_key value.
      Signed-off-by: default avatarJim Westfall <jwestfall@surrealistic.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      014510b1
    • Neil Horman's avatar
      vmxnet3: repair memory leak · 66c16a22
      Neil Horman authored
      
      [ Upstream commit 848b1598 ]
      
      with the introduction of commit
      b0eb57cb, it appears that rq->buf_info
      is improperly handled.  While it is heap allocated when an rx queue is
      setup, and freed when torn down, an old line of code in
      vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0]
      being set to NULL prior to its being freed, causing a memory leak, which
      eventually exhausts the system on repeated create/destroy operations
      (for example, when  the mtu of a vmxnet3 interface is changed
      frequently.
      
      Fix is pretty straight forward, just move the NULL set to after the
      free.
      
      Tested by myself with successful results
      
      Applies to net, and should likely be queued for stable, please
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-By: boyang@redhat.com
      CC: boyang@redhat.com
      CC: Shrikrishna Khare <skhare@vmware.com>
      CC: "VMware, Inc." <pv-drivers@vmware.com>
      CC: David S. Miller <davem@davemloft.net>
      Acked-by: default avatarShrikrishna Khare <skhare@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66c16a22
    • Cong Wang's avatar
      tipc: fix a memory leak in tipc_nl_node_get_link() · 0e52703d
      Cong Wang authored
      
      [ Upstream commit 59b36613 ]
      
      When tipc_node_find_by_name() fails, the nlmsg is not
      freed.
      
      While on it, switch to a goto label to properly
      free it.
      
      Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e52703d
    • Xin Long's avatar
      sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf · 2f056e7d
      Xin Long authored
      
      [ Upstream commit a0ff6600 ]
      
      After commit cea0cc80 ("sctp: use the right sk after waking up from
      wait_buf sleep"), it may change to lock another sk if the asoc has been
      peeled off in sctp_wait_for_sndbuf.
      
      However, the asoc's new sk could be already closed elsewhere, as it's in
      the sendmsg context of the old sk that can't avoid the new sk's closing.
      If the sk's last one refcnt is held by this asoc, later on after putting
      this asoc, the new sk will be freed, while under it's own lock.
      
      This patch is to revert that commit, but fix the old issue by returning
      error under the old sk's lock.
      
      Fixes: cea0cc80 ("sctp: use the right sk after waking up from wait_buf sleep")
      Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f056e7d
    • Xin Long's avatar
      sctp: do not allow the v4 socket to bind a v4mapped v6 address · 8e3534ea
      Xin Long authored
      
      [ Upstream commit c5006b8a ]
      
      The check in sctp_sockaddr_af is not robust enough to forbid binding a
      v4mapped v6 addr on a v4 socket.
      
      The worse thing is that v4 socket's bind_verify would not convert this
      v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
      socket bound a v6 addr.
      
      This patch is to fix it by doing the common sa.sa_family check first,
      then AF_INET check for v4mapped v6 addrs.
      
      Fixes: 7dab83de ("sctp: Support ipv6only AF_INET6 sockets.")
      Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e3534ea
    • Francois Romieu's avatar
      r8169: fix memory corruption on retrieval of hardware statistics. · 0f51492d
      Francois Romieu authored
      
      [ Upstream commit a78e9366 ]
      
      Hardware statistics retrieval hurts in tight invocation loops.
      
      Avoid extraneous write and enforce strict ordering of writes targeted to
      the tally counters dump area address registers.
      Signed-off-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Tested-by: default avatarOliver Freyermuth <o.freyermuth@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f51492d
    • Guillaume Nault's avatar
      pppoe: take ->needed_headroom of lower device into account on xmit · 1bd21b15
      Guillaume Nault authored
      
      [ Upstream commit 02612bb0 ]
      
      In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom
      was probably fine before the introduction of ->needed_headroom in
      commit f5184d26 ("net: Allow netdevices to specify needed head/tailroom").
      
      But now, virtual devices typically advertise the size of their overhead
      in dev->needed_headroom, so we must also take it into account in
      skb_reserve().
      Allocation size of skb is also updated to take dev->needed_tailroom
      into account and replace the arbitrary 32 bytes with the real size of
      a PPPoE header.
      
      This issue was discovered by syzbot, who connected a pppoe socket to a
      gre device which had dev->header_ops->create == ipgre_header and
      dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any
      headroom, and dev_hard_header() crashed when ipgre_header() tried to
      prepend its header to skb->data.
      
      skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24
      head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0
      ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:104!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
          (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted
      4.15.0-rc7-next-20180115+ #97
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100
      RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282
      RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000
      RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc
      RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0
      R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180
      FS:  00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        skb_under_panic net/core/skbuff.c:114 [inline]
        skb_push+0xce/0xf0 net/core/skbuff.c:1714
        ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879
        dev_hard_header include/linux/netdevice.h:2723 [inline]
        pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:640
        sock_write_iter+0x31a/0x5d0 net/socket.c:909
        call_write_iter include/linux/fs.h:1775 [inline]
        do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
        do_iter_write+0x154/0x540 fs/read_write.c:932
        vfs_writev+0x18a/0x340 fs/read_write.c:977
        do_writev+0xfc/0x2a0 fs/read_write.c:1012
        SYSC_writev fs/read_write.c:1085 [inline]
        SyS_writev+0x27/0x30 fs/read_write.c:1082
        entry_SYSCALL_64_fastpath+0x29/0xa0
      
      Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like
      interfaces, but reserving space for ->needed_headroom is a more
      fundamental issue that needs to be addressed first.
      
      Same problem exists for __pppoe_xmit(), which also needs to take
      dev->needed_headroom into account in skb_cow_head().
      
      Fixes: f5184d26 ("net: Allow netdevices to specify needed head/tailroom")
      Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d@syzkaller.appspotmail.com
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bd21b15
    • Dan Streetman's avatar
      net: tcp: close sock if net namespace is exiting · cf67be7a
      Dan Streetman authored
      
      [ Upstream commit 4ee806d5 ]
      
      When a tcp socket is closed, if it detects that its net namespace is
      exiting, close immediately and do not wait for FIN sequence.
      
      For normal sockets, a reference is taken to their net namespace, so it will
      never exit while the socket is open.  However, kernel sockets do not take a
      reference to their net namespace, so it may begin exiting while the kernel
      socket is still open.  In this case if the kernel socket is a tcp socket,
      it will stay open trying to complete its close sequence.  The sock's dst(s)
      hold a reference to their interface, which are all transferred to the
      namespace's loopback interface when the real interfaces are taken down.
      When the namespace tries to take down its loopback interface, it hangs
      waiting for all references to the loopback interface to release, which
      results in messages like:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      These messages continue until the socket finally times out and closes.
      Since the net namespace cleanup holds the net_mutex while calling its
      registered pernet callbacks, any new net namespace initialization is
      blocked until the current net namespace finishes exiting.
      
      After this change, the tcp socket notices the exiting net namespace, and
      closes immediately, releasing its dst(s) and their reference to the
      loopback interface, which lets the net namespace continue exiting.
      
      Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811Signed-off-by: default avatarDan Streetman <ddstreet@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf67be7a