1. 30 Nov, 2017 1 commit
    • Heiko Carstens's avatar
      s390: fix transactional execution control register handling · c9d0db61
      Heiko Carstens authored
      commit a1c5befc upstream.
      
      Dan Horák reported the following crash related to transactional execution:
      
      User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000]
      CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
      Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
      task: 00000000fafc8000 task.stack: 00000000fafc4000
      User PSW : 0705200180000000 000003ff93c14e70
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
      User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e
                 0000000000000000 0000000000000002 0000000000000000 000003ff00000000
                 0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770
                 000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0
      User Code: 000003ff93c14e62: 60e0b030            std     %f14,48(%r11)
                 000003ff93c14e66: 60f0b038            std     %f15,56(%r11)
                #000003ff93c14e6a: e5600000ff0e        tbegin  0,65294
                >000003ff93c14e70: a7740006            brc     7,3ff93c14e7c
                 000003ff93c14e74: a7080000            lhi     %r0,0
                 000003ff93c14e78: a7f40023            brc     15,3ff93c14ebe
                 000003ff93c14e7c: b2220000            ipm     %r0
                 000003ff93c14e80: 8800001c            srl     %r0,28
      
      There are several bugs with control register handling with respect to
      transactional execution:
      
      - on task switch update_per_regs() is only called if the next task has
        an mm (is not a kernel thread). This however is incorrect. This
        breaks e.g. for user mode helper handling, where the kernel creates
        a kernel thread and then execve's a user space program. Control
        register contents related to transactional execution won't be
        updated on execve. If the previous task ran with transactional
        execution disabled then the new task will also run with
        transactional execution disabled, which is incorrect. Therefore call
        update_per_regs() unconditionally within switch_to().
      
      - on startup the transactional execution facility is not enabled for
        the idle thread. This is not really a bug, but an inconsistency to
        other facilities. Therefore enable the facility if it is available.
      
      - on fork the new thread's per_flags field is not cleared. This means
        that a child process inherits the PER_FLAG_NO_TE flag. This flag can
        be set with a ptrace request to disable transactional execution for
        the current process. It should not be inherited by new child
        processes in order to be consistent with the handling of all other
        PER related debugging options. Therefore clear the per_flags field in
        copy_thread_tls().
      Reported-and-tested-by: default avatarDan Horák <dan@danny.cz>
      Fixes: d35339a4 ("s390: add support for transactional memory")
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9d0db61
  2. 24 Nov, 2017 27 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.65 · 133e6ccf
      Greg Kroah-Hartman authored
      133e6ccf
    • Jann Horn's avatar
      mm/pagewalk.c: report holes in hugetlb ranges · ceaec6e8
      Jann Horn authored
      commit 373c4557 upstream.
      
      This matters at least for the mincore syscall, which will otherwise copy
      uninitialized memory from the page allocator to userspace.  It is
      probably also a correctness error for /proc/$pid/pagemap, but I haven't
      tested that.
      
      Removing the `walk->hugetlb_entry` condition in walk_hugetlb_range() has
      no effect because the caller already checks for that.
      
      This only reports holes in hugetlb ranges to callers who have specified
      a hugetlb_entry callback.
      
      This issue was found using an AFL-based fuzzer.
      
      v2:
       - don't crash on ->pte_hole==NULL (Andrew Morton)
       - add Cc stable (Andrew Morton)
      
      Changed for 4.4/4.9 stable backport:
       - fix up conflict in the huge_pte_offset() call
      
      Fixes: 1e25a271 ("mincore: apply page table walker on do_mincore()")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      ceaec6e8
    • Jan Harkes's avatar
      coda: fix 'kernel memory exposure attempt' in fsync · fae59471
      Jan Harkes authored
      commit d337b66a upstream.
      
      When an application called fsync on a file in Coda a small request with
      just the file identifier was allocated, but the declared length was set
      to the size of union of all possible upcall requests.
      
      This bug has been around for a very long time and is now caught by the
      extra checking in usercopy that was introduced in Linux-4.8.
      
      The exposure happens when the Coda cache manager process reads the fsync
      upcall request at which point it is killed. As a result there is nobody
      servicing any further upcalls, trapping any processes that try to access
      the mounted Coda filesystem.
      Signed-off-by: default avatarJan Harkes <jaharkes@cs.cmu.edu>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fae59471
    • Pavel Tatashin's avatar
      mm/page_alloc.c: broken deferred calculation · 9980b827
      Pavel Tatashin authored
      commit d135e575 upstream.
      
      In reset_deferred_meminit() we determine number of pages that must not
      be deferred.  We initialize pages for at least 2G of memory, but also
      pages for reserved memory in this node.
      
      The reserved memory is determined in this function:
      memblock_reserved_memory_within(), which operates over physical
      addresses, and returns size in bytes.  However, reset_deferred_meminit()
      assumes that that this function operates with pfns, and returns page
      count.
      
      The result is that in the best case machine boots slower than expected
      due to initializing more pages than needed in single thread, and in the
      worst case panics because fewer than needed pages are initialized early.
      
      Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
      Fixes: 864b9a39 ("mm: consider memblock reservations for deferred memory initialization sizing")
      Signed-off-by: default avatarPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9980b827
    • Corey Minyard's avatar
      ipmi: fix unsigned long underflow · 55b06b0f
      Corey Minyard authored
      commit 392a17b1 upstream.
      
      When I set the timeout to a specific value such as 500ms, the timeout
      event will not happen in time due to the overflow in function
      check_msg_timeout:
      ...
      	ent->timeout -= timeout_period;
      	if (ent->timeout > 0)
      		return;
      ...
      
      The type of timeout_period is long, but ent->timeout is unsigned long.
      This patch makes the type consistent.
      Reported-by: default avatarWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Tested-by: default avatarWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55b06b0f
    • alex chen's avatar
      ocfs2: should wait dio before inode lock in ocfs2_setattr() · 8af77738
      alex chen authored
      commit 28f5a8a7 upstream.
      
      we should wait dio requests to finish before inode lock in
      ocfs2_setattr(), otherwise the following deadlock will happen:
      
      process 1                  process 2                    process 3
      truncate file 'A'          end_io of writing file 'A'   receiving the bast messages
      ocfs2_setattr
       ocfs2_inode_lock_tracker
        ocfs2_inode_lock_full
       inode_dio_wait
        __inode_dio_wait
        -->waiting for all dio
        requests finish
                                                              dlm_proxy_ast_handler
                                                               dlm_do_local_bast
                                                                ocfs2_blocking_ast
                                                                 ocfs2_generic_handle_bast
                                                                  set OCFS2_LOCK_BLOCKED flag
                              dio_end_io
                               dio_bio_end_aio
                                dio_complete
                                 ocfs2_dio_end_io
                                  ocfs2_dio_end_io_write
                                   ocfs2_inode_lock
                                    __ocfs2_cluster_lock
                                     ocfs2_wait_for_mask
                                     -->waiting for OCFS2_LOCK_BLOCKED
                                     flag to be cleared, that is waiting
                                     for 'process 1' unlocking the inode lock
                                 inode_dio_end
                                 -->here dec the i_dio_count, but will never
                                 be called, so a deadlock happened.
      
      Link: http://lkml.kernel.org/r/59F81636.70508@huawei.comSigned-off-by: default avatarAlex Chen <alex.chen@huawei.com>
      Reviewed-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Acked-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8af77738
    • Changwei Ge's avatar
      ocfs2: fix cluster hang after a node dies · a8356445
      Changwei Ge authored
      commit 1c019671 upstream.
      
      When a node dies, other live nodes have to choose a new master for an
      existed lock resource mastered by the dead node.
      
      As for ocfs2/dlm implementation, this is done by function -
      dlm_move_lockres_to_recovery_list which marks those lock rsources as
      DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM
      changes lock resource's master later.
      
      So without invoking dlm_move_lockres_to_recovery_list, no master will be
      choosed after dlm recovery accomplishment since no lock resource can be
      found through ::resource list.
      
      What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock
      resources mastered a dead node, it will break up synchronization among
      nodes.
      
      So invoke dlm_move_lockres_to_recovery_list again.
      
      Fixs: 'commit ee8f7fcb ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")'
      Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-EX.srv.huawei-3com.comSigned-off-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Reported-by: default avatarVitaly Mayatskih <v.mayatskih@gmail.com>
      Tested-by: default avatarVitaly Mayatskikh <v.mayatskih@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8356445
    • Adam Wallis's avatar
      dmaengine: dmatest: warn user when dma test times out · 2bd38ece
      Adam Wallis authored
      commit a9df21e3 upstream.
      
      Commit adfa543e ("dmatest: don't use set_freezable_with_signal()")
      introduced a bug (that is in fact documented by the patch commit text)
      that leaves behind a dangling pointer. Since the done_wait structure is
      allocated on the stack, future invocations to the DMATEST can produce
      undesirable results (e.g., corrupted spinlocks). Ideally, this would be
      cleaned up in the thread handler, but at the very least, the kernel
      is left in a very precarious scenario that can lead to some long debug
      sessions when the crash comes later.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=197605Signed-off-by: default avatarAdam Wallis <awallis@codeaurora.org>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2bd38ece
    • Ji-Ze Hong (Peter Hong)'s avatar
      serial: 8250_fintek: Fix finding base_port with activated SuperIO · e6d4a078
      Ji-Ze Hong (Peter Hong) authored
      commit fd97e66c upstream.
      
      The SuperIO will be configured at boot time by BIOS, but some BIOS
      will not deactivate the SuperIO when the end of configuration. It'll
      lead to mismatch for pdata->base_port in probe_setup_port(). So we'll
      deactivate all SuperIO before activate special base_port in
      fintek_8250_enter_key().
      
      Tested on iBASE MI802.
      Tested-by: default avatarJi-Ze Hong (Peter Hong) <hpeter+linux_kernel@gmail.com>
      Signed-off-by: default avatarJi-Ze Hong (Peter Hong) <hpeter+linux_kernel@gmail.com>
      Reviewd-by: default avatarAlan Cox <alan@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6d4a078
    • Lukas Wunner's avatar
      serial: omap: Fix EFR write on RTS deassertion · 70eb4608
      Lukas Wunner authored
      commit 2a71de2f upstream.
      
      Commit 348f9bb3 ("serial: omap: Fix RTS handling") sought to enable
      auto RTS upon manual RTS assertion and disable it on deassertion.
      However it seems the latter was done incorrectly, it clears all bits in
      the Extended Features Register *except* auto RTS.
      
      Fixes: 348f9bb3 ("serial: omap: Fix RTS handling")
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70eb4608
    • Roberto Sassu's avatar
      ima: do not update security.ima if appraisal status is not INTEGRITY_PASS · 2cfbb32f
      Roberto Sassu authored
      commit 020aae3e upstream.
      
      Commit b65a9cfc ("Untangling ima mess, part 2: deal with counters")
      moved the call of ima_file_check() from may_open() to do_filp_open() at a
      point where the file descriptor is already opened.
      
      This breaks the assumption made by IMA that file descriptors being closed
      belong to files whose access was granted by ima_file_check(). The
      consequence is that security.ima and security.evm are updated with good
      values, regardless of the current appraisal status.
      
      For example, if a file does not have security.ima, IMA will create it after
      opening the file for writing, even if access is denied. Access to the file
      will be allowed afterwards.
      
      Avoid this issue by checking the appraisal status before updating
      security.ima.
      Signed-off-by: default avatarRoberto Sassu <roberto.sassu@huawei.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cfbb32f
    • Eric Biggers's avatar
      crypto: dh - Fix double free of ctx->p · aa15fe4d
      Eric Biggers authored
      commit 12d41a02 upstream.
      
      When setting the secret with the software Diffie-Hellman implementation,
      if allocating 'g' failed (e.g. if it was longer than
      MAX_EXTERN_MPI_BITS), then 'p' was freed twice: once immediately, and
      once later when the crypto_kpp tfm was destroyed.
      
      Fix it by using dh_free_ctx() (renamed to dh_clear_ctx()) in the error
      paths, as that correctly sets the pointers to NULL.
      
      KASAN report:
      
          MPI: mpi too large (32760 bits)
          ==================================================================
          BUG: KASAN: use-after-free in mpi_free+0x131/0x170
          Read of size 4 at addr ffff88006c7cdf90 by task reproduce_doubl/367
      
          CPU: 1 PID: 367 Comm: reproduce_doubl Not tainted 4.14.0-rc7-00040-g05298abde6fe #7
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Call Trace:
           dump_stack+0xb3/0x10b
           ? mpi_free+0x131/0x170
           print_address_description+0x79/0x2a0
           ? mpi_free+0x131/0x170
           kasan_report+0x236/0x340
           ? akcipher_register_instance+0x90/0x90
           __asan_report_load4_noabort+0x14/0x20
           mpi_free+0x131/0x170
           ? akcipher_register_instance+0x90/0x90
           dh_exit_tfm+0x3d/0x140
           crypto_kpp_exit_tfm+0x52/0x70
           crypto_destroy_tfm+0xb3/0x250
           __keyctl_dh_compute+0x640/0xe90
           ? kasan_slab_free+0x12f/0x180
           ? dh_data_from_key+0x240/0x240
           ? key_create_or_update+0x1ee/0xb20
           ? key_instantiate_and_link+0x440/0x440
           ? lock_contended+0xee0/0xee0
           ? kfree+0xcf/0x210
           ? SyS_add_key+0x268/0x340
           keyctl_dh_compute+0xb3/0xf1
           ? __keyctl_dh_compute+0xe90/0xe90
           ? SyS_add_key+0x26d/0x340
           ? entry_SYSCALL_64_fastpath+0x5/0xbe
           ? trace_hardirqs_on_caller+0x3f4/0x560
           SyS_keyctl+0x72/0x2c0
           entry_SYSCALL_64_fastpath+0x1f/0xbe
          RIP: 0033:0x43ccf9
          RSP: 002b:00007ffeeec96158 EFLAGS: 00000246 ORIG_RAX: 00000000000000fa
          RAX: ffffffffffffffda RBX: 000000000248b9b9 RCX: 000000000043ccf9
          RDX: 00007ffeeec96170 RSI: 00007ffeeec96160 RDI: 0000000000000017
          RBP: 0000000000000046 R08: 0000000000000000 R09: 0248b9b9143dc936
          R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000000
          R13: 0000000000409670 R14: 0000000000409700 R15: 0000000000000000
      
          Allocated by task 367:
           save_stack_trace+0x16/0x20
           kasan_kmalloc+0xeb/0x180
           kmem_cache_alloc_trace+0x114/0x300
           mpi_alloc+0x4b/0x230
           mpi_read_raw_data+0xbe/0x360
           dh_set_secret+0x1dc/0x460
           __keyctl_dh_compute+0x623/0xe90
           keyctl_dh_compute+0xb3/0xf1
           SyS_keyctl+0x72/0x2c0
           entry_SYSCALL_64_fastpath+0x1f/0xbe
      
          Freed by task 367:
           save_stack_trace+0x16/0x20
           kasan_slab_free+0xab/0x180
           kfree+0xb5/0x210
           mpi_free+0xcb/0x170
           dh_set_secret+0x2d7/0x460
           __keyctl_dh_compute+0x623/0xe90
           keyctl_dh_compute+0xb3/0xf1
           SyS_keyctl+0x72/0x2c0
           entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: 802c7f1c ("crypto: dh - Add DH software implementation")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarTudor Ambarus <tudor.ambarus@microchip.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa15fe4d
    • Tudor-Dan Ambarus's avatar
      crypto: dh - fix memleak in setkey · 4a7e0231
      Tudor-Dan Ambarus authored
      commit ee34e264 upstream.
      
      setkey can be called multiple times during the existence
      of the transformation object. In case of multiple setkey calls,
      the old key was not freed and we leaked memory.
      Free the old MPI key if any.
      Signed-off-by: default avatarTudor Ambarus <tudor.ambarus@microchip.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a7e0231
    • Eric W. Biederman's avatar
      net/sctp: Always set scope_id in sctp_inet6_skb_msgname · 67b718fc
      Eric W. Biederman authored
      
      [ Upstream commit 7c8a61d9 ]
      
      Alexandar Potapenko while testing the kernel with KMSAN and syzkaller
      discovered that in some configurations sctp would leak 4 bytes of
      kernel stack.
      
      Working with his reproducer I discovered that those 4 bytes that
      are leaked is the scope id of an ipv6 address returned by recvmsg.
      
      With a little code inspection and a shrewd guess I discovered that
      sctp_inet6_skb_msgname only initializes the scope_id field for link
      local ipv6 addresses to the interface index the link local address
      pertains to instead of initializing the scope_id field for all ipv6
      addresses.
      
      That is almost reasonable as scope_id's are meaniningful only for link
      local addresses.  Set the scope_id in all other cases to 0 which is
      not a valid interface index to make it clear there is nothing useful
      in the scope_id field.
      
      There should be no danger of breaking userspace as the stack leak
      guaranteed that previously meaningless random data was being returned.
      
      Fixes: 372f525b ("SCTP:  Resync with LKSCTP tree.")
      History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gitReported-by: default avatarAlexander Potapenko <glider@google.com>
      Tested-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67b718fc
    • Huacai Chen's avatar
      fealnx: Fix building error on MIPS · f0ae7a1b
      Huacai Chen authored
      
      [ Upstream commit cc54c1d3 ]
      
      This patch try to fix the building error on MIPS. The reason is MIPS
      has already defined the LONG macro, which conflicts with the LONG enum
      in drivers/net/ethernet/fealnx.c.
      Signed-off-by: default avatarHuacai Chen <chenhc@lemote.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0ae7a1b
    • Xin Long's avatar
      sctp: do not peel off an assoc from one netns to another one · 362d2ce0
      Xin Long authored
      
      [ Upstream commit df80cd9b ]
      
      Now when peeling off an association to the sock in another netns, all
      transports in this assoc are not to be rehashed and keep use the old
      key in hashtable.
      
      As a transport uses sk->net as the hash key to insert into hashtable,
      it would miss removing these transports from hashtable due to the new
      netns when closing the sock and all transports are being freeed, then
      later an use-after-free issue could be caused when looking up an asoc
      and dereferencing those transports.
      
      This is a very old issue since very beginning, ChunYu found it with
      syzkaller fuzz testing with this series:
      
        socket$inet6_sctp()
        bind$inet6()
        sendto$inet6()
        unshare(0x40000000)
        getsockopt$inet_sctp6_SCTP_GET_ASSOC_ID_LIST()
        getsockopt$inet_sctp6_SCTP_SOCKOPT_PEELOFF()
      
      This patch is to block this call when peeling one assoc off from one
      netns to another one, so that the netns of all transport would not
      go out-sync with the key in hashtable.
      
      Note that this patch didn't fix it by rehashing transports, as it's
      difficult to handle the situation when the tuple is already in use
      in the new netns. Besides, no one would like to peel off one assoc
      to another netns, considering ipaddrs, ifaces, etc. are usually
      different.
      Reported-by: default avatarChunYu Wang <chunwang@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      362d2ce0
    • Jason A. Donenfeld's avatar
      af_netlink: ensure that NLMSG_DONE never fails in dumps · 99aa74ce
      Jason A. Donenfeld authored
      
      [ Upstream commit 0642840b ]
      
      The way people generally use netlink_dump is that they fill in the skb
      as much as possible, breaking when nla_put returns an error. Then, they
      get called again and start filling out the next skb, and again, and so
      forth. The mechanism at work here is the ability for the iterative
      dumping function to detect when the skb is filled up and not fill it
      past the brim, waiting for a fresh skb for the rest of the data.
      
      However, if the attributes are small and nicely packed, it is possible
      that a dump callback function successfully fills in attributes until the
      skb is of size 4080 (libmnl's default page-sized receive buffer size).
      The dump function completes, satisfied, and then, if it happens to be
      that this is actually the last skb, and no further ones are to be sent,
      then netlink_dump will add on the NLMSG_DONE part:
      
        nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
      
      It is very important that netlink_dump does this, of course. However, in
      this example, that call to nlmsg_put_answer will fail, because the
      previous filling by the dump function did not leave it enough room. And
      how could it possibly have done so? All of the nla_put variety of
      functions simply check to see if the skb has enough tailroom,
      independent of the context it is in.
      
      In order to keep the important assumptions of all netlink dump users, it
      is therefore important to give them an skb that has this end part of the
      tail already reserved, so that the call to nlmsg_put_answer does not
      fail. Otherwise, library authors are forced to find some bizarre sized
      receive buffer that has a large modulo relative to the common sizes of
      messages received, which is ugly and buggy.
      
      This patch thus saves the NLMSG_DONE for an additional message, for the
      case that things are dangerously close to the brim. This requires
      keeping track of the errno from ->dump() across calls.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99aa74ce
    • Cong Wang's avatar
      vlan: fix a use-after-free in vlan_device_event() · 080ecd2b
      Cong Wang authored
      
      [ Upstream commit 052d41c0 ]
      
      After refcnt reaches zero, vlan_vid_del() could free
      dev->vlan_info via RCU:
      
      	RCU_INIT_POINTER(dev->vlan_info, NULL);
      	call_rcu(&vlan_info->rcu, vlan_info_rcu_free);
      
      However, the pointer 'grp' still points to that memory
      since it is set before vlan_vid_del():
      
              vlan_info = rtnl_dereference(dev->vlan_info);
              if (!vlan_info)
                      goto out;
              grp = &vlan_info->grp;
      
      Depends on when that RCU callback is scheduled, we could
      trigger a use-after-free in vlan_group_for_each_dev()
      right following this vlan_vid_del().
      
      Fix it by moving vlan_vid_del() before setting grp. This
      is also symmetric to the vlan_vid_add() we call in
      vlan_device_event().
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Fixes: efc73f4b ("net: Fix memory leak - vlan_info struct")
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Girish Moodalbail <girish.moodalbail@oracle.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarGirish Moodalbail <girish.moodalbail@oracle.com>
      Tested-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      080ecd2b
    • Andrey Konovalov's avatar
      net: usb: asix: fill null-ptr-deref in asix_suspend · 58baa36d
      Andrey Konovalov authored
      
      [ Upstream commit 8f562462 ]
      
      When asix_suspend() is called dev->driver_priv might not have been
      assigned a value, so we need to check that it's not NULL.
      
      Similar issue is present in asix_resume(), this patch fixes it as well.
      
      Found by syzkaller.
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      Modules linked in:
      CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc4-43422-geccacdd69a8c #400
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Workqueue: usb_hub_wq hub_event
      task: ffff88006bb36300 task.stack: ffff88006bba8000
      RIP: 0010:asix_suspend+0x76/0xc0 drivers/net/usb/asix_devices.c:629
      RSP: 0018:ffff88006bbae718 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff880061ba3b80 RCX: 1ffff1000c34d644
      RDX: 0000000000000001 RSI: 0000000000000402 RDI: 0000000000000008
      RBP: ffff88006bbae738 R08: 1ffff1000d775cad R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800630a8b40
      R13: 0000000000000000 R14: 0000000000000402 R15: ffff880061ba3b80
      FS:  0000000000000000(0000) GS:ffff88006c600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ff33cf89000 CR3: 0000000061c0a000 CR4: 00000000000006f0
      Call Trace:
       usb_suspend_interface drivers/usb/core/driver.c:1209
       usb_suspend_both+0x27f/0x7e0 drivers/usb/core/driver.c:1314
       usb_runtime_suspend+0x41/0x120 drivers/usb/core/driver.c:1852
       __rpm_callback+0x339/0xb60 drivers/base/power/runtime.c:334
       rpm_callback+0x106/0x220 drivers/base/power/runtime.c:461
       rpm_suspend+0x465/0x1980 drivers/base/power/runtime.c:596
       __pm_runtime_suspend+0x11e/0x230 drivers/base/power/runtime.c:1009
       pm_runtime_put_sync_autosuspend ./include/linux/pm_runtime.h:251
       usb_new_device+0xa37/0x1020 drivers/usb/core/hub.c:2487
       hub_port_connect drivers/usb/core/hub.c:4903
       hub_port_connect_change drivers/usb/core/hub.c:5009
       port_event drivers/usb/core/hub.c:5115
       hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195
       process_one_work+0xc7f/0x1db0 kernel/workqueue.c:2119
       worker_thread+0x221/0x1850 kernel/workqueue.c:2253
       kthread+0x3a1/0x470 kernel/kthread.c:231
       ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
      Code: 8d 7c 24 20 48 89 fa 48 c1 ea 03 80 3c 02 00 75 5b 48 b8 00 00
      00 00 00 fc ff df 4d 8b 6c 24 20 49 8d 7d 08 48 89 fa 48 c1 ea 03 <80>
      3c 02 00 75 34 4d 8b 6d 08 4d 85 ed 74 0b e8 26 2b 51 fd 4c
      RIP: asix_suspend+0x76/0xc0 RSP: ffff88006bbae718
      ---[ end trace dfc4f5649284342c ]---
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58baa36d
    • Kristian Evensen's avatar
      qmi_wwan: Add missing skb_reset_mac_header-call · 4ad82095
      Kristian Evensen authored
      
      [ Upstream commit 0de0add1 ]
      
      When we receive a packet on a QMI device in raw IP mode, we should call
      skb_reset_mac_header() to ensure that skb->mac_header contains a valid
      offset in the packet. While it shouldn't really matter, the packets have
      no MAC header and the interface is configured as-such, it seems certain
      parts of the network stack expects a "good" value in skb->mac_header.
      
      Without the skb_reset_mac_header() call added in this patch, for example
      shaping traffic (using tc) triggers the following oops on the first
      received packet:
      
      [  303.642957] skbuff: skb_under_panic: text:8f137918 len:177 put:67 head:8e4b0f00 data:8e4b0eff tail:0x8e4b0fb0 end:0x8e4b1520 dev:wwan0
      [  303.655045] Kernel bug detected[#1]:
      [  303.658622] CPU: 1 PID: 1002 Comm: logd Not tainted 4.9.58 #0
      [  303.664339] task: 8fdf05e0 task.stack: 8f15c000
      [  303.668844] $ 0   : 00000000 00000001 0000007a 00000000
      [  303.674062] $ 4   : 8149a2fc 8149a2fc 8149ce20 00000000
      [  303.679284] $ 8   : 00000030 3878303a 31623465 20303235
      [  303.684510] $12   : ded731e3 2626a277 00000000 03bd0000
      [  303.689747] $16   : 8ef62b40 00000043 8f137918 804db5fc
      [  303.694978] $20   : 00000001 00000004 8fc13800 00000003
      [  303.700215] $24   : 00000001 8024ab10
      [  303.705442] $28   : 8f15c000 8fc19cf0 00000043 802cc920
      [  303.710664] Hi    : 00000000
      [  303.713533] Lo    : 74e58000
      [  303.716436] epc   : 802cc920 skb_panic+0x58/0x5c
      [  303.721046] ra    : 802cc920 skb_panic+0x58/0x5c
      [  303.725639] Status: 11007c03 KERNEL EXL IE
      [  303.729823] Cause : 50800024 (ExcCode 09)
      [  303.733817] PrId  : 0001992f (MIPS 1004Kc)
      [  303.737892] Modules linked in: rt2800pci rt2800mmio rt2800lib qcserial ppp_async option usb_wwan rt2x00pci rt2x00mmio rt2x00lib rndis_host qmi_wwan ppp_generic nf_nat_pptp nf_conntrack_pptp nf_conntrack_ipv6 mt76x2i
      Process logd (pid: 1002, threadinfo=8f15c000, task=8fdf05e0, tls=77b3eee4)
      [  303.962509] Stack : 00000000 80408990 8f137918 000000b1 00000043 8e4b0f00 8e4b0eff 8e4b0fb0
      [  303.970871]         8e4b1520 8fec1800 00000043 802cd2a4 6e000045 00000043 00000000 8ef62000
      [  303.979219]         8eef5d00 8ef62b40 8fea7300 8f137918 00000000 00000000 0002bb01 793e5664
      [  303.987568]         8ef08884 00000001 8fea7300 00000002 8fc19e80 8eef5d00 00000006 00000003
      [  303.995934]         00000000 8030ba90 00000003 77ab3fd0 8149dc80 8004d1bc 8f15c000 8f383700
      [  304.004324]         ...
      [  304.006767] Call Trace:
      [  304.009241] [<802cc920>] skb_panic+0x58/0x5c
      [  304.013504] [<802cd2a4>] skb_push+0x78/0x90
      [  304.017783] [<8f137918>] 0x8f137918
      [  304.021269] Code: 00602825  0c02a3b4  24842888 <000c000d> 8c870060  8c8200a0  0007382b  00070336  8c88005c
      [  304.031034]
      [  304.032805] ---[ end trace b778c482b3f0bda9 ]---
      [  304.041384] Kernel panic - not syncing: Fatal exception in interrupt
      [  304.051975] Rebooting in 3 seconds..
      
      While the oops is for a 4.9-kernel, I was able to trigger the same oops with
      net-next as of yesterday.
      
      Fixes: 32f7adf6 ("net: qmi_wwan: support "raw IP" mode")
      Signed-off-by: default avatarKristian Evensen <kristian.evensen@gmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ad82095
    • Bjørn Mork's avatar
      net: qmi_wwan: fix divide by 0 on bad descriptors · 02a0c063
      Bjørn Mork authored
      
      [ Upstream commit 7fd07833 ]
      
      A CDC Ethernet functional descriptor with wMaxSegmentSize = 0 will
      cause a divide error in usbnet_probe:
      
      divide error: 0000 [#1] PREEMPT SMP KASAN
      Modules linked in:
      CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc8-44453-g1fdc1a82c34f #56
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Workqueue: usb_hub_wq hub_event
      task: ffff88006bef5c00 task.stack: ffff88006bf60000
      RIP: 0010:usbnet_update_max_qlen+0x24d/0x390 drivers/net/usb/usbnet.c:355
      RSP: 0018:ffff88006bf67508 EFLAGS: 00010246
      RAX: 00000000000163c8 RBX: ffff8800621fce40 RCX: ffff8800621fcf34
      RDX: 0000000000000000 RSI: ffffffff837ecb7a RDI: ffff8800621fcf34
      RBP: ffff88006bf67520 R08: ffff88006bef5c00 R09: ffffed000c43f881
      R10: ffffed000c43f880 R11: ffff8800621fc406 R12: 0000000000000003
      R13: ffffffff85c71de0 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88006ca00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffe9c0d6dac CR3: 00000000614f4000 CR4: 00000000000006f0
      Call Trace:
       usbnet_probe+0x18b5/0x2790 drivers/net/usb/usbnet.c:1783
       qmi_wwan_probe+0x133/0x220 drivers/net/usb/qmi_wwan.c:1338
       usb_probe_interface+0x324/0x940 drivers/usb/core/driver.c:361
       really_probe drivers/base/dd.c:413
       driver_probe_device+0x522/0x740 drivers/base/dd.c:557
      
      Fix by simply ignoring the bogus descriptor, as it is optional
      for QMI devices anyway.
      
      Fixes: 423ce8ca ("net: usb: qmi_wwan: New driver for Huawei QMI based WWAN devices")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02a0c063
    • Bjørn Mork's avatar
      net: cdc_ether: fix divide by 0 on bad descriptors · f3766218
      Bjørn Mork authored
      
      [ Upstream commit 2cb80187 ]
      
      Setting dev->hard_mtu to 0 will cause a divide error in
      usbnet_probe. Protect against devices with bogus CDC Ethernet
      functional descriptors by ignoring a zero wMaxSegmentSize.
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3766218
    • Hangbin Liu's avatar
      bonding: discard lowest hash bit for 802.3ad layer3+4 · 6f239c06
      Hangbin Liu authored
      
      [ Upstream commit b5f86218 ]
      
      After commit 07f4c900 ("tcp/dccp: try to not exhaust ip_local_port_range
      in connect()"), we will try to use even ports for connect(). Then if an
      application (seen clearly with iperf) opens multiple streams to the same
      destination IP and port, each stream will be given an even source port.
      
      So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing
      will always hash all these streams to the same interface. And the total
      throughput will limited to a single slave.
      
      Change the tcp code will impact the whole tcp behavior, only for bonding
      usage. Paolo Abeni suggested fix this by changing the bonding code only,
      which should be more reasonable, and less impact.
      
      Fix this by discarding the lowest hash bit because it contains little entropy.
      After the fix we can re-balance between slaves.
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f239c06
    • Ye Yin's avatar
      netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed · afd9fa66
      Ye Yin authored
      
      [ Upstream commit 2b5ec1a5 ]
      
      When run ipvs in two different network namespace at the same host, and one
      ipvs transport network traffic to the other network namespace ipvs.
      'ipvs_property' flag will make the second ipvs take no effect. So we should
      clear 'ipvs_property' when SKB network namespace changed.
      
      Fixes: 621e84d6 ("dev: introduce skb_scrub_packet()")
      Signed-off-by: default avatarYe Yin <hustcat@gmail.com>
      Signed-off-by: default avatarWei Zhou <chouryzhou@gmail.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afd9fa66
    • Eric Dumazet's avatar
      tcp: do not mangle skb->cb[] in tcp_make_synack() · 3920a5bd
      Eric Dumazet authored
      
      [ Upstream commit 3b117750 ]
      
      Christoph Paasch sent a patch to address the following issue :
      
      tcp_make_synack() is leaving some TCP private info in skb->cb[],
      then send the packet by other means than tcp_transmit_skb()
      
      tcp_transmit_skb() makes sure to clear skb->cb[] to not confuse
      IPv4/IPV6 stacks, but we have no such cleanup for SYNACK.
      
      tcp_make_synack() should not use tcp_init_nondata_skb() :
      
      tcp_init_nondata_skb() really should be limited to skbs put in write/rtx
      queues (the ones that are only sent via tcp_transmit_skb())
      
      This patch fixes the issue and should even save few cpu cycles ;)
      
      Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3920a5bd
    • Jeff Barnhill's avatar
      net: vrf: correct FRA_L3MDEV encode type · 58b21b02
      Jeff Barnhill authored
      
      [ Upstream commit 18129a24 ]
      
      FRA_L3MDEV is defined as U8, but is being added as a U32 attribute. On
      big endian architecture, this results in the l3mdev entry not being
      added to the FIB rules.
      
      Fixes: 1aa6c4f6 ("net: vrf: Add l3mdev rules on first device create")
      Signed-off-by: default avatarJeff Barnhill <0xeffeff@gmail.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58b21b02
    • Konstantin Khlebnikov's avatar
      tcp_nv: fix division by zero in tcpnv_acked() · b0e50c4e
      Konstantin Khlebnikov authored
      
      [ Upstream commit 4eebff27 ]
      
      Average RTT could become zero. This happened in real life at least twice.
      This patch treats zero as 1us.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarLawrence Brakmo <Brakmo@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0e50c4e
  3. 21 Nov, 2017 12 commits