1. 30 Apr, 2017 3 commits
    • Eric Biggers's avatar
      KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings · 6efda250
      Eric Biggers authored
      commit c9f838d1 upstream.
      
      This fixes CVE-2017-7472.
      
      Running the following program as an unprivileged user exhausts kernel
      memory by leaking thread keyrings:
      
      	#include <keyutils.h>
      
      	int main()
      	{
      		for (;;)
      			keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_THREAD_KEYRING);
      	}
      
      Fix it by only creating a new thread keyring if there wasn't one before.
      To make things more consistent, make install_thread_keyring_to_cred()
      and install_process_keyring_to_cred() both return 0 if the corresponding
      keyring is already present.
      
      Fixes: d84f4f99 ("CRED: Inaugurate COW credentials")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6efda250
    • David Howells's avatar
      KEYS: Change the name of the dead type to ".dead" to prevent user access · 44d6e10f
      David Howells authored
      commit c1644fe0 upstream.
      
      This fixes CVE-2017-6951.
      
      Userspace should not be able to do things with the "dead" key type as it
      doesn't have some of the helper functions set upon it that the kernel
      needs.  Attempting to use it may cause the kernel to crash.
      
      Fix this by changing the name of the type to ".dead" so that it's rejected
      up front on userspace syscalls by key_get_type_from_user().
      
      Though this doesn't seem to affect recent kernels, it does affect older
      ones, certainly those prior to:
      
      	commit c06cfb08
      	Author: David Howells <dhowells@redhat.com>
      	Date:   Tue Sep 16 17:36:06 2014 +0100
      	KEYS: Remove key_type::match in favour of overriding default by match_preparse
      
      which went in before 3.18-rc1.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44d6e10f
    • David Howells's avatar
      KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings · 44c03782
      David Howells authored
      commit ee8f844e upstream.
      
      This fixes CVE-2016-9604.
      
      Keyrings whose name begin with a '.' are special internal keyrings and so
      userspace isn't allowed to create keyrings by this name to prevent
      shadowing.  However, the patch that added the guard didn't fix
      KEYCTL_JOIN_SESSION_KEYRING.  Not only can that create dot-named keyrings,
      it can also subscribe to them as a session keyring if they grant SEARCH
      permission to the user.
      
      This, for example, allows a root process to set .builtin_trusted_keys as
      its session keyring, at which point it has full access because now the
      possessor permissions are added.  This permits root to add extra public
      keys, thereby bypassing module verification.
      
      This also affects kexec and IMA.
      
      This can be tested by (as root):
      
      	keyctl session .builtin_trusted_keys
      	keyctl add user a a @s
      	keyctl list @s
      
      which on my test box gives me:
      
      	2 keys in keyring:
      	180010936: ---lswrv     0     0 asymmetric: Build time autogenerated kernel key: ae3d4a31b82daa8e1a75b49dc2bba949fd992a05
      	801382539: --alswrv     0     0 user: a
      
      
      Fix this by rejecting names beginning with a '.' in the keyctl.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      cc: linux-ima-devel@lists.sourceforge.net
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44c03782
  2. 22 Apr, 2017 37 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.18.50 · 630b59cd
      Greg Kroah-Hartman authored
      630b59cd
    • Linus Torvalds's avatar
      give up on gcc ilog2() constant optimizations · 2143e71a
      Linus Torvalds authored
      commit 474c9015 upstream.
      
      gcc-7 has an "optimization" pass that completely screws up, and
      generates the code expansion for the (impossible) case of calling
      ilog2() with a zero constant, even when the code gcc compiles does not
      actually have a zero constant.
      
      And we try to generate a compile-time error for anybody doing ilog2() on
      a constant where that doesn't make sense (be it zero or negative).  So
      now gcc7 will fail the build due to our sanity checking, because it
      created that constant-zero case that didn't actually exist in the source
      code.
      
      There's a whole long discussion on the kernel mailing about how to work
      around this gcc bug.  The gcc people themselevs have discussed their
      "feature" in
      
         https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785
      
      but it's all water under the bridge, because while it looked at one
      point like it would be solved by the time gcc7 was released, that was
      not to be.
      
      So now we have to deal with this compiler braindamage.
      
      And the only simple approach seems to be to just delete the code that
      tries to warn about bad uses of ilog2().
      
      So now "ilog2()" will just return 0 not just for the value 1, but for
      any non-positive value too.
      
      It's not like I can recall anybody having ever actually tried to use
      this function on any invalid value, but maybe the sanity check just
      meant that such code never made it out in public.
      Reported-by: default avatarLaura Abbott <labbott@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>,
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2143e71a
    • James Hogan's avatar
      metag/usercopy: Add missing fixups · 9e5397c7
      James Hogan authored
      commit b884a190 upstream.
      
      The rapf copy loops in the Meta usercopy code is missing some extable
      entries for HTP cores with unaligned access checking enabled, where
      faults occur on the instruction immediately after the faulting access.
      
      Add the fixup labels and extable entries for these cases so that corner
      case user copy failures don't cause kernel crashes.
      
      Fixes: 373cd784 ("metag: Memory handling")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: linux-metag@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e5397c7
    • James Hogan's avatar
      metag/usercopy: Fix src fixup in from user rapf loops · 4740da3f
      James Hogan authored
      commit 2c0b1df8 upstream.
      
      The fixup code to rewind the source pointer in
      __asm_copy_from_user_{32,64}bit_rapf_loop() always rewound the source by
      a single unit (4 or 8 bytes), however this is insufficient if the fault
      didn't occur on the first load in the loop, as the source pointer will
      have been incremented but nothing will have been stored until all 4
      register [pairs] are loaded.
      
      Read the LSM_STEP field of TXSTATUS (which is already loaded into a
      register), a bit like the copy_to_user versions, to determine how many
      iterations of MGET[DL] have taken place, all of which need rewinding.
      
      Fixes: 373cd784 ("metag: Memory handling")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: linux-metag@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4740da3f
    • James Hogan's avatar
      metag/usercopy: Set flags before ADDZ · 5620eb31
      James Hogan authored
      commit fd40eee1 upstream.
      
      The fixup code for the copy_to_user rapf loops reads TXStatus.LSM_STEP
      to decide how far to rewind the source pointer. There is a special case
      for the last execution of an MGETL/MGETD, since it leaves LSM_STEP=0
      even though the number of MGETLs/MGETDs attempted was 4. This uses ADDZ
      which is conditional upon the Z condition flag, but the AND instruction
      which masked the TXStatus.LSM_STEP field didn't set the condition flags
      based on the result.
      
      Fix that now by using ANDS which does set the flags, and also marking
      the condition codes as clobbered by the inline assembly.
      
      Fixes: 373cd784 ("metag: Memory handling")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: linux-metag@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5620eb31
    • James Hogan's avatar
      metag/usercopy: Zero rest of buffer from copy_from_user · c02ee85c
      James Hogan authored
      commit 563ddc10 upstream.
      
      Currently we try to zero the destination for a failed read from userland
      in fixup code in the usercopy.c macros. The rest of the destination
      buffer is then zeroed from __copy_user_zeroing(), which is used for both
      copy_from_user() and __copy_from_user().
      
      Unfortunately we fail to zero in the fixup code as D1Ar1 is set to 0
      before the fixup code entry labels, and __copy_from_user() shouldn't even
      be zeroing the rest of the buffer.
      
      Move the zeroing out into copy_from_user() and rename
      __copy_user_zeroing() to raw_copy_from_user() since it no longer does
      any zeroing. This also conveniently matches the name needed for
      RAW_COPY_USER support in a later patch.
      
      Fixes: 373cd784 ("metag: Memory handling")
      Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: linux-metag@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c02ee85c
    • James Hogan's avatar
      metag/usercopy: Add early abort to copy_to_user · 753c05dc
      James Hogan authored
      commit fb8ea062 upstream.
      
      When copying to userland on Meta, if any faults are encountered
      immediately abort the copy instead of continuing on and repeatedly
      faulting, and worse potentially copying further bytes successfully to
      subsequent valid pages.
      
      Fixes: 373cd784 ("metag: Memory handling")
      Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: linux-metag@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      753c05dc
    • James Hogan's avatar
      metag/usercopy: Fix alignment error checking · 63916ca6
      James Hogan authored
      commit 22572119 upstream.
      
      Fix the error checking of the alignment adjustment code in
      raw_copy_from_user(), which mistakenly considers it safe to skip the
      error check when aligning the source buffer on a 2 or 4 byte boundary.
      
      If the destination buffer was unaligned it may have started to copy
      using byte or word accesses, which could well be at the start of a new
      (valid) source page. This would result in it appearing to have copied 1
      or 2 bytes at the end of the first (invalid) page rather than none at
      all.
      
      Fixes: 373cd784 ("metag: Memory handling")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: linux-metag@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63916ca6
    • James Hogan's avatar
      metag/usercopy: Drop unused macros · f80e0324
      James Hogan authored
      commit ef62a2d8 upstream.
      
      Metag's lib/usercopy.c has a bunch of copy_from_user macros for larger
      copies between 5 and 16 bytes which are completely unused. Before fixing
      zeroing lets drop these macros so there is less to fix.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-metag@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f80e0324
    • Wei Yongjun's avatar
      ring-buffer: Fix return value check in test_ringbuffer() · 814051bd
      Wei Yongjun authored
      commit 62277de7 upstream.
      
      In case of error, the function kthread_run() returns ERR_PTR()
      and never returns NULL. The NULL test in the return value check
      should be replaced with IS_ERR().
      
      Link: http://lkml.kernel.org/r/1466184839-14927-1-git-send-email-weiyj_lk@163.com
      
      Fixes: 6c43e554 ("ring-buffer: Add ring buffer startup selftest")
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      814051bd
    • Chris Salls's avatar
      mm/mempolicy.c: fix error handling in set_mempolicy and mbind. · c01cf958
      Chris Salls authored
      commit cf01fb99 upstream.
      
      In the case that compat_get_bitmap fails we do not want to copy the
      bitmap to the user as it will contain uninitialized stack data and leak
      sensitive data.
      Signed-off-by: default avatarChris Salls <salls@cs.ucsb.edu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c01cf958
    • Rafał Miłecki's avatar
      mtd: bcm47xxpart: fix parsing first block after aligned TRX · c6a9a73e
      Rafał Miłecki authored
      commit bd5d2131 upstream.
      
      After parsing TRX we should skip to the first block placed behind it.
      Our code was working only with TRX with length not aligned to the
      blocksize. In other cases (length aligned) it was missing the block
      places right after TRX.
      
      This fixes calculation and simplifies the comment.
      Signed-off-by: default avatarRafał Miłecki <rafal@milecki.pl>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c6a9a73e
    • Naoya Horiguchi's avatar
      mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd() · 4f6625fa
      Naoya Horiguchi authored
      commit c9d398fa upstream.
      
      I found the race condition which triggers the following bug when
      move_pages() and soft offline are called on a single hugetlb page
      concurrently.
      
          Soft offlining page 0x119400 at 0x700000000000
          BUG: unable to handle kernel paging request at ffffea0011943820
          IP: follow_huge_pmd+0x143/0x190
          PGD 7ffd2067
          PUD 7ffd1067
          PMD 0
              [61163.582052] Oops: 0000 [#1] SMP
          Modules linked in: binfmt_misc ppdev virtio_balloon parport_pc pcspkr i2c_piix4 parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: cap_check]
          CPU: 0 PID: 22573 Comm: iterate_numa_mo Tainted: P           OE   4.11.0-rc2-mm1+ #2
          Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
          RIP: 0010:follow_huge_pmd+0x143/0x190
          RSP: 0018:ffffc90004bdbcd0 EFLAGS: 00010202
          RAX: 0000000465003e80 RBX: ffffea0004e34d30 RCX: 00003ffffffff000
          RDX: 0000000011943800 RSI: 0000000000080001 RDI: 0000000465003e80
          RBP: ffffc90004bdbd18 R08: 0000000000000000 R09: ffff880138d34000
          R10: ffffea0004650000 R11: 0000000000c363b0 R12: ffffea0011943800
          R13: ffff8801b8d34000 R14: ffffea0000000000 R15: 000077ff80000000
          FS:  00007fc977710740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: ffffea0011943820 CR3: 000000007a746000 CR4: 00000000001406f0
          Call Trace:
           follow_page_mask+0x270/0x550
           SYSC_move_pages+0x4ea/0x8f0
           SyS_move_pages+0xe/0x10
           do_syscall_64+0x67/0x180
           entry_SYSCALL64_slow_path+0x25/0x25
          RIP: 0033:0x7fc976e03949
          RSP: 002b:00007ffe72221d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000117
          RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc976e03949
          RDX: 0000000000c22390 RSI: 0000000000001400 RDI: 0000000000005827
          RBP: 00007ffe72221e00 R08: 0000000000c2c3a0 R09: 0000000000000004
          R10: 0000000000c363b0 R11: 0000000000000246 R12: 0000000000400650
          R13: 00007ffe72221ee0 R14: 0000000000000000 R15: 0000000000000000
          Code: 81 e4 ff ff 1f 00 48 21 c2 49 c1 ec 0c 48 c1 ea 0c 4c 01 e2 49 bc 00 00 00 00 00 ea ff ff 48 c1 e2 06 49 01 d4 f6 45 bc 04 74 90 <49> 8b 7c 24 20 40 f6 c7 01 75 2b 4c 89 e7 8b 47 1c 85 c0 7e 2a
          RIP: follow_huge_pmd+0x143/0x190 RSP: ffffc90004bdbcd0
          CR2: ffffea0011943820
          ---[ end trace e4f81353a2d23232 ]---
          Kernel panic - not syncing: Fatal exception
          Kernel Offset: disabled
      
      This bug is triggered when pmd_present() returns true for non-present
      hugetlb, so fixing the present check in follow_huge_pmd() prevents it.
      Using pmd_present() to determine present/non-present for hugetlb is not
      correct, because pmd_present() checks multiple bits (not only
      _PAGE_PRESENT) for historical reason and it can misjudge hugetlb state.
      
      Fixes: e66f17ff ("mm/hugetlb: take page table lock in follow_huge_pmd()")
      Link: http://lkml.kernel.org/r/1490149898-20231-1-git-send-email-n-horiguchi@ah.jp.nec.comSigned-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f6625fa
    • Bjorn Andersson's avatar
      pinctrl: qcom: Don't clear status bit on irq_unmask · 08b1ade0
      Bjorn Andersson authored
      commit a6566710 upstream.
      
      Clearing the status bit on irq_unmask will discard any pending interrupt
      that did arrive after the irq_ack, i.e. while the IRQ handler function
      was executing.
      
      Fixes: f365be09 ("pinctrl: Add Qualcomm TLMM driver")
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Reported-by: default avatarTimur Tabi <timur@codeaurora.org>
      Signed-off-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08b1ade0
    • Ladi Prosek's avatar
      virtio_balloon: init 1st buffer in stats vq · bcfe0f5b
      Ladi Prosek authored
      commit fc865322 upstream.
      
      When init_vqs runs, virtio_balloon.stats is either uninitialized or
      contains stale values. The host updates its state with garbage data
      because it has no way of knowing that this is just a marker buffer
      used for signaling.
      
      This patch updates the stats before pushing the initial buffer.
      
      Alternative fixes:
      * Push an empty buffer in init_vqs. Not easily done with the current
        virtio implementation and violates the spec "Driver MUST supply the
        same subset of statistics in all buffers submitted to the statsq".
      * Push a buffer with invalid tags in init_vqs. Violates the same
        spec clause, plus "invalid tag" is not really defined.
      
      Note: the spec says:
      	When using the legacy interface, the device SHOULD ignore all values in
      	the first buffer in the statsq supplied by the driver after device
      	initialization. Note: Historically, drivers supplied an uninitialized
      	buffer in the first buffer.
      
      Unfortunately QEMU does not seem to implement the recommendation
      even for the legacy interface.
      Signed-off-by: default avatarLadi Prosek <lprosek@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bcfe0f5b
    • Mauricio Faria de Oliveira's avatar
      block: allow WRITE_SAME commands with the SG_IO ioctl · f142c511
      Mauricio Faria de Oliveira authored
      commit 25cdb645 upstream.
      
      The WRITE_SAME commands are not present in the blk_default_cmd_filter
      write_ok list, and thus are failed with -EPERM when the SG_IO ioctl()
      is executed without CAP_SYS_RAWIO capability (e.g., unprivileged users).
      [ sg_io() -> blk_fill_sghdr_rq() > blk_verify_command() -> -EPERM ]
      
      The problem can be reproduced with the sg_write_same command
      
        # sg_write_same --num 1 --xferlen 512 /dev/sda
        #
      
        # capsh --drop=cap_sys_rawio -- -c \
          'sg_write_same --num 1 --xferlen 512 /dev/sda'
          Write same: pass through os error: Operation not permitted
        #
      
      For comparison, the WRITE_VERIFY command does not observe this problem,
      since it is in that list:
      
        # capsh --drop=cap_sys_rawio -- -c \
          'sg_write_verify --num 1 --ilen 512 --lba 0 /dev/sda'
        #
      
      So, this patch adds the WRITE_SAME commands to the list, in order
      for the SG_IO ioctl to finish successfully:
      
        # capsh --drop=cap_sys_rawio -- -c \
          'sg_write_same --num 1 --xferlen 512 /dev/sda'
        #
      
      That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
      (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2]),
      which employs the SG_IO ioctl() and runs as an unprivileged user (libvirt-qemu).
      
      In that scenario, when a filesystem (e.g., ext4) performs its zero-out calls,
      which are translated to write-same calls in the guest kernel, and then into
      SG_IO ioctls to the host kernel, SCSI I/O errors may be observed in the guest:
      
        [...] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
        [...] sd 0:0:0:0: [sda] tag#0 Sense Key : Aborted Command [current]
        [...] sd 0:0:0:0: [sda] tag#0 Add. Sense: I/O process terminated
        [...] sd 0:0:0:0: [sda] tag#0 CDB: Write Same(10) 41 00 01 04 e0 78 00 00 08 00
        [...] blk_update_request: I/O error, dev sda, sector 17096824
      
      Links:
      [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
      [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: default avatarBrahadambal Srinivasan <latha@linux.vnet.ibm.com>
      Reported-by: default avatarManjunatha H R <manjuhr1@in.ibm.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f142c511
    • Henrik Ingo's avatar
      uvcvideo: uvc_scan_fallback() for webcams with broken chain · 68fc744f
      Henrik Ingo authored
      commit e950267a upstream.
      
      Some devices have invalid baSourceID references, causing uvc_scan_chain()
      to fail, but if we just take the entities we can find and put them
      together in the most sensible chain we can think of, turns out they do
      work anyway. Note: This heuristic assumes there is a single chain.
      
      At the time of writing, devices known to have such a broken chain are
        - Acer Integrated Camera (5986:055a)
        - Realtek rtl157a7 (0bda:57a7)
      Signed-off-by: default avatarHenrik Ingo <henrik.ingo@avoinelama.fi>
      Signed-off-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      68fc744f
    • Gabriel Krisman Bertazi's avatar
      serial: 8250_pci: Detach low-level driver during PCI error recovery · c7293aed
      Gabriel Krisman Bertazi authored
      commit f209fa03 upstream.
      
      During a PCI error recovery, like the ones provoked by EEH in the ppc64
      platform, all IO to the device must be blocked while the recovery is
      completed.  Current 8250_pci implementation only suspends the port
      instead of detaching it, which doesn't prevent incoming accesses like
      TIOCMGET and TIOCMSET calls from reaching the device.  Those end up
      racing with the EEH recovery, crashing it.  Similar races were also
      observed when opening the device and when shutting it down during
      recovery.
      
      This patch implements a more robust IO blockage for the 8250_pci
      recovery by unregistering the port at the beginning of the procedure and
      re-adding it afterwards.  Since the port is detached from the uart
      layer, we can be sure that no request will make through to the device
      during recovery.  This is similar to the solution used by the JSM serial
      driver.
      
      I thank Peter Hurley <peter@hurleysoftware.com> for valuable input on
      this one over one year ago.
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c7293aed
    • Joerg Roedel's avatar
      ACPI: Do not create a platform_device for IOAPIC/IOxAPIC · d233e2ef
      Joerg Roedel authored
      commit 08f63d97 upstream.
      
      No platform-device is required for IO(x)APICs, so don't even
      create them.
      
      [ rjw: This fixes a problem with leaking platform device objects
        after IOAPIC/IOxAPIC hot-removal events.]
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d233e2ef
    • Josh Poimboeuf's avatar
      ACPI: Fix incompatibility with mcount-based function graph tracing · 2ed2f05e
      Josh Poimboeuf authored
      commit 61b79e16 upstream.
      
      Paul Menzel reported a warning:
      
        WARNING: CPU: 0 PID: 774 at /build/linux-ROBWaj/linux-4.9.13/kernel/trace/trace_functions_graph.c:233 ftrace_return_to_handler+0x1aa/0x1e0
        Bad frame pointer: expected f6919d98, received f6919db0
          from func acpi_pm_device_sleep_wake return to c43b6f9d
      
      The warning means that function graph tracing is broken for the
      acpi_pm_device_sleep_wake() function.  That's because the ACPI Makefile
      unconditionally sets the '-Os' gcc flag to optimize for size.  That's an
      issue because mcount-based function graph tracing is incompatible with
      '-Os' on x86, thanks to the following gcc bug:
      
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109
      
      I have another patch pending which will ensure that mcount-based
      function graph tracing is never used with CONFIG_CC_OPTIMIZE_FOR_SIZE on
      x86.
      
      But this patch is needed in addition to that one because the ACPI
      Makefile overrides that config option for no apparent reason.  It has
      had this flag since the beginning of git history, and there's no related
      comment, so I don't know why it's there.  As far as I can tell, there's
      no reason for it to be there.  The appropriate behavior is for it to
      honor CONFIG_CC_OPTIMIZE_FOR_{SIZE,PERFORMANCE} like the rest of the
      kernel.
      Reported-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ed2f05e
    • Darrick J. Wong's avatar
      xfs: clear _XBF_PAGES from buffers when readahead page · 51d70cc7
      Darrick J. Wong authored
      commit 2aa6ba7b upstream.
      
      If we try to allocate memory pages to back an xfs_buf that we're trying
      to read, it's possible that we'll be so short on memory that the page
      allocation fails.  For a blocking read we'll just wait, but for
      readahead we simply dump all the pages we've collected so far.
      
      Unfortunately, after dumping the pages we neglect to clear the
      _XBF_PAGES state, which means that the subsequent call to xfs_buf_free
      thinks that b_pages still points to pages we own.  It then double-frees
      the b_pages pages.
      
      This results in screaming about negative page refcounts from the memory
      manager, which xfs oughtn't be triggering.  To reproduce this case,
      mount a filesystem where the size of the inodes far outweighs the
      availalble memory (a ~500M inode filesystem on a VM with 300MB memory
      did the trick here) and run bulkstat in parallel with other memory
      eating processes to put a huge load on the system.  The "check summary"
      phase of xfs_scrub also works for this purpose.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      51d70cc7
    • Eric Sandeen's avatar
      xfs: fix up xfs_swap_extent_forks inline extent handling · 20022ad6
      Eric Sandeen authored
      commit 4dfce57d upstream.
      
      There have been several reports over the years of NULL pointer
      dereferences in xfs_trans_log_inode during xfs_fsr processes,
      when the process is doing an fput and tearing down extents
      on the temporary inode, something like:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
          [exception RIP: xfs_trans_log_inode+0x10]
       #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
      #10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
      #11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
      #12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
      #13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
      #14 [ffff8800a57bbe00] evict at ffffffff811e1b67
      #15 [ffff8800a57bbe28] iput at ffffffff811e23a5
      #16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
      #17 [ffff8800a57bbe88] dput at ffffffff811dd06c
      #18 [ffff8800a57bbea8] __fput at ffffffff811c823b
      #19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
      #20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
      #21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
      #22 [ffff8800a57bbf50] int_signal at ffffffff8161405d
      
      As it turns out, this is because the i_itemp pointer, along
      with the d_ops pointer, has been overwritten with zeros
      when we tear down the extents during truncate.  When the in-core
      inode fork on the temporary inode used by xfs_fsr was originally
      set up during the extent swap, we mistakenly looked at di_nextents
      to determine whether all extents fit inline, but this misses extents
      generated by speculative preallocation; we should be using if_bytes
      instead.
      
      This mistake corrupts the in-memory inode, and code in
      xfs_iext_remove_inline eventually gets bad inputs, causing
      it to memmove and memset incorrect ranges; this became apparent
      because the two values in ifp->if_u2.if_inline_ext[1] contained
      what should have been in d_ops and i_itemp; they were memmoved due
      to incorrect array indexing and then the original locations
      were zeroed with memset, again due to an array overrun.
      
      Fix this by properly using i_df.if_bytes to determine the number
      of extents, not di_nextents.
      
      Thanks to dchinner for looking at this with me and spotting the
      root cause.
      
      [nborisov: backported to 4.4]
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20022ad6
    • Darrick J. Wong's avatar
      xfs: don't allow di_size with high bit set · 0f27d9dc
      Darrick J. Wong authored
      commit ef388e20 upstream.
      
      The on-disk field di_size is used to set i_size, which is a signed
      integer of loff_t.  If the high bit of di_size is set, we'll end up with
      a negative i_size, which will cause all sorts of problems.  Since the
      VFS won't let us create a file with such length, we should catch them
      here in the verifier too.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f27d9dc
    • Todd Fujinaka's avatar
      igb: add i211 to i210 PHY workaround · 230a372f
      Todd Fujinaka authored
      commit 5bc8c230 upstream.
      
      i210 and i211 share the same PHY but have different PCI IDs. Don't
      forget i211 for any i210 workarounds.
      Signed-off-by: default avatarTodd Fujinaka <todd.fujinaka@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      230a372f
    • Chris J Arges's avatar
      igb: Workaround for igb i210 firmware issue · 6446e0ba
      Chris J Arges authored
      commit 4e684f59 upstream.
      
      Sometimes firmware may not properly initialize I347AT4_PAGE_SELECT causing
      the probe of an igb i210 NIC to fail. This patch adds an addition zeroing
      of this register during igb_get_phy_id to workaround this issue.
      
      Thanks for Jochen Henneberg for the idea and original patch.
      Signed-off-by: default avatarChris J Arges <christopherarges@gmail.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6446e0ba
    • Koos Vriezen's avatar
      iommu/vt-d: Fix NULL pointer dereference in device_to_iommu · d1fad823
      Koos Vriezen authored
      commit 5003ae1e upstream.
      
      The function device_to_iommu() in the Intel VT-d driver
      lacks a NULL-ptr check, resulting in this oops at boot on
      some platforms:
      
       BUG: unable to handle kernel NULL pointer dereference at 00000000000007ab
       IP: [<ffffffff8132234a>] device_to_iommu+0x11a/0x1a0
       PGD 0
      
       [...]
      
       Call Trace:
         ? find_or_alloc_domain.constprop.29+0x1a/0x300
         ? dw_dma_probe+0x561/0x580 [dw_dmac_core]
         ? __get_valid_domain_for_dev+0x39/0x120
         ? __intel_map_single+0x138/0x180
         ? intel_alloc_coherent+0xb6/0x120
         ? sst_hsw_dsp_init+0x173/0x420 [snd_soc_sst_haswell_pcm]
         ? mutex_lock+0x9/0x30
         ? kernfs_add_one+0xdb/0x130
         ? devres_add+0x19/0x60
         ? hsw_pcm_dev_probe+0x46/0xd0 [snd_soc_sst_haswell_pcm]
         ? platform_drv_probe+0x30/0x90
         ? driver_probe_device+0x1ed/0x2b0
         ? __driver_attach+0x8f/0xa0
         ? driver_probe_device+0x2b0/0x2b0
         ? bus_for_each_dev+0x55/0x90
         ? bus_add_driver+0x110/0x210
         ? 0xffffffffa11ea000
         ? driver_register+0x52/0xc0
         ? 0xffffffffa11ea000
         ? do_one_initcall+0x32/0x130
         ? free_vmap_area_noflush+0x37/0x70
         ? kmem_cache_alloc+0x88/0xd0
         ? do_init_module+0x51/0x1c4
         ? load_module+0x1ee9/0x2430
         ? show_taint+0x20/0x20
         ? kernel_read_file+0xfd/0x190
         ? SyS_finit_module+0xa3/0xb0
         ? do_syscall_64+0x4a/0xb0
         ? entry_SYSCALL64_slow_path+0x25/0x25
       Code: 78 ff ff ff 4d 85 c0 74 ee 49 8b 5a 10 0f b6 9b e0 00 00 00 41 38 98 e0 00 00 00 77 da 0f b6 eb 49 39 a8 88 00 00 00 72 ce eb 8f <41> f6 82 ab 07 00 00 04 0f 85 76 ff ff ff 0f b6 4d 08 88 0e 49
       RIP  [<ffffffff8132234a>] device_to_iommu+0x11a/0x1a0
        RSP <ffffc90001457a78>
       CR2: 00000000000007ab
       ---[ end trace 16f974b6d58d0aad ]---
      
      Add the missing pointer check.
      
      Fixes: 1c387188 ("iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions")
      Signed-off-by: default avatarKoos Vriezen <koos.vriezen@gmail.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1fad823
    • Adrian Hunter's avatar
      mmc: sdhci: Do not disable interrupts while waiting for clock · fc09e721
      Adrian Hunter authored
      commit e2ebfb21 upstream.
      
      Disabling interrupts for even a millisecond can cause problems for some
      devices. That can happen when sdhci changes clock frequency because it
      waits for the clock to become stable under a spin lock.
      
      The spin lock is not necessary here. Anything that is racing with changes
      to the I/O state is already broken. The mmc core already provides
      synchronization via "claiming" the host.
      
      Although the spin lock probably should be removed from the code paths that
      lead to this point, such a patch would touch too much code to be suitable
      for stable trees. Consequently, for this patch, just drop the spin lock
      while waiting.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Tested-by: default avatarLudovic Desroches <ludovic.desroches@microchip.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc09e721
    • Eric Biggers's avatar
      ext4: mark inode dirty after converting inline directory · d4925dd3
      Eric Biggers authored
      commit b9cf625d upstream.
      
      If ext4_convert_inline_data() was called on a directory with inline
      data, the filesystem was left in an inconsistent state (as considered by
      e2fsck) because the file size was not increased to cover the new block.
      This happened because the inode was not marked dirty after i_disksize
      was updated.  Fix this by marking the inode dirty at the end of
      ext4_finish_convert_inline_dir().
      
      This bug was probably not noticed before because most users mark the
      inode dirty afterwards for other reasons.  But if userspace executed
      FS_IOC_SET_ENCRYPTION_POLICY with invalid parameters, as exercised by
      'kvm-xfstests -c adv generic/396', then the inode was never marked dirty
      after updating i_disksize.
      
      Fixes: 3c47d541Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4925dd3
    • Michael Engl's avatar
      iio: adc: ti_am335x_adc: fix fifo overrun recovery · 75a06cc2
      Michael Engl authored
      commit e83bb3e6 upstream.
      
      The tiadc_irq_h(int irq, void *private) function is handling FIFO
      overruns by clearing flags, disabling and enabling the ADC to
      recover.
      
      If the ADC is running in continuous mode a FIFO overrun happens
      regularly. If the disabling of the ADC happens concurrently with
      a new conversion. It might happen that the enabling of the ADC
      is ignored by the hardware. This stops the ADC permanently. No
      more interrupts are triggered.
      
      According to the AM335x Reference Manual (SPRUH73H October 2011 -
      Revised April 2013 - Chapter 12.4 and 12.5) it is necessary to
      check the ADC FSM bits in REG_ADCFSM before enabling the ADC
      again. Because the disabling of the ADC is done right after the
      current conversion has been finished.
      
      To trigger this bug it is necessary to run the ADC in continuous
      mode. The ADC values of all channels need to be read in an endless
      loop. The bug appears within the first 6 hours (~5.4 million
      handled FIFO overruns). The user space application will hang on
      reading new values from the character device.
      
      Fixes: ca9a5638 ("iio: ti_am335x_adc: Add continuous sampling
      support")
      Signed-off-by: default avatarMichael Engl <michael.engl@wjw-solutions.com>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75a06cc2
    • Johan Hovold's avatar
      USB: usbtmc: add missing endpoint sanity check · 7c83fd79
      Johan Hovold authored
      commit 687e0687 upstream.
      
      USBTMC devices are required to have a bulk-in and a bulk-out endpoint,
      but the driver failed to verify this, something which could lead to the
      endpoint addresses being taken from uninitialised memory.
      
      Make sure to zero all private data as part of allocation, and add the
      missing endpoint sanity check.
      
      Note that this also addresses a more recently introduced issue, where
      the interrupt-in-presence flag would also be uninitialised whenever the
      optional interrupt-in endpoint is not present. This in turn could lead
      to an interrupt urb being allocated, initialised and submitted based on
      uninitialised values.
      
      Fixes: dbf3e7f6 ("Implement an ioctl to support the USMTMC-USB488 READ_STATUS_BYTE operation.")
      Fixes: 5b775f67 ("USB: add USB test and measurement class driver")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      [ johan: backport to v4.4 ]
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c83fd79
    • Johan Hovold's avatar
      uwb: i1480-dfu: fix NULL-deref at probe · e2410dd8
      Johan Hovold authored
      commit 4ce36271 upstream.
      
      Make sure to check the number of endpoints to avoid dereferencing a
      NULL-pointer should a malicious device lack endpoints.
      
      Note that the dereference happens in the cmd and wait_init_done
      callbacks which are called during probe.
      
      Fixes: 1ba47da5 ("uwb: add the i1480 DFU driver")
      Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
      Cc: David Vrabel <david.vrabel@csr.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2410dd8
    • Johan Hovold's avatar
      uwb: hwa-rc: fix NULL-deref at probe · 14d75cda
      Johan Hovold authored
      commit daf229b1 upstream.
      
      Make sure to check the number of endpoints to avoid dereferencing a
      NULL-pointer should a malicious device lack endpoints.
      
      Note that the dereference happens in the start callback which is called
      during probe.
      
      Fixes: de520b8b ("uwb: add HWA radio controller driver")
      Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
      Cc: David Vrabel <david.vrabel@csr.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      14d75cda
    • Johan Hovold's avatar
      mmc: ushc: fix NULL-deref at probe · 87ddf660
      Johan Hovold authored
      commit 181302dc upstream.
      
      Make sure to check the number of endpoints to avoid dereferencing a
      NULL-pointer should a malicious device lack endpoints.
      
      Fixes: 53f3a9e2 ("mmc: USB SD Host Controller (USHC) driver")
      Cc: David Vrabel <david.vrabel@csr.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87ddf660
    • Eric Dumazet's avatar
      tcp: initialize icsk_ack.lrcvtime at session start time · 29d30b77
      Eric Dumazet authored
      commit 15bb7745 upstream.
      
      icsk_ack.lrcvtime has a 0 value at socket creation time.
      
      tcpi_last_data_recv can have bogus value if no payload is ever received.
      
      This patch initializes icsk_ack.lrcvtime for active sessions
      in tcp_finish_connect(), and for passive sessions in
      tcp_create_openreq_child()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29d30b77
    • Daniel Borkmann's avatar
      socket, bpf: fix sk_filter use after free in sk_clone_lock · 2a5b25a8
      Daniel Borkmann authored
      commit 95aa915c upstream.
      
      In sk_clone_lock(), we create a new socket and inherit most of the
      parent's members via sock_copy() which memcpy()'s various sections.
      Now, in case the parent socket had a BPF socket filter attached,
      then newsk->sk_filter points to the same instance as the original
      sk->sk_filter.
      
      sk_filter_charge() is then called on the newsk->sk_filter to take a
      reference and should that fail due to hitting max optmem, we bail
      out and release the newsk instance.
      
      The issue is that commit 278571ba ("net: filter: simplify socket
      charging") wrongly combined the dismantle path with the failure path
      of xfrm_sk_clone_policy(). This means, even when charging failed, we
      call sk_free_unlock_clone() on the newsk, which then still points to
      the same sk_filter as the original sk.
      
      Thus, sk_free_unlock_clone() calls into __sk_destruct() eventually
      where it tests for present sk_filter and calls sk_filter_uncharge()
      on it, which potentially lets sk_omem_alloc wrap around and releases
      the eBPF prog and sk_filter structure from the (still intact) parent.
      
      Fix it by making sure that when sk_filter_charge() failed, we reset
      newsk->sk_filter back to NULL before passing to sk_free_unlock_clone(),
      so that we don't mess with the parents sk_filter.
      
      Only if xfrm_sk_clone_policy() fails, we did reach the point where
      either the parent's filter was NULL and as a result newsk's as well
      or where we previously had a successful sk_filter_charge(), thus for
      that case, we do need sk_filter_uncharge() to release the prior taken
      reference on sk_filter.
      
      Fixes: 278571ba ("net: filter: simplify socket charging")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a5b25a8
    • Andrey Ulanov's avatar
      net: unix: properly re-increment inflight counter of GC discarded candidates · af60665c
      Andrey Ulanov authored
      commit 7df9c246 upstream.
      
      Dmitry has reported that a BUG_ON() condition in unix_notinflight()
      may be triggered by a simple code that forwards unix socket in an
      SCM_RIGHTS message.
      That is caused by incorrect unix socket GC implementation in unix_gc().
      
      The GC first collects list of candidates, then (a) decrements their
      "children's" inflight counter, (b) checks which inflight counters are
      now 0, and then (c) increments all inflight counters back.
      (a) and (c) are done by calling scan_children() with inc_inflight or
      dec_inflight as the second argument.
      
      Commit 6209344f ("net: unix: fix inflight counting bug in garbage
      collector") changed scan_children() such that it no longer considers
      sockets that do not have UNIX_GC_CANDIDATE flag. It also added a block
      of code that that unsets this flag _before_ invoking
      scan_children(, dec_iflight, ). This may lead to incorrect inflight
      counters for some sockets.
      
      This change fixes this bug by changing order of operations:
      UNIX_GC_CANDIDATE is now unset only after all inflight counters are
      restored to the original state.
      
        kernel BUG at net/unix/garbage.c:149!
        RIP: 0010:[<ffffffff8717ebf4>]  [<ffffffff8717ebf4>]
        unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149
        Call Trace:
         [<ffffffff8716cfbf>] unix_detach_fds.isra.19+0xff/0x170 net/unix/af_unix.c:1487
         [<ffffffff8716f6a9>] unix_destruct_scm+0xf9/0x210 net/unix/af_unix.c:1496
         [<ffffffff86a90a01>] skb_release_head_state+0x101/0x200 net/core/skbuff.c:655
         [<ffffffff86a9808a>] skb_release_all+0x1a/0x60 net/core/skbuff.c:668
         [<ffffffff86a980ea>] __kfree_skb+0x1a/0x30 net/core/skbuff.c:684
         [<ffffffff86a98284>] kfree_skb+0x184/0x570 net/core/skbuff.c:705
         [<ffffffff871789d5>] unix_release_sock+0x5b5/0xbd0 net/unix/af_unix.c:559
         [<ffffffff87179039>] unix_release+0x49/0x90 net/unix/af_unix.c:836
         [<ffffffff86a694b2>] sock_release+0x92/0x1f0 net/socket.c:570
         [<ffffffff86a6962b>] sock_close+0x1b/0x20 net/socket.c:1017
         [<ffffffff81a76b8e>] __fput+0x34e/0x910 fs/file_table.c:208
         [<ffffffff81a771da>] ____fput+0x1a/0x20 fs/file_table.c:244
         [<ffffffff81483ab0>] task_work_run+0x1a0/0x280 kernel/task_work.c:116
         [<     inline     >] exit_task_work include/linux/task_work.h:21
         [<ffffffff8141287a>] do_exit+0x183a/0x2640 kernel/exit.c:828
         [<ffffffff8141383e>] do_group_exit+0x14e/0x420 kernel/exit.c:931
         [<ffffffff814429d3>] get_signal+0x663/0x1880 kernel/signal.c:2307
         [<ffffffff81239b45>] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807
         [<ffffffff8100666a>] exit_to_usermode_loop+0x1ea/0x2d0
        arch/x86/entry/common.c:156
         [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
         [<ffffffff81009693>] syscall_return_slowpath+0x4d3/0x570
        arch/x86/entry/common.c:259
         [<ffffffff881478e6>] entry_SYSCALL_64_fastpath+0xc4/0xc6
      
      Link: https://lkml.org/lkml/2017/3/6/252Signed-off-by: default avatarAndrey Ulanov <andreyu@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: 6209344f ("net: unix: fix inflight counting bug in garbage collector")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af60665c
    • Eric Dumazet's avatar
      net: properly release sk_frag.page · 38ab6ca2
      Eric Dumazet authored
      commit 22a0e18e upstream.
      
      I mistakenly added the code to release sk->sk_frag in
      sk_common_release() instead of sk_destruct()
      
      TCP sockets using sk->sk_allocation == GFP_ATOMIC do no call
      sk_common_release() at close time, thus leaking one (order-3) page.
      
      iSCSI is using such sockets.
      
      Fixes: 5640f768 ("net: use a per task frag allocator")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38ab6ca2