1. 18 May, 2017 2 commits
    • Daniel Borkmann's avatar
      bpf: adjust verifier heuristics · 3c2ce60b
      Daniel Borkmann authored
      Current limits with regards to processing program paths do not
      really reflect today's needs anymore due to programs becoming
      more complex and verifier smarter, keeping track of more data
      such as const ALU operations, alignment tracking, spilling of
      PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
      smarter matching of what LLVM generates.
      
      This also comes with the side-effect that we result in fewer
      opportunities to prune search states and thus often need to do
      more work to prove safety than in the past due to different
      register states and stack layout where we mismatch. Generally,
      it's quite hard to determine what caused a sudden increase in
      complexity, it could be caused by something as trivial as a
      single branch somewhere at the beginning of the program where
      LLVM assigned a stack slot that is marked differently throughout
      other branches and thus causing a mismatch, where verifier
      then needs to prove safety for the whole rest of the program.
      Subsequently, programs with even less than half the insn size
      limit can get rejected. We noticed that while some programs
      load fine under pre 4.11, they get rejected due to hitting
      limits on more recent kernels. We saw that in the vast majority
      of cases (90+%) pruning failed due to register mismatches. In
      case of stack mismatches, majority of cases failed due to
      different stack slot types (invalid, spill, misc) rather than
      differences in spilled registers.
      
      This patch makes pruning more aggressive by also adding markers
      that sit at conditional jumps as well. Currently, we only mark
      jump targets for pruning. For example in direct packet access,
      these are usually error paths where we bail out. We found that
      adding these markers, it can reduce number of processed insns
      by up to 30%. Another option is to ignore reg->id in probing
      PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
      slightly as well by up to 7% observed complexity reduction as
      stand-alone. Meaning, if a previous path with register type
      PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
      in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
      the same map X must be safe as well. Last but not least the
      patch also adds a scheduling point and bumps the current limit
      for instructions to be processed to a more adequate value.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c2ce60b
    • David S. Miller's avatar
      ipv6: Check ip6_find_1stfragopt() return value properly. · 7dd7eb95
      David S. Miller authored
      Do not use unsigned variables to see if it returns a negative
      error or not.
      
      Fixes: 2423496a ("ipv6: Prevent overrun when parsing v6 header options")
      Reported-by: default avatarJulia Lawall <julia.lawall@lip6.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7dd7eb95
  2. 17 May, 2017 8 commits
    • Yonghong Song's avatar
      selftests/bpf: fix broken build due to types.h · 579f1d92
      Yonghong Song authored
      Commit 0a5539f6 ("bpf: Provide a linux/types.h override
      for bpf selftests.") caused a build failure for tools/testing/selftest/bpf
      because of some missing types:
          $ make -C tools/testing/selftests/bpf/
          ...
          In file included from /home/yhs/work/net-next/tools/testing/selftests/bpf/test_pkt_access.c:8:
          ../../../include/uapi/linux/bpf.h:170:3: error: unknown type name '__aligned_u64'
                          __aligned_u64   key;
          ...
          /usr/include/linux/swab.h:160:8: error: unknown type name '__always_inline'
          static __always_inline __u16 __swab16p(const __u16 *p)
          ...
      The type __aligned_u64 is defined in linux:include/uapi/linux/types.h.
      
      The fix is to copy missing type definition into
      tools/testing/selftests/bpf/include/uapi/linux/types.h.
      Adding additional include "string.h" resolves __always_inline issue.
      
      Fixes: 0a5539f6 ("bpf: Provide a linux/types.h override for bpf selftests.")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      579f1d92
    • David S. Miller's avatar
      Merge branch 'bnxt_en-DCBX-fixes' · f917174c
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: DCBX fixes.
      
      2 bug fixes for the case where the NIC's firmware DCBX agent is enabled.
      With these fixes, we will return the proper information to lldpad.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f917174c
    • Michael Chan's avatar
      bnxt_en: Check status of firmware DCBX agent before setting DCB_CAP_DCBX_HOST. · f667724b
      Michael Chan authored
      Otherwise, all the host based DCBX settings from lldpad will fail if the
      firmware DCBX agent is running.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f667724b
    • Michael Chan's avatar
      bnxt_en: Call bnxt_dcb_init() after getting firmware DCBX configuration. · 87fe6032
      Michael Chan authored
      In the current code, bnxt_dcb_init() is called too early before we
      determine if the firmware DCBX agent is running or not.  As a result,
      we are not setting the DCB_CAP_DCBX_HOST and DCB_CAP_DCBX_LLD_MANAGED
      flags properly to report to DCBNL.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87fe6032
    • Eric Dumazet's avatar
      net: fix compile error in skb_orphan_partial() · 9142e900
      Eric Dumazet authored
      If CONFIG_INET is not set, net/core/sock.c can not compile :
      
      net/core/sock.c: In function ‘skb_orphan_partial’:
      net/core/sock.c:1810:2: error: implicit declaration of function
      ‘skb_is_tcp_pure_ack’ [-Werror=implicit-function-declaration]
        if (skb_is_tcp_pure_ack(skb))
        ^
      
      Fix this by always including <net/tcp.h>
      
      Fixes: f6ba8d33 ("netem: fix skb_orphan_partial()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9142e900
    • Craig Gallek's avatar
      ipv6: Prevent overrun when parsing v6 header options · 2423496a
      Craig Gallek authored
      The KASAN warning repoted below was discovered with a syzkaller
      program.  The reproducer is basically:
        int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
        send(s, &one_byte_of_data, 1, MSG_MORE);
        send(s, &more_than_mtu_bytes_data, 2000, 0);
      
      The socket() call sets the nexthdr field of the v6 header to
      NEXTHDR_HOP, the first send call primes the payload with a non zero
      byte of data, and the second send call triggers the fragmentation path.
      
      The fragmentation code tries to parse the header options in order
      to figure out where to insert the fragment option.  Since nexthdr points
      to an invalid option, the calculation of the size of the network header
      can made to be much larger than the linear section of the skb and data
      is read outside of it.
      
      This fix makes ip6_find_1stfrag return an error if it detects
      running out-of-bounds.
      
      [   42.361487] ==================================================================
      [   42.364412] BUG: KASAN: slab-out-of-bounds in ip6_fragment+0x11c8/0x3730
      [   42.365471] Read of size 840 at addr ffff88000969e798 by task ip6_fragment-oo/3789
      [   42.366469]
      [   42.366696] CPU: 1 PID: 3789 Comm: ip6_fragment-oo Not tainted 4.11.0+ #41
      [   42.367628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
      [   42.368824] Call Trace:
      [   42.369183]  dump_stack+0xb3/0x10b
      [   42.369664]  print_address_description+0x73/0x290
      [   42.370325]  kasan_report+0x252/0x370
      [   42.370839]  ? ip6_fragment+0x11c8/0x3730
      [   42.371396]  check_memory_region+0x13c/0x1a0
      [   42.371978]  memcpy+0x23/0x50
      [   42.372395]  ip6_fragment+0x11c8/0x3730
      [   42.372920]  ? nf_ct_expect_unregister_notifier+0x110/0x110
      [   42.373681]  ? ip6_copy_metadata+0x7f0/0x7f0
      [   42.374263]  ? ip6_forward+0x2e30/0x2e30
      [   42.374803]  ip6_finish_output+0x584/0x990
      [   42.375350]  ip6_output+0x1b7/0x690
      [   42.375836]  ? ip6_finish_output+0x990/0x990
      [   42.376411]  ? ip6_fragment+0x3730/0x3730
      [   42.376968]  ip6_local_out+0x95/0x160
      [   42.377471]  ip6_send_skb+0xa1/0x330
      [   42.377969]  ip6_push_pending_frames+0xb3/0xe0
      [   42.378589]  rawv6_sendmsg+0x2051/0x2db0
      [   42.379129]  ? rawv6_bind+0x8b0/0x8b0
      [   42.379633]  ? _copy_from_user+0x84/0xe0
      [   42.380193]  ? debug_check_no_locks_freed+0x290/0x290
      [   42.380878]  ? ___sys_sendmsg+0x162/0x930
      [   42.381427]  ? rcu_read_lock_sched_held+0xa3/0x120
      [   42.382074]  ? sock_has_perm+0x1f6/0x290
      [   42.382614]  ? ___sys_sendmsg+0x167/0x930
      [   42.383173]  ? lock_downgrade+0x660/0x660
      [   42.383727]  inet_sendmsg+0x123/0x500
      [   42.384226]  ? inet_sendmsg+0x123/0x500
      [   42.384748]  ? inet_recvmsg+0x540/0x540
      [   42.385263]  sock_sendmsg+0xca/0x110
      [   42.385758]  SYSC_sendto+0x217/0x380
      [   42.386249]  ? SYSC_connect+0x310/0x310
      [   42.386783]  ? __might_fault+0x110/0x1d0
      [   42.387324]  ? lock_downgrade+0x660/0x660
      [   42.387880]  ? __fget_light+0xa1/0x1f0
      [   42.388403]  ? __fdget+0x18/0x20
      [   42.388851]  ? sock_common_setsockopt+0x95/0xd0
      [   42.389472]  ? SyS_setsockopt+0x17f/0x260
      [   42.390021]  ? entry_SYSCALL_64_fastpath+0x5/0xbe
      [   42.390650]  SyS_sendto+0x40/0x50
      [   42.391103]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.391731] RIP: 0033:0x7fbbb711e383
      [   42.392217] RSP: 002b:00007ffff4d34f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   42.393235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbbb711e383
      [   42.394195] RDX: 0000000000001000 RSI: 00007ffff4d34f60 RDI: 0000000000000003
      [   42.395145] RBP: 0000000000000046 R08: 00007ffff4d34f40 R09: 0000000000000018
      [   42.396056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400aad
      [   42.396598] R13: 0000000000000066 R14: 00007ffff4d34ee0 R15: 00007fbbb717af00
      [   42.397257]
      [   42.397411] Allocated by task 3789:
      [   42.397702]  save_stack_trace+0x16/0x20
      [   42.398005]  save_stack+0x46/0xd0
      [   42.398267]  kasan_kmalloc+0xad/0xe0
      [   42.398548]  kasan_slab_alloc+0x12/0x20
      [   42.398848]  __kmalloc_node_track_caller+0xcb/0x380
      [   42.399224]  __kmalloc_reserve.isra.32+0x41/0xe0
      [   42.399654]  __alloc_skb+0xf8/0x580
      [   42.400003]  sock_wmalloc+0xab/0xf0
      [   42.400346]  __ip6_append_data.isra.41+0x2472/0x33d0
      [   42.400813]  ip6_append_data+0x1a8/0x2f0
      [   42.401122]  rawv6_sendmsg+0x11ee/0x2db0
      [   42.401505]  inet_sendmsg+0x123/0x500
      [   42.401860]  sock_sendmsg+0xca/0x110
      [   42.402209]  ___sys_sendmsg+0x7cb/0x930
      [   42.402582]  __sys_sendmsg+0xd9/0x190
      [   42.402941]  SyS_sendmsg+0x2d/0x50
      [   42.403273]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.403718]
      [   42.403871] Freed by task 1794:
      [   42.404146]  save_stack_trace+0x16/0x20
      [   42.404515]  save_stack+0x46/0xd0
      [   42.404827]  kasan_slab_free+0x72/0xc0
      [   42.405167]  kfree+0xe8/0x2b0
      [   42.405462]  skb_free_head+0x74/0xb0
      [   42.405806]  skb_release_data+0x30e/0x3a0
      [   42.406198]  skb_release_all+0x4a/0x60
      [   42.406563]  consume_skb+0x113/0x2e0
      [   42.406910]  skb_free_datagram+0x1a/0xe0
      [   42.407288]  netlink_recvmsg+0x60d/0xe40
      [   42.407667]  sock_recvmsg+0xd7/0x110
      [   42.408022]  ___sys_recvmsg+0x25c/0x580
      [   42.408395]  __sys_recvmsg+0xd6/0x190
      [   42.408753]  SyS_recvmsg+0x2d/0x50
      [   42.409086]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.409513]
      [   42.409665] The buggy address belongs to the object at ffff88000969e780
      [   42.409665]  which belongs to the cache kmalloc-512 of size 512
      [   42.410846] The buggy address is located 24 bytes inside of
      [   42.410846]  512-byte region [ffff88000969e780, ffff88000969e980)
      [   42.411941] The buggy address belongs to the page:
      [   42.412405] page:ffffea000025a780 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
      [   42.413298] flags: 0x100000000008100(slab|head)
      [   42.413729] raw: 0100000000008100 0000000000000000 0000000000000000 00000001800c000c
      [   42.414387] raw: ffffea00002a9500 0000000900000007 ffff88000c401280 0000000000000000
      [   42.415074] page dumped because: kasan: bad access detected
      [   42.415604]
      [   42.415757] Memory state around the buggy address:
      [   42.416222]  ffff88000969e880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   42.416904]  ffff88000969e900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   42.417591] >ffff88000969e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   42.418273]                    ^
      [   42.418588]  ffff88000969ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   42.419273]  ffff88000969ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   42.419882] ==================================================================
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2423496a
    • Ihar Hrachyshka's avatar
      neighbour: update neigh timestamps iff update is effective · 77d71233
      Ihar Hrachyshka authored
      It's a common practice to send gratuitous ARPs after moving an
      IP address to another device to speed up healing of a service. To
      fulfill service availability constraints, the timing of network peers
      updating their caches to point to a new location of an IP address can be
      particularly important.
      
      Sometimes neigh_update calls won't touch neither lladdr nor state, for
      example if an update arrives in locktime interval. The neigh->updated
      value is tested by the protocol specific neigh code, which in turn
      will influence whether NEIGH_UPDATE_F_OVERRIDE gets set in the
      call to neigh_update() or not. As a result, we may effectively ignore
      the update request, bailing out of touching the neigh entry, except that
      we still bump its timestamps inside neigh_update.
      
      This may be a problem for updates arriving in quick succession. For
      example, consider the following scenario:
      
      A service is moved to another device with its IP address. The new device
      sends three gratuitous ARP requests into the network with ~1 seconds
      interval between them. Just before the first request arrives to one of
      network peer nodes, its neigh entry for the IP address transitions from
      STALE to DELAY.  This transition, among other things, updates
      neigh->updated. Once the kernel receives the first gratuitous ARP, it
      ignores it because its arrival time is inside the locktime interval. The
      kernel still bumps neigh->updated. Then the second gratuitous ARP
      request arrives, and it's also ignored because it's still in the (new)
      locktime interval. Same happens for the third request. The node
      eventually heals itself (after delay_first_probe_time seconds since the
      initial transition to DELAY state), but it just wasted some time and
      require a new ARP request/reply round trip. This unfortunate behaviour
      both puts more load on the network, as well as reduces service
      availability.
      
      This patch changes neigh_update so that it bumps neigh->updated (as well
      as neigh->confirmed) only once we are sure that either lladdr or entry
      state will change). In the scenario described above, it means that the
      second gratuitous ARP request will actually update the entry lladdr.
      
      Ideally, we would update the neigh entry on the very first gratuitous
      ARP request. The locktime mechanism is designed to ignore ARP updates in
      a short timeframe after a previous ARP update was honoured by the kernel
      layer. This would require tracking timestamps for state transitions
      separately from timestamps when actual updates are received. This would
      probably involve changes in neighbour struct. Therefore, the patch
      doesn't tackle the issue of the first gratuitous APR ignored, leaving
      it for a follow-up.
      Signed-off-by: default avatarIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77d71233
    • Ihar Hrachyshka's avatar
      arp: honour gratuitous ARP _replies_ · 23d268eb
      Ihar Hrachyshka authored
      When arp_accept is 1, gratuitous ARPs are supposed to override matching
      entries irrespective of whether they arrive during locktime. This was
      implemented in commit 56022a8f ("ipv4: arp: update neighbour address
      when a gratuitous arp is received and arp_accept is set")
      
      There is a glitch in the patch though. RFC 2002, section 4.6, "ARP,
      Proxy ARP, and Gratuitous ARP", defines gratuitous ARPs so that they can
      be either of Request or Reply type. Those Reply gratuitous ARPs can be
      triggered with standard tooling, for example, arping -A option does just
      that.
      
      This patch fixes the glitch, making both Request and Reply flavours of
      gratuitous ARPs to behave identically.
      
      As per RFC, if gratuitous ARPs are of Reply type, their Target Hardware
      Address field should also be set to the link-layer address to which this
      cache entry should be updated. The field is present in ARP over Ethernet
      but not in IEEE 1394. In this patch, I don't consider any broadcasted
      ARP replies as gratuitous if the field is not present, to conform the
      standard. It's not clear whether there is such a thing for IEEE 1394 as
      a gratuitous ARP reply; until it's cleared up, we will ignore such
      broadcasts. Note that they will still update existing ARP cache entries,
      assuming they arrive out of locktime time interval.
      Signed-off-by: default avatarIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23d268eb
  3. 16 May, 2017 6 commits
  4. 15 May, 2017 14 commits
  5. 14 May, 2017 5 commits
  6. 13 May, 2017 5 commits
    • Linus Torvalds's avatar
      Linux 4.12-rc1 · 2ea659a9
      Linus Torvalds authored
      2ea659a9
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · cd636458
      Linus Torvalds authored
      Pull some more input subsystem updates from Dmitry Torokhov:
       "An updated xpad driver with a few more recognized device IDs, and a
        new psxpad-spi driver, allowing connecting Playstation 1 and 2 joypads
        via SPI bus"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: cros_ec_keyb - remove extraneous 'const'
        Input: add support for PlayStation 1/2 joypads connected via SPI
        Input: xpad - add USB IDs for Mad Catz Brawlstick and Razer Sabertooth
        Input: xpad - sync supported devices with xboxdrv
        Input: xpad - sort supported devices by USB ID
      cd636458
    • Linus Torvalds's avatar
      Merge tag 'upstream-4.12-rc1' of git://git.infradead.org/linux-ubifs · b53c4d5e
      Linus Torvalds authored
      Pull UBI/UBIFS updates from Richard Weinberger:
      
       - new config option CONFIG_UBIFS_FS_SECURITY
      
       - minor improvements
      
       - random fixes
      
      * tag 'upstream-4.12-rc1' of git://git.infradead.org/linux-ubifs:
        ubi: Add debugfs file for tracking PEB state
        ubifs: Fix a typo in comment of ioctl2ubifs & ubifs2ioctl
        ubifs: Remove unnecessary assignment
        ubifs: Fix cut and paste error on sb type comparisons
        ubi: fastmap: Fix slab corruption
        ubifs: Add CONFIG_UBIFS_FS_SECURITY to disable/enable security labels
        ubi: Make mtd parameter readable
        ubi: Fix section mismatch
      b53c4d5e
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · ec059019
      Linus Torvalds authored
      Pull UML fixes from Richard Weinberger:
       "No new stuff, just fixes"
      
      * 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
        um: Add missing NR_CPUS include
        um: Fix to call read_initrd after init_bootmem
        um: Include kbuild.h instead of duplicating its macros
        um: Fix PTRACE_POKEUSER on x86_64
        um: Set number of CPUs
        um: Fix _print_addr()
      ec059019
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 1251704a
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "15 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm, docs: update memory.stat description with workingset* entries
        mm: vmscan: scan until it finds eligible pages
        mm, thp: copying user pages must schedule on collapse
        dax: fix PMD data corruption when fault races with write
        dax: fix data corruption when fault races with write
        ext4: return to starting transaction in ext4_dax_huge_fault()
        mm: fix data corruption due to stale mmap reads
        dax: prevent invalidation of mapped DAX entries
        Tigran has moved
        mm, vmalloc: fix vmalloc users tracking properly
        mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin
        gcov: support GCC 7.1
        mm, vmstat: Remove spurious WARN() during zoneinfo print
        time: delete current_fs_time()
        hwpoison, memcg: forcibly uncharge LRU pages
      1251704a