1. 06 May, 2014 15 commits
    • Will Woods's avatar
      fanotify: fix -EOVERFLOW with large files on 64-bit · 1e2ee49f
      Will Woods authored
      On 64-bit systems, O_LARGEFILE is automatically added to flags inside
      the open() syscall (also openat(), blkdev_open(), etc).  Userspace
      therefore defines O_LARGEFILE to be 0 - you can use it, but it's a
      no-op.  Everything should be O_LARGEFILE by default.
      
      But: when fanotify does create_fd() it uses dentry_open(), which skips
      all that.  And userspace can't set O_LARGEFILE in fanotify_init()
      because it's defined to 0.  So if fanotify gets an event regarding a
      large file, the read() will just fail with -EOVERFLOW.
      
      This patch adds O_LARGEFILE to fanotify_init()'s event_f_flags on 64-bit
      systems, using the same test as open()/openat()/etc.
      
      Addresses https://bugzilla.redhat.com/show_bug.cgi?id=696821Signed-off-by: default avatarWill Woods <wwoods@redhat.com>
      Acked-by: default avatarEric Paris <eparis@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e2ee49f
    • Christoph Lameter's avatar
      slub: use sysfs'es release mechanism for kmem_cache · 41a21285
      Christoph Lameter authored
      debugobjects warning during netfilter exit:
      
          ------------[ cut here ]------------
          WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0()
          ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
          Modules linked in:
          CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G        W 3.11.0-next-20130906-sasha #3984
          Workqueue: netns cleanup_net
          Call Trace:
            dump_stack+0x52/0x87
            warn_slowpath_common+0x8c/0xc0
            warn_slowpath_fmt+0x46/0x50
            debug_print_object+0x8d/0xb0
            __debug_check_no_obj_freed+0xa5/0x220
            debug_check_no_obj_freed+0x15/0x20
            kmem_cache_free+0x197/0x340
            kmem_cache_destroy+0x86/0xe0
            nf_conntrack_cleanup_net_list+0x131/0x170
            nf_conntrack_pernet_exit+0x5d/0x70
            ops_exit_list+0x5e/0x70
            cleanup_net+0xfb/0x1c0
            process_one_work+0x338/0x550
            worker_thread+0x215/0x350
            kthread+0xe7/0xf0
            ret_from_fork+0x7c/0xb0
      
      Also during dcookie cleanup:
      
          WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0()
          ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
          Modules linked in:
          CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408
          Call Trace:
            dump_stack (lib/dump_stack.c:52)
            warn_slowpath_common (kernel/panic.c:430)
            warn_slowpath_fmt (kernel/panic.c:445)
            debug_print_object (lib/debugobjects.c:262)
            __debug_check_no_obj_freed (lib/debugobjects.c:697)
            debug_check_no_obj_freed (lib/debugobjects.c:726)
            kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717)
            kmem_cache_destroy (mm/slab_common.c:363)
            dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343)
            event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153)
            __fput (fs/file_table.c:217)
            ____fput (fs/file_table.c:253)
            task_work_run (kernel/task_work.c:125 (discriminator 1))
            do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751)
            int_signal (arch/x86/kernel/entry_64.S:807)
      
      Sysfs has a release mechanism.  Use that to release the kmem_cache
      structure if CONFIG_SYSFS is enabled.
      
      Only slub is changed - slab currently only supports /proc/slabinfo and
      not /sys/kernel/slab/*.  We talked about adding that and someone was
      working on it.
      
      [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build]
      [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more]
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Acked-by: default avatarGreg KH <greg@kroah.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      41a21285
    • Johannes Weiner's avatar
      revert "mm: vmscan: do not swap anon pages just because free+file is low" · 62376251
      Johannes Weiner authored
      This reverts commit 0bf1457f ("mm: vmscan: do not swap anon pages
      just because free+file is low") because it introduced a regression in
      mostly-anonymous workloads, where reclaim would become ineffective and
      trap every allocating task in direct reclaim.
      
      The problem is that there is a runaway feedback loop in the scan balance
      between file and anon, where the balance tips heavily towards a tiny
      thrashing file LRU and anonymous pages are no longer being looked at.
      The commit in question removed the safe guard that would detect such
      situations and respond with forced anonymous reclaim.
      
      This commit was part of a series to fix premature swapping in loads with
      relatively little cache, and while it made a small difference, the cure
      is obviously worse than the disease.  Revert it.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: default avatarRafael Aquini <aquini@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: <stable@kernel.org>		[3.12+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      62376251
    • Ian Kent's avatar
      autofs: fix lockref lookup · 6b6751f7
      Ian Kent authored
      autofs needs to be able to see private data dentry flags for its dentrys
      that are being created but not yet hashed and for its dentrys that have
      been rmdir()ed but not yet freed.  It needs to do this so it can block
      processes in these states until a status has been returned to indicate
      the given operation is complete.
      
      It does this by keeping two lists, active and expring, of dentrys in
      this state and uses ->d_release() to keep them stable while it checks
      the reference count to determine if they should be used.
      
      But with the recent lockref changes dentrys being freed sometimes don't
      transition to a reference count of 0 before being freed so autofs can
      occassionally use a dentry that is invalid which can lead to a panic.
      Signed-off-by: default avatarIan Kent <raven@themaw.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6b6751f7
    • Johannes Weiner's avatar
      mm: filemap: update find_get_pages_tag() to deal with shadow entries · 139b6a6f
      Johannes Weiner authored
      Dave Jones reports the following crash when find_get_pages_tag() runs
      into an exceptional entry:
      
        kernel BUG at mm/filemap.c:1347!
        RIP: find_get_pages_tag+0x1cb/0x220
        Call Trace:
          find_get_pages_tag+0x36/0x220
          pagevec_lookup_tag+0x21/0x30
          filemap_fdatawait_range+0xbe/0x1e0
          filemap_fdatawait+0x27/0x30
          sync_inodes_sb+0x204/0x2a0
          sync_inodes_one_sb+0x19/0x20
          iterate_supers+0xb2/0x110
          sys_sync+0x44/0xb0
          ia32_do_call+0x13/0x13
      
        1343                         /*
        1344                          * This function is never used on a shmem/tmpfs
        1345                          * mapping, so a swap entry won't be found here.
        1346                          */
        1347                         BUG();
      
      After commit 0cd6144a ("mm + fs: prepare for non-page entries in
      page cache radix trees") this comment and BUG() are out of date because
      exceptional entries can now appear in all mappings - as shadows of
      recently evicted pages.
      
      However, as Hugh Dickins notes,
      
        "it is truly surprising for a PAGECACHE_TAG_WRITEBACK (and probably
         any other PAGECACHE_TAG_*) to appear on an exceptional entry.
      
         I expect it comes down to an occasional race in RCU lookup of the
         radix_tree: lacking absolute synchronization, we might sometimes
         catch an exceptional entry, with the tag which really belongs with
         the unexceptional entry which was there an instant before."
      
      And indeed, not only is the tree walk lockless, the tags are also read
      in chunks, one radix tree node at a time.  There is plenty of time for
      page reclaim to swoop in and replace a page that was already looked up
      as tagged with a shadow entry.
      
      Remove the BUG() and update the comment.  While reviewing all other
      lookup sites for whether they properly deal with shadow entries of
      evicted pages, update all the comments and fix memcg file charge moving
      to not miss shmem/tmpfs swapcache pages.
      
      Fixes: 0cd6144a ("mm + fs: prepare for non-page entries in page cache radix trees")
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      139b6a6f
    • Vlastimil Babka's avatar
      mm/compaction: make isolate_freepages start at pageblock boundary · 49e068f0
      Vlastimil Babka authored
      The compaction freepage scanner implementation in isolate_freepages()
      starts by taking the current cc->free_pfn value as the first pfn.  In a
      for loop, it scans from this first pfn to the end of the pageblock, and
      then subtracts pageblock_nr_pages from the first pfn to obtain the first
      pfn for the next for loop iteration.
      
      This means that when cc->free_pfn starts at offset X rather than being
      aligned on pageblock boundary, the scanner will start at offset X in all
      scanned pageblock, ignoring potentially many free pages.  Currently this
      can happen when
      
       a) zone's end pfn is not pageblock aligned, or
      
       b) through zone->compact_cached_free_pfn with CONFIG_HOLES_IN_ZONE
          enabled and a hole spanning the beginning of a pageblock
      
      This patch fixes the problem by aligning the initial pfn in
      isolate_freepages() to pageblock boundary.  This also permits replacing
      the end-of-pageblock alignment within the for loop with a simple
      pageblock_nr_pages increment.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarHeesub Shin <heesub.shin@samsung.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Dongjun Shin <d.j.shin@samsung.com>
      Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      49e068f0
    • Seth Jennings's avatar
      MAINTAINERS: zswap/zbud: change maintainer email address · 0e3b7e54
      Seth Jennings authored
      sjenning@linux.vnet.ibm.com is no longer a viable entity.
      Signed-off-by: default avatarSeth Jennings <sjennings@variantweb.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0e3b7e54
    • Rik van Riel's avatar
      mm/page-writeback.c: fix divide by zero in pos_ratio_polynom · d5c9fde3
      Rik van Riel authored
      It is possible for "limit - setpoint + 1" to equal zero, after getting
      truncated to a 32 bit variable, and resulting in a divide by zero error.
      
      Using the fully 64 bit divide functions avoids this problem.  It also
      will cause pos_ratio_polynom() to return the correct value when
      (setpoint - limit) exceeds 2^32.
      
      Also uninline pos_ratio_polynom, at Andrew's request.
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d5c9fde3
    • Nishanth Aravamudan's avatar
      hugetlb: ensure hugepage access is denied if hugepages are not supported · 457c1b27
      Nishanth Aravamudan authored
      Currently, I am seeing the following when I `mount -t hugetlbfs /none
      /dev/hugetlbfs`, and then simply do a `ls /dev/hugetlbfs`.  I think it's
      related to the fact that hugetlbfs is properly not correctly setting
      itself up in this state?:
      
        Unable to handle kernel paging request for data at address 0x00000031
        Faulting instruction address: 0xc000000000245710
        Oops: Kernel access of bad area, sig: 11 [#1]
        SMP NR_CPUS=2048 NUMA pSeries
        ....
      
      In KVM guests on Power, in a guest not backed by hugepages, we see the
      following:
      
        AnonHugePages:         0 kB
        HugePages_Total:       0
        HugePages_Free:        0
        HugePages_Rsvd:        0
        HugePages_Surp:        0
        Hugepagesize:         64 kB
      
      HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
      are not supported at boot-time, but this is only checked in
      hugetlb_init().  Extract the check to a helper function, and use it in a
      few relevant places.
      
      This does make hugetlbfs not supported (not registered at all) in this
      environment.  I believe this is fine, as there are no valid hugepages
      and that won't change at runtime.
      
      [akpm@linux-foundation.org: use pr_info(), per Mel]
      [akpm@linux-foundation.org: fix build when HPAGE_SHIFT is undefined]
      Signed-off-by: default avatarNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      457c1b27
    • Vladimir Davydov's avatar
      slub: fix memcg_propagate_slab_attrs · 93030d83
      Vladimir Davydov authored
      After creating a cache for a memcg we should initialize its sysfs attrs
      with the values from its parent.  That's what memcg_propagate_slab_attrs
      is for.  Currently it's broken - we clearly muddled root-vs-memcg caches
      there.  Let's fix it up.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      93030d83
    • Chris Cui's avatar
      drivers/rtc/rtc-pcf8523.c: fix month definition · 35738392
      Chris Cui authored
      PCF8523 uses 1-12 to represent month according to datasheet.
      link: www.nxp.com/documents/data_sheet/PCF8523.pdf.
      Signed-off-by: default avatarChris Cui <chris.wei.cui@gmail.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      35738392
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 256cf4c4
      Linus Torvalds authored
      Pull fuse fixes from Miklos Szeredi:
       "This adds ctime update in the new cached writeback mode and also
        fixes/simplifies the mtime update handling.  Support for rename flags
        (aka renameat2) is also added to the userspace API"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
        fuse: add renameat2 support
        fuse: clear MS_I_VERSION
        fuse: clear FUSE_I_CTIME_DIRTY flag on setattr
        fuse: trust kernel i_ctime only
        fuse: remove .update_time
        fuse: allow ctime flushing to userspace
        fuse: fuse: add time_gran to INIT_OUT
        fuse: add .write_inode
        fuse: clean up fsync
        fuse: fuse: fallocate: use file_update_time()
        fuse: update mtime on open(O_TRUNC) in atomic_o_trunc mode
        fuse: update mtime on truncate(2)
        fuse: do not use uninitialized i_mode
        fuse: fix mtime update error in fsync
        fuse: check fallocate mode
        fuse: add __exit to fuse_ctl_cleanup
      256cf4c4
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 9bd29c56
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "I've been auditing the THP support on sparc64 and found several bugs,
        hopefully most of which are fixed completely here.
      
        Also an RT kernel locking fix from Kirill Tkhai"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Give more detailed information in {pgd,pmd}_ERROR() and kill pte_ERROR().
        sparc64: Add basic validations to {pud,pmd}_bad().
        sparc64: Use 'ILOG2_4MB' instead of constant '22'.
        sparc64: Fix range check in kern_addr_valid().
        sparc64: Fix top-level fault handling bugs.
        sparc64: Handle 32-bit tasks properly in compute_effective_address().
        sparc64: Don't use _PAGE_PRESENT in pte_modify() mask.
        sparc64: Fix hex values in comment above pte_modify().
        sparc64: Fix bugs in get_user_pages_fast() wrt. THP.
        sparc64: Fix huge PMD invalidation.
        sparc64: Fix executable bit testing in set_pmd_at() paths.
        sparc64: Normalize NMI watchdog logging and behavior.
        sparc64: Make itc_sync_lock raw
        sparc64: Fix argument sign extension for compat_sys_futex().
      9bd29c56
    • David Miller's avatar
      slab: Fix off by one in object max number tests. · 30321c7b
      David Miller authored
      If freelist_idx_t is a byte, SLAB_OBJ_MAX_NUM should be 255 not 256, and
      likewise if freelist_idx_t is a short, then it should be 65535 not
      65536.
      
      This was leading to all kinds of random crashes on sparc64 where
      PAGE_SIZE is 8192.  One problem shown was that if spinlock debugging was
      enabled, we'd get deadlocks in copy_pte_range() or do_wp_page() with the
      same cpu already holding a lock it shouldn't hold, or the lock belonging
      to a completely unrelated process.
      
      Fixes: a41adfaa ("slab: introduce byte sized index for the freelist of a slab")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30321c7b
    • Joonsoo Kim's avatar
      slab: fix the type of the index on freelist index accessor · 7cc68973
      Joonsoo Kim authored
      Commit a41adfaa ("slab: introduce byte sized index for the freelist
      of a slab") changes the size of freelist index and also changes
      prototype of accessor function to freelist index.  And there was a
      mistake.
      
      The mistake is that although it changes the size of freelist index
      correctly, it changes the size of the index of freelist index
      incorrectly.  With patch, freelist index can be 1 byte or 2 bytes, that
      means that num of object on on a slab can be more than 255.  So we need
      more than 1 byte for the index to find the index of free object on
      freelist.  But, above patch makes this index type 1 byte, so slab which
      have more than 255 objects cannot work properly and in consequence of
      it, the system cannot boot.
      
      This issue was reported by Steven King on m68knommu which would use
      2 bytes freelist index:
      
        https://lkml.org/lkml/2014/4/16/433
      
      To fix is easy.  To change the type of the index of freelist index on
      accessor functions is enough to fix this bug.  Although 2 bytes is
      enough, I use 4 bytes since it have no bad effect and make things more
      easier.  This fix was suggested and tested by Steven in his original
      report.
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reported-and-acked-by: default avatarSteven King <sfking@fdwdc.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Tested-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Tested-by: default avatarDavid Miller <davem@davemloft.net>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7cc68973
  2. 05 May, 2014 25 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 2080cee4
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) e1000e computes header length incorrectly wrt vlans, fix from Vlad
          Yasevich.
      
       2) ns_capable() check in sock_diag netlink code, from Andrew
          Lutomirski.
      
       3) Fix invalid queue pairs handling in virtio_net, from Amos Kong.
      
       4) Checksum offloading busted in sxgbe driver due to incorrect
          descriptor layout, fix from Byungho An.
      
       5) Fix build failure with SMC_DEBUG set to 2 or larger, from Zi Shen
          Lim.
      
       6) Fix uninitialized A and X registers in BPF interpreter, from Alexei
          Starovoitov.
      
       7) Fix arch dependencies of candence driver.
      
       8) Fix netlink capabilities checking tree-wide, from Eric W Biederman.
      
       9) Don't dump IFLA_VF_PORTS if netlink request didn't ask for it in
          IFLA_EXT_MASK, from David Gibson.
      
      10) IPV6 FIB dump restart doesn't handle table changes that happen
          meanwhile, causing the code to loop forever or emit dups, fix from
          Kumar Sandararajan.
      
      11) Memory leak on VF removal in bnx2x, from Yuval Mintz.
      
      12) Bug fixes for new Altera TSE driver from Vince Bridgers.
      
      13) Fix route lookup key in SCTP, from Xugeng Zhang.
      
      14) Use BH blocking spinlocks in SLIP, as per a similar fix to CAN/SLCAN
          driver.  From Oliver Hartkopp.
      
      15) TCP doesn't bump retransmit counters in some code paths, fix from
          Eric Dumazet.
      
      16) Clamp delayed_ack in tcp_cubic to prevent theoretical divides by
          zero.  Fix from Liu Yu.
      
      17) Fix locking imbalance in error paths of HHF packet scheduler, from
          John Fastabend.
      
      18) Properly reference the transport module when vsock_core_init() runs,
          from Andy King.
      
      19) Fix buffer overflow in cdc_ncm driver, from Bjørn Mork.
      
      20) IP_ECN_decapsulate() doesn't see a correct SKB network header in
          ip_tunnel_rcv(), fix from Ying Cai.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits)
        net: macb: Fix race between HW and driver
        net: macb: Remove 'unlikely' optimization
        net: macb: Re-enable RX interrupt only when RX is done
        net: macb: Clear interrupt flags
        net: macb: Pass same size to DMA_UNMAP as used for DMA_MAP
        ip_tunnel: Set network header properly for IP_ECN_decapsulate()
        e1000e: Restrict MDIO Slow Mode workaround to relevant parts
        e1000e: Fix issue with link flap on 82579
        e1000e: Expand workaround for 10Mb HD throughput bug
        e1000e: Workaround for dropped packets in Gig/100 speeds on 82579
        net/mlx4_core: Don't issue PCIe speed/width checks for VFs
        net/mlx4_core: Load the Eth driver first
        net/mlx4_core: Fix slave id computation for single port VF
        net/mlx4_core: Adjust port number in qp_attach wrapper when detaching
        net: cdc_ncm: fix buffer overflow
        Altera TSE: ALTERA_TSE should depend on HAS_DMA
        vsock: Make transport the proto owner
        net: sched: lock imbalance in hhf qdisc
        net: mvmdio: Check for a valid interrupt instead of an error
        net phy: Check for aneg completion before setting state to PHY_RUNNING
        ...
      2080cee4
    • Linus Torvalds's avatar
      Merge tag 'usb-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 783e9e8e
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small fixes and device ids for 3.15-rc4.
      
        All have been in linux-next just fine"
      
      * tag 'usb-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: Nokia 5300 should be treated as unusual dev
        USB: Nokia 305 should be treated as unusual dev
        fsl-usb: do not test for PHY_CLK_VALID bit on controller version 1.6
        usb: storage: shuttle_usbat: fix discs being detected twice
        usb: qcserial: add a number of Dell devices
        USB: OHCI: fix problem with global suspend on ATI controllers
        usb: gadget: at91-udc: fix irq and iomem resource retrieval
        usb: phy: fsm: change "|" to "||" for condition OTG_STATE_A_WAIT_BCON at statemachine
        usb: phy: fsm: update OTG HNP state transition
      783e9e8e
    • Linus Torvalds's avatar
      Merge tag 'tty-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · df862f62
      Linus Torvalds authored
      Pull tty/serial fixes from Greg KH:
       "Here are some tty and serial driver fixes for things reported
        recently"
      
      * tag 'tty-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        tty: Fix lockless tty buffer race
        Revert "tty: Fix race condition between __tty_buffer_request_room and flush_to_ldisc"
        drivers/tty/hvc: don't free hvc_console_setup after init
        n_tty: Fix n_tty_write crash when echoing in raw mode
        tty: serial: 8250_core.c Bug fix for Exar chips.
      df862f62
    • Linus Torvalds's avatar
      Merge tag 'staging-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · a1e74464
      Linus Torvalds authored
      Pull staging / iio fixes from Greg KH:
       "Here are some small IIO driver fixes for 3.15-rc4 that resolve some
        reported issues"
      
      * tag 'staging-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        iio: adc: Nothing in ADC should be a bool CONFIG
        iio: exynos_adc: use indio_dev->dev structure to handle child nodes
        iio:imu:mpu6050: Fixed segfault in Invensens MPU driver due to null dereference
        staging:iio:ad2s1200 fix missing parenthesis in a for statment.
      a1e74464
    • Linus Torvalds's avatar
      Merge tag 'xtensa-next-20140503' of git://github.com/czankel/xtensa-linux · 03787ff6
      Linus Torvalds authored
      Pull Xtensa fixes from Chris Zankel:
       - Fixes allmodconfig, allnoconfig builds
       - Adds highmem support
       - Enables build-time exception table sorting.
      
      * tag 'xtensa-next-20140503' of git://github.com/czankel/xtensa-linux:
        xtensa: ISS: don't depend on CONFIG_TTY
        xtensa: xt2000: drop redundant sysmem initialization
        xtensa: add support for KC705
        xtensa: xtfpga: introduce SoC I/O bus
        xtensa: add HIGHMEM support
        xtensa: optimize local_flush_tlb_kernel_range
        xtensa: dump sysmem from the bootmem_init
        xtensa: handle memmap kernel option
        xtensa: keep sysmem banks ordered in mem_reserve
        xtensa: keep sysmem banks ordered in add_sysmem_bank
        xtensa: split bootparam and kernel meminfo
        xtensa: enable sorting extable at build time
        xtensa: export __{invalidate,flush}_dcache_range
        xtensa: Export __invalidate_icache_range
      03787ff6
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 5575eeb7
      Linus Torvalds authored
      Pull Ceph fixes from Sage Weil:
       "First, there is a critical fix for the new primary-affinity function
        that went into -rc1.
      
        The second batch of patches from Zheng fix a range of problems with
        directory fragmentation, readdir, and a few odds and ends for cephfs"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
        ceph: reserve caps for file layout/lock MDS requests
        ceph: avoid releasing caps that are being used
        ceph: clear directory's completeness when creating file
        libceph: fix non-default values check in apply_primary_affinity()
        ceph: use fpos_cmp() to compare dentry positions
        ceph: check directory's completeness before emitting directory entry
      5575eeb7
    • Soren Brinkmann's avatar
      net: macb: Fix race between HW and driver · c8ea5a22
      Soren Brinkmann authored
      Under "heavy" RX load, the driver cannot handle the descriptors fast
      enough. In detail, when a descriptor is consumed, its used flag is
      cleared and once the RX budget is consumed all descriptors with a
      cleared used flag are prepared to receive more data. Under load though,
      the HW may constantly receive more data and use those descriptors with a
      cleared used flag before they are actually prepared for next usage.
      
      The head and tail pointers into the RX-ring should always be valid and
      we can omit clearing and checking of the used flag.
      Signed-off-by: default avatarSoren Brinkmann <soren.brinkmann@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8ea5a22
    • Soren Brinkmann's avatar
      net: macb: Remove 'unlikely' optimization · 504ad98d
      Soren Brinkmann authored
      Coverage data suggests that the unlikely case of receiving data while
      the receive handler is running may not be that unlikely.
      Coverage data after running iperf for a while:
          91320:  891:	work_done = bp->macbgem_ops.mog_rx(bp, budget);
          91320:  892:	if (work_done < budget) {
           2362:  893:		napi_complete(napi);
              -:  894:
              -:  895:		/* Packets received while interrupts were disabled */
           4724:  896:		status = macb_readl(bp, RSR);
           2362:  897:		if (unlikely(status)) {
            762:  898:			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
            762:  899:				macb_writel(bp, ISR, MACB_BIT(RCOMP));
              -:  900:			napi_reschedule(napi);
              -:  901:		} else {
           1600:  902:			macb_writel(bp, IER, MACB_RX_INT_FLAGS);
              -:  903:		}
              -:  904:	}
      Signed-off-by: default avatarSoren Brinkmann <soren.brinkmann@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      504ad98d
    • Soren Brinkmann's avatar
      net: macb: Re-enable RX interrupt only when RX is done · 02f7a34f
      Soren Brinkmann authored
      When data is received during the driver processing received data the
      NAPI is re-scheduled. In that case the RX interrupt should not be
      re-enabled.
      Signed-off-by: default avatarSoren Brinkmann <soren.brinkmann@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02f7a34f
    • Soren Brinkmann's avatar
      net: macb: Clear interrupt flags · 6a027b70
      Soren Brinkmann authored
      A few interrupt flags were not cleared in the ISR, resulting in a sytem
      trapped in the ISR in cases one of those interrupts occurred. Clear all
      flags to avoid such situations.
      Signed-off-by: default avatarSoren Brinkmann <soren.brinkmann@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a027b70
    • Soren Brinkmann's avatar
      net: macb: Pass same size to DMA_UNMAP as used for DMA_MAP · ccd6d0a9
      Soren Brinkmann authored
      Just as commit "net: macb: DMA-unmap full rx-buffer"
      (48330e08), pass the size that
      was used for mapping the memory also to the unmap routine to
      avoid warnings from the DMA_API.
      Signed-off-by: default avatarSoren Brinkmann <soren.brinkmann@xilinx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccd6d0a9
    • Ying Cai's avatar
      ip_tunnel: Set network header properly for IP_ECN_decapsulate() · e96f2e7c
      Ying Cai authored
      In ip_tunnel_rcv(), set skb->network_header to inner IP header
      before IP_ECN_decapsulate().
      
      Without the fix, IP_ECN_decapsulate() takes outer IP header as
      inner IP header, possibly causing error messages or packet drops.
      
      Note that this skb_reset_network_header() call was in this spot when
      the original feature for checking consistency of ECN bits through
      tunnels was added in eccc1bb8 ("tunnel: drop packet if ECN present
      with not-ECT"). It was only removed from this spot in 3d7b46cd
      ("ip_tunnel: push generic protocol handling to ip_tunnel module.").
      
      Fixes: 3d7b46cd ("ip_tunnel: push generic protocol handling to ip_tunnel module.")
      Reported-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYing Cai <ycai@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e96f2e7c
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net · 780ce3a2
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates
      
      This series contains updates to e1000e only.
      
      David provides four fixes for e1000e, first is a workaround for a hardware
      erratum on 82579 devices which experienced packet loss in gigabit and 100
      speeds when interconnect between the PHY and MAC is exiting K1 power saving
      state.  Second expands the scope of a workaround to include i217 and i218
      parts as well to address over aggressive transmit behavior when connecting
      at 10Mbs half-duplex.  Next is to resolve a reported link flap issue on
      82579 parts which was root caused as an interoperability problem between
      82579 and at least some Broadcom PHYs in the Energy Efficient Ethernet wake
      mechanism.  Lastly, restricts the workaround of putting the PHY into MDIO
      slow mode to access the PHY id to relevant parts since this issue has been
      fixed on the newer hardware.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      780ce3a2
    • David Ertman's avatar
      e1000e: Restrict MDIO Slow Mode workaround to relevant parts · 2c982624
      David Ertman authored
      It has been determined that the workaround of putting the PHY into MDIO
      slow mode to access the PHY id is not necessary with Lynx Point and newer
      parts.  The issue that necessitated the workaround has been fixed on the
      newer hardware.
      
      We will maintains, as a last ditch attempt, the conversion to MDIO Slow
      Mode in the failure branch when attempting to access the PHY id so as to
      cover all contingencies.
      Signed-off-by: default avatarDave Ertman <davidx.m.ertman@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2c982624
    • David Ertman's avatar
      e1000e: Fix issue with link flap on 82579 · 7142a55c
      David Ertman authored
      Several customers have reported a link flap issue on 82579. The symptoms
      are random and intermittent link losses when 82579 is connected to specific
      link partners. Issue has been root caused as interoperability problem
      between 82579 and at least some Broadcom PHYs in the Energy Efficient
      Ethernet wake mechanism.
      
      To fix the issue, we are disabling the Phase Locked Loop shutdown in 100M
      Low Power Idle.  This solution will cause an increase of power in 100M EEE
      link. It will cost additional 28mW in this specific mode.
      
      Cc: Lukasz Adamczuk <lukasz.adamczuk@intel.com>
      Signed-off-by: default avatarDave Ertman <davidx.m.ertman@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      7142a55c
    • David Ertman's avatar
      e1000e: Expand workaround for 10Mb HD throughput bug · fbb9ab10
      David Ertman authored
      In commit 772d05c5 "e1000e: slow performance
      between two 82579 connected via 10Mbit hub", a workaround was put into place
      to address the overaggressive transmit behavior of 82579 parts when connecting
      at 10Mbs half-duplex.
      
      This same behavior is seen on i217 and i218 parts as well.  This patch expands
      the original workaround to encompass these parts.
      Signed-off-by: default avatarDave Ertman <davidx.m.ertman@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fbb9ab10
    • David Ertman's avatar
      e1000e: Workaround for dropped packets in Gig/100 speeds on 82579 · 77e61146
      David Ertman authored
      This is a workaround for a HW erratum on 82579 devices.
      Erratum is #23 in Intel 6 Series Chipset and Intel C200 Series Chipset
      specification Update June 2013.
      
      Problem: 82579 parts experience packet loss in Gig and 100 speeds
      when interconnect between PHY and MAC is exiting K1 power saving state.
      This was previously believed to only affect 1Gig speed, but has been observed
      at 100Mbs also.
      
      Workaround: Disable K1 for 82579 devices at Gig and 100 speeds.
      Signed-off-by: default avatarDave Ertman <davidx.m.ertman@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      77e61146
    • David S. Miller's avatar
      Merge branch 'mlx4' · eaff8292
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      This series contains fixes for 3.15-rc, mostly around SRIOV. The patches by Jack,
      Matan and myself fix few issues related to mlx4 SRIOV support for RoCE and single
      port VFs, and the patch from Eyal eliminates checking PCI caps for VFs which is misleading.
      
      Patches done against the net tree, commit 014f1b20 "net: bonding: Fix format string
      mismatch in bond_sysfs.c"
      
      We'd be happy to get Eyal's patch queued in your -stable list for 3.14.y
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eaff8292
    • Eyal Perry's avatar
      net/mlx4_core: Don't issue PCIe speed/width checks for VFs · 83d3459a
      Eyal Perry authored
      Carrying out PCI speed/width checks through pcie_get_minimum_link()
      on VFs yield wrong results, so remove them.
      
      Fixes: b912b2f8 ('net/mlx4_core: Warn if device doesn't have enough PCI bandwidth')
      Signed-off-by: default avatarEyal Perry <eyalpe@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83d3459a
    • Or Gerlitz's avatar
      net/mlx4_core: Load the Eth driver first · f24f790f
      Or Gerlitz authored
      When running in SRIOV mode, VM that is assigned with a non-provisioned
      Ethernet VFs get themselves a random mac when the Eth driver starts. In
      this case, if the IB driver startup code that deals with RoCE runs first,
      it will use a zero mac as the source mac for the Para-Virtual CM MADs
      which is buggy. To handle that, we change the order of loading.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f24f790f
    • Matan Barak's avatar
      net/mlx4_core: Fix slave id computation for single port VF · 0254bc82
      Matan Barak authored
      The code that deals with computing the slave id based on a given GID
      gave wrong results when the number of single port VFs wasn't the
      same for port 1 vs. port 2 and the relevant VF is single ported on
      port 2. As a result, incoming CM MADs were dispatched to the wrong VF.
      Fixed that and added documentation to clarify the computation steps.
      
      Fixes: 449fc488 ('net/mlx4: Adapt code for N-Port VF')
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0254bc82
    • Jack Morgenstein's avatar
      net/mlx4_core: Adjust port number in qp_attach wrapper when detaching · 531d9014
      Jack Morgenstein authored
      When using single ported VFs and the VF is using port 2, we need
      to adjust the port accordingly (change it from 1 to 2).
      
      Fixes: 449fc488 ('net/mlx4: Adapt code for N-Port VF')
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarJack Morgenstein <jackm@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      531d9014
    • Bjørn Mork's avatar
      net: cdc_ncm: fix buffer overflow · 9becd707
      Bjørn Mork authored
      Commit 4d619f62 ("net: cdc_ncm: no point in filling up the NTBs
      if we send ZLPs") changed the padding logic for devices with the ZLP
      flag set.  This meant that frames of any size will be sent without
      additional padding, except for the single byte added if the size is
      a multiple of the USB packet size. But if the unpadded size is
      identical to the maximum frame size, and the maximum size is a
      multiplum of the USB packet size, then this one-byte padding will
      overflow the buffer.
      
      Prevent padding if already at maximum frame size, letting usbnet
      transmit a ZLP instead in this case.
      
      Fixes: 4d619f62 ("net: cdc_ncm: no point in filling up the NTBs if we send ZLPs")
      Reported by: Yu-an Shih <yshih@nvidia.com>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9becd707
    • Geert Uytterhoeven's avatar
      Altera TSE: ALTERA_TSE should depend on HAS_DMA · 9d4619c4
      Geert Uytterhoeven authored
      If NO_DMA=y:
      
      drivers/built-in.o: In function `altera_tse_probe':
      altera_tse_main.c:(.text+0x25ec2e): undefined reference to `dma_set_mask'
      altera_tse_main.c:(.text+0x25ec78): undefined reference to `dma_supported'
      altera_tse_main.c:(.text+0x25ecb6): undefined reference to `dma_supported'
      drivers/built-in.o: In function `sgdma_async_read':
      altera_sgdma.c:(.text+0x25f620): undefined reference to `dma_sync_single_for_cpu'
      drivers/built-in.o: In function `sgdma_uninitialize':
      (.text+0x25f678): undefined reference to `dma_unmap_single'
      drivers/built-in.o: In function `sgdma_uninitialize':
      (.text+0x25f696): undefined reference to `dma_unmap_single'
      drivers/built-in.o: In function `sgdma_initialize':
      (.text+0x25f6f0): undefined reference to `dma_map_single'
      drivers/built-in.o: In function `sgdma_initialize':
      (.text+0x25f702): undefined reference to `dma_mapping_error'
      drivers/built-in.o: In function `sgdma_tx_buffer':
      (.text+0x25f92a): undefined reference to `dma_sync_single_for_cpu'
      drivers/built-in.o: In function `sgdma_rx_status':
      (.text+0x25fa24): undefined reference to `dma_sync_single_for_cpu'
      make[3]: *** [vmlinux] Error 1
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarVince Bridgers <vbridgers2013@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d4619c4
    • Andy King's avatar
      vsock: Make transport the proto owner · 2c4a336e
      Andy King authored
      Right now the core vsock module is the owner of the proto family. This
      means there's nothing preventing the transport module from unloading if
      there are open sockets, which results in a panic. Fix that by allowing
      the transport to be the owner, which will refcount it properly.
      
      Includes version bump to 1.0.1.0-k
      
      Passes checkpatch this time, I swear...
      Acked-by: default avatarDmitry Torokhov <dtor@vmware.com>
      Signed-off-by: default avatarAndy King <acking@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c4a336e