1. 14 Jan, 2013 5 commits
    • Linus Torvalds's avatar
      Merge tag 'driver-core-3.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core · 3441f0d2
      Linus Torvalds authored
      Pull driver core fixes from Greg Kroah-Hartman:
       "Here are two patches for 3.8-rc3.
      
        One removes the __dev* defines from init.h now that all usages of it
        are gone from your tree.  The other fix is for debugfs's paramater
        that was using the wrong base for the option.
      
        Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
      
      * tag 'driver-core-3.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        debugfs: convert gid= argument from decimal, not octal
        Remove __dev* markings from init.h
      3441f0d2
    • Linus Torvalds's avatar
      Merge tag 'char-misc-3.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · f6a0e2ca
      Linus Torvalds authored
      Pull char/misc fix from Greg Kroah-Hartman:
       "Here is a single fix for the mei driver that resolves a reported
        issue.
      
        Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
      
      * tag 'char-misc-3.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        mei: fix mismatch in mutex unlock-lock in mei_amthif_read()
      f6a0e2ca
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 7f1825da
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Nothing too astounding
      
         - nouveau: bunch of regression fixes and oops fixes
         - radeon: UMS fixes, rn50 fix, dma fix
         - udl: fix EDID retrieval for large EDIDs."
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        udldrmfb: udl_get_edid: drop unneeded i--
        udldrmfb: udl_get_edid: usb_control_msg buffer must not be on the stack
        udldrmfb: Fix EDID not working with monitors with EDID extension blocks
        drm/nvc0/fb: fix crash when different mutex is used to protect same list
        drm/nouveau/clock: fix support for more than 2 monitors on nve0
        drm/nv50/disp: fix selection of bios script for analog outputs
        drm/nv17-50: restore fence buffer on resume
        drm/nouveau: fix blank LVDS screen regression on pre-nv50 cards
        drm/nouveau: fix nouveau_client allocation failure path
        drm/nouveau: don't return freed object from nouveau_handle_create
        drm/nouveau/vm: fix memory corruption when pgt allocation fails
        drm/nouveau: add locking around instobj list operations
        drm/nouveau: do not forcibly power on lvds panels
        drm/nouveau/devinit: ensure legacy vga control is enabled during post
        radeon/kms: fix dma relocation checking
        radeon/kms: force rn50 chip to always report connected on analog output
        drm/radeon: fix error path in kpage allocation
        drm/radeon: fix a bogus kfree
        drm/radeon: fix NULL pointer dereference in UMS mode
      7f1825da
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 6843cc0e
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix regression allowing IP_TTL setting of zero, fix from Cong Wang.
      
       2) Fix leak regressions in tunap, from Jason Wang.
      
       3) be2net driver always returns IRQ_HANDLED in INTx handler, fix from
          Sathya Perla.
      
       4) qlge doesn't really support NETIF_F_TSO6, don't set that flag.  Fix
          from Amerigo Wang.
      
       5) Add 802.11ad Atheros wil6210 driver, from Vladimir Kondratiev.
      
       6) Fix MTU calculations in mac80211 layer, from T Krishna Chaitanya.
      
       7) Station info layer of mac80211 needs to use del_timer_sync(), from
          Johannes Berg.
      
       8) tcp_read_sock() can loop forever, because we don't immediately stop
          when recv_actor() returns zero.  Fix from Eric Dumazet.
      
       9) Fix WARN_ON() in tcp_cleanup_rbuf().  We have to use sk_eat_skb() in
          tcp_recv_skb() to handle the case where a large GRO packet is split
          up while it is use by a splice() operation.  Fix also from Eric
          Dumazet.
      
      10) addrconf_get_prefix_route() in ipv6 tests flags incorrectly, it
          does:
      
              if (X && (p->flags & Y) != 0)
      
          when it really meant to go:
      
              if (X && (p->flags & X) != 0)
      
          fix from Romain Kuntz.
      
      11) Fix lost Kconfig dependency for bfin_mac driver hardware
          timestamping.  From Lars-Peter Clausen.
      
      12) Fix regression in handling of RST without ACK in TCP, from Eric
          Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits)
        be2net: fix unconditionally returning IRQ_HANDLED in INTx
        tuntap: fix leaking reference count
        tuntap: forbid calling TUNSETIFF when detached
        tuntap: switch to use rtnl_dereference()
        net, wireless: overwrite default_ethtool_ops
        qlge: remove NETIF_F_TSO6 flag
        tcp: accept RST without ACK flag
        net: ethernet: xilinx: Do not use NO_IRQ in axienet
        net: ethernet: xilinx: Do not use axienet on PPC
        bnx2x: Allow management traffic after boot from SAN
        bnx2x: Fix fastpath structures when memory allocation fails
        bfin_mac: Restore hardware time-stamping dependency on BF518
        tun: avoid owner checks on IFF_ATTACH_QUEUE
        bnx2x: move debugging code before the return
        tuntap: refuse to re-attach to different tun_struct
        ipv6: use addrconf_get_prefix_route for prefix route lookup [v2]
        ipv6: fix the noflags test in addrconf_get_prefix_route
        tcp: fix splice() and tcp collapsing interaction
        tcp: splice: fix an infinite loop in tcp_read_sock()
        net: prevent setting ttl=0 via IP_TTL
        ...
      6843cc0e
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 7c8284c3
      Linus Torvalds authored
      Pull sparc updates from David Miller:
      
       1) Add finit_module syscall entry.
      
       2) Remove stray __dev{init,exit} references, from Sam Ravnborg.
      
      Fix up conflicts in the sparc PCI code due to whitespace differences in
      the __dev{init,exit} removal (which also came in through Greg).
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc: remove __devinit, __devexit annotations
        sparc: Hook up finit_module syscall.
      7c8284c3
  2. 13 Jan, 2013 16 commits
  3. 12 Jan, 2013 5 commits
    • Sathya Perla's avatar
      be2net: fix unconditionally returning IRQ_HANDLED in INTx · d0b9cec3
      Sathya Perla authored
      commit e49cc34f introduced an unconditional IRQ_HANDLED return in be_intx()
      to workaround Lancer and BE2 HW issues. This is bad as it prevents the kernel
      from detecting interrupt storms due to broken HW.
      
      The BE2/Lancer HW issues are:
      1) In Lancer, there is no means for the driver to detect if the interrupt
      belonged to device, other than counting and notifying events.
      2) In Lancer de-asserting INTx takes a while, causing the INTx irq handler
      to be called multiple times till the de-assert happens.
      3) In BE2, we see an occasional interrupt even when EQs are unarmed.
      
      Issue (1) can cause the notified events to be orphaned, if NAPI was already
      running.
      This patch fixes this issue by scheduling NAPI only if it is not scheduled
      already. Doing this also takes care of possible events_get() race that may be
      caused due to issue (2) and (3). Also, IRQ_HANDLED is returned only the first
      time zero events are detected.
      (Thanks Ben H. for the feedback and suggestions.)
      Signed-off-by: default avatarSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0b9cec3
    • Sam Ravnborg's avatar
      sparc: remove __devinit, __devexit annotations · b7c13f76
      Sam Ravnborg authored
      __devinit, __devexit annotations are nops - so drop them.
      Likewise for __devexit_p.
      
      Adjusted alignment of arguments when needed.
      Signed-off-by: default avatarSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b7c13f76
    • Jason Wang's avatar
      tuntap: fix leaking reference count · dd38bd85
      Jason Wang authored
      Reference count leaking of both module and sock were found:
      
      - When a detached file were closed, its sock refcnt from device were not
        released, solving this by add the sock_put().
      - The module were hold or drop unconditionally in TUNSETPERSIST, which means we
        if we set the persist flag for N times, we need unset it for another N
        times. Solving this by only hold or drop an reference when there's a flag
        change and also drop the reference count when the persist device is deleted.
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd38bd85
    • Jason Wang's avatar
      tuntap: forbid calling TUNSETIFF when detached · 7c0c3b1a
      Jason Wang authored
      Michael points out that even after Stefan's fix the TUNSETIFF is still allowed
      to create a new tap device. This because we only check tfile->tun but the
      tfile->detached were introduced. Fix this by failing early in tun_set_iff() if
      the file is detached. After this fix, there's no need to do the check again in
      tun_set_iff(), so this patch removes it.
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c0c3b1a
    • Jason Wang's avatar
      tuntap: switch to use rtnl_dereference() · b8deabd3
      Jason Wang authored
      Switch to use rtnl_dereference() instead of the open code, suggested by Eric.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8deabd3
  4. 11 Jan, 2013 14 commits
    • Stanislaw Gruszka's avatar
      net, wireless: overwrite default_ethtool_ops · d07d7507
      Stanislaw Gruszka authored
      Since:
      
      commit 2c60db03
      Author: Eric Dumazet <edumazet@google.com>
      Date:   Sun Sep 16 09:17:26 2012 +0000
      
          net: provide a default dev->ethtool_ops
      
      wireless core does not correctly assign ethtool_ops.
      
      After alloc_netdev*() call, some cfg80211 drivers provide they own
      ethtool_ops, but some do not. For them, wireless core provide generic
      cfg80211_ethtool_ops, which is assigned in NETDEV_REGISTER notify call:
      
              if (!dev->ethtool_ops)
                      dev->ethtool_ops = &cfg80211_ethtool_ops;
      
      But after Eric's commit, dev->ethtool_ops is no longer NULL (on cfg80211
      drivers without custom ethtool_ops), but points to &default_ethtool_ops.
      
      In order to fix the problem, provide function which will overwrite
      default_ethtool_ops and use it by wireless core.
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Acked-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Acked-by: default avatarBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d07d7507
    • Amerigo Wang's avatar
      qlge: remove NETIF_F_TSO6 flag · f7e9e230
      Amerigo Wang authored
      It is werid that qlge driver supports NETIF_F_TSO6 but
      not NETIF_F_IPV6_CSUM. This also causes some kernel warning [1]
      when VLAN device setups on a qlge interface.
      
      I think the qlge hardware doesn't support NETIF_F_IPV6_CSUM,
      so we have to just remove the NETIF_F_TSO6 flag.
      
      After this patch, the TCP/IPv6 traffic becomes normal again,
      no kernel warnings any more.
      
      NOTE: I only tested it on 2.6.32 kernel, even if the upstream
      kernel could fix this automatically (it is hard to track NETIF*
      flags), removing it is also safe.
      
      1. https://bugzilla.redhat.com/show_bug.cgi?id=891839
      
      Cc: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
      Cc: Ron Mercer <ron.mercer@qlogic.com>
      Cc: linux-driver@qlogic.com
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <amwang@redhat.com>
      Acked-by: default avatarJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7e9e230
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · b719f430
      Linus Torvalds authored
      Pull a hwmon patch from Guenter Roeck:
       "Fix build error in vexpress driver"
      
      * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (vexpress) Fix build error seen if CONFIG_OF_DEVICE is not set
      b719f430
    • Linus Torvalds's avatar
      Merge branch 'akpm' (incoming fixes from Andrew) · c727b4c6
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "The audit fixes have been floating around for a while - Al and Eric
        aren't responding to either myself or Kees so I asked Kees to
        re-review them and here they are."
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
        lib/rbtree.c: avoid the use of non-static __always_inline
        MAINTAINERS: Omar had moved
        mm: compaction: partially revert capture of suitable high-order page
        linux/audit.h: move ptrace.h include to kernel header
        kernel/audit.c: avoid negative sleep durations
        audit: catch possible NULL audit buffers
        audit: create explicit AUDIT_SECCOMP event type
        MAINTAINERS: fix a status pattern
        MAINTAINERS: fix arch/arm/plat-omap/include/plat/omap_hwmod.h
        mm: thp: acquire the anon_vma rwsem for write during split
        mm: mmap: annotate vm_lock_anon_vma locking properly for lockdep
        lockdep, rwsem: provide down_write_nest_lock()
        arch/mn10300/Kconfig: select CONFIG_GENERIC_ATOMIC64
        mm: bootmem: fix free_all_bootmem_core() with odd bitmap alignment
        mm: use aligned zone start for pfn_to_bitidx calculation
        fs/exec.c: work around icc miscompilation
        mm: compaction: fix echo 1 > compact_memory return error issue
        mm: memblock: fix wrong memmove size in memblock_merge_regions()
        drivers/video/ssd1307fb.c: fix bit order bug in the byte translation function
        mm: migrate: check page_count of THP before migrating
        ...
      c727b4c6
    • Michel Lespinasse's avatar
      lib/rbtree.c: avoid the use of non-static __always_inline · 3cb7a563
      Michel Lespinasse authored
      lib/rbtree.c declared __rb_erase_color() as __always_inline void, and
      then exported it with EXPORT_SYMBOL.
      
      This was because __rb_erase_color() must be exported for augmented
      rbtree users, but it must also be inlined into rb_erase() so that the
      dummy callback can get optimized out of that call site.
      
      (Actually with a modern compiler, none of the dummy callback functions
      should even be generated as separate text functions).
      
      The above usage is legal C, but it was unusual enough for some compilers
      to warn about it.  This change makes things more explicit, with a static
      __always_inline ____rb_erase_color function for use in rb_erase(), and a
      separate non-inline __rb_erase_color function for use in
      rb_erase_augmented call sites.
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Reported-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3cb7a563
    • Chen Gang's avatar
      MAINTAINERS: Omar had moved · a8906b0b
      Chen Gang authored
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Cc: Omar Ramirez Luna <omar.ramirez@ti.com>
      Cc: Omar Ramirez Luna <omar.ramirez@copitl.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a8906b0b
    • Mel Gorman's avatar
      mm: compaction: partially revert capture of suitable high-order page · 8fb74b9f
      Mel Gorman authored
      Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
      waiting for POLLIN on a local TCP socket.  It was easier to trigger if
      there was disk IO and dirty pages at the same time and he bisected it to
      commit 1fb3f8ca ("mm: compaction: capture a suitable high-order page
      immediately when it is made available").
      
      The intention of that patch was to improve high-order allocations under
      memory pressure after changes made to reclaim in 3.6 drastically hurt
      THP allocations but the approach was flawed.  For Eric, the problem was
      that page->pfmemalloc was not being cleared for captured pages leading
      to a poor interaction with swap-over-NFS support causing the packets to
      be dropped.  However, I identified a few more problems with the patch
      including the fact that it can increase contention on zone->lock in some
      cases which could result in async direct compaction being aborted early.
      
      In retrospect the capture patch took the wrong approach.  What it should
      have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
      was allocating for THP and avoided races that way.  While the patch was
      showing to improve allocation success rates at the time, the benefit is
      marginal given the relative complexity and it should be revisited from
      scratch in the context of the other reclaim-related changes that have
      taken place since the patch was first written and tested.  This patch
      partially reverts commit 1fb3f8ca ("mm: compaction: capture a
      suitable high-order page immediately when it is made available").
      Reported-and-tested-by: default avatarEric Wong <normalperson@yhbt.net>
      Tested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8fb74b9f
    • Mike Frysinger's avatar
      linux/audit.h: move ptrace.h include to kernel header · c0a3a20b
      Mike Frysinger authored
      While the kernel internals want pt_regs (and so it includes
      linux/ptrace.h), the user version of audit.h does not need it.  So move
      the include out of the uapi version.
      
      This avoids issues where people want the audit defines and userland
      ptrace api.  Including both the kernel ptrace and the userland ptrace
      headers can easily lead to failure.
      Signed-off-by: default avatarMike Frysinger <vapier@gentoo.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c0a3a20b
    • Andrew Morton's avatar
      kernel/audit.c: avoid negative sleep durations · 82919919
      Andrew Morton authored
      audit_log_start() performs the same jiffies comparison in two places.
      If sufficient time has elapsed between the two comparisons, the second
      one produces a negative sleep duration:
      
        schedule_timeout: wrong timeout value fffffffffffffff0
        Pid: 6606, comm: trinity-child1 Not tainted 3.8.0-rc1+ #43
        Call Trace:
          schedule_timeout+0x305/0x340
          audit_log_start+0x311/0x470
          audit_log_exit+0x4b/0xfb0
          __audit_syscall_exit+0x25f/0x2c0
          sysret_audit+0x17/0x21
      
      Fix it by performing the comparison a single time.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@redhat.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      82919919
    • Kees Cook's avatar
      audit: catch possible NULL audit buffers · 0644ec0c
      Kees Cook authored
      It's possible for audit_log_start() to return NULL.  Handle it in the
      various callers.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Julien Tinnes <jln@google.com>
      Cc: Will Drewry <wad@google.com>
      Cc: Steve Grubb <sgrubb@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0644ec0c
    • Kees Cook's avatar
      audit: create explicit AUDIT_SECCOMP event type · 7b9205bd
      Kees Cook authored
      The seccomp path was using AUDIT_ANOM_ABEND from when seccomp mode 1
      could only kill a process.  While we still want to make sure an audit
      record is forced on a kill, this should use a separate record type since
      seccomp mode 2 introduces other behaviors.
      
      In the case of "handled" behaviors (process wasn't killed), only emit a
      record if the process is under inspection.  This change also fixes
      userspace examination of seccomp audit events, since it was considered
      malformed due to missing fields of the AUDIT_ANOM_ABEND event type.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Julien Tinnes <jln@google.com>
      Acked-by: default avatarWill Drewry <wad@chromium.org>
      Acked-by: default avatarSteve Grubb <sgrubb@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b9205bd
    • Zhang Yanfei's avatar
      MAINTAINERS: fix a status pattern · 56ca9d98
      Zhang Yanfei authored
      Change MAINTAINED to Maintained.
      Signed-off-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      56ca9d98
    • Zhang Yanfei's avatar
      MAINTAINERS: fix arch/arm/plat-omap/include/plat/omap_hwmod.h · 8fc8b12b
      Zhang Yanfei authored
      This file was moved to arch/arm/mach-omap2/omap=5Fhwmod.h by commit
      2a296c8f ("ARM: OMAP: Make plat/omap=5Fhwmod.h local to
      mach-omap2").
      Signed-off-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8fc8b12b
    • Mel Gorman's avatar
      mm: thp: acquire the anon_vma rwsem for write during split · 062f1af2
      Mel Gorman authored
      Zhouping Liu reported the following against 3.8-rc1 when running a mmap
      testcase from LTP.
      
        mapcount 0 page_mapcount 3
        ------------[ cut here ]------------
        kernel BUG at mm/huge_memory.c:1798!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: ip6table_filter ip6_tables ebtable_nat ebtables bnep bluetooth rfkill iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfat fat dm_mirror dm_region_hash dm_log dm_mod cdc_ether iTCO_wdt i7core_edac coretemp usbnet iTCO_vendor_support mii crc32c_intel edac_core lpc_ich shpchp ioatdma mfd_core i2c_i801 pcspkr serio_raw bnx2 microcode dca vhost_net tun macvtap macvlan kvm_intel kvm uinput mgag200 sr_mod cdrom i2c_algo_bit sd_mod drm_kms_helper crc_t10dif ata_generic pata_acpi ttm ata_piix drm libata i2c_core megaraid_sas
        CPU 1
        Pid: 23217, comm: mmap10 Not tainted 3.8.0-rc1mainline+ #17 IBM IBM System x3400 M3 Server -[7379I08]-/69Y4356
        RIP: __split_huge_page+0x677/0x6d0
        RSP: 0000:ffff88017a03fc08  EFLAGS: 00010293
        RAX: 0000000000000003 RBX: ffff88027a6c22e0 RCX: 00000000000034d2
        RDX: 000000000000748b RSI: 0000000000000046 RDI: 0000000000000246
        RBP: ffff88017a03fcb8 R08: ffffffff819d2440 R09: 000000000000054a
        R10: 0000000000aaaaaa R11: 00000000ffffffff R12: 0000000000000000
        R13: 00007f4f11a00000 R14: ffff880179e96e00 R15: ffffea0005c08000
        FS:  00007f4f11f4a740(0000) GS:ffff88017bc20000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 00000037e9ebb404 CR3: 000000017a436000 CR4: 00000000000007e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process mmap10 (pid: 23217, threadinfo ffff88017a03e000, task ffff880172dd32e0)
        Stack:
         ffff88017a540ec8 ffff88017a03fc20 ffffffff816017b5 ffff88017a03fc88
         ffffffff812fa014 0000000000000000 ffff880279ebd5c0 00000000f4f11a4c
         00000007f4f11f49 00000007f4f11a00 ffff88017a540ef0 ffff88017a540ee8
        Call Trace:
          split_huge_page+0x68/0xb0
          __split_huge_page_pmd+0x134/0x330
          split_huge_page_pmd_mm+0x51/0x60
          split_huge_page_address+0x3b/0x50
          __vma_adjust_trans_huge+0x9c/0xf0
          vma_adjust+0x684/0x750
          __split_vma.isra.28+0x1fa/0x220
          do_munmap+0xf9/0x420
          vm_munmap+0x4e/0x70
          sys_munmap+0x2b/0x40
          system_call_fastpath+0x16/0x1b
      
      Alexander Beregalov and Alex Xu reported similar bugs and Hillf Danton
      identified that commit 5a505085 ("mm/rmap: Convert the struct
      anon_vma::mutex to an rwsem") and commit 4fc3f1d6 ("mm/rmap,
      migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable")
      were likely the problem.  Reverting these commits was reported to solve
      the problem for Alexander.
      
      Despite the reason for these commits, NUMA balancing is not the direct
      source of the problem.  split_huge_page() expects the anon_vma lock to
      be exclusive to serialise the whole split operation.  Ordinarily it is
      expected that the anon_vma lock would only be required when updating the
      avcs but THP also uses the anon_vma rwsem for collapse and split
      operations where the page lock or compound lock cannot be used (as the
      page is changing from base to THP or vice versa) and the page table
      locks are insufficient.
      
      This patch takes the anon_vma lock for write to serialise against parallel
      split_huge_page as THP expected before the conversion to rwsem.
      Reported-and-tested-by: default avatarZhouping Liu <zliu@redhat.com>
      Reported-by: default avatarAlexander Beregalov <a.beregalov@gmail.com>
      Reported-by: default avatarAlex Xu <alex_y_xu@yahoo.ca>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      062f1af2