1. 18 May, 2012 2 commits
    • Linus Torvalds's avatar
      proc: move fd symlink i_mode calculations into tid_fd_revalidate() · 30a08bf2
      Linus Torvalds authored
      Instead of doing the i_mode calculations at proc_fd_instantiate() time,
      move them into tid_fd_revalidate(), which is where the other inode state
      (notably uid/gid information) is updated too.
      
      Otherwise we'll end up with stale i_mode information if an fd is re-used
      while the dentry still hangs around.  Not that anything really *cares*
      (symlink permissions don't really matter), but Tetsuo Handa noticed that
      the owner read/write bits don't always match the state of the
      readability of the file descriptor, and we _used_ to get this right a
      long time ago in a galaxy far, far away.
      
      Besides, aside from fixing an ugly detail (that has apparently been this
      way since commit 61a28784: "proc: Remove the hard coded inode
      numbers" in 2006), this removes more lines of code than it adds.  And it
      just makes sense to update i_mode in the same place we update i_uid/gid.
      
      Al Viro correctly points out that we could just do the inode fill in the
      inode iops ->getattr() function instead.  However, that does require
      somewhat slightly more invasive changes, and adds yet *another* lookup
      of the file descriptor.  We need to do the revalidate() for other
      reasons anyway, and have the file descriptor handy, so we might as well
      fill in the information at this point.
      Reported-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarEric Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30a08bf2
    • Linus Torvalds's avatar
      Merge tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 3d994497
      Linus Torvalds authored
      Pull a machine check recovery fix from Tony Luck.
      
      I really don't like how the MCE code does some of the things it does,
      but this does seem to be an improvement.
      
      * tag 'linus-mce-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        x86/mce: Only restart instruction after machine check recovery if it is safe
      3d994497
  2. 17 May, 2012 20 commits
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm · 42ea7d7f
      Linus Torvalds authored
      Pull ARM fixes from Russell King:
       "Small set of fixes again."
      
      * 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
        ARM: 7419/1: vfp: fix VFP flushing regression on sigreturn path
        ARM: 7418/1: LPAE: fix access flag setup in mem_type_table
        ARM: prevent VM_GROWSDOWN mmaps extending below FIRST_USER_ADDRESS
        ARM: 7417/1: vfp: ensure preemption is disabled when enabling VFP access
      42ea7d7f
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 39c20285
      Linus Torvalds authored
      Pull two networking fixes from David S. Miller:
      
      1) Thanks to Willy Tarreau and Eric Dumazet, we've unlocked a bug that's
         been present in do_tcp_sendpages() since that function was written in
         2002.
      
         When we block to wait for memory we have to unconditionally try and
         push out pending TCP data, otherwise we can block for an unreasonably
         long amount of time.
      
      2) Fix deadlock in e1000, fixes kernel bugzilla 43132
      
         From Tushar Dave.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        e1000: Prevent reset task killing itself.
        tcp: do_tcp_sendpages() must try to push data out on oom conditions
      39c20285
    • Rafael J. Wysocki's avatar
      ACPI / PCI / PM: Fix device PM regression related to D3hot/D3cold · 5c7dd710
      Rafael J. Wysocki authored
      Commit 1cc0c998 ("ACPI: Fix D3hot v D3cold confusion") introduced a
      bug in __acpi_bus_set_power() and changed the behavior of
      acpi_pci_set_power_state() in such a way that it generally doesn't work
      as expected if PCI_D3hot is passed to it as the second argument.
      
      First off, if ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) is passed to
      __acpi_bus_set_power() and the explicit_set flag is set for the D3cold
      state, the function will try to execute AML method called "_PS4", which
      doesn't exist.
      
      Fix this by adding a check to ensure that the name of the AML method
      to execute for transitions to ACPI_STATE_D3_COLD is correct in
      __acpi_bus_set_power().  Also make sure that the explicit_set flag
      for ACPI_STATE_D3_COLD will be set if _PS3 is present and modify
      acpi_power_transition() to avoid accessing power resources for
      ACPI_STATE_D3_COLD, because they don't exist.
      
      Second, if PCI_D3hot is passed to acpi_pci_set_power_state() as the
      target state, the function will request a transition to
      ACPI_STATE_D3_HOT instead of ACPI_STATE_D3.  However,
      ACPI_STATE_D3_HOT is now only marked as supported if the _PR3 AML
      method is defined for the given device, which is rare.  This causes
      problems to happen on systems where devices were successfully put
      into ACPI D3 by pci_set_power_state(PCI_D3hot) which doesn't work
      now.  In particular, some unused graphics adapters are not turned
      off as a result.
      
      To fix this issue restore the old behavior of
      acpi_pci_set_power_state(), which is to request a transition to
      ACPI_STATE_D3 (equal to ACPI_STATE_D3_COLD) if either PCI_D3hot or
      PCI_D3cold is passed to it as the argument.
      
      This approach is not ideal, because generally power should not
      be removed from devices if PCI_D3hot is the target power state,
      but since this behavior is relied on, we have no choice but to
      restore it at the moment and spend more time on designing a
      better solution in the future.
      
      References: https://bugzilla.kernel.org/show_bug.cgi?id=43228Reported-by: default avatarrocko <rockorequin@hotmail.com>
      Reported-by: default avatarCristian Rodríguez <crrodriguez@opensuse.org>
      Reported-and-tested-by: default avatarPeter <lekensteyn@gmail.com>
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5c7dd710
    • Tushar Dave's avatar
      e1000: Prevent reset task killing itself. · 8ce6909f
      Tushar Dave authored
      Killing reset task while adapter is resetting causes deadlock.
      Only kill reset task if adapter is not resetting.
      Ref bug #43132 on bugzilla.kernel.org
      
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarTushar Dave <tushar.n.dave@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ce6909f
    • Willy Tarreau's avatar
      tcp: do_tcp_sendpages() must try to push data out on oom conditions · bad115cf
      Willy Tarreau authored
      Since recent changes on TCP splicing (starting with commits 2f533844
      "tcp: allow splice() to build full TSO packets" and 35f9c09f "tcp:
      tcp_sendpages() should call tcp_push() once"), I started seeing
      massive stalls when forwarding traffic between two sockets using
      splice() when pipe buffers were larger than socket buffers.
      
      Latest changes (net: netdev_alloc_skb() use build_skb()) made the
      problem even more apparent.
      
      The reason seems to be that if do_tcp_sendpages() fails on out of memory
      condition without being able to send at least one byte, tcp_push() is not
      called and the buffers cannot be flushed.
      
      After applying the attached patch, I cannot reproduce the stalls at all
      and the data rate it perfectly stable and steady under any condition
      which previously caused the problem to be permanent.
      
      The issue seems to have been there since before the kernel migrated to
      git, which makes me think that the stalls I occasionally experienced
      with tux during stress-tests years ago were probably related to the
      same issue.
      
      This issue was first encountered on 3.0.31 and 3.2.17, so please backport
      to -stable.
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: <stable@vger.kernel.org>
      bad115cf
    • Linus Torvalds's avatar
      Merge branch '3.4-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · eea03647
      Linus Torvalds authored
      Pull two more target-core updates from Nicholas Bellinger:
       "The first patch addresses a SPC-2 reservations RELEASE bug in a
        special (iscsi specific) multi-ISID setup case that was allowing the
        same initiator to be able to incorrect release it's own reservation on
        a different SCSI path with enforce_pr_isid=1 operation.  This bug was
        caught by Bernhard Kohl.
      
        The second patch is to address a bug with FILEIO backends where the
        incorrect number of blocks for READ_CAPACITY was being reported after
        an underlying device-mapper block_device size change.  This patch uses
        now i_size_read() in fd_get_blocks() for FILEIO backends with an
        underlying block_device, instead of trying to determine this value at
        setup time during fd_create_virtdevice().  (hch CC'ed)
      
        Both are CC'ed to stable."
      
      * '3.4-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        target: Fix bug in handling of FILEIO + block_device resize ops
        target: Fix SPC-2 RELEASE bug for multi-session iSCSI client setups
      eea03647
    • Nicholas Bellinger's avatar
      target: Fix bug in handling of FILEIO + block_device resize ops · cd9323fd
      Nicholas Bellinger authored
      This patch fixes a bug in the handling of FILEIO w/ underlying block_device
      resize operations where the original fd_dev->fd_dev_size was incorrectly being
      used in fd_get_blocks() for READ_CAPACITY response payloads.
      
      This patch avoids using fd_dev->fd_dev_size for FILEIO devices with
      an underlying block_device, and instead changes fd_get_blocks() to
      get the sector count directly from i_size_read() as recommended by hch.
      Reported-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      cd9323fd
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma · 1be5f0b7
      Linus Torvalds authored
      Pull slave-dmaengine fixes fromVinod Koul:
       "fixes of cylic dma usages in slave dma drivers"
      
      * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: fix cyclic dma usage
        dmaengine: pl330: dont complete descriptor for cyclic dma
      1be5f0b7
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 42e8b9c0
      Linus Torvalds authored
      Pull last minute virtio fixes from Michael S. Tsirkin:
       "Here are a couple of last minute virtio fixes for 3.4.  Hope it's not
        too late yes - I might have tried too hard to make sure the fix is
        well tested.
      
        Fixes are by Amit and myself.  One fixes module removal and one
        suspend of a VM, the last one the handling of out of memory condition.
      
        They are thus very low risk as most people never hit these paths, but
        do fix very annoying problems for people that do use the feature.
      
        Signed-off-by: Michael S. Tsirkin <mst@redhat.com>"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio_net: invoke softirqs after __napi_schedule
        virtio: balloon: let host know of updated balloon size before module removal
        virtio: console: tell host of open ports after resume from s3/s4
      42e8b9c0
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 674ff517
      Linus Torvalds authored
      Pull ARM: SoC fixes from Olof Johansson:
       "I will stop trying to predict when we're done with fixes for a
        release.
      
        Here's another small batch of three patches for arm-soc:
      
         - A fix for a boot time WARN_ON() due to irq domain conversion on
           PRIMA2
         - Fix for a regression in Tegra SMP spinup code due to swapped
           register offsets
         - Fixed config dependency for mv_cesa crypto driver to avoid build
           breakage"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        ARM: PRIMA2: fix irq domain size and IRQ mask of internal interrupt controller
        crypto: mv_cesa requires on CRYPTO_HASH to build
        ARM: tegra: Fix flow controller accesses
      674ff517
    • Linus Torvalds's avatar
      Merge tag 'md-3.4-fixes' of git://neil.brown.name/md · 36a1987c
      Linus Torvalds authored
      Pull two md fixes from NeilBrown:
       "One fixes a bug in the new raid10 resize code so is relevant to 3.4
        only.
      
        The other fixes a bug in the use of md by dm-raid, so is relevant to
        any kernel with dm-raid support"
      
      * tag 'md-3.4-fixes' of git://neil.brown.name/md:
        MD: Add del_timer_sync to mddev_suspend (fix nasty panic)
        md/raid10: set dev_sectors properly when resizing devices in array.
      36a1987c
    • Linus Torvalds's avatar
      Merge branches 'perf-urgent-for-linus', 'x86-urgent-for-linus' and... · 31ae9835
      Linus Torvalds authored
      Merge branches 'perf-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
      
      Pull perf, x86 and scheduler updates from Ingo Molnar.
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tracing: Do not enable function event with enable
        perf stat: handle ENXIO error for perf_event_open
        perf: Turn off compiler warnings for flex and bison generated files
        perf stat: Fix case where guest/host monitoring is not supported by kernel
        perf build-id: Fix filename size calculation
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, kvm: KVM paravirt kernels don't check for CPUID being unavailable
        x86: Fix section annotation of acpi_map_cpu2node()
        x86/microcode: Ensure that module is only loaded on supported Intel CPUs
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched: Fix KVM and ia64 boot crash due to sched_groups circular linked list assumption
      31ae9835
    • Will Deacon's avatar
      ARM: 7419/1: vfp: fix VFP flushing regression on sigreturn path · 56cb2484
      Will Deacon authored
      Commit ff9a184c ("ARM: 7400/1: vfp: clear fpscr length and stride bits
      on entry to sig handler") flushes the VFP state prior to entering a
      signal handler so that a VFP operation inside the handler will trap and
      force a restore of ABI-compliant registers. Reflushing and disabling VFP
      on the sigreturn path is predicated on the saved thread state indicating
      that VFP was used by the handler -- however for SMP platforms this is
      only set on context-switch, making the check unreliable and causing VFP
      register corruption in userspace since the register values are not
      necessarily those restored from the sigframe.
      
      This patch unconditionally flushes the VFP state after a signal handler.
      Since we already perform the flush before the handler and the flushing
      itself happens lazily, the redundant flush when VFP is not used by the
      handler is essentially a nop.
      Reported-by: default avatarJon Medhurst <tixy@linaro.org>
      Signed-off-by: default avatarJon Medhurst <tixy@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      56cb2484
    • Vitaly Andrianov's avatar
      ARM: 7418/1: LPAE: fix access flag setup in mem_type_table · 1a3abcf4
      Vitaly Andrianov authored
      A zero value for prot_sect in the memory types table implies that
      section mappings should never be created for the memory type in question.
      This is checked for in alloc_init_section().
      
      With LPAE, we set a bit to mask access flag faults for kernel mappings.
      This breaks the aforementioned (!prot_sect) check in alloc_init_section().
      
      This patch fixes this bug by first checking for a non-zero
      prot_sect before setting the PMD_SECT_AF flag.
      Signed-off-by: default avatarVitaly Andrianov <vitalya@ti.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      1a3abcf4
    • Michael S. Tsirkin's avatar
      virtio_net: invoke softirqs after __napi_schedule · ec13ee80
      Michael S. Tsirkin authored
      __napi_schedule might raise softirq but nothing
      causes do_softirq to trigger, so it does not in fact
      run. As a result,
      the error message "NOHZ: local_softirq_pending 08"
      sometimes occurs during boot of a KVM guest when the network service is
      started and we are oom:
      
        ...
        Bringing up loopback interface:  [  OK  ]
        Bringing up interface eth0:
        Determining IP information for eth0...NOHZ: local_softirq_pending 08
         done.
        [  OK  ]
        ...
      
      Further, receive queue processing might get delayed
      indefinitely until some interrupt triggers:
      virtio_net expected napi to be run immediately.
      
      One way to cause do_softirq to be executed is by
      invoking local_bh_enable(). As __napi_schedule is
      normally called from bh or irq context, this
      seems to make sense: disable bh before __napi_schedule
      and enable afterwards.
      
      In fact it's a very complicated way of calling do_softirq(),
      and works since this function is only used when we are not
      in interrupt context.  It's not hot at all, in any ideal scenario.
      Reported-by: default avatarUlrich Obergfell <uobergfe@redhat.com>
      Tested-by: default avatarUlrich Obergfell <uobergfe@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      ec13ee80
    • Amit Shah's avatar
      virtio: balloon: let host know of updated balloon size before module removal · b8ae0eb3
      Amit Shah authored
      When the balloon module is removed, we deflate the balloon, reclaiming
      all the pages that were given to the host.  However, we don't update the
      config values for the new balloon size, resulting in the host showing
      outdated balloon values.
      
      The size update is done after each leak and fill operation, only the
      module removal case was left out.
      Signed-off-by: default avatarAmit Shah <amit.shah@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      b8ae0eb3
    • Amit Shah's avatar
      virtio: console: tell host of open ports after resume from s3/s4 · fa8b66cc
      Amit Shah authored
      If a port was open before going into one of the sleep states, the port
      can continue normal operation after restore.  However, the host has to
      be told that the guest side of the connection is open to restore
      pre-suspend state.
      
      This wasn't noticed so far due to a bug in qemu that was fixed recently
      (which marked the guest-side connection as always open).
      
      CC: stable@vger.kernel.org   # Only for 3.3
      Signed-off-by: default avatarAmit Shah <amit.shah@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      fa8b66cc
    • Barry Song's avatar
      ARM: PRIMA2: fix irq domain size and IRQ mask of internal interrupt controller · ad3b8a83
      Barry Song authored
      the old codes will cause 3.4 kernel warning as irq domain size is wrong:
      ------------[ cut here ]------------
      WARNING: at kernel/irq/irqdomain.c:74 irq_domain_legacy_revmap+0x24/0x48()
      Modules linked in:
      [<c0013f50>] (unwind_backtrace+0x0/0xf8) from [<c001e7d8>] (warn_slowpath_common+0x54/0x64)
      [<c001e7d8>] (warn_slowpath_common+0x54/0x64) from [<c001e804>] (warn_slowpath_null+0x1c/0x24)
      [<c001e804>] (warn_slowpath_null+0x1c/0x24) from [<c005c3c4>] (irq_domain_legacy_revmap+0x24/0x48)
      [<c005c3c4>] (irq_domain_legacy_revmap+0x24/0x48) from [<c005c704>] (irq_create_mapping+0x20/0x120)
      [<c005c704>] (irq_create_mapping+0x20/0x120) from [<c005c880>] (irq_create_of_mapping+0x7c/0xf0)
      [<c005c880>] (irq_create_of_mapping+0x7c/0xf0) from [<c01a6c48>] (irq_of_parse_and_map+0x2c/0x34)
      [<c01a6c48>] (irq_of_parse_and_map+0x2c/0x34) from [<c01a6c68>] (of_irq_to_resource+0x18/0x74)
      [<c01a6c68>] (of_irq_to_resource+0x18/0x74) from [<c01a6ce8>] (of_irq_count+0x24/0x34)
      [<c01a6ce8>] (of_irq_count+0x24/0x34) from [<c01a7220>] (of_device_alloc+0x58/0x158)
      [<c01a7220>] (of_device_alloc+0x58/0x158) from [<c01a735c>] (of_platform_device_create_pdata+0x3c/0x80)
      [<c01a735c>] (of_platform_device_create_pdata+0x3c/0x80) from [<c01a7468>] (of_platform_bus_create+0xc8/0x190)
      [<c01a7468>] (of_platform_bus_create+0xc8/0x190) from [<c01a74cc>] (of_platform_bus_create+0x12c/0x190)
      ---[ end trace 1b75b31a2719ed32 ]---
      Signed-off-by: default avatarBarry Song <Baohua.Song@csr.com>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      ad3b8a83
    • Jonathan Brassow's avatar
      MD: Add del_timer_sync to mddev_suspend (fix nasty panic) · 0d9f4f13
      Jonathan Brassow authored
      Use del_timer_sync to remove timer before mddev_suspend finishes.
      
      We don't want a timer going off after an mddev_suspend is called.  This is
      especially true with device-mapper, since it can call the destructor function
      immediately following a suspend.  This results in the removal (kfree) of the
      structures upon which the timer depends - resulting in a very ugly panic.
      Therefore, we add a del_timer_sync to mddev_suspend to prevent this.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      0d9f4f13
    • NeilBrown's avatar
      md/raid10: set dev_sectors properly when resizing devices in array. · 6508fdbf
      NeilBrown authored
      raid10 stores dev_sectors in 'conf' separately from the one in
      'mddev' because it can have a very significant effect on block
      addressing and so need to be updated carefully.
      
      However raid10_resize isn't updating it at all!
      
      To update it correctly, we need to make sure it is a proper
      multiple of the chunksize taking various details of the layout
      in to account.
      This calculation is currently done in setup_conf.   So split it
      out from there and call it from raid10_resize as well.
      Then set conf->dev_sectors properly.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      6508fdbf
  3. 16 May, 2012 18 commits