Commits · 892a925e42adb8192a3c832ad29cbc780fc466f6 · Kirill Smelkov / linux

02 Dec, 2012 2 commits

8139cp: fix coherent mapping leak in error path. · 892a925e

françois romieu authored Dec 01, 2012

cp_open
[...]
        rc = cp_alloc_rings(cp);
        if (rc)
                return rc;

cp_alloc_rings
[...]
        mem = dma_alloc_coherent(&cp->pdev->dev, CP_RING_BYTES,
                                 &cp->ring_dma, GFP_KERNEL);

- cp_alloc_rings never frees the coherent mapping it allocates
- neither do cp_open when cp_alloc_rings fails
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

892a925e

tcp: fix crashes in do_tcp_sendpages() · 64022d0b

Eric Dumazet authored Dec 01, 2012

Recent network changes allowed high order pages being used
for skb fragments.

This uncovered a bug in do_tcp_sendpages() which was assuming its caller
provided an array of order-0 page pointers.

We only have to deal with a single page in this function, and its order
is irrelevant.
Reported-by: Willy Tarreau <w@1wt.eu>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

64022d0b

30 Nov, 2012 1 commit

Merge branch 'master' of... · 9f8933e9

John W. Linville authored Nov 30, 2012

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem

9f8933e9

29 Nov, 2012 4 commits

bonding: fix race condition in bonding_store_slaves_active · e196c0e5

nikolay@redhat.com authored Nov 29, 2012

Race between bonding_store_slaves_active() and slave manipulation
 functions. The bond_for_each_slave use in bonding_store_slaves_active()
 is not protected by any synchronization mechanism.
 NULL pointer dereference is easy to reach.
 Fixed by acquiring the bond->lock for the slave walk.

 v2: Make description text < 75 columns
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e196c0e5

bonding: make arp_ip_target parameter checks consistent with sysfs · 90fb6250

nikolay@redhat.com authored Nov 29, 2012

The module can be loaded with arp_ip_target="255.255.255.255" which makes
 it impossible to remove as the function in sysfs checks for that value,
 so we make the parameter checks consistent with sysfs.

 v2: Fix formatting
 v3: Make description text < 75 columns
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

90fb6250

bonding: fix miimon and arp_interval delayed work race conditions · fbb0c41b

nikolay@redhat.com authored Nov 29, 2012

First I would give three observations which will be used later.
Observation 1: if (delayed_work_pending(wq)) cancel_delayed_work(wq)
 This usage is wrong because the pending bit is cleared just before the
 work's fn is executed and if the function re-arms itself we might end up
 with the work still running. It's safe to call cancel_delayed_work_sync()
 even if the work is not queued at all.
Observation 2: Use of INIT_DELAYED_WORK()
 Work needs to be initialized only once prior to (de/en)queueing.
Observation 3: IFF_UP is set only after ndo_open is called

Related race conditions:
1. Race between bonding_store_miimon() and bonding_store_arp_interval()
 Because of Obs.1 we can end up having both works enqueued.
2. Multiple races with INIT_DELAYED_WORK()
 Since the works are not protected by anything between INIT_DELAYED_WORK()
 and calls to (en/de)queue it is possible for races between the following
 functions:
 (races are also possible between the calls to INIT_DELAYED_WORK()
  and workqueue code)
 bonding_store_miimon() - bonding_store_arp_interval(), bond_close(),
			  bond_open(), enqueued functions
 bonding_store_arp_interval() - bonding_store_miimon(), bond_close(),
				bond_open(), enqueued functions
3. By Obs.1 we need to change bond_cancel_all()

Bugs 1 and 2 are fixed by moving all work initializations in bond_open
which by Obs. 2 and Obs. 3 and the fact that we make sure that all works
are cancelled in bond_close(), is guaranteed not to have any work
enqueued.
Also RTNL lock is now acquired in bonding_store_miimon/arp_interval so
they can't race with bond_close and bond_open. The opposing work is
cancelled only if the IFF_UP flag is set and it is cancelled
unconditionally. The opposing work is already cancelled if the interface
is down so no need to cancel it again. This way we don't need new
synchronizations for the bonding workqueue. These bugs (and fixes) are
tied together and belong in the same patch.
Note: I have left 1 line intentionally over 80 characters (84) because I
      didn't like how it looks broken down. If you'd prefer it otherwise,
      then simply break it.

 v2: Make description text < 75 columns
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fbb0c41b

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · e9296e89

Linus Torvalds authored Nov 28, 2012

Pull networking fixes from David Miller:
 "Some more fixes trickled in over the past few days:

   1) PIM device names can overflow the IFNAMSIZ buffer unless we
      properly limit the allowed indexes, fix from Eric Dumazet.

   2) Under heavy load we can OOPS in icmp reply processing due to an
      unchecked inet_putpeer() call.  Fix from Neal Cardwell.

   3) SCTP round trip calculations need to use 64-bit math to avoid
      overflows, fix from Schoch Christian.

   4) Fix a memory leak and an error return flub in SCTP and IRDA
      triggerable by userspace.  Fix from Tommi Rantala and found by the
      syscall fuzzer (trinity).

   5) MLX4 driver gives bogus size to memcpy() call, fix from Amir
      Vadai.

   6) Fix length calculation in VHOST descriptor translation, from
      Michael S Tsirkin.

   7) Ambassador ATM driver loops forever while loading firmware, fix
      from Dan Carpenter.

   8) Over MTU packets in openvswitch warn about wrong device, fix from
      Jesse Gross.

   9) Netfilter IPSET's netlink code can overrun a string buffer because
      it's not properly limited to IFNAMSIZ.  Fix from Florian Westphal.

  10) PCAN USB driver sets wrong timestamp in SKB, from Oliver Hartkopp.

  11) Make sure the RX ifindex always has a valid value in the CAN BCM
      driver, even if we haven't received a frame yet.  Fix also from
      Oliver Hartkopp."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  team: fix hw_features setup
  atm: forever loop loading ambassador firmware
  vhost: fix length for cross region descriptor
  irda: irttp: fix memory leak in irttp_open_tsap() error path
  net: qmi_wwan: add Huawei E173
  net/mlx4_en: Can set maxrate only for TC0
  sctp: Error in calculation of RTTvar
  sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscall
  sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails
  net: ipmr: limit MRT_TABLE identifiers
  ipv4: avoid passing NULL to inet_putpeer() in icmpv4_xrlim_allow()
  can: bcm: initialize ifindex for timeouts without previous frame reception
  can: peak_usb: fix hwtstamp assignment
  netfilter: ipset: fix netiface set name overflow
  openvswitch: Store flow key len if ARP opcode is not request or reply.
  openvswitch: Print device when warning about over MTU packets.

e9296e89

28 Nov, 2012 12 commits

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch · a45085f6
David S. Miller authored Nov 28, 2012
```
Two small openswitch fixes from Jesse Gross.
Signed-off-by: David S. Miller <davem@davemloft.net>
```
a45085f6

team: fix hw_features setup · 3ed71471

Jiri Pirko authored Nov 28, 2012

Do this in the same way bonding does. This fixed setup resolves performance
issues when using some cards with certain offloading.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ed71471

atm: forever loop loading ambassador firmware · fcdc90b0

Dan Carpenter authored Nov 27, 2012

There was a forever loop introduced here when we converted this to
request_firmware() back in 2008.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>

fcdc90b0

Merge branch 'master' of git://1984.lsi.us.es/nf · 52f2ede1

David S. Miller authored Nov 28, 2012

An interface name overflow fix in netfilter via Pablo Neira Ayuso.
Signed-off-by: David S. Miller <davem@davemloft.net>

52f2ede1

vhost: fix length for cross region descriptor · bd97120f

Michael S. Tsirkin authored Nov 26, 2012

If a single descriptor crosses a region, the
second chunk length should be decremented
by size translated so far, instead it includes
the full descriptor length.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd97120f

irda: irttp: fix memory leak in irttp_open_tsap() error path · c3b2c258

Tommi Rantala authored Nov 26, 2012

Cleanup the memory we allocated earlier in irttp_open_tsap() when we hit
this error path. The leak goes back to at least 1da177e4
("Linux-2.6.12-rc2").

Discovered with Trinity (the syscall fuzzer).
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3b2c258

net: qmi_wwan: add Huawei E173 · ba695af0

Bjørn Mork authored Nov 25, 2012

The Huawei E173 is a QMI/wwan device which normally appear
as 12d1:1436 in Linux. The descriptors displayed in that
mode will be picked up by cdc_ether.  But the modem has
another mode with a different device ID and a slightly
different set of descriptors. This is the mode used by
Windows like this:

3Modem:      USB\VID_12D1&PID_140C&MI_00\6&3A1D2012&0&0000
Networkcard: USB\VID_12D1&PID_140C&MI_01\6&3A1D2012&0&0001
Appli.Inter: USB\VID_12D1&PID_140C&MI_02\6&3A1D2012&0&0002
PC UI Inter: USB\VID_12D1&PID_140C&MI_03\6&3A1D2012&0&0003
Reported-by: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

ba695af0

net/mlx4_en: Can set maxrate only for TC0 · 29bb8f4a

Amir Vadai authored Nov 28, 2012

Had a typo in memcpy.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

29bb8f4a

sctp: Error in calculation of RTTvar · 92d64c26

Schoch Christian authored Nov 28, 2012

The calculation of RTTVAR involves the subtraction of two unsigned
numbers which
may causes rollover and results in very high values of RTTVAR when RTT > SRTT.
With this patch it is possible to set RTOmin = 1 to get the minimum of RTO at
4 times the clock granularity.

Change Notes:

v2)
        *Replaced abs() by abs64() and long by __s64, changed patch
description.
Signed-off-by: Christian Schoch <e0326715@student.tuwien.ac.at>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: Neil Horman <nhorman@tuxdriver.com>
CC: linux-sctp@vger.kernel.org
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

92d64c26

sctp: fix -ENOMEM result with invalid user space pointer in sendto() syscall · 6e51fe75

Tommi Rantala authored Nov 22, 2012

Consider the following program, that sets the second argument to the
sendto() syscall incorrectly:

 #include <string.h>
 #include <arpa/inet.h>
 #include <sys/socket.h>

 int main(void)
 {
         int fd;
         struct sockaddr_in sa;

         fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
         if (fd < 0)
                 return 1;

         memset(&sa, 0, sizeof(sa));
         sa.sin_family = AF_INET;
         sa.sin_addr.s_addr = inet_addr("127.0.0.1");
         sa.sin_port = htons(11111);

         sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));

         return 0;
 }

We get -ENOMEM:

 $ strace -e sendto ./demo
 sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ENOMEM (Cannot allocate memory)

Propagate the error code from sctp_user_addto_chunk(), so that we will
tell user space what actually went wrong:

 $ strace -e sendto ./demo
 sendto(3, NULL, 1, 0, {sa_family=AF_INET, sin_port=htons(11111), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address)

Noticed while running Trinity (the syscall fuzzer).
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6e51fe75

sctp: fix memory leak in sctp_datamsg_from_user() when copy from user space fails · be364c8c

Tommi Rantala authored Nov 27, 2012

Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
reproducible e.g. with the sendto() syscall by passing invalid
user space pointer in the second argument:

 #include <string.h>
 #include <arpa/inet.h>
 #include <sys/socket.h>

 int main(void)
 {
         int fd;
         struct sockaddr_in sa;

         fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
         if (fd < 0)
                 return 1;

         memset(&sa, 0, sizeof(sa));
         sa.sin_family = AF_INET;
         sa.sin_addr.s_addr = inet_addr("127.0.0.1");
         sa.sin_port = htons(11111);

         sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));

         return 0;
 }

As far as I can tell, the leak has been around since ~2003.
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be364c8c

percpu-rwsem: use synchronize_sched_expedited · 4b05a1c7

Mikulas Patocka authored Nov 27, 2012

Use synchronize_sched_expedited() instead of synchronize_sched()
to improve mount speed.

This patch improves mount time from 0.500s to 0.013s for Jeff's
test-case.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4b05a1c7

27 Nov, 2012 14 commits

Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · e23739b4

Linus Torvalds authored Nov 27, 2012

Pull media fixes from Mauro Carvalho Chehab:
 "For some media fixes:
   - dvb_usb_v2: some fixes at the core
   - Some fixes on some embedded drivers: soc_camera, adv7604, omap3isp,
     exynos/s5p
   - Several Exynos4/5 camera fixes
   - a fix at stv0900 driver
   - a few USB ID additions to detect more variants of rtl28xxu-based
     sticks"

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (25 commits)
  [media] rtl28xxu: 0ccd:00d7 TerraTec Cinergy T Stick+
  [media] rtl28xxu: 1d19:1102 Dexatek DK mini DVB-T Dongle
  [media] mt9v022: fix the V4L2_CID_EXPOSURE control
  [media] mx2_camera: fix missing unlock on error in mx2_start_streaming()
  [media] media: omap1_camera: fix const cropping related warnings
  [media] media: mx1_camera: use the default .set_crop() implementation
  [media] media: mx2_camera: fix const cropping related warnings
  [media] media: mx3_camera: fix const cropping related warnings
  [media] media: pxa_camera: fix const cropping related warnings
  [media] media: sh_mobile_ceu_camera: fix const cropping related warnings
  [media] media: sh_vou: fix const cropping related warnings
  [media] adv7604: restart STDI once if format is not found
  [media] adv7604: use presets where possible
  [media] adv7604: Replace prim_mode by mode
  [media] adv7604: cleanup references
  [media] dvb_usb_v2: switch interruptible mutex to normal
  [media] dvb_usb_v2: fix pid_filter callback error logging
  [media] exynos-gsc: change driver compatible string
  [media] omap3isp: Fix warning caused by bad subdev events operations prototypes
  [media] omap3isp: video: Fix warning caused by bad vidioc_s_crop prototype
  ...

e23739b4

Merge branch 'akpm' (Fixes from Andrew) · 2844a487

Linus Torvalds authored Nov 26, 2012

Merge misc fixes from Andrew Morton:
 "8 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (8 patches)
  futex: avoid wake_futex() for a PI futex_q
  watchdog: using u64 in get_sample_period()
  writeback: put unused inodes to LRU after writeback completion
  mm: vmscan: check for fatal signals iff the process was throttled
  Revert "mm: remove __GFP_NO_KSWAPD"
  proc: check vma->vm_file before dereferencing
  UAPI: strip the _UAPI prefix from header guards during header installation
  include/linux/bug.h: fix sparse warning related to BUILD_BUG_ON_INVALID

2844a487

Merge tag 'tty-3.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 5687100a

Linus Torvalds authored Nov 26, 2012

Pull TTY fix from Greg Kroah-Hartman:
 "Here is a single fix for a reported regression in 3.7-rc5 for the tty
  layer.  This fix has been in the linux-next tree and solves the
  reported problem.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'tty-3.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty vt: Fix a regression in command line edition

5687100a

Merge tag 'mfd-for-linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 · c854539d

Linus Torvalds authored Nov 26, 2012

Pull MFD fixes from Samuel Ortiz:

 - A twl fix preventing a buffer overflow.

 - A wm5102 register patch fix.

 - A wm5110 error misreport fix.

 - Arizona fixes: Use the right array size when adding subdevices,
   correctly report underclocked events, synchronize register cache
   after reset.

 - A twl4030 fix for preventing the system to hang from an interrupt
   flood.

* tag 'mfd-for-linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
  mfd: twl4030: Fix chained irq handling on resume from suspend
  mfd: arizona: Sync regcache after reset
  mfd: arizona: Correctly report when AIF2/AIF1 is underclocked
  mfd: arizona: Use correct array for ARRAY_SIZE in mfd_add_devices call
  mfd: wm5110: Disable control interface error report for WM5110 rev B
  mfd: wm5102: Update register patch for latest evaluation
  mfd: twl-core: Fix chip ID for the twl6030-pwm module

c854539d

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm · 33057692

Linus Torvalds authored Nov 26, 2012

Pull ARM fixes from Russell King:
 "Not much here, just a couple minor/cosmetic fixes and a patch for the
  decompressor which fixes problems with modern GCC and CPUs."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7583/1: decompressor: Enable unaligned memory access for v6 and above
  ARM: 7572/1: proc-v6.S: fix comment
  ARM: 7570/1: quiet down the non make -s output

33057692

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · 87726c33

Linus Torvalds authored Nov 26, 2012

Pull ext3 regression fix from Jan Kara:
 "Fix an ext3 regression introduced during 3.7 merge window.  It leads
  to deadlock if you stress the filesystem in the right way (luckily
  only if blocksize < pagesize)."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  jbd: Fix lock ordering bug in journal_unmap_buffer()

87726c33

futex: avoid wake_futex() for a PI futex_q · aa10990e

Darren Hart authored Nov 26, 2012

Dave Jones reported a bug with futex_lock_pi() that his trinity test
exposed.  Sometime between queue_me() and taking the q.lock_ptr, the
lock_ptr became NULL, resulting in a crash.

While futex_wake() is careful to not call wake_futex() on futex_q's with
a pi_state or an rt_waiter (which are either waiting for a
futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and
futex_requeue() do not perform the same test.

Update futex_wake_op() and futex_requeue() to test for q.pi_state and
q.rt_waiter and abort with -EINVAL if detected.  To ensure any future
breakage is caught, add a WARN() to wake_futex() if the same condition
is true.

This fix has seen 3 hours of testing with "trinity -c futex" on an
x86_64 VM with 4 CPUS.

[akpm@linux-foundation.org: tidy up the WARN()]
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Reported-by: Dave Jones <davej@redat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Kacur <jkacur@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

aa10990e

watchdog: using u64 in get_sample_period() · 8ffeb9b0

Chuansheng Liu authored Nov 26, 2012

In get_sample_period(), unsigned long is not enough:

  watchdog_thresh * 2 * (NSEC_PER_SEC / 5)

case1:
  watchdog_thresh is 10 by default, the sample value will be: 0xEE6B2800

case2:
 set watchdog_thresh is 20, the sample value will be: 0x1 DCD6 5000

In case2, we need use u64 to express the sample period.  Otherwise,
changing the threshold thru proc often can not be successful.
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8ffeb9b0

writeback: put unused inodes to LRU after writeback completion · 4eff96dd

Jan Kara authored Nov 26, 2012

Commit 169ebd90 ("writeback: Avoid iput() from flusher thread")
removed iget-iput pair from inode writeback.  As a side effect, inodes
that are dirty during iput_final() call won't be ever added to inode LRU
(iput_final() doesn't add dirty inodes to LRU and later when the inode
is cleaned there's noone to add the inode there).  Thus inodes are
effectively unreclaimable until someone looks them up again.

The practical effect of this bug is limited by the fact that inodes are
pinned by a dentry for long enough that the inode gets cleaned.  But
still the bug can have nasty consequences leading up to OOM conditions
under certain circumstances.  Following can easily reproduce the
problem:

  for (( i = 0; i < 1000; i++ )); do
    mkdir $i
    for (( j = 0; j < 1000; j++ )); do
      touch $i/$j
      echo 2 > /proc/sys/vm/drop_caches
    done
  done

then one needs to run 'sync; ls -lR' to make inodes reclaimable again.

We fix the issue by inserting unused clean inodes into the LRU after
writeback finishes in inode_sync_complete().
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: <stable@vger.kernel.org>		[3.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

4eff96dd

mm: vmscan: check for fatal signals iff the process was throttled · 50694c28

Mel Gorman authored Nov 26, 2012

Commit 5515061d ("mm: throttle direct reclaimers if PF_MEMALLOC
reserves are low and swap is backed by network storage") introduced a
check for fatal signals after a process gets throttled for network
storage.  The intention was that if a process was throttled and got
killed that it should not trigger the OOM killer.  As pointed out by
Minchan Kim and David Rientjes, this check is in the wrong place and too
broad.  If a system is in am OOM situation and a process is exiting, it
can loop in __alloc_pages_slowpath() and calling direct reclaim in a
loop.  As the fatal signal is pending it returns 1 as if it is making
forward progress and can effectively deadlock.

This patch moves the fatal_signal_pending() check after throttling to
throttle_direct_reclaim() where it belongs.  If the process is killed
while throttled, it will return immediately without direct reclaim
except now it will have TIF_MEMDIE set and will use the PFMEMALLOC
reserves.

Minchan pointed out that it may be better to direct reclaim before
returning to avoid using the reserves because there may be pages that
can easily reclaim that would avoid using the reserves.  However, we do
no such targetted reclaim and there is no guarantee that suitable pages
are available.  As it is expected that this throttling happens when
swap-over-NFS is used there is a possibility that the process will
instead swap which may allocate network buffers from the PFMEMALLOC
reserves.  Hence, in the swap-over-nfs case where a process can be
throtted and be killed it can use the reserves to exit or it can
potentially use reserves to swap a few pages and then exit.  This patch
takes the option of using the reserves if necessary to allow the process
exit quickly.

If this patch passes review it should be considered a -stable candidate
for 3.6.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

50694c28

Revert "mm: remove __GFP_NO_KSWAPD" · 82b212f4

Mel Gorman authored Nov 26, 2012

With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
based on failures" reverted, Zdenek Kabelac reported the following

  Hmm,  so it's just took longer to hit the problem and observe
  kswapd0 spinning on my CPU again - it's not as endless like before -
  but still it easily eats minutes - it helps to	turn off  Firefox
  or TB  (memory hungry apps) so kswapd0 stops soon - and restart
  those apps again.  (And I still have like >1GB of cached memory)

  kswapd0         R  running task        0    30      2 0x00000000
  Call Trace:
    preempt_schedule+0x42/0x60
    _raw_spin_unlock+0x55/0x60
    put_super+0x31/0x40
    drop_super+0x22/0x30
    prune_super+0x149/0x1b0
    shrink_slab+0xba/0x510

The sysrq+m indicates the system has no swap so it'll never reclaim
anonymous pages as part of reclaim/compaction.  That is one part of the
problem but not the root cause as file-backed pages could also be
reclaimed.

The likely underlying problem is that kswapd is woken up or kept awake
for each THP allocation request in the page allocator slow path.

If compaction fails for the requesting process then compaction will be
deferred for a time and direct reclaim is avoided.  However, if there
are a storm of THP requests that are simply rejected, it will still be
the the case that kswapd is awake for a prolonged period of time as
pgdat->kswapd_max_order is updated each time.  This is noticed by the
main kswapd() loop and it will not call kswapd_try_to_sleep().  Instead
it will loopp, shrinking a small number of pages and calling
shrink_slab() on each iteration.

The temptation is to supply a patch that checks if kswapd was woken for
THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not
backed up by proper testing.  As 3.7 is very close to release and this
is not a bug we should release with, a safer path is to revert "mm:
remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing
out the balance_pgdat() logic in general.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

82b212f4

proc: check vma->vm_file before dereferencing · 05f56484

Stanislav Kinsbursky authored Nov 26, 2012

Commit 7b540d06 ("proc_map_files_readdir(): don't bother with
grabbing files") switched proc_map_files_readdir() to use @f_mode
directly instead of grabbing @file reference, but same time the test for
@vm_file presence was lost leading to nil dereference.  The patch brings
the test back.

The all proc_map_files feature is CONFIG_CHECKPOINT_RESTORE wrapped
(which is set to 'n' by default) so the bug doesn't affect regular
kernels.

The regression is 3.7-rc1 only as far as I can tell.

[gorcunov@openvz.org: provided changelog]
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

05f56484

UAPI: strip the _UAPI prefix from header guards during header installation · 56c176c9

David Howells authored Nov 26, 2012

Strip the _UAPI prefix from header guards during header installation so
that any userspace dependencies aren't affected.  glibc, for example,
checks for linux/types.h, linux/kernel.h, linux/compiler.h and
linux/list.h by their guards - though the last two aren't actually
exported.

  libtool: compile:  gcc -std=gnu99 -DHAVE_CONFIG_H -I. -Wall -Werror -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fno-delete-null-pointer-checks -fstack-protector -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -c child.c  -fPIC -DPIC -o .libs/child.o
  In file included from cli.c:20:0:
  common.h:152:8: error: redefinition of 'struct sysinfo'
  In file included from /usr/include/linux/kernel.h:4:0,
  		 from /usr/include/linux/sysctl.h:25,
  		 from /usr/include/sys/sysctl.h:43,
  		 from common.h:50,
  		 from cli.c:20:
  /usr/include/linux/sysinfo.h:7:8: note: originally defined here
Reported-by: Tomasz Torcz <tomek@pipebreaker.pl>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

56c176c9

include/linux/bug.h: fix sparse warning related to BUILD_BUG_ON_INVALID · c5782e9f

Tushar Behera authored Nov 26, 2012

Commit baf05aa9 ("bug: introduce BUILD_BUG_ON_INVALID() macro")
introduces this macro only when _CHECKER_ is not defined. Define a
silent macro in the else condition to fix following sparse warning:

mm/filemap.c:395:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
mm/filemap.c:396:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
mm/filemap.c:397:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
include/linux/mm.h:419:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
include/linux/mm.h:419:9: error: not a function <noident>
Signed-off-by: Tushar Behera <tushar.behera@linaro.org>
Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

c5782e9f

26 Nov, 2012 7 commits

net: ipmr: limit MRT_TABLE identifiers · b49d3c1e

Eric Dumazet authored Nov 25, 2012

Name of pimreg devices are built from following format :

char name[IFNAMSIZ]; // IFNAMSIZ == 16

sprintf(name, "pimreg%u", mrt->id);

We must therefore limit mrt->id to 9 decimal digits
or risk a buffer overflow and a crash.

Restrict table identifiers in [0 ... 999999999] interval.
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b49d3c1e

ipv4: avoid passing NULL to inet_putpeer() in icmpv4_xrlim_allow() · e1a67642

Neal Cardwell authored Nov 24, 2012

inet_getpeer_v4() can return NULL under OOM conditions, and while
inet_peer_xrlim_allow() is OK with a NULL peer, inet_putpeer() will
crash.

This code path now uses the same idiom as the others from:
1d861aa4 ("inet: Minimize use of
cached route inetpeer.").
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1a67642

can: bcm: initialize ifindex for timeouts without previous frame reception · 81b40110

Oliver Hartkopp authored Nov 26, 2012

Set in the rx_ifindex to pass the correct interface index in the case of a
message timeout detection. Usually the rx_ifindex value is set at receive
time. But when no CAN frame has been received the RX_TIMEOUT notification
did not contain a valid value.

Cc: linux-stable <stable@vger.kernel.org>
Reported-by: Andre Naujoks <nautsch2@googlemail.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

81b40110

can: peak_usb: fix hwtstamp assignment · c9faaa09

Oliver Hartkopp authored Nov 21, 2012

The skb->tstamp is set to the hardware timestamp when available in the USB
urb message. This leads to user visible timestamps which contain the 'uptime'
of the USB adapter - and not the usual system generated timestamp.

Fix this wrong assignment by applying the available hardware timestamp to the
skb_shared_hwtstamps data structure - which is intended for this purpose.

Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

c9faaa09

mac80211: fix remain-on-channel (non-)cancelling · 6bdd253f

Johannes Berg authored Nov 24, 2012

Felix Liao reported that when an interface is set DOWN
while another interface is executing a ROC, the warning
in ieee80211_start_next_roc() (about the first item on
the list having started already) triggers.

This is because ieee80211_roc_purge() calls it even if
it never actually changed the list of ROC items. To fix
this, simply remove the function call. If it is needed
then it will be done by the ieee80211_sw_roc_work()
function when the ROC item that is being removed while
active is cleaned up.

Cc: stable@vger.kernel.org
Reported-by: Felix Liao <Felix.Liao@watchguard.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

6bdd253f

Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes · 53c52513
John W. Linville authored Nov 26, 2012

53c52513
Linux 3.7-rc7 · 9489e9dc
Linus Torvalds authored Nov 25, 2012

9489e9dc