Commits · 75118a82e21cafb4a82b53bb85d1c7689787e046 · Kirill Smelkov / linux

19 Jun, 2008 9 commits

x86: fix NULL pointer deref in __switch_to · 75118a82

Suresh Siddha authored Jun 13, 2008

Patrick McHardy reported a crash:

> > I get this oops once a day, its apparently triggered by something
> > run by cron, but the process is a different one each time.
> >
> > Kernel is -git from yesterday shortly before the -rc6 release
> > (last commit is the usb-2.6 merge, the x86 patches are missing),
> > .config is attached.
> >
> > I'll retry with current -git, but the patches that have gone in
> > since I last updated don't look related.
> >
> > [62060.043009] BUG: unable to handle kernel NULL pointer dereference at
> > 000001ff
> > [62060.043009] IP: [<c0102a9b>] __switch_to+0x2f/0x118
> > [62060.043009] *pde = 00000000
> > [62060.043009] Oops: 0002 [#1] PREEMPT

Vegard Nossum analyzed it:

> This decodes to
>
>    0:   0f ae 00                fxsave (%eax)
>
> so it's related to the floating-point context. This is the exact
> location of the crash:
>
> $ addr2line -e arch/x86/kernel/process_32.o -i ab0
> include/asm/i387.h:232
> include/asm/i387.h:262
> arch/x86/kernel/process_32.c:595
>
> ...so it looks like prev_task->thread.xstate->fxsave has become NULL.
> Or maybe it never had any other value.

Somehow (as described below) TS_USEDFPU is set but the fpu is not
allocated or freed.

Another possible FPU pre-emption issue with the sleazy FPU optimization
which was benign before but not so anymore, with the dynamic FPU allocation
patch.

New task is getting exec'd and it is prempted at the below point.

flush_thread() {
	...
	/*
	* Forget coprocessor state..
	*/
	clear_fpu(tsk);
		<----- Preemption point
	clear_used_math();
	...
}

Now when it context switches in again, as the used_math() is still set
and fpu_counter can be > 5, we will do a math_state_restore() which sets
the task's TS_USEDFPU. After it continues from the above preemption point
it does clear_used_math() and much later free_thread_xstate().

Now, at the next context switch, it is quite possible that xstate is
null, used_math() is not set and TS_USEDFPU is still set. This will
trigger unlazy_fpu() causing kernel oops.

Fix this  by clearing tsk's fpu_counter before clearing task's fpu.
Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

75118a82

x86: set PAE PHYSICAL_MASK_SHIFT to 44 bits. · ad524d46

Jeremy Fitzhardinge authored Jun 06, 2008

When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
potentially have the same number of physical address bits as the
64-bit host ("Enhanced Legacy PAE Paging").  This means, in theory,
we could have up to 52 bits of physical address in a pte.

The 32-bit kernel uses a 32-bit unsigned long to represent a pfn.
This means that it can only represent physical addresses up to 32+12=44
bits wide.  Rather than widening pfns everywhere, just set 2^44 as the
Linux x86_32-PAE architectural limit for physical address size.

This is a bugfix for two cases:
1. running a 32-bit PAE kernel on a machine with
  more than 64GB RAM.
2. running a 32-bit PAE Xen guest on a host machine with
  more than 64GB RAM

In both cases, a pte could need to have more than 36 bits of physical,
and masking it to 36-bits will cause fairly severe havoc.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

ad524d46

agp: brown paper bag patch - put back two lines that got lost · 9bedbcb2

Dave Airlie authored Jun 19, 2008

Commit 62c96b9d ("agp/intel: cleanup
some serious whitespace badness") didn't just fix whitespace.  It also
lost two lines.

Noticed by Linus. No more whitespace diffs for me.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9bedbcb2

Merge branch 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6 · 3506ba7b

Linus Torvalds authored Jun 18, 2008

* 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
  agp/intel: cleanup some serious whitespace badness
  [AGP] intel_agp: Add support for Intel 4 series chipsets
  [AGP] intel_agp: extra stolen mem size available for IGD_GM chipset
  agp: more boolean conversions.
  drivers/char/agp - use bool
  agp: two-stage page destruction issue
  agp/via: fixup pci ids

3506ba7b

agp/intel: cleanup some serious whitespace badness · 62c96b9d
Dave Airlie authored Jun 19, 2008
```
Signed-off-by: Dave Airlie <airlied@redhat.com>
```
62c96b9d
[AGP] intel_agp: Add support for Intel 4 series chipsets · 25ce77ab
Zhenyu Wang authored Jun 19, 2008
```
Signed-off-by: Zhenyu Wang <zhenyu.z.wang@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
```
25ce77ab

[AGP] intel_agp: extra stolen mem size available for IGD_GM chipset · 598d1448

Zhenyu Wang authored Jun 19, 2008

This adds missing stolen memory size detect for IGD_GM, be sure to
detect right size as current X intel driver (2.3.2) which has already
worked out.
Signed-off-by: Zhenyu Wang <zhenyu.z.wang@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

598d1448

agp: more boolean conversions. · 9516b030
Dave Airlie authored Jun 19, 2008
```
Signed-off-by: Dave Airlie <airlied@redhat.com>
```
9516b030

drivers/char/agp - use bool · c7258012

Joe Perches authored Mar 26, 2008

Use boolean in AGP instead of having own TRUE/FALSE

--
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

c7258012

18 Jun, 2008 30 commits

agp: two-stage page destruction issue · da503fa6

Jan Beulich authored Jun 18, 2008

besides it apparently being useful only in 2.6.24 (the changes in 2.6.25
really mean that it could be converted back to a single-stage mechanism),
I'm seeing an issue in Xen Dom0 kernels, which is caused by the calling
of gart_to_virt() in the second stage invocations of the destroy function.
I think that besides this being a real issue with Xen (where
unmap_page_from_agp() is not just a page table attribute change), this
also is invalid from a theoretical perspective: One should not assume that
gart_to_virt() is still valid after unmapping a page. So minimally (keeping
the 2-stage mechanism) a patch like the one below would be needed.

Jan
Signed-off-by: Dave Airlie <airlied@redhat.com>

da503fa6

agp/via: fixup pci ids · dcd981a7

Greg KH authored Jun 19, 2008

add a new PCI ID and remove an old dodgy one, include the explaination
in the commented code so nobody readds later.

(davej also sent the pci id addition).
Signed-off-by: Dave Airlie <airlied@redhat.com>

dcd981a7

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband · f9d1c6ca

Linus Torvalds authored Jun 18, 2008

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/uverbs: Fix check of is_closed flag check in ib_uverbs_async_handler()
  RDMA/nes: Fix off-by-one in nes_reg_user_mr() error path

f9d1c6ca

IB/uverbs: Fix check of is_closed flag check in ib_uverbs_async_handler() · fb77bcef

Jack Morgenstein authored Jun 18, 2008

Commit 1ae5c187 ("IB/uverbs: Don't store struct file * for event
files") changed the way that closed files are handled in the uverbs
code.  However, after the conversion, is_closed flag is checked
incorrectly in ib_uverbs_async_handler().  As a result, no async
events are ever passed to applications.

Found by: Ronni Zimmerman <ronniz@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

fb77bcef

Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog · a8051fde

Linus Torvalds authored Jun 18, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  Revert "[WATCHDOG] hpwdt: Fix NMI handling."
  [WATCHDOG] hpwdt: Add CFLAGS to get driver working
  Revert "[WATCHDOG] make watchdog/hpwdt.c:asminline_call() static"

a8051fde

Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 · 5dfd0621

Linus Torvalds authored Jun 18, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] dpt_i2o: Add PROC_IA64 define
  [SCSI] scsi_host regression: fix scsi host leak
  [SCSI] sr: fix corrupt CD data after media change and delay

5dfd0621

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc · f32c23f5

Linus Torvalds authored Jun 18, 2008

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] Clear sub-page HPTE present bits when demoting page size
  [POWERPC] 4xx: Clear new TLB cache attribute bits in Data Storage vector

f32c23f5

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 · e8995364
Linus Torvalds authored Jun 18, 2008
```
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
  udf: restore UDFFS_DEBUG to being undefined by default
```
e8995364

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · d83b14c0

Linus Torvalds authored Jun 18, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (43 commits)
  netlink: genl: fix circular locking
  Revert "mac80211: Use skb_header_cloned() on TX path."
  af_unix: fix 'poll for write'/ connected DGRAM sockets
  tun: Proper handling of IPv6 header in tun driver when TUN_NO_PI is set
  atl1: relax eeprom mac address error check
  net/enc28j60: low power mode
  net/enc28j60: section fix
  sky2: 88E8040T pci device id
  netxen: download firmware in pci probe
  netxen: cleanup debug messages
  netxen: remove global physical_port array
  netxen: fix portnum for hp mezz cards
  ibm_newemac: select CRC32 in Kconfig
  xfrm: fix fragmentation for ipv4 xfrm tunnel
  netfilter: nf_conntrack_h323: fix module unload crash
  netfilter: nf_conntrack_h323: fix memory leak in module initialization error path
  netfilter: nf_nat: fix RCU races
  atm: [he] send idle cells instead of unassigned when in SDH mode
  atm: [he] limit queries to the device's register space
  atm: [br2864] fix routed vcmux support
  ...

d83b14c0

Revert "[WATCHDOG] hpwdt: Fix NMI handling." · fdf7be6f

Wim Van Sebroeck authored Jun 18, 2008

The old setup works better.
Signed-off-by: Thomas Mingarelli <Thomas.Mingarelli@hp.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>

fdf7be6f

[POWERPC] Clear sub-page HPTE present bits when demoting page size · 65ba6cdc

Paul Mackerras authored Jun 18, 2008

When we demote a slice from 64k to 4k, and we are about to insert an
HPTE for a 4k subpage and we notice that there is an existing 64k
HPTE, we first invalidate that HPTE before inserting the new 4k
subpage HPTE. Since the bits that encode which hash bucket the old
HPTE was in overlap with the bits that encode which of the 16 subpages
have HPTEs, we need to clear out the subpage HPTE-present bits before
starting to insert HPTEs for the 4k subpages. If we don't do that, we
can erroneously think that a subpage already has an HPTE when it
doesn't.

That in itself wouldn't be such a problem except that when we go to
update the HPTE that we think is present on machines with a
hypervisor, the hypervisor can tell us that the HPTE we think is there
is actually there even though it isn't, which can lead to a process
getting stuck in a loop, continually faulting. The reason for the
confusion is that the AVPN (abbreviated virtual page number) we are
looking for in the HPTE for a 4k subpage can actually match the AVPN
in a stale HPTE for another 64k page. For example, the HPTE for
the 4k subpage at 0x84000f000 will be in the same hash bucket and have
the same AVPN as the HPTE for the 64k page at 0x8400f0000.

This fixes the code to clear out the subpage HPTE-present bits.
Signed-off-by: Paul Mackerras <paulus@samba.org>

65ba6cdc

[POWERPC] 4xx: Clear new TLB cache attribute bits in Data Storage vector · b17879f7

Josh Boyer authored Jun 18, 2008

A recent commit added support for the new 440x6 and 464 cores that have the
added WL1, IL1I, IL1D, IL2I, and ILD2 bits for the caching attributes in the
TLBs. The new bits were cleared in the finish_tlb_load function, however a
similar bit of code was missed in the DataStorage interrupt vector.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>

b17879f7

netlink: genl: fix circular locking · 6d1a3fb5

Patrick McHardy authored Jun 18, 2008

genetlink has a circular locking dependency when dumping the registered
families:

- dump start:
genl_rcv()            : take genl_mutex
genl_rcv_msg()        : call netlink_dump_start() while holding genl_mutex
netlink_dump_start(),
netlink_dump()        : take nlk->cb_mutex
ctrl_dumpfamily()     : try to detect this case and not take genl_mutex a
                        second time

- dump continuance:
netlink_rcv()         : call netlink_dump
netlink_dump          : take nlk->cb_mutex
ctrl_dumpfamily()     : take genl_mutex

Register genl_lock as callback mutex with netlink to fix this. This slightly
widens an already existing module unload race, the genl ops used during the
dump might go away when the module is unloaded. Thomas Graf is working on a
seperate fix for this.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d1a3fb5

Revert "mac80211: Use skb_header_cloned() on TX path." · 3a5be7d4

David S. Miller authored Jun 18, 2008

This reverts commit 608961a5.

The problem is that the mac80211 stack not only needs to be able to
muck with the link-level headers, it also might need to mangle all of
the packet data if doing sw wireless encryption.

This fixes kernel bugzilla #10903.  Thanks to Didier Raboud (for the
bugzilla report), Andrew Prince (for bisecting), Johannes Berg (for
bringing this bisection analysis to my attention), and Ilpo (for
trying to analyze this purely from the TCP side).

In 2.6.27 we can take another stab at this, by using something like
skb_cow_data() when the TX path of mac80211 ends up with a non-NULL
tx->key.  The ESP protocol code in the IPSEC stack can be used as a
model for implementation.
Signed-off-by: David S. Miller <davem@davemloft.net>

3a5be7d4

af_unix: fix 'poll for write'/ connected DGRAM sockets · 3c73419c

Rainer Weikusat authored Jun 17, 2008

The unix_dgram_sendmsg routine implements a (somewhat crude)
form of receiver-imposed flow control by comparing the length of the
receive queue of the 'peer socket' with the max_ack_backlog value
stored in the corresponding sock structure, either blocking
the thread which caused the send-routine to be called or returning
EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET
sockets. The poll-implementation for these socket types is
datagram_poll from core/datagram.c. A socket is deemed to be writeable
by this routine when the memory presently consumed by datagrams
owned by it is less than the configured socket send buffer size. This
is always wrong for connected PF_UNIX non-stream sockets when the
abovementioned receive queue is currently considered to be full.
'poll' will then return, indicating that the socket is writeable, but
a subsequent write result in EAGAIN, effectively causing an
(usual) application to 'poll for writeability by repeated send request
with O_NONBLOCK set' until it has consumed its time quantum.

The change below uses a suitably modified variant of the datagram_poll
routines for both type of PF_UNIX sockets, which tests if the
recv-queue of the peer a socket is connected to is presently
considered to be 'full' as part of the 'is this socket
writeable'-checking code. The socket being polled is additionally
put onto the peer_wait wait queue associated with its peer, because the
unix_dgram_sendmsg routine does a wake up on this queue after a
datagram was received and the 'other wakeup call' is done implicitly
as part of skb destruction, meaning, a process blocked in poll
because of a full peer receive queue could otherwise sleep forever
if no datagram owned by its socket was already sitting on this queue.
Among this change is a small (inline) helper routine named
'unix_recvq_full', which consolidates the actual testing code (in three
different places) into a single location.
Signed-off-by: Rainer Weikusat <rweikusat@mssgmbh.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3c73419c

Merge branch 'davem-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 · 4552e119
David S. Miller authored Jun 17, 2008

4552e119

tun: Proper handling of IPv6 header in tun driver when TUN_NO_PI is set · f09f7ee2

Ang Way Chuang authored Jun 17, 2008

By default, tun.c running in TUN_TUN_DEV mode will set the protocol of
packet to IPv4 if TUN_NO_PI is set. My program failed to work when I
assumed that the driver will check the first nibble of packet,
determine IP version and set the appropriate protocol.
Signed-off-by: Ang Way Chuang <wcang@nav6.org>
Acked-by: Max Krasnyansky <maxk@qualcomm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f09f7ee2

atl1: relax eeprom mac address error check · 58c7821c

Radu Cristescu authored Jun 12, 2008

The atl1 driver tries to determine the MAC address thusly:

	- If an EEPROM exists, read the MAC address from EEPROM and
	  validate it.
	- If an EEPROM doesn't exist, try to read a MAC address from
	  SPI flash.
	- If that fails, try to read a MAC address directly from the
	  MAC Station Address register.
	- If that fails, assign a random MAC address provided by the
	  kernel.

We now have a report of a system fitted with an EEPROM containing all
zeros where we expect the MAC address to be, and we currently handle
this as an error condition.  Turns out, on this system the BIOS writes
a valid MAC address to the NIC's MAC Station Address register, but we
never try to read it because we return an error when we find the all-
zeros address in EEPROM.

This patch relaxes the error check and continues looking for a MAC
address even if it finds an illegal one in EEPROM.
Signed-off-by: Radu Cristescu <advantis@gmx.net>
Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

58c7821c

net/enc28j60: low power mode · 7dac6f8d

David Brownell authored Jun 12, 2008

Keep enc28j60 chips in low-power mode when they're not in use.
At typically 120 mA, these chips run hot even when idle; this
low power mode cuts that power usage by a factor of around 100.

This version provides a generic routine to poll a register until
its masked value equals some value ... e.g. bit set or cleared.
It's basically what the previous wait_phy_ready() did, but this
version is generalized to support the handshaking needed to
enter and exit low power mode.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Claudio Lanconelli <lanconelli.claudio@eptar.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

7dac6f8d

net/enc28j60: section fix · 6fd65882

David Brownell authored Jun 12, 2008

Minor bugfixes to the enc28j60 driver ... wrong section marking,
indentation, and bogus use of spi_bus_type.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Claudio Lanconelli <lanconelli.claudio@eptar.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

6fd65882

sky2: 88E8040T pci device id · a3b4fced

Stephen Hemminger authored Jun 14, 2008

Missed one pci id for 88E8040T.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

a3b4fced

netxen: download firmware in pci probe · 439b454e

Dhananjay Phadke authored Jun 15, 2008

Downloading firmware in pci probe allows recovery in case of
firmware failure by reloading the driver.

Also reduced delays in firmware load.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

439b454e

netxen: cleanup debug messages · dcd56fdb

Dhananjay Phadke authored Jun 15, 2008

o Remove unnecessary debug prints and functions.
o Explicitly specify pci class (0x020000) to avoid enabling
  management function.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

dcd56fdb

netxen: remove global physical_port array · 3276fbad

Dhananjay Phadke authored Jun 15, 2008

Store physical port number in netxen_adapter structure.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

3276fbad

netxen: fix portnum for hp mezz cards · dc515f2e

Dhananjay Phadke authored Jun 15, 2008

This fixes a the issue where logical port number is set incorrectly
for HP blade mezz cards.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

dc515f2e

ibm_newemac: select CRC32 in Kconfig · 8b8091fb

Josh Boyer authored Jun 17, 2008

The ibm_newemac driver requires ether_crc to be defined.  Apparently it is
possible to generate a .config without CONFIG_CRC32 set which causes the
following link errors if IBM_NEW_EMAC is selected:

  LD      .tmp_vmlinux1
drivers/built-in.o: In function `emac_hash_mc':
core.c:(.text+0x2f524): undefined reference to `crc32_le'
core.c:(.text+0x2f528): undefined reference to `bitrev32'
make: *** [.tmp_vmlinux1] Error 1

This patch has IBM_NEW_EMAC select CRC32 so we don't hit this error.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

8b8091fb

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 952f4a0a

Linus Torvalds authored Jun 17, 2008

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: appletouch - implement reset-resume logic
  Input: i8042 - retry failed CTR writes when resuming
  Input: i8042 - add Fujitsu-Siemens Amilo Pro V2030 to nomux table
  Input: pcspkr - remove negative dependency on snd-pcsp

Manually fixed up trivial conflict in drivers/usb/core/quirks.c

952f4a0a

fuse: fix thinko in max I/O size calucation · f948d564

Miklos Szeredi authored Jun 17, 2008

Use max not min to enforce a lower limit on the max I/O size.

This bug was introduced by "fuse: fix max i/o size calculation" (commit
e5d9a0df).

Thanks to Brian Wang for noticing.
Reported-by: Brian Wang <ywang221@hotmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Szabolcs Szakacsits <szaka@ntfs-3g.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

f948d564

Unignore vmlinux.lds.h from Git. · cd50e892

Eduard - Gabriel Munteanu authored Jun 15, 2008

Added !vmlinux.lds.h to .gitignore because it would otherwise be ignored.
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cd50e892

x86-64: Fix "bytes left to copy" return value for copy_from_user() · 42a886af

Linus Torvalds authored Jun 17, 2008

Most users by far do not care about the exact return value (they only
really care about whether the copy succeeded in its entirety or not),
but a few special core routines actually care deeply about exactly how
many bytes were copied from user space.

And the unrolled versions of the x86-64 user copy routines would
sometimes report that it had copied more bytes than it actually had.

Very few uses actually have partial copies to begin with, but to make
this bug even harder to trigger, most x86 CPU's use the "rep string"
instructions for normal user copies, and that version didn't have this
issue.

To make it even harder to hit, the one user of this that really cared
about the return value (and used the uncached version of the copy that
doesn't use the "rep string" instructions) was the generic write
routine, which pre-populated its source, once more hiding the problem by
avoiding the exception case that triggers the bug.

In other words, very special thanks to Bron Gondwana who not only
triggered this, but created a test-program to show it, and bisected the
behavior down to commit 08291429 ("mm:
fix pagecache write deadlocks") which changed the access pattern just
enough that you can now trigger it with 'writev()' with multiple
iovec's.

That commit itself was not the cause of the bug, it just allowed all the
stars to align just right that you could trigger the problem.

[ Side note: this is just the minimal fix to make the copy routines
  (with __copy_from_user_inatomic_nocache as the particular version that
  was involved in showing this) have the right return values.

  We really should improve on the exceptional case further - to make the
  copy do a byte-accurate copy up to the exact page limit that causes it
  to fail.  As it is, the callers have to do extra work to handle the
  limit case gracefully. ]
Reported-by: Bron Gondwana <brong@fastmail.fm>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

 (which didn't have this problem), and since
most users that do the carethis was very hard to trigger, but

42a886af

17 Jun, 2008 1 commit

xfrm: fix fragmentation for ipv4 xfrm tunnel · fe833fca

Steffen Klassert authored Jun 17, 2008

When generating the ip header for the transformed packet we just copy
the frag_off field of the ip header from the original packet to the ip
header of the new generated packet. If we receive a packet as a chain
of fragments, all but the last of the new generated packets have the
IP_MF flag set. We have to mask the frag_off field to only keep the
IP_DF flag from the original packet. This got lost with git commit
36cf9acf ("[IPSEC]: Separate
inner/outer mode processing on output")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe833fca