Commits · 5dae92a718570e6a942e0b882e53d25cab03b40f · Kirill Smelkov / linux

26 Feb, 2013 1 commit

ghes_edac: fix to use list_for_each_entry_safe() when delete list items · 5dae92a7

Wei Yongjun authored Feb 26, 2013

Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each_entry() macro aptly named
list_for_each_entry_safe().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

5dae92a7

25 Feb, 2013 8 commits

ghes_edac: Fix RAS tracing · 8ae8f50a

Mauro Carvalho Chehab authored Feb 19, 2013

With the current version of CPER, there's no way to associate an
error with the memory error. So, the error location in EDAC
layers is unused.

As CPER has its own idea about memory architectural layers, just
output whatever is there inside the driver's detail at the RAS
tracepoint.

The EDAC location keeps untouched, in the case that, in some future,
we could actually map the error into the dimm labels.

Now, the error message:

[   72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[   72.396627] {1}[Hardware Error]: APEI generic hardware error status
[   72.396628] {1}[Hardware Error]: severity: 2, corrected
[   72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected
[   72.396632] {1}[Hardware Error]: flags: 0x01
[   72.396634] {1}[Hardware Error]: primary
[   72.396635] {1}[Hardware Error]: section_type: memory error
[   72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400
[   72.396638] {1}[Hardware Error]: node: 3
[   72.396639] {1}[Hardware Error]: card: 0
[   72.396640] {1}[Hardware Error]: module: 0
[   72.396641] {1}[Hardware Error]: device: 0
[   72.396643] {1}[Hardware Error]: error_type: 18, unknown
[   72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Is properly represented on the trace event:

     kworker/0:2-584   [000] ....    72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location:-1:-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 sockets E5-4650 Sandy Bridge machine.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

8ae8f50a

ghes_edac: Make it compliant with UEFI spec 2.3.1 · 689c9cd8

Mauro Carvalho Chehab authored Feb 19, 2013

The UEFI spec defines the memory error types ans the bits that
validate each field on the memory error record, at
Appendix N om items N.2.5 (Memory Error Section) and
N.2.11 (Error Status). Make the error description compliant with
it, only showing the valid fields.

The EDAC error log is now properly reporting the error:

[  281.556854] mce: [Hardware Error]: Machine check events logged
[  281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[  281.557044] {2}[Hardware Error]: APEI generic hardware error status
[  281.557046] {2}[Hardware Error]: severity: 2, corrected
[  281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected
[  281.557050] {2}[Hardware Error]: flags: 0x01
[  281.557052] {2}[Hardware Error]: primary
[  281.557053] {2}[Hardware Error]: section_type: memory error
[  281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400
[  281.557056] {2}[Hardware Error]: node: 3
[  281.557057] {2}[Hardware Error]: card: 0
[  281.557058] {2}[Hardware Error]: module: 1
[  281.557059] {2}[Hardware Error]: device: 0
[  281.557061] {2}[Hardware Error]: error_type: 18, unknown
[  281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9
[  281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 CPUs E5-4650 Sandy Bridge machine.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

689c9cd8

ghes_edac: Improve driver's printk messages · d2a68566

Mauro Carvalho Chehab authored Feb 15, 2013

Provide a better infrastructure for printk's inside the driver:
	- use edac_dbg() for debug messages;
	- standardize the usage of pr_info();
	- provide warning about the risk of relying on this
	  driver.

While here, changes the size of a fake memory to 1 page. This is
as good or as bad as 1000 pages, but it is easier for userspace to
detect, as I don't expect that any machine implementing GHES would
provide just 1 page available ;)
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

Conflicts:
	drivers/edac/ghes_edac.c

d2a68566

ghes_edac: Don't credit the same memory dimm twice · 5ee726db

Mauro Carvalho Chehab authored Feb 15, 2013

On my tests on a 4xE5-4650 CPU's system, the GHES
EDAC driver is called twice. As the SMBIOS DMI enumeration
call will seek for the entire DIMM sockets in the system, on
this machine, equipped with 128 GB of RAM, the memory is
displayed twice:

          +-----------------------+
          |    mc0    |    mc1    |
----------+-----------------------+
memory45: |  8192 MB  |  8192 MB  |
memory44: |     0 MB  |     0 MB  |
----------+-----------------------+
memory43: |     0 MB  |     0 MB  |
memory42: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory41: |     0 MB  |     0 MB  |
memory40: |     0 MB  |     0 MB  |
----------+-----------------------+
memory39: |  8192 MB  |  8192 MB  |
memory38: |     0 MB  |     0 MB  |
----------+-----------------------+
memory37: |     0 MB  |     0 MB  |
memory36: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory35: |     0 MB  |     0 MB  |
memory34: |     0 MB  |     0 MB  |
----------+-----------------------+
memory33: |  8192 MB  |  8192 MB  |
memory32: |     0 MB  |     0 MB  |
----------+-----------------------+
memory31: |     0 MB  |     0 MB  |
memory30: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory29: |     0 MB  |     0 MB  |
memory28: |     0 MB  |     0 MB  |
----------+-----------------------+
memory27: |  8192 MB  |  8192 MB  |
memory26: |     0 MB  |     0 MB  |
----------+-----------------------+
memory25: |     0 MB  |     0 MB  |
memory24: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory23: |     0 MB  |     0 MB  |
memory22: |     0 MB  |     0 MB  |
----------+-----------------------+
memory21: |  8192 MB  |  8192 MB  |
memory20: |     0 MB  |     0 MB  |
----------+-----------------------+
memory19: |     0 MB  |     0 MB  |
memory18: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory17: |     0 MB  |     0 MB  |
memory16: |     0 MB  |     0 MB  |
----------+-----------------------+
memory15: |  8192 MB  |  8192 MB  |
memory14: |     0 MB  |     0 MB  |
----------+-----------------------+
memory13: |     0 MB  |     0 MB  |
memory12: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory11: |     0 MB  |     0 MB  |
memory10: |     0 MB  |     0 MB  |
----------+-----------------------+
memory9:  |  8192 MB  |  8192 MB  |
memory8:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory7:  |     0 MB  |     0 MB  |
memory6:  |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory5:  |     0 MB  |     0 MB  |
memory4:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory3:  |  8192 MB  |  8192 MB  |
memory2:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory1:  |     0 MB  |     0 MB  |
memory0:  |  8192 MB  |  8192 MB  |
----------+-----------------------+

Total sum of 256 GB.

As there's no reliable way to credit DIMMS to the right memory
controller, just put everything on memory controller 0 (with should
always exist).
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

5ee726db

ghes_edac: do a better job of filling EDAC DIMM info · 32fa1f53

Mauro Carvalho Chehab authored Feb 14, 2013

Instead of just faking a random value for the DIMM data, get
the information that it is available via DMI table.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

32fa1f53

ghes_edac: add support for reporting errors via EDAC · f04c62a7

Mauro Carvalho Chehab authored Feb 15, 2013

Now that the EDAC core is capable of just forward the errors via
the userspace API, add a report mechanism for the GHES errors.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

f04c62a7

ghes_edac: Register at EDAC core the BIOS report · 77c5f5d2

Mauro Carvalho Chehab authored Feb 15, 2013

Register GHES at EDAC MC core, in order to avoid other
drivers to also handle errors and mangle with error data.

The edac core will warrant that just one driver will be used,
so the first one to register (BIOS first) will be the one that
will be reporting the hardware errors.

For now, the EDAC driver does nothing but to register at the
EDAC core, preventing the hardware-driven mechanism to
interfere with GHES.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

77c5f5d2

ghes: add the needed hooks for EDAC error report · 21480547

Mauro Carvalho Chehab authored Feb 15, 2013

In order to allow reporting errors via EDAC, add hooks for:

1) register an EDAC driver;
2) unregister an EDAC driver;
3) report errors via EDAC.

As the EDAC driver will need to access the ghes structure, adds it
as one of the parameters for ghes_do_proc.
Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

21480547

21 Feb, 2013 16 commits

ghes: move structures/enum to a header file · 40e06415

Mauro Carvalho Chehab authored Feb 15, 2013

As a ghes_edac driver will need to access ghes structures, in order
to properly handle the errors, move those structures to a separate
header file. No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

40e06415

edac: add support for error type "Info" · 8dd93d45

Mauro Carvalho Chehab authored Feb 19, 2013

The CPER spec defines a forth type of error: informational
logs. Add support for it at the edac API and at the
trace event interface.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

8dd93d45

edac: add support for raw error reports · e7e24830

Mauro Carvalho Chehab authored Oct 31, 2012

That allows APEI GHES driver to report errors directly, using
the EDAC error report API.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

e7e24830

edac: reduce stack pressure by using a pre-allocated buffer · c7ef7645

Mauro Carvalho Chehab authored Feb 21, 2013

The number of variables at the stack is too big.
Reduces the stack usage by using a pre-allocated error
buffer.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

c7ef7645

edac: lock module owner to avoid error report conflicts · 80cc7d87

Mauro Carvalho Chehab authored Oct 31, 2012

APEI GHES and i7core_edac/sb_edac currently can be loaded at
the same time, but those are Highlander modules:
	"There can be only one".

There are two reasons for that:

1) Each driver assumes that it is the only one registering at
   the EDAC core, as it is driver's responsibility to number
   the memory controllers, and all of them start from 0;

2) If BIOS is handling the memory errors, the OS can't also be
   doing it, as one will mangle with the other.

So, we need to add an module owner's lock at the EDAC core,
in order to avoid having two different modules handling memory
errors at the same time. The best way for doing this lock seems
to use the driver's name, as this is unique, and won't require
changes on every driver.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

80cc7d87

edac: remove proc_name from mci structure · c2c93dbc

Mauro Carvalho Chehab authored Feb 19, 2013

proc_name isn't used anywhere. Remove it.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

c2c93dbc

edac: add a new memory layer type · c66b5a79

Mauro Carvalho Chehab authored Feb 15, 2013

There are some cases where the memory controller layout is
completely hidden. This is the case of firmware-driven error
code, like the one provided by GHES. Add a new layer to be
used on such memory error report mechanisms.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

c66b5a79

edac: initialize the core earlier · 4ab19b06

Mauro Carvalho Chehab authored Feb 15, 2013

In order for it to work with it builtin, the EDAC core should
be initialized earlier, otherwise the ghes_edac driver initializes
before edac_mc_sysfs_init() being called:

...
[    4.998373] EDAC MC0: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
...
[    4.998373] EDAC MC1: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes
[    6.519495] EDAC MC: Ver: 3.0.0
[    6.523749] EDAC DEBUG: edac_mc_sysfs_init: device mc created

The net result is that no EDAC sysfs nodes will appear.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

4ab19b06

edac: better report error conditions in debug mode · 3d958823

Mauro Carvalho Chehab authored Feb 15, 2013

It is hard to find what's wrong without a proper error
report. Improve it, in debug mode.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

3d958823

i5100_edac: Remove two checkpatch warnings · 59b9796d

Mauro Carvalho Chehab authored Feb 21, 2013

The last changeset introduced a few checkpatch warnings:

WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required
261: FILE: drivers/edac/i5100_edac.c:1207:
+       if (priv->debugfs)
+               debugfs_remove_recursive(priv->debugfs);

WARNING: debugfs_remove(NULL) is safe this check is probably not required
290: FILE: drivers/edac/i5100_edac.c:1250:
+       if (i5100_debugfs)
+               debugfs_remove(i5100_debugfs);

Get rid of them.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

59b9796d

i5100_edac: connect fault injection to debugfs node · 9cbc6d38

Niklas Söderlund authored Aug 08, 2012

Create a debugfs direcotry i5100_edac/mcX for each memory controller and
add nodes to control how fault injection is preformed.

After configuring an injection using inject_channel, inject_deviceptr1,
inject_deviceptr2, inject_eccmask1, inject_eccmask2 and inject_hlinesel
trigger the injection by writing anything to inject_enable.

Example of a CE injection:

echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 61440 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

Example of UE injection:

echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel
echo 2 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1
echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask2
echo 17 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr1
echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr2
echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable

Sometimes it is needed to enable the injection more then once (echo to
the inject_enable node) for the injection to happen, I am not sure why.
Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

9cbc6d38

i5100_edac: add fault injection code · 53ceafd6

Niklas Söderlund authored Aug 08, 2012

Add fault injection based on information datasheet for i5100, see 1. In
addition to the i5100 datasheet some missing information on injection
functions where found through experimentation and the i7300 datasheet,
see 2.

[1] Intel 5100 Memory Controller Hub Chipset
    Doc.Nr: 318378
    http://www.intel.com/content/dam/doc/datasheet/5100-
    memory-controller-hub-chipset-datasheet.pdf

[2] Intel 7300 Chipset MemoryController Hub (MCH)
    Doc.Nr: 318082
	http://www.intel.com/assets/pdf/datasheet/318082.pdfSigned-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

53ceafd6

i5100_edac: probe for device 19 function 0 · 52608ba2

Niklas Söderlund authored Aug 08, 2012

Probe and store the device handle for the device 19 function 0 during
driver initialization. The device is used during fault injection.
Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

52608ba2

edac: only create sdram_scrub_rate where supported · e7100478

Mauro Carvalho Chehab authored Feb 19, 2013

Currently, sdram_scrub_rate sysfs node is created even if the device
doesn't support get/set the scub rate. Change the logic to only
create this device node when the operation is supported.
Reported-by: Felipe Balbi <balbi@ti.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

e7100478

i3200_edac: Fix the logic that detects filled memories · 61734e18

Mauro Carvalho Chehab authored Jan 10, 2013

After running a series of tests on an HP DL320, filled with different
memory sizes, it was noticed that, when filled with just one DIMM
on such hardware, the driver wrongly detects twice the memory, and
thinks that both channels 0 and 1 are filled.

It seems to be partially caused by the BIOS and partially by the driver.

The i3200_edac current logic would be working fine if the BIOS were
disabling the unused second channel when just one DIMM is connected,
in order to do power-saving, as recommended on this chipset's datasheet.

However, the BIOS on this particular machine doesn't do it:

[   16.741421] EDAC DEBUG: how_many_channels: In dual channel mode
[   16.741424] EDAC DEBUG: how_many_channels: 2 DIMMS per channel enabled

So, the driver were assuming that 2 channels are enabled (well, they are,
but the second is unused).

Combined with that, I found two issues at the logic that creates the
EDAC data, that were failing when the two channels are not equally
filled (AFAICT, that happens only when just 1 DIMM is plugged).

The first one is that a 0 at DRB means that nothing is filled. The
driver's logic, however, do some calculation with that.

The second one is that the logic that fills the DIMM data currently
assumes that both channels are equally filled.

I tested the system already with the current configuration and my
patch and it is now working fine. So, for a 2R single DIMM 2Gb memory
at dimm slot 01 (channel 0), it is now displaying:

[   16.741406] EDAC DEBUG: i3200_get_drbs: drb[0][0] = 16, drb[1][0] = 0
[   16.741410] EDAC DEBUG: i3200_get_drbs: drb[0][1] = 32, drb[1][1] = 0
[   16.741413] EDAC DEBUG: i3200_get_drbs: drb[0][2] = 32, drb[1][2] = 0
[   16.741416] EDAC DEBUG: i3200_get_drbs: drb[0][3] = 32, drb[1][3] = 0
...
[   16.741896] EDAC DEBUG: i3200_probe1: csrow 0, channel 0, size = 1024 Mb
[   16.741899] EDAC DEBUG: i3200_probe1: csrow 1, channel 0, size = 1024 Mb

and the corresponding sysfs nodes are now properly filled.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

61734e18

i3200_edac: Add more debug to the driver · 5f466cb0

Mauro Carvalho Chehab authored Jan 10, 2013

Currently, it is not possible to know, when debug is enabled,
if the driver is using 2 DIMMS per channel mode or not. It is
not possible to know the values of the drbs registers, used
to identify the memory rank sizes.

Add debug for both, as it helps to track issues on the driver.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

5f466cb0

20 Feb, 2013 1 commit

Merge tag 'v3.8-rc7' into next · 1339730e

Mauro Carvalho Chehab authored Feb 20, 2013

Linux 3.8-rc7

* tag 'v3.8-rc7': (12052 commits)
  Linux 3.8-rc7
  net: sctp: sctp_endpoint_free: zero out secret key data
  net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
  atm/iphase: rename fregt_t -> ffreg_t
  ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
  ARM: DMA mapping: fix bad atomic test
  ARM: realview: ensure that we have sufficient IRQs available
  ARM: GIC: fix GIC cpumask initialization
  net: usb: fix regression from FLAG_NOARP code
  l2tp: dont play with skb->truesize
  net: sctp: sctp_auth_key_put: use kzfree instead of kfree
  netback: correct netbk_tx_err to handle wrap around.
  xen/netback: free already allocated memory on failure in xen_netbk_get_requests
  xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
  xen/netback: shutdown the ring if it contains garbage.
  drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try
  virtio_console: Don't access uninitialized data.
  net: qmi_wwan: add more Huawei devices, including E320
  net: cdc_ncm: add another Huawei vendor specific device
  ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
  ...

1339730e

08 Feb, 2013 14 commits

Linux 3.8-rc7 · 836dc9e3
Linus Torvalds authored Feb 09, 2013

836dc9e3

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm · 39923134

Linus Torvalds authored Feb 09, 2013

Pull ARM fixes from Russell King:
 "I was going to hold these off until v3.8 was out, and send them with a
  stable tag, but as everyone else is pushing much bigger fixes which
  Linus is accepting, let's save people from the hastle of having to
  patch v3.8 back into working or use a stable kernel.

  Looking at the diffstat, this really is high value for its size; this
  is miniscule compared to how the -rc6 to tip diffstat currently looks.

  So, four patches in this set:
   - Punit Agrawal reports that the kernel no longer boots on MPCore due
     to a new assumption made in the GIC code which isn't true of
     earlier GIC designs.  This is the biggest change in this set.
   - Punit's boot log also revealed a bunch of WARN_ON() dumps caused by
     the DT-ification of the GIC support without fixing up non-DT
     Realview - which now sees a greater number of interrupts than it
     did before.
   - A fix for the DMA coherent code from Marek which uses the wrong
     check for atomic allocations; this can result in spinlock lockups
     or other nasty effects.
   - A fix from Will, which will affect all Android based platforms if
     not applied (which use the 2G:2G VM split) - this causes
     particularly 'make' to misbehave unless this bug is fixed."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
  ARM: DMA mapping: fix bad atomic test
  ARM: realview: ensure that we have sufficient IRQs available
  ARM: GIC: fix GIC cpumask initialization

39923134

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · e06b8405

Linus Torvalds authored Feb 09, 2013

Pull networking fixes from David Miller:

 1) Revert iwlwifi reclaimed packet tracking, it causes problems for a
    bunch of folks.  From Emmanuel Grumbach.

 2) Work limiting code in brcmsmac wifi driver can clear tx status
    without processing the event.  From Arend van Spriel.

 3) rtlwifi USB driver processes wrong SKB, fix from Larry Finger.

 4) l2tp tunnel delete can race with close, fix from Tom Parkin.

 5) pktgen_add_device() failures are not checked at all, fix from Cong
    Wang.

 6) Fix unintentional removal of carrier off from tun_detach(),
    otherwise we confuse userspace, from Michael S.  Tsirkin.

 7) Don't leak socket reference counts and ubufs in vhost-net driver,
    from Jason Wang.

 8) vmxnet3 driver gets it's initial carrier state wrong, fix from Neil
    Horman.

 9) Protect against USB networking devices which spam the host with 0
    length frames, from Bjørn Mork.

10) Prevent neighbour overflows in ipv6 for locally destined routes,
    from Marcelo Ricardo.  This is the best short-term fix for this, a
    longer term fix has been implemented in net-next.

11) L2TP uses ipv4 datagram routines in it's ipv6 code, whoops.  This
    mistake is largely because the ipv6 functions don't even have some
    kind of prefix in their names to suggest they are ipv6 specific.
    From Tom Parkin.

12) Check SYN packet drops properly in tcp_rcv_fastopen_synack(), from
    Yuchung Cheng.

13) Fix races and TX skb freeing bugs in via-rhine's NAPI support, from
    Francois Romieu and your's truly.

14) Fix infinite loops and divides by zero in TCP congestion window
    handling, from Eric Dumazet, Neal Cardwell, and Ilpo Järvinen.

15) AF_PACKET tx ring handling can leak kernel memory to userspace, fix
    from Phil Sutter.

16) Fix error handling in ipv6 GRE tunnel transmit, from Tommi Rantala.

17) Protect XEN netback driver against hostile frontend putting garbage
    into the rings, don't leak pages in TX GOP checking, and add proper
    resource releasing in error path of xen_netbk_get_requests().  From
    Ian Campbell.

18) SCTP authentication keys should be cleared out and released with
    kzfree(), from Daniel Borkmann.

19) L2TP is a bit too clever trying to maintain skb->truesize, and ends
    up corrupting socket memory accounting to the point where packet
    sending is halted indefinitely.  Just remove the adjustments
    entirely, they aren't really needed.  From Eric Dumazet.

20) ATM Iphase driver uses a data type with the same name as the S390
    headers, rename to fix the build.  From Heiko Carstens.

21) Fix a typo in copying the inner network header offset from one SKB
    to another, from Pravin B Shelar.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
  net: sctp: sctp_endpoint_free: zero out secret key data
  net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
  atm/iphase: rename fregt_t -> ffreg_t
  net: usb: fix regression from FLAG_NOARP code
  l2tp: dont play with skb->truesize
  net: sctp: sctp_auth_key_put: use kzfree instead of kfree
  netback: correct netbk_tx_err to handle wrap around.
  xen/netback: free already allocated memory on failure in xen_netbk_get_requests
  xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
  xen/netback: shutdown the ring if it contains garbage.
  net: qmi_wwan: add more Huawei devices, including E320
  net: cdc_ncm: add another Huawei vendor specific device
  ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
  tcp: fix for zero packets_in_flight was too broad
  brcmsmac: rework of mac80211 .flush() callback operation
  ssb: unregister gpios before unloading ssb
  bcma: unregister gpios before unloading bcma
  rtlwifi: Fix scheduling while atomic bug
  net: usbnet: fix tx_dropped statistics
  tcp: ipv6: Update MIB counters for drops
  ...

e06b8405

Merge branch 'sctp_keys' · a1c83b05

David S. Miller authored Feb 08, 2013

Daniel Borkmann says:

====================
Cryptographically used keys should be zeroed out when our session
ends resp. memory is freed, thus do not leave them somewhere in the
memory.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a1c83b05

net: sctp: sctp_endpoint_free: zero out secret key data · b5c37fe6

Daniel Borkmann authored Feb 08, 2013

On sctp_endpoint_destroy, previously used sensitive keying material
should be zeroed out before the memory is returned, as we already do
with e.g. auth keys when released.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5c37fe6

net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree · 6ba542a2

Daniel Borkmann authored Feb 08, 2013

In sctp_setsockopt_auth_key, we create a temporary copy of the user
passed shared auth key for the endpoint or association and after
internal setup, we free it right away. Since it's sensitive data, we
should zero out the key before returning the memory back to the
allocator. Thus, use kzfree instead of kfree, just as we do in
sctp_auth_key_put().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ba542a2

atm/iphase: rename fregt_t -> ffreg_t · ab54ee80

Heiko Carstens authored Feb 08, 2013

We have conflicting type qualifiers for "freg_t" in s390's ptrace.h and the
iphase atm device driver, which causes the compile error below.
Unfortunately the s390 typedef can't be renamed, since it's a user visible api,
nor can I change the include order in s390 code to avoid the conflict.

So simply rename the iphase typedef to a new name. Fixes this compile error:

In file included from drivers/atm/iphase.c:66:0:
drivers/atm/iphase.h:639:25: error: conflicting type qualifiers for 'freg_t'
In file included from next/arch/s390/include/asm/ptrace.h:9:0,
                 from next/arch/s390/include/asm/lowcore.h:12,
                 from next/arch/s390/include/asm/thread_info.h:30,
                 from include/linux/thread_info.h:54,
                 from include/linux/preempt.h:9,
                 from include/linux/spinlock.h:50,
                 from include/linux/seqlock.h:29,
                 from include/linux/time.h:5,
                 from include/linux/stat.h:18,
                 from include/linux/module.h:10,
                 from drivers/atm/iphase.c:43:
next/arch/s390/include/uapi/asm/ptrace.h:197:3: note: previous declaration of 'freg_t' was here
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab54ee80

ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned · 79d1f5c9

Will Deacon authored Feb 08, 2013

We have received multiple reports of mmap failures when running with a
2:2 vm split. These manifest as either -EINVAL with a non page-aligned
address (ending 0xaaa) or a SEGV, depending on the application. The
issue is commonly observed in children of make, which appears to use
bottom-up mmap (assumedly because it changes the stack rlimit).

Further investigation reveals that this regression was triggered by
394ef640 ("mm: use vm_unmapped_area() on arm architecture"), whereby
TASK_UNMAPPED_BASE is no longer page-aligned for bottom-up mmap, causing
get_unmapped_area to choke on misaligned addressed.

This patch fixes the problem by defining TASK_UNMAPPED_BASE in terms of
TASK_SIZE and explicitly aligns the result to 16M, matching the other
end of the heap.
Acked-by: Nicolas Pitre <nico@linaro.org>
Reported-by: Steve Capper <steve.capper@arm.com>
Reported-by: Jean-Francois Moine <moinejf@free.fr>
Reported-by: Christoffer Dall <cdall@cs.columbia.edu>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

79d1f5c9

ARM: DMA mapping: fix bad atomic test · 633dc92a

Russell King authored Jan 30, 2013

Realview fails to boot with this warning:
BUG: spinlock lockup suspected on CPU#0, init/1
 lock: 0xcf8bde10, .magic: dead4ead, .owner: init/1, .owner_cpu: 0
Backtrace:
[<c00185d8>] (dump_backtrace+0x0/0x10c) from [<c03294e8>] (dump_stack+0x18/0x1c) r6:cf8bde10 r5:cf83d1c0 r4:cf8bde10 r3:cf83d1c0
[<c03294d0>] (dump_stack+0x0/0x1c) from [<c018926c>] (spin_dump+0x84/0x98)
[<c01891e8>] (spin_dump+0x0/0x98) from [<c0189460>] (do_raw_spin_lock+0x100/0x198)
[<c0189360>] (do_raw_spin_lock+0x0/0x198) from [<c032cbac>] (_raw_spin_lock+0x3c/0x44)
[<c032cb70>] (_raw_spin_lock+0x0/0x44) from [<c01c9224>] (pl011_console_write+0xe8/0x11c)
[<c01c913c>] (pl011_console_write+0x0/0x11c) from [<c002aea8>] (call_console_drivers.clone.7+0xdc/0x104)
[<c002adcc>] (call_console_drivers.clone.7+0x0/0x104) from [<c002b320>] (console_unlock+0x2e8/0x454)
[<c002b038>] (console_unlock+0x0/0x454) from [<c002b8b4>] (vprintk_emit+0x2d8/0x594)
[<c002b5dc>] (vprintk_emit+0x0/0x594) from [<c0329718>] (printk+0x3c/0x44)
[<c03296dc>] (printk+0x0/0x44) from [<c002929c>] (warn_slowpath_common+0x28/0x6c)
[<c0029274>] (warn_slowpath_common+0x0/0x6c) from [<c0029304>] (warn_slowpath_null+0x24/0x2c)
[<c00292e0>] (warn_slowpath_null+0x0/0x2c) from [<c0070ab0>] (lockdep_trace_alloc+0xd8/0xf0)
[<c00709d8>] (lockdep_trace_alloc+0x0/0xf0) from [<c00c0850>] (kmem_cache_alloc+0x24/0x11c)
[<c00c082c>] (kmem_cache_alloc+0x0/0x11c) from [<c00bb044>] (__get_vm_area_node.clone.24+0x7c/0x16c)
[<c00bafc8>] (__get_vm_area_node.clone.24+0x0/0x16c) from [<c00bb7b8>] (get_vm_area_caller+0x48/0x54)
[<c00bb770>] (get_vm_area_caller+0x0/0x54) from [<c0020064>] (__alloc_remap_buffer.clone.15+0x38/0xb8)
[<c002002c>] (__alloc_remap_buffer.clone.15+0x0/0xb8) from [<c0020244>] (__dma_alloc+0x160/0x2c8)
[<c00200e4>] (__dma_alloc+0x0/0x2c8) from [<c00204d8>] (arm_dma_alloc+0x88/0xa0)[<c0020450>] (arm_dma_alloc+0x0/0xa0) from [<c00beb00>] (dma_pool_alloc+0xcc/0x1a8)
[<c00bea34>] (dma_pool_alloc+0x0/0x1a8) from [<c01a9d14>] (pl08x_fill_llis_for_desc+0x28/0x568)
[<c01a9cec>] (pl08x_fill_llis_for_desc+0x0/0x568) from [<c01aab8c>] (pl08x_prep_slave_sg+0x258/0x3b0)
[<c01aa934>] (pl08x_prep_slave_sg+0x0/0x3b0) from [<c01c9f74>] (pl011_dma_tx_refill+0x140/0x288)
[<c01c9e34>] (pl011_dma_tx_refill+0x0/0x288) from [<c01ca748>] (pl011_start_tx+0xe4/0x120)
[<c01ca664>] (pl011_start_tx+0x0/0x120) from [<c01c54a4>] (__uart_start+0x48/0x4c)
[<c01c545c>] (__uart_start+0x0/0x4c) from [<c01c632c>] (uart_start+0x2c/0x3c)
[<c01c6300>] (uart_start+0x0/0x3c) from [<c01c795c>] (uart_write+0xcc/0xf4)
[<c01c7890>] (uart_write+0x0/0xf4) from [<c01b0384>] (n_tty_write+0x1c0/0x3e4)
[<c01b01c4>] (n_tty_write+0x0/0x3e4) from [<c01acfe8>] (tty_write+0x144/0x240)
[<c01acea4>] (tty_write+0x0/0x240) from [<c01ad17c>] (redirected_tty_write+0x98/0xac)
[<c01ad0e4>] (redirected_tty_write+0x0/0xac) from [<c00c371c>] (vfs_write+0xbc/0x150)
[<c00c3660>] (vfs_write+0x0/0x150) from [<c00c39c0>] (sys_write+0x4c/0x78)
[<c00c3974>] (sys_write+0x0/0x78) from [<c0014460>] (ret_fast_syscall+0x0/0x3c)

This happens because the DMA allocation code is not respecting atomic
allocations correctly.

GFP flags should not be tested for GFP_ATOMIC to determine if an
atomic allocation is being requested.  GFP_ATOMIC is not a flag but
a value.  The GFP bitmask flags are all prefixed with __GFP_.

The rest of the kernel tests for __GFP_WAIT not being set to indicate
an atomic allocation.  We need to do the same.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

633dc92a

ARM: realview: ensure that we have sufficient IRQs available · e210101d

Russell King authored Jan 30, 2013

Realview EB with a rev B MPcore tile results in lots of warnings at
boot because it can't allocate enough IRQs.  Fix this by increasing
the number of available IRQs.

WARNING: at /home/rmk/git/linux-rmk/arch/arm/common/gic.c:757 gic_init_bases+0x12c/0x2ec()
Cannot allocate irq_descs @ IRQ96, assuming pre-allocated
Modules linked in:
Backtrace:
[<c00185d8>] (dump_backtrace+0x0/0x10c) from [<c03294e8>] (dump_stack+0x18/0x1c) r6:000002f5 r5:c042c62c r4:c044ff40 r3:c045f240
[<c03294d0>] (dump_stack+0x0/0x1c) from [<c00292c8>] (warn_slowpath_common+0x54/0x6c)
[<c0029274>] (warn_slowpath_common+0x0/0x6c) from [<c0029384>] (warn_slowpath_fmt+0x38/0x40)
[<c002934c>] (warn_slowpath_fmt+0x0/0x40) from [<c042c62c>] (gic_init_bases+0x12c/0x2ec)
[<c042c500>] (gic_init_bases+0x0/0x2ec) from [<c042cdc8>] (gic_init_irq+0x8c/0xd8)
[<c042cd3c>] (gic_init_irq+0x0/0xd8) from [<c042827c>] (init_IRQ+0x1c/0x24)
[<c0428260>] (init_IRQ+0x0/0x24) from [<c04256c8>] (start_kernel+0x1a4/0x300)
[<c0425524>] (start_kernel+0x0/0x300) from [<70008070>] (0x70008070)
---[ end trace 1b75b31a2719ed1c ]---
------------[ cut here ]------------
WARNING: at /home/rmk/git/linux-rmk/kernel/irq/irqdomain.c:234 irq_domain_add_legacy+0x80/0x140()
Modules linked in:
Backtrace:
[<c00185d8>] (dump_backtrace+0x0/0x10c) from [<c03294e8>] (dump_stack+0x18/0x1c) r6:000000ea r5:c0081a38 r4:00000000 r3:c045f240
[<c03294d0>] (dump_stack+0x0/0x1c) from [<c00292c8>] (warn_slowpath_common+0x54/0x6c)
[<c0029274>] (warn_slowpath_common+0x0/0x6c) from [<c0029304>] (warn_slowpath_null+0x24/0x2c)
[<c00292e0>] (warn_slowpath_null+0x0/0x2c) from [<c0081a38>] (irq_domain_add_legacy+0x80/0x140)
[<c00819b8>] (irq_domain_add_legacy+0x0/0x140) from [<c042c64c>] (gic_init_bases+0x14c/0x2ec)
[<c042c500>] (gic_init_bases+0x0/0x2ec) from [<c042cdc8>] (gic_init_irq+0x8c/0xd8)
[<c042cd3c>] (gic_init_irq+0x0/0xd8) from [<c042827c>] (init_IRQ+0x1c/0x24)
[<c0428260>] (init_IRQ+0x0/0x24) from [<c04256c8>] (start_kernel+0x1a4/0x300)
[<c0425524>] (start_kernel+0x0/0x300) from [<70008070>] (0x70008070)
---[ end trace 1b75b31a2719ed1d ]---
------------[ cut here ]------------
WARNING: at /home/rmk/git/linux-rmk/arch/arm/common/gic.c:762 gic_init_bases+0x170/0x2ec()
Modules linked in:
Backtrace:
[<c00185d8>] (dump_backtrace+0x0/0x10c) from [<c03294e8>] (dump_stack+0x18/0x1c) r6:000002fa r5:c042c670 r4:00000000 r3:c045f240
[<c03294d0>] (dump_stack+0x0/0x1c) from [<c00292c8>] (warn_slowpath_common+0x54/0x6c)
[<c0029274>] (warn_slowpath_common+0x0/0x6c) from [<c0029304>] (warn_slowpath_null+0x24/0x2c)
[<c00292e0>] (warn_slowpath_null+0x0/0x2c) from [<c042c670>] (gic_init_bases+0x170/0x2ec)
[<c042c500>] (gic_init_bases+0x0/0x2ec) from [<c042cdc8>] (gic_init_irq+0x8c/0xd8)
[<c042cd3c>] (gic_init_irq+0x0/0xd8) from [<c042827c>] (init_IRQ+0x1c/0x24)
[<c0428260>] (init_IRQ+0x0/0x24) from [<c04256c8>] (start_kernel+0x1a4/0x300)
[<c0425524>] (start_kernel+0x0/0x300) from [<70008070>] (0x70008070)
---[ end trace 1b75b31a2719ed1e ]---
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

e210101d

ARM: GIC: fix GIC cpumask initialization · 2bb31351

Russell King authored Jan 30, 2013

Punit Agrawal reports:
> I was trying to boot 3.8-rc5 on Realview EB 11MPCore using
> realview-smp_defconfig as a starting point but the kernel failed to
> progress past the log below (config attached).
>
> Pawel suggested I try reverting 384a2902 - "ARM: gic: use a private
> mapping for CPU target interfaces" that you've authored. With this
> commit reverted the kernel boots.
>
> I am not quite sure why the commit breaks 11MPCore but Pawel (cc'd)
> might be able to shed light on that.

Some early GIC implementations return zero for the first distributor
CPU routing register.  This means we can't rely on that telling us
which CPU interface we're connected to.  We know that these platforms
implement PPIs for IRQs 29-31 - but we shouldn't assume that these
will always be populated.

So, instead, scan for a non-zero CPU routing register in the first
32 IRQs and use that as our CPU mask.
Reported-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

2bb31351

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 2a1a6e7a

Linus Torvalds authored Feb 08, 2013

Pull drm regression fix from Dave Airlie:
 "This one fixes a sleep while locked regression that was introduced
  earlier in 3.8."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try

2a1a6e7a

net: usb: fix regression from FLAG_NOARP code · 9c79330d

Lucas Stach authored Feb 07, 2013

In commit 6509141f ("usbnet: add new
flag FLAG_NOARP for usb net devices"), the newly added flag NOARP was
using an already defined value, which broke drivers using flag
MULTI_PACKET.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c79330d

l2tp: dont play with skb->truesize · 87c084a9

Eric Dumazet authored Feb 07, 2013

Andrew Savchenko reported a DNS failure and we diagnosed that
some UDP sockets were unable to send more packets because their
sk_wmem_alloc was corrupted after a while (tx_queue column in
following trace)

$ cat /proc/net/udp
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode ref pointer drops
...
  459: 00000000:0270 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4507 2 ffff88003d612380 0
  466: 00000000:0277 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4802 2 ffff88003d613180 0
  470: 076A070A:007B 00000000:0000 07 FFFF4600:00000000 00:00000000 00000000   123        0 5552 2 ffff880039974380 0
  470: 010213AC:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4986 2 ffff88003dbd3180 0
  470: 010013AC:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 4985 2 ffff88003dbd2e00 0
  470: 00FCA8C0:007B 00000000:0000 07 FFFFFB00:00000000 00:00000000 00000000     0        0 4984 2 ffff88003dbd2a80 0
...

Playing with skb->truesize is tricky, especially when
skb is attached to a socket, as we can fool memory charging.

Just remove this code, its not worth trying to be ultra
precise in xmit path.
Reported-by: Andrew Savchenko <bircoph@gmail.com>
Tested-by: Andrew Savchenko <bircoph@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

87c084a9