Commits · 54d34fdcc24fa7a6f427a7fd307a921d14a50bc3 · Kirill Smelkov / linux

07 Nov, 2014 40 commits

sparc64: Move to 64-bit PGDs and PMDs. · 54d34fdc

David S. Miller authored Sep 25, 2013

To make the page tables compact, we were using 32-bit PGDs and PMDs.
We only had to support <= 43 bits of physical addresses so this was
quite feasible.

In order to support larger physical addresses we have to move to
64-bit PGDs and PMDs.

Most of the changes are straight-forward:

1) {pgd,pmd}_t --> unsigned long

2) Anything that tries to use plain "unsigned int" types with pgd/pmd
   values needs to be adjusted.  In particular things like "0U" become
   "0UL".

3) {PGDIR,PMD}_BITS decrease by one.

4) In the assembler page table walkers, use "ldxa" instead of "lduwa"
   and adjust the low bit masks to clear out the low 3 bits instead of
   just the low 2 bits during pgd/pmd address formation.

Also, use PTRS_PER_PGD and PTRS_PER_PMD in the sizing of the
swapper_{pg_dir,low_pmd_dir} arrays.

This patch does not try to take advantage of having 64-bits in the
PMDs to simplify the hugepage code, that will come in a subsequent
change.
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 2b77933c)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

54d34fdc

sparc64: Move from 4MB to 8MB huge pages. · 92ce8d3d

David S. Miller authored Sep 25, 2013

The impetus for this is that we would like to move to 64-bit PMDs and
PGDs, but that would result in only supporting a 42-bit address space
with the current page table layout.  It'd be nice to support at least
43-bits.

The reason we'd end up with only 42-bits after making PMDs and PGDs
64-bit is that we only use half-page sized PTE tables in order to make
PMDs line up to 4MB, the hardware huge page size we use.

So what we do here is we make huge pages 8MB, and fabricate them using
4MB hw TLB entries.

Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in
places that really need to operate on hardware 4MB pages.

Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT,
PGD_SHIFT, and the build time CPP test as needed.  Use a CPP test to
make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up.

This makes the pgtable cache completely unused, so remove the code
managing it and the state used in mm_context_t.  Now we have less
spinlocks taken in the page table allocation path.

The technique we use to fabricate the 8MB pages is to transfer bit 22
from the missing virtual address into the PTEs physical address field.
That takes care of the transparent huge pages case.

For hugetlb, we fill things in at the PTE level and that code already
puts the sub huge page physical bits into the PTEs, based upon the
offset, so there is nothing special we need to do.  It all just works
out.

So, a small amount of complexity in the THP case, but this code is
about to get much simpler when we move the 64-bit PMDs as we can move
away from the fancy 32-bit huge PMD encoding and just put a real PTE
value in there.

With bug fixes and help from Bob Picco.
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 37b3a8ff)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

92ce8d3d

sparc64: Make PAGE_OFFSET variable. · 0c1369f1

David S. Miller authored Sep 20, 2013

Choose PAGE_OFFSET dynamically based upon cpu type.

Original UltraSPARC-I (spitfire) chips only supported a 44-bit
virtual address space.

Newer chips (T4 and later) support 52-bit virtual addresses
and up to 47-bits of physical memory space.

Therefore we have to adjust PAGE_SIZE dynamically based upon
the capabilities of the chip.

Note that this change alone does not allow us to support > 43-bit
physical memory, to do that we need to re-arrange our page table
support.  The current encodings of the pmd_t and pgd_t pointers
restricts us to "32 + 11" == 43 bits.

This change can waste quite a bit of memory for the various tables.
In particular, a future change should work to size and allocate
kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically.
This isn't easy as we really cannot take a TLB miss when accessing
kern_linear_bitmap[].  We'd have to lock it into the TLB or similar.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>

(cherry picked from commit b2d43834)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

0c1369f1

sparc64: Fix inconsistent max-physical-address defines. · 62f69d39

David S. Miller authored Sep 18, 2013

Some parts of the code use '41' others use '42', make them
all use the same value.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>

(cherry picked from commit f998c9c0)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

62f69d39

sparc64: Document the shift counts used to validate linear kernel addresses. · d6992c53

David S. Miller authored Sep 18, 2013

This way we can see exactly what they are derived from, and in particular
how they would change if we were to use a different PAGE_OFFSET value.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>

(cherry picked from commit bb7b4353)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

d6992c53

sparc64: Define PAGE_OFFSET in terms of physical address bits. · 6904704d

David S. Miller authored Sep 18, 2013

This makes clearer the implications for a given choosen
value.

Based upon patches by Bob Picco.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>

(cherry picked from commit e0a45e35)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

6904704d

sparc64: Use PAGE_OFFSET instead of a magic constant. · b18a9864

David S. Miller authored Sep 18, 2013

This pertains to all of the computations of the kernel fast
TLB miss xor values.

Based upon a patch by Bob Picco.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>

(cherry picked from commit 922631b9)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

b18a9864

sparc64: Clean up 64-bit mmap exclusion defines. · a867a924

David S. Miller authored Sep 18, 2013

Older UltraSPARC chips had an address space hole due to the MMU only
supporting 44-bit virtual addresses.

The top end of this hole also has the same value as the current
definition of PAGE_OFFSET, so this can be confusing.

Consolidate the defines for the userspace mmap exclusion range into
page_64.h and use them in sys_sparc_64.c and hugetlbpage.c
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>

(cherry picked from commit c920745e)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

a867a924

fold __d_shrink() into its only remaining caller · eecd56cd

Al Viro authored Oct 04, 2013

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

(cherry picked from commit b61625d2)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

eecd56cd

uas: replace WARN_ON_ONCE() with lockdep_assert_held() · d73b515a

Sanjeev Sharma authored Aug 12, 2014

on some architecture spin_is_locked() always return false in
uniprocessor configuration and therefore it would be advise
to replace with lockdep_assert_held().
Signed-off-by: Sanjeev Sharma <Sanjeev_Sharma@mentor.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

(cherry picked from commit ab945eff)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

d73b515a

storage: Add quirks for Castlewood and Double-H USB-SCSI converters · 066b5a4c

Mark Knibbs authored Sep 23, 2014

Castlewood Systems supplied various models of USB-SCSI converter with their
ORB external removable-media drive. The ORB Windows and Macintosh drivers
support six USB IDs:
 084B:A001     [VID 084B is Castlewood Systems]
 04E6:0002 (*) ORB USB Smart Cable P/N 88205-001 (generic SCM ID)
 2027:A001     Double-H Technology DH-2000SC
 1822:0001 (*) Ariston iConnect/iSCSI
 07AF:0004 (*) Microtech XpressSCSI (25-pin)
 07AF:0005 (*) Microtech XpressSCSI (50-pin)

*: quirk already in unusual-devs.h

[Apparently the official VID for Double-H Technology is 0x07EB = 2027
decimal. That's another hex/decimal mix-up with these SCM-based products
(in addition to the Ariston and Entrega ones). Perhaps the USB-IF informed
companies of their allocated VID in decimal, but they assumed it was hex?
It seems all Entrega products used VID 0x1645, not just the USB-SCSI
converter.]

Double-H Technology Co., Ltd. produced a USB-SCSI converter, model
DH-2000SC, which is probably the one supported by the ORB drivers. Perhaps
the Castlewood-bundled product had a different label or PID though?
Castlewood mentioned Conmate as being one type of USB-SCSI converter.
Conmate and Double-H seem related somehow; both company addresses in the
same road, and at one point the Conmate web site mentioned DH-2000H4,
DH-200D4/DH-2000C4 as models of USB hub (DH short for Double-H presumably).
Conmate did show a USB-SCSI converter model CM-660 on their web site at one
point. My guess is that was identical to the DH-2000SC.

Mention of the Double-H product:
  http://web.archive.org/web/20010221010141/http://www.doubleh.com.tw/dh-2000sc.htm
The only picture I could find is at
  http://jp.acesuppliers.com/catalog/j64/component/page03.html
The casing design looks the same as my ORB USB Smart Cable which has ID
04E6:0002.

Anyway, that's enough rambling. Here's the patch.

storage: Add quirks for Castlewood and Double-H USB-SCSI converters

Add quirks for two SCM-based USB-SCSI converters which were bundled with
some Castlewood ORB removable drives. Without the quirk only the (single)
drive with SCSI ID 0 can be accessed.
Signed-off-by: Mark Knibbs <markk@clara.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

(cherry picked from commit 57cde01a)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

066b5a4c

storage: Add quirk for another SCM-based USB-SCSI converter · 41826ac9

Mark Knibbs authored Sep 23, 2014

There is apparently another SCM USB-SCSI converter with ID 04E6:000F. It
is listed along with 04E6:000B in the Windows INF file for the Startech
ICUSBSCSI2 as "eUSB SCSI Adapter (Bus Powered)". The quirk allows
devices with SCSI ID other than 0 to be accessed.

Also make a couple of existing SCM product IDs lower case to be
consistent with other entries.
Signed-off-by: Mark Knibbs <markk@clara.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

(cherry picked from commit 3512e7bf)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

41826ac9

USB: Add device quirk for ASUS T100 Base Station keyboard · d018b501

Lu Baolu authored Sep 19, 2014

This full-speed USB device generates spurious remote wakeup event
as soon as USB_DEVICE_REMOTE_WAKEUP feature is set. As the result,
Linux can't enter system suspend and S0ix power saving modes once
this keyboard is used.

This patch tries to introduce USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk.
With this quirk set, wakeup capability will be ignored during
device configure.

This patch could be back-ported to kernels as old as 2.6.39.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

(cherry picked from commit ddbe1fca)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

d018b501

regulatory: fix misapplied alpha2 fix · 94286a69

Eliad Peller authored Jun 11, 2014

Upstream commit a5fe8e76 (regulatory:
add NUL to alpha2) contained a hunk that was supposed to be applied to
struct ieee80211_reg_rule.  However in stable 3.12 (3.12.31 in
particular), it ended up in struct regulatory_request. Fix that now.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Eliad Peller <eliadx.peller@intel.com>
Cc: Johannes Berg <johannes.berg@intel.com>
(cherry picked from commit 84197d64)

(cherry picked from commit HEAD)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

94286a69

dell-wmi: Fix access out of memory · 8fd856ad

Pali Rohár authored Sep 29, 2014

Without this patch, dell-wmi is trying to access elements of dynamically
allocated array without checking the array size. This can lead to memory
corruption or a kernel panic. This patch adds the missing checks for
array size.
Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>

(cherry picked from commit a666b6ff)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

8fd856ad

HID: logitech-dj: prevent false errors to be shown · f137971a

Benjamin Tissoires authored Aug 22, 2014

Commit "HID: logitech: perform bounds checking on device_id early
enough" unfortunately leaks some errors to dmesg which are not real
ones:
- if the report is not a DJ one, then there is not point in checking
  the device_id
- the receiver (index 0) can also receive some notifications which
  can be safely ignored given the current implementation

Move out the test regarding the report_id and also discards
printing errors when the receiver got notified.

Fixes: ad3e14d7

Cc: stable@vger.kernel.org
Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

(cherry picked from commit 5abfe85c)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

f137971a

HID: logitech: perform bounds checking on device_id early enough · 4decac82

Jiri Kosina authored Aug 21, 2014

device_index is a char type and the size of paired_dj_deivces is 7
elements, therefore proper bounds checking has to be applied to
device_index before it is used.

We are currently performing the bounds checking in
logi_dj_recv_add_djhid_device(), which is too late, as malicious device
could send REPORT_TYPE_NOTIF_DEVICE_UNPAIRED early enough and trigger the
problem in one of the report forwarding functions called from
logi_dj_raw_event().

Fix this by performing the check at the earliest possible ocasion in
logi_dj_raw_event().

Cc: stable@vger.kernel.org
Reported-by: Ben Hawkes <hawkes@google.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

(cherry picked from commit ad3e14d7)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

4decac82

sparc64: Fix register corruption in top-most kernel stack frame during boot. · a32886ba

David S. Miller authored Oct 23, 2014

Meelis Roos reported that kernels built with gcc-4.9 do not boot, we
eventually narrowed this down to only impacting machines using
UltraSPARC-III and derivitive cpus.

The crash happens right when the first user process is spawned:

[   54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
[   54.451346]
[   54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab7 #96
[   54.666431] Call Trace:
[   54.698453]  [0000000000762f8c] panic+0xb0/0x224
[   54.759071]  [000000000045cf68] do_exit+0x948/0x960
[   54.823123]  [000000000042cbc0] fault_in_user_windows+0xe0/0x100
[   54.902036]  [0000000000404ad0] __handle_user_windows+0x0/0x10
[   54.978662] Press Stop-A (L1-A) to return to the boot prom
[   55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004

Further investigation showed that compiling only per_cpu_patch() with
an older compiler fixes the boot.

Detailed analysis showed that the function is not being miscompiled by
gcc-4.9, but it is using a different register allocation ordering.

With the gcc-4.9 compiled function, something during the code patching
causes some of the %i* input registers to get corrupted.  Perhaps
we have a TLB miss path into the firmware that is deep enough to
cause a register window spill and subsequent restore when we get
back from the TLB miss trap.

Let's plug this up by doing two things:

1) Stop using the firmware stack for client interface calls into
   the firmware.  Just use the kernel's stack.

2) As soon as we can, call into a new function "start_early_boot()"
   to put a one-register-window buffer between the firmware's
   deepest stack frame and the top-most initial kernel one.
Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit ef3e035c)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

a32886ba

sparc64: sparse irq · 211165d6

bob picco authored Sep 25, 2014

This patch attempts to do a few things. The highlights are: 1) enable
SPARSE_IRQ unconditionally, 2) kills off !SPARSE_IRQ code 3) allocates
ivector_table at boot time and 4) default to cookie only VIRQ mechanism
for supported firmware. The first firmware with cookie only support for
me appears on T5. You can optionally force the HV firmware to not cookie
only mode which is the sysino support.

The sysino is a deprecated HV mechanism according to the most recent
SPARC Virtual Machine Specification. HV_GRP_INTR is what controls the
cookie/sysino firmware versioning.

The history of this interface is:

1) Major version 1.0 only supported sysino based interrupt interfaces.

2) Major version 2.0 added cookie based VIRQs, however due to the fact
   that OSs were using the VIRQs without negoatiating major version
   2.0 (Linux and Solaris are both guilty), the VIRQs calls were
   allowed even with major version 1.0

   To complicate things even further, the VIRQ interfaces were only
   actually hooked up in the hypervisor for LDC interrupt sources.
   VIRQ calls on other device types would result in HV_EINVAL errors.

   So effectively, major version 2.0 is unusable.

3) Major version 3.0 was created to signal use of VIRQs and the fact
   that the hypervisor has these calls hooked up for all interrupt
   sources, not just those for LDC devices.

A new boot option is provided should cookie only HV support have issues.
hvirq - this is the version for HV_GRP_INTR. This is related to HV API
versioning.  The code attempts major=3 first by default. The option can
be used to override this default.

I've tested with SPARSE_IRQ on T5-8, M7-4 and T4-X and Jalap?no.
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit ee6a9333)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

211165d6

usb: pch_udc: usb gadget device support for Intel Quark X1000 · 4c67bdcd

Bryan O'Donoghue authored Aug 04, 2014

This patch is to enable the USB gadget device for Intel Quark X1000
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@intel.com>
Signed-off-by: Bing Niu <bing.niu@intel.com>
Signed-off-by: Alvin (Weike) Chen <alvin.chen@intel.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>

(cherry picked from commit a68df706)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

4c67bdcd

xfrm: Generate blackhole routes only from route lookup functions · ee68e725

Steffen Klassert authored Sep 16, 2014

Currently we genarate a blackhole route route whenever we have
matching policies but can not resolve the states. Here we assume
that dst_output() is called to kill the balckholed packets.
Unfortunately this assumption is not true in all cases, so
it is possible that these packets leave the system unwanted.

We fix this by generating blackhole routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: 2774c131 ("xfrm: Handle blackhole route creation via afinfo.")
Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

(cherry picked from commit f92ee619)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

ee68e725

usb: dwc3: core: fix order of PM runtime calls · 09132b7e

Felipe Balbi authored Sep 02, 2014

Currently, we disable pm_runtime before all register
accesses are done, this is dangerous and might lead
to abort exceptions due to the driver trying to access
a register which is clocked by a clock which was long
gated.

Fix that by moving pm_runtime_put_sync() and pm_runtime_disable()
as the last thing we do before returning from our ->remove()
method.

Fixes: 72246da4 (usb: Introduce DesignWare USB3 DRD Driver)
Cc: <stable@vger.kernel.org> # v3.2+
Signed-off-by: Felipe Balbi <balbi@ti.com>

(cherry picked from commit fed33afc)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

09132b7e

sparc64: T5 PMU · 57d4fb80

bob picco authored Sep 16, 2014

The T5 (niagara5) has different PCR related HV fast trap values and a new
HV API Group. This patch utilizes these and shares when possible with niagara4.

We use the same sparc_pmu niagara4_pmu. Should there be new effort to
obtain the MCU perf statistics then this would have to be changed.

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 05aa1651)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

57d4fb80

sparc: bpf_jit: fix loads from negative offsets · 13801b6d

Alexei Starovoitov authored Sep 23, 2014

- fix BPF_LD|ABS|IND from negative offsets:
  make sure to sign extend lower 32 bits in 64-bit register
  before calling C helpers from JITed code, otherwise 'int k'
  argument of bpf_internal_load_pointer_neg_helper() function
  will be added as large unsigned integer, causing packet size
  check to trigger and abort the program.

  It's worth noting that JITed code for 'A = A op K' will affect
  upper 32 bits differently depending whether K is simm13 or not.
  Since small constants are sign extended, whereas large constants
  are stored in temp register and zero extended.
  That is ok and we don't have to pay a penalty of sign extension
  for every sethi, since all classic BPF instructions have 32-bit
  semantics and we only need to set correct upper bits when
  transitioning from JITed code into C.

- though instructions 'A &= 0' and 'A *= 0' are odd, JIT compiler
  should not optimize them out
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 35607b02)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

13801b6d

sparc64: Fix pcr_ops initialization and usage bugs. · dce4d0ac

David S. Miller authored Aug 11, 2014

Christopher reports that perf_event_print_debug() can crash in uniprocessor
builds.  The crash is due to pcr_ops being NULL.

This happens because pcr_arch_init() is only invoked by smp_cpus_done() which
only executes in SMP builds.

init_hw_perf_events() is closely intertwined with pcr_ops being setup properly,
therefore:

1) Call pcr_arch_init() early on from init_hw_perf_events(), instead of
   from smp_cpus_done().

2) Do not hook up a PMU type if pcr_ops is NULL after pcr_arch_init().

3) Move init_hw_perf_events to a later initcall so that it we will be
   sure to invoke pcr_arch_init() after all cpus are brought up.

Finally, guard the one naked sequence of pcr_ops dereferences in
__global_pmu_self() with an appropriate NULL check.
Reported-by: Christopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 8bccf5b3)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

dce4d0ac

sparc64: Do not disable interrupts in nmi_cpu_busy() · a4be156d

David S. Miller authored Aug 11, 2014

nmi_cpu_busy() is a SMP function call that just makes sure that all of the
cpus are spinning using cpu cycles while the NMI test runs.

It does not need to disable IRQs because we just care about NMIs executing
which will even with 'normal' IRQs disabled.

It is not legal to enable hard IRQs in a SMP cross call, in fact this bug
triggers the BUG check in irq_work_run_list():

	BUG_ON(!irqs_disabled());

Because now irq_work_run() is invoked from the tail of
generic_smp_call_function_single_interrupt().
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 58556104)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

a4be156d

usb: pch_udc: usb gadget device support for Intel Quark X1000 · 80007683

Bryan O'Donoghue authored Aug 04, 2014

This patch is to enable the USB gadget device for Intel Quark X1000
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@intel.com>
Signed-off-by: Bing Niu <bing.niu@intel.com>
Signed-off-by: Alvin (Weike) Chen <alvin.chen@intel.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>

(cherry picked from commit a68df706)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

80007683

xfrm: Generate blackhole routes only from route lookup functions · a577c465

Steffen Klassert authored Sep 16, 2014

Currently we genarate a blackhole route route whenever we have
matching policies but can not resolve the states. Here we assume
that dst_output() is called to kill the balckholed packets.
Unfortunately this assumption is not true in all cases, so
it is possible that these packets leave the system unwanted.

We fix this by generating blackhole routes only from the
route lookup functions, here we can guarantee a call to
dst_output() afterwards.

Fixes: 2774c131 ("xfrm: Handle blackhole route creation via afinfo.")
Reported-by: Konstantinos Kolelis <k.kolelis@sirrix.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

(cherry picked from commit f92ee619)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

a577c465

tg3: Work around HW/FW limitations with vlan encapsulated frames · ca657c2a

Vlad Yasevich authored Sep 18, 2014

TG3 appears to have an issue performing TSO and checksum offloading
correclty when the frame has been vlan encapsulated (non-accelrated).
In these cases, tcp checksum is not correctly updated.

This patch attempts to work around this issue.  After the patch,
802.1ad vlans start working correctly over tg3 devices.

CC: Prashant Sreedharan <prashant@broadcom.com>
CC: Michael Chan <mchan@broadcom.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 476c1885)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

ca657c2a

sparc64: Implement __get_user_pages_fast(). · 20c887cf

David S. Miller authored Oct 24, 2014

It is not sufficient to only implement get_user_pages_fast(), you
must also implement the atomic version __get_user_pages_fast()
otherwise you end up using the weak symbol fallback implementation
which simply returns zero.

This is dangerous, because it causes the futex code to loop forever
if transparent hugepages are supported (see get_futex_key()).
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 06090e8e)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

20c887cf

sparc64: Increase size of boot string to 1024 bytes · 723366c9

Dave Kleikamp authored Oct 07, 2014

This is the longest boot string that silo supports.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 1cef94c3)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

723366c9

sparc64: Adjust KTSB assembler to support larger physical addresses. · d9e308a8

David S. Miller authored Sep 17, 2014

As currently coded the KTSB accesses in the kernel only support up to
47 bits of physical addressing.

Adjust the instruction and patching sequence in order to support
arbitrary 64 bits addresses.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>

(cherry picked from commit 8c82dc0e)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

d9e308a8

sparc64: T5 PMU · 12d05c9b

bob picco authored Sep 16, 2014

The T5 (niagara5) has different PCR related HV fast trap values and a new
HV API Group. This patch utilizes these and shares when possible with niagara4.

We use the same sparc_pmu niagara4_pmu. Should there be new effort to
obtain the MCU perf statistics then this would have to be changed.

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 05aa1651)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

12d05c9b

sparc64: Do not define thread fpregs save area as zero-length array. · 05946379

David S. Miller authored Oct 18, 2014

This breaks the stack end corruption detection facility.

What that facility does it write a magic value to "end_of_stack()"
and checking to see if it gets overwritten.

"end_of_stack()" is "task_thread_info(p) + 1", which for sparc64 is
the beginning of the FPU register save area.

So once the user uses the FPU, the magic value is overwritten and the
debug checks trigger.

Fix this by making the size explicit.

Due to the size we use for the fpsaved[], gsr[], and xfsr[] arrays we
are limited to 7 levels of FPU state saves.  So each FPU register set
is 256 bytes, allocate 256 * 7 for the fpregs area.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit e2653143)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

05946379

sparc64: Fix FPU register corruption with AES crypto offload. · 7aba3ba3

David S. Miller authored Oct 14, 2014

The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the
key material is preloaded into the FPU registers, and then we loop
over and over doing the crypt operation, reusing those pre-cooked key
registers.

There are intervening blkcipher*() calls between the crypt operation
calls.  And those might perform memcpy() and thus also try to use the
FPU.

The sparc64 kernel FPU usage mechanism is designed to allow such
recursive uses, but with a catch.

There has to be a trap between the two FPU using threads of control.

The mechanism works by, when the FPU is already in use by the kernel,
allocating a slot for FPU saving at trap time.  Then if, within the
trap handler, we try to use the FPU registers, the pre-trap FPU
register state is saved into the slot.  Then at trap return time we
notice this and restore the pre-trap FPU state.

Over the long term there are various more involved ways we can make
this work, but for a quick fix let's take advantage of the fact that
the situation where this happens is very limited.

All sparc64 chips that support the crypto instructiosn also are using
the Niagara4 memcpy routine, and that routine only uses the FPU for
large copies where we can't get the source aligned properly to a
multiple of 8 bytes.

We look to see if the FPU is already in use in this context, and if so
we use the non-large copy path which only uses integer registers.

Furthermore, we also limit this special logic to when we are doing
kernel copy, rather than a user copy.
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit f4da3628)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

7aba3ba3

sparc64: Fix lockdep warnings on reboot on Ultra-5 · 567e5998

David S. Miller authored Oct 10, 2014

Inconsistently, the raw_* IRQ routines do not interact with and update
the irqflags tracing and lockdep state, whereas the raw_* spinlock
interfaces do.

This causes problems in p1275_cmd_direct() because we disable hardirqs
by hand using raw_local_irq_restore() and then do a raw_spin_lock()
which triggers a lockdep trace because the CPU's hw IRQ state doesn't
match IRQ tracing's internal software copy of that state.

The CPU's irqs are disabled, yet current->hardirqs_enabled is true.

====================
reboot: Restarting system
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3536 check_flags+0x7c/0x240()
DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
Modules linked in: openpromfs
CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: G        W      3.17.0-dirty #145
Call Trace:
 [000000000045919c] warn_slowpath_common+0x5c/0xa0
 [0000000000459210] warn_slowpath_fmt+0x30/0x40
 [000000000048f41c] check_flags+0x7c/0x240
 [0000000000493280] lock_acquire+0x20/0x1c0
 [0000000000832b70] _raw_spin_lock+0x30/0x60
 [000000000068f2fc] p1275_cmd_direct+0x1c/0x60
 [000000000068ed28] prom_reboot+0x28/0x40
 [000000000043610c] machine_restart+0x4c/0x80
 [000000000047d2d4] kernel_restart+0x54/0x80
 [000000000047d618] SyS_reboot+0x138/0x200
 [00000000004060b4] linux_sparc_syscall32+0x34/0x60
---[ end trace 5c439fe81c05a100 ]---
possible reason: unannotated irqs-off.
irq event stamp: 2010267
hardirqs last  enabled at (2010267): [<000000000049a358>] vprintk_emit+0x4b8/0x580
hardirqs last disabled at (2010266): [<0000000000499f08>] vprintk_emit+0x68/0x580
softirqs last  enabled at (2010046): [<000000000045d278>] __do_softirq+0x378/0x4a0
softirqs last disabled at (2010039): [<000000000042bf08>] do_softirq_own_stack+0x28/0x40
Resetting ...
====================

Use local_* variables of the hw IRQ interfaces so that IRQ tracing sees
all of our changes.
Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit bdcf81b6)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

567e5998

sparc64: Fix reversed start/end in flush_tlb_kernel_range() · 9a39deda

David S. Miller authored Oct 04, 2014

When we have to split up a flush request into multiple pieces
(in order to avoid the firmware range) we don't specify the
arguments in the right order for the second piece.

Fix the order, or else we get hangs as the code tries to
flush "a lot" of entries and we get lockups like this:

[ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032]
[ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core
[ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608
[ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000
[ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000    Not tainted
[ 4423.064905] TPC: <__flush_tlb_kernel_range+0x28/0x40>
[ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000
[ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006
[ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000
[ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54
[ 4423.145121] RPC: <__purge_vmap_area_lazy+0x314/0x3a0>
[ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000
[ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138
[ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000
[ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24
[ 4423.224640] I7: <free_vmap_area_noflush+0x64/0x80>
[ 4423.234193] Call Trace:
[ 4423.239051]  [0000000000530c24] free_vmap_area_noflush+0x64/0x80
[ 4423.251029]  [0000000000531a7c] remove_vm_area+0x5c/0x80
[ 4423.261628]  [0000000000531b80] __vunmap+0x20/0x120
[ 4423.271352]  [000000000071cf18] n_tty_close+0x18/0x40
[ 4423.281423]  [00000000007222b0] tty_ldisc_close+0x30/0x60
[ 4423.292183]  [00000000007225a4] tty_ldisc_reinit+0x24/0xa0
[ 4423.303120]  [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0
[ 4423.314232]  [0000000000719aa0] __tty_hangup+0x280/0x3c0
[ 4423.324835]  [0000000000724cb4] pty_close+0x134/0x1a0
[ 4423.334905]  [000000000071aa24] tty_release+0x104/0x500
[ 4423.345316]  [00000000005511d0] __fput+0x90/0x1e0
[ 4423.354701]  [000000000047fa54] task_work_run+0x94/0xe0
[ 4423.365126]  [0000000000404b44] __handle_signal+0xc/0x2c

Fixes: 4ca9a237 ("sparc64: Guard against flushing openfirmware mappings.")
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 473ad7f4)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

9a39deda

sparc: Let memset return the address argument · 4b1b701b

Andreas Larsson authored Aug 29, 2014

This makes memset follow the standard (instead of returning 0 on success). This
is needed when certain versions of gcc optimizes around memset calls and assume
that the address argument is preserved in %o0.
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 74cad25c)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

4b1b701b

sparc64: find_node adjustment · efee7787

bob picco authored Sep 16, 2014

We have seen an issue with guest boot into LDOM that causes early boot failures
because of no matching rules for node identitity of the memory. I analyzed this
on my T4 and concluded there might not be a solution. I saw the issue in
mainline too when booting into the control/primary domain - with guests
configured.  Note, this could be a firmware bug on some older machines.

I'll provide a full explanation of the issues below. Should we not find a
matching BEST latency group for a real address (RA) then we will assume node 0.
On the T4-2 here with the information provided I can't see an alternative.

Technically the LDOM shown below should match the MBLOCK to the
favorable latency group. However other factors must be considered too. Were
the memory controllers configured "fine" grained interleave or "coarse"
grain interleaved -  T4. Also should a "group" MD node be considered a NUMA
node?

There has to be at least one Machine Description (MD) "group" and hence one
NUMA node. The group can have one or more latency groups (lg) - more than one
memory controller. The current code chooses the smallest latency as the most
favorable per group. The latency and lg information is in MLGROUP below.
MBLOCK is the base and size of the RAs for the machine as fetched from OBP
/memory "available" property. My machine has one MBLOCK but more would be
possible - with holes?

For a T4-2 the following information has been gathered:
with LDOM guest
MEMBLOCK configuration:
 memory size = 0x27f870000
 memory.cnt  = 0x3
 memory[0x0]    [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes
 memory[0x1]    [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes
 memory[0x2]    [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes
 reserved.cnt  = 0x2
 reserved[0x0]  [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes
 reserved[0x1]  [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes
MBLOCK[0]: base[20000000] size[280000000] offset[0]
(note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values)
(note: (RA + offset) & mask = val is the formula to detect a match for the
memory controller. should there be no match for find_node node, a return
value of -1 resulted for the node - BAD)

There is one group. It has these forward links
MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000]
MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000]
NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8])
(note: "val" is the best lg's (smallest latency) "match")

no LDOM guest - bare metal
MEMBLOCK configuration:
 memory size = 0xfdf2d0000
 memory.cnt  = 0x3
 memory[0x0]    [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes
 memory[0x1]    [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes
 memory[0x2]    [0x00000fff766000-0x00000fff771fff], 0xc000 bytes
 reserved.cnt  = 0x2
 reserved[0x0]  [0x00000020800000-0x00000021a04580], 0x1204581 bytes
 reserved[0x1]  [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes
MBLOCK[0]: base[20000000] size[fe0000000] offset[0]

there are two groups
group node[16d5]
MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000]
MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000]
NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8])
group node[171d]
MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000]
MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000]
NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8])
(note: for this two "group" bare metal machine, 1/2 memory is in group one's
lg and 1/2 memory is in group two's lg).

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 3dee9df5)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

efee7787

sparc64: Fix corrupted thread fault code. · 4c538a30

David S. Miller authored Oct 18, 2014

Every path that ends up at do_sparc64_fault() must install a valid
FAULT_CODE_* bitmask in the per-thread fault code byte.

Two paths leading to the label winfix_trampoline (which expects the
FAULT_CODE_* mask in register %g4) were not doing so:

1) For pre-hypervisor TLB protection violation traps, if we took
   the 'winfix_trampoline' path we wouldn't have %g4 initialized
   with the FAULT_CODE_* value yet.  Resulting in using the
   TLB_TAG_ACCESS register address value instead.

2) In the TSB miss path, when we notice that we are going to use a
   hugepage mapping, but we haven't allocated the hugepage TSB yet, we
   still have to take the window fixup case into consideration and
   in that particular path we leave %g4 not setup properly.

Errors on this sort were largely invisible previously, but after
commit 4ccb9272 ("sparc64: sun4v TLB
error power off events") we now have a fault_code mask bit
(FAULT_CODE_BAD_RA) that triggers due to this bug.

FAULT_CODE_BAD_RA triggers because this bit is set in TLB_TAG_ACCESS
(see #1 above) and thus we get seemingly random bus errors triggered
for user processes.

Fixes: 4ccb9272 ("sparc64: sun4v TLB error power off events")
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>

(cherry picked from commit 84bd6d8b)
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>

4c538a30