Commits · 554726d33c20791653dbd1c047c83f93459bc586 · nexedi / linux

29 Jul, 2015 17 commits

Merge tag 'kvm-s390-next-20150728' of... · 554726d3

Paolo Bonzini authored Jul 29, 2015

Merge tag 'kvm-s390-next-20150728' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

KVM: s390: Fixes and features for kvm/next (4.3)

1. Rework logging infrastructure (s390dbf) to integrate feedback learned
when debugging performance and test issues
2. Some cleanups and simplifications for CMMA handling
3. Fix gdb debugging and single stepping on some instructions
4. Error handling for storage key setup

554726d3

KVM: s390: log capability enablement and vm attribute changes · c92ea7b9

Christian Borntraeger authored Jul 22, 2015

Depending on user space, some capabilities and vm attributes are
enabled at runtime. Let's log those events and while we're at it,
log querying the vm attributes as well.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

c92ea7b9

KVM: s390: Provide global debug log · 78f26131

Christian Borntraeger authored Jul 22, 2015

In addition to the per VM debug logs, let's provide a global
one for KVM-wide events, like new guests or fatal errors.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>

78f26131

KVM: s390: adapt debug entries for instruction handling · 7cbde76b

Christian Borntraeger authored Jul 21, 2015

Use the default log level 3 for state changing and/or seldom events,
use 4 for others. Also change some numbers from %x to %d and vice versa
to match documentation. If hex, let's prepend the numbers with 0x.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>

7cbde76b

KVM: s390: improve debug feature usage · 1cb9cf72

Christian Borntraeger authored Jul 20, 2015

We do not use the exception logger, so the 2nd area is unused.
Just have one area that is bigger (32 pages).
At the same time we can limit the debug feature size to 7
longs, as the largest user has 3 parameters + string + boiler
plate (vCPU, PSW mask, PSW addr)
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>

1cb9cf72

KVM: s390: remove outdated documentation · f2d2eb9d

Christian Borntraeger authored Jul 20, 2015

The old Documentation/s390/kvm.txt file is either
outdated or described in Documentation/virtual/kvm/api.txt.
Let's get rid of it.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>

f2d2eb9d

KVM: s390: more irq names for trace events · a37281b6

David Hildenbrand authored Nov 21, 2014

This patch adds names for missing irq types to the trace events.
In order to identify adapter irqs, the define is moved from
interrupt.c to the other basic irq defines in uapi/linux/kvm.h.
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

a37281b6

KVM: s390: Fixup interrupt vcpu event messages and levels · 3f24ba15

Christian Borntraeger authored Jul 09, 2015

This reworks the debug logging for interrupt related logs.
Several changes:
- unify program int/irq
- improve decoding (e.g. use mcic instead of parm64 for machine
  check injection)
- remove useless interrupt type number (the name is enough)
- rename "interrupt:" to "deliver:" as the other side is called "inject"
- use log level 3 for state changing and/or seldom events (like machine
  checks, restart..)
- use log level 4 for frequent events
- use 0x prefix for hex numbers
- add pfault done logging
- move some tracing outside spinlock
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>

3f24ba15

KVM: s390: add more debug data for the pfault diagnoses · ab7090a6

Christian Borntraeger authored Jul 16, 2015

We're not only interested in the address of the control block, but
also in the requested subcommand and for the token subcommand, in the
specified token address and masks.
Suggested-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>

ab7090a6

KVM: s390: remove "from (user|kernel)" from irq injection messages · ed2afcfa

David Hildenbrand authored Jul 20, 2015

The "from user"/"from kernel" part of the log/trace messages is not
always correct anymore and therefore not really helpful.

Let's remove that part from the log + trace messages. For program
interrupts, we can now move the logging/tracing part into the real
injection function, as already done for the other injection functions.
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

ed2afcfa

KVM: s390: VCPU_EVENT cleanup for prefix changes · 71db35d2

Christian Borntraeger authored Jul 10, 2015

SPX (SET PREFIX)  and SIGP (Set prefix) can change the prefix
register of a CPU. As sigp set prefix may be handled in user
space (KVM_CAP_S390_USER_SIGP), we would not log the changes
triggered via SIGP in that case. Let's have just one VCPU_EVENT
at the central location that tracks prefix changes.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>

71db35d2

KVM: s390: Improve vcpu event debugging for diagnoses · 15e8b5da

Christian Borntraeger authored Jul 09, 2015

Let's add a vcpu event for the page reference handling and change
the default debugging level for the ipl diagnose. Both are not
frequent AND change the global state, so lets log them always.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

15e8b5da

KVM: s390: add kvm stat counter for all diagnoses · 175a5c9e

Christian Borntraeger authored Jul 07, 2015

Sometimes kvm stat counters are the only performance metric to check
after something went wrong. Let's add additional counters for some
diagnoses.

In addition do the count for diag 10 all the time, even if we inject
a program interrupt.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>

175a5c9e

KVM: s390: only reset CMMA state if it was enabled before · c3489155

Dominik Dingel authored Jun 18, 2015

There is no point in resetting the CMMA state if it was never enabled.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

c3489155

KVM: s390: clean up cmma_enable check · e6db1d61

Dominik Dingel authored May 07, 2015

As we already only enable CMMA when userspace requests it, we can
safely move the additional checks to the request handler and avoid
doing them multiple times. This also tells userspace if CMMA is
available.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

e6db1d61

KVM: s390: filter space-switch events when PER is enforced · 0df30abc

David Hildenbrand authored Jun 23, 2015

When guest debugging is active, space-switch events might be enforced
by PER. While the PER events are correctly filtered out,
space-switch-events could be forwarded to the guest, although from a
guest point of view, they should not have been reported.

Therefore we have to filter out space-switch events being concurrently
reported with a PER event, if the PER event got filtered out. To do so,
we theoretically have to know which instruction was responsible for the
event. As the applicable instructions modify the PSW address, the
address space set in the PSW and even the address space in cr1, we
can't figure out the instruction that way.

For this reason, we have to rely on the information about the old and
new address space, in order to guess the responsible instruction type
and do appropriate checks for space-switch events.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

0df30abc

KVM: s390: propagate error from enable storage key · 14d4a425

Dominik Dingel authored May 07, 2015

As enabling storage keys might fail, we should forward the error.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>

14d4a425

23 Jul, 2015 12 commits

KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask · 54928303

Paolo Bonzini authored Jul 10, 2015

We can disable CD unconditionally when there is no assigned device.
KVM now forces guest PAT to all-writeback in that case, so it makes
sense to also force CR0.CD=0.

When there are assigned devices, emulate cache-disabled operation
through the page tables.  This behavior is consistent with VMX
microcode, where CD/NW are not touched by vmentry/vmexit.  However,
keep this dependent on the quirk because OVMF enables the caches
too late.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

54928303

kvm/x86: add support for MONITOR_TRAP_FLAG · 5f3d45e7

Mihai Donțu authored Jul 05, 2015

Allow a nested hypervisor to single step its guests.
Signed-off-by: Mihai Donțu <mihai.dontu@gmail.com>
[Fix overlong line. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5f3d45e7

kvm/x86: add sending hyper-v crash notification to user space · 2ce79189

Andrey Smetanin authored Jul 03, 2015

Sending of notification is done by exiting vcpu to user space
if KVM_REQ_HV_CRASH is enabled for vcpu. At exit to user space
the kvm_run structure contains system_event with type
KVM_SYSTEM_EVENT_CRASH to notify about guest crash occurred.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2ce79189

kvm/x86: added hyper-v crash msrs into kvm hyperv context · e7d9513b

Andrey Smetanin authored Jul 03, 2015

Added kvm Hyper-V context hv crash variables as storage
of Hyper-V crash msrs.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e7d9513b

kvm: introduce vcpu_debug = kvm_debug + vcpu context · ee86dbc6

Andrey Smetanin authored Jul 03, 2015

vcpu_debug is useful macro like kvm_debug but additionally
includes vcpu context inside output.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ee86dbc6

kvm/x86: move Hyper-V MSR's/hypercall code into hyperv.c file · e83d5887

Andrey Smetanin authored Jul 03, 2015

This patch introduce Hyper-V related source code file - hyperv.c and
per vm and per vcpu hyperv context structures.
All Hyper-V MSR's and hypercall code moved into hyperv.c.
All Hyper-V kvm/vcpu fields moved into appropriate hyperv context
structures. Copyrights and authors information copied from x86.c
to hyperv.c.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Peter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e83d5887

KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions · f9eb4af6

Eugene Korenevsky authored Apr 17, 2015

According to Intel SDM several checks must be applied for memory operands
of VMX instructions.

Long mode: #GP(0) or #SS(0) depending on the segment must be thrown
if the memory address is in a non-canonical form.

Protected mode, checks in chronological order:
- The segment type must be checked with access type (read or write) taken
into account.
	For write access: #GP(0) must be generated if the destination operand
		is located in a read-only data segment or any code segment.
	For read access: #GP(0) must be generated if if the source operand is
		located in an execute-only code segment.
- Usability of the segment must be checked. #GP(0) or #SS(0) depending on the
	segment must be thrown if the segment is unusable.
- Limit check. #GP(0) or #SS(0) depending on the segment must be
	thrown if the memory operand effective address is outside the segment
	limit.
Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f9eb4af6

KVM: x86: rename quirk constants to KVM_X86_QUIRK_* · 0da029ed

Paolo Bonzini authored Jul 23, 2015

Make them clearly architecture-dependent; the capability is valid for
all architectures, but the argument is not.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

0da029ed

KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED · fb279950

Xiao Guangrong authored Jul 16, 2015

OVMF depends on WB to boot fast, because it only clears caches after
it has set up MTRRs---which is too late.

Let's do writeback if CR0.CD is set to make it happy, similar to what
SVM is already doing.
Signed-off-by: Xiao Guangrong <guangrong.xiao@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fb279950

KVM: x86: introduce kvm_check_has_quirk · 41dbc6bc

Paolo Bonzini authored Jul 23, 2015

The logic of the disabled_quirks field usually results in a double
negation.  Wrap it in a simple function that checks the bit and
negates it.

Based on a patch from Xiao Guangrong.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

41dbc6bc

KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type · 3e5d2fdc

Xiao Guangrong authored Jul 16, 2015

kvm_mtrr_get_guest_memory_type never returns -1 which is implied
in the current code since if @type = -1 (means no MTRR contains the
range), iter.partial_map must be true

Simplify the code to indicate this fact
Signed-off-by: Xiao Guangrong <guangrong.xiao@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3e5d2fdc

KVM: MTRR: fix memory type handling if MTRR is completely disabled · 10dc331f

Xiao Guangrong authored Jul 16, 2015

Currently code uses default memory type if MTRR is fully disabled,
fix it by using UC instead.
Signed-off-by: Xiao Guangrong <guangrong.xiao@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

10dc331f

22 Jul, 2015 11 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c5dfd654

Linus Torvalds authored Jul 22, 2015

Pull networking fixes from David Miller:

 1) Don't use shared bluetooth antenna in iwlwifi driver for management
    frames, from Emmanuel Grumbach.

 2) Fix device ID check in ath9k driver, from Felix Fietkau.

 3) Off by one in xen-netback BUG checks, from Dan Carpenter.

 4) Fix IFLA_VF_PORT netlink attribute validation, from Daniel Borkmann.

 5) Fix races in setting peeked bit flag in SKBs during datagram
    receive.  If it's shared we have to clone it otherwise the value can
    easily be corrupted.  Fix from Herbert Xu.

 6) Revert fec clock handling change, causes regressions.  From Fabio
    Estevam.

 7) Fix use after free in fq_codel and sfq packet schedulers, from WANG
    Cong.

 8) ipvlan bug fixes (memory leaks, missing rcu_dereference_bh, etc.)
    from WANG Cong and Konstantin Khlebnikov.

 9) Memory leak in act_bpf packet action, from Alexei Starovoitov.

10) ARM bpf JIT bug fixes from Nicolas Schichan.

11) Fix backwards compat of ANY_LAYOUT in virtio_net driver, from
    Michael S Tsirkin.

12) Destruction of bond with different ARP header types not handled
    correctly, fix from Nikolay Aleksandrov.

13) Revert GRO receive support in ipv6 SIT tunnel driver, causes
    regressions because the GRO packets created cannot be processed
    properly on the GSO side if we forward the frame.  From Herbert Xu.

14) TCCR update race and other fixes to ravb driver from Sergei
    Shtylyov.

15) Fix SKB leaks in caif_queue_rcv_skb(), from Eric Dumazet.

16) Fix panics on packet scheduler filter replace, from Daniel Borkmann.

17) Make sure AF_PACKET sees properly IP headers in defragmented frames
    (via PACKET_FANOUT_FLAG_DEFRAG option), from Edward Hyunkoo Jee.

18) AF_NETLINK cannot hold mutex in RCU callback, fix from Florian
    Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits)
  ravb: fix ring memory allocation
  net: phy: dp83867: Fix warning check for setting the internal delay
  openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
  netlink: don't hold mutex in rcu callback when releasing mmapd ring
  ARM: net: fix vlan access instructions in ARM JIT.
  ARM: net: handle negative offsets in BPF JIT.
  ARM: net: fix condition for load_order > 0 when translating load instructions.
  tcp: suppress a division by zero warning
  drivers: net: cpsw: remove tx event processing in rx napi poll
  inet: frags: fix defragmented packet's IP header for af_packet
  net: mvneta: fix refilling for Rx DMA buffers
  stmmac: fix setting of driver data in stmmac_dvr_probe
  sched: cls_flow: fix panic on filter replace
  sched: cls_flower: fix panic on filter replace
  sched: cls_bpf: fix panic on filter replace
  net/mdio: fix mdio_bus_match for c45 PHY
  net: ratelimit warnings about dst entry refcount underflow or overflow
  caif: fix leaks and race in caif_queue_rcv_skb()
  qmi_wwan: add the second QMI/network interface for Sierra Wireless MC7305/MC7355
  ravb: fix race updating TCCR
  ...

c5dfd654

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 5a5ca73a

Linus Torvalds authored Jul 22, 2015

Pull ARM64 fixes from Catalin Marinas:

 - arm64 build fix following the move of the thread_struct to the end of
   task_struct and the asm offsets becoming too large for the AArch64
   ISA

 - preparatory patch for moving irq_data struct members (applied now to
   reduce dependency for the next merging window)

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  ARM64/irq: Use access helper irq_data_get_affinity_mask()
  arm64: switch_to: calculate cpu context pointer using separate register

5a5ca73a

ARM64/irq: Use access helper irq_data_get_affinity_mask() · 3bc38fc1

Jiang Liu authored Jul 13, 2015

This is a preparatory patch for moving irq_data struct members.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

3bc38fc1

arm64: switch_to: calculate cpu context pointer using separate register · c0d3fce5

Will Deacon authored Jul 20, 2015

Commit 0c8c0f03 ("x86/fpu, sched: Dynamically allocate 'struct fpu'")
moved the thread_struct to the bottom of task_struct. As a result, the
offset is now too large to be used in an immediate add on arm64 with
some kernel configs:

arch/arm64/kernel/entry.S: Assembler messages:
arch/arm64/kernel/entry.S:588: Error: immediate out of range
arch/arm64/kernel/entry.S:597: Error: immediate out of range

This patch calculates the offset using an additional register instead of
an immediate offset.

Fixes: 0c8c0f03 ("x86/fpu, sched: Dynamically allocate 'struct fpu'")
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

c0d3fce5

ravb: fix ring memory allocation · d8b48911

Sergei Shtylyov authored Jul 22, 2015

The driver is written as if it can adapt to a low memory situation allocating
less RX skbs and TX aligned buffers than the respective RX/TX ring sizes. In
reality though the driver would malfunction in this case. Stop being overly
smart and just fail in such situation -- this is achieved by moving the memory
allocation from ravb_ring_format() to ravb_ring_init().

We leave dma_map_single() calls in place but make their failure non-fatal
by marking the corresponding RX descriptors with zero data size which should
prevent DMA to an invalid addresses.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d8b48911

net: phy: dp83867: Fix warning check for setting the internal delay · a46fa260

Dan Murphy authored Jul 21, 2015

Fix warning: logical ‘or’ of collectively exhaustive tests is always true

Change the internal delay check from an 'or' condition to an 'and'
condition.
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a46fa260

openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes · bac541e4

Chris J Arges authored Jul 21, 2015

Some architectures like POWER can have a NUMA node_possible_map that
contains sparse entries. This causes memory corruption with openvswitch
since it allocates flow_cache with a multiple of num_possible_nodes() and
assumes the node variable returned by for_each_node will index into
flow->stats[node].

Use nr_node_ids to allocate a maximal sparse array instead of
num_possible_nodes().

The crash was noticed after 3af229f2 was applied as it changed the
node_possible_map to match node_online_map on boot.
Fixes: 3af229f2Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bac541e4

netlink: don't hold mutex in rcu callback when releasing mmapd ring · 0470eb99

Florian Westphal authored Jul 21, 2015

Kirill A. Shutemov says:

This simple test-case trigers few locking asserts in kernel:

int main(int argc, char **argv)
{
        unsigned int block_size = 16 * 4096;
        struct nl_mmap_req req = {
                .nm_block_size          = block_size,
                .nm_block_nr            = 64,
                .nm_frame_size          = 16384,
                .nm_frame_nr            = 64 * block_size / 16384,
        };
        unsigned int ring_size;
	int fd;

	fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
                exit(1);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
                exit(1);

	ring_size = req.nm_block_nr * req.nm_block_size;
	mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	return 0;
}

+++ exited with 0 +++
BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
3 locks held by init/1:
 #0:  (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
 #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
 #2:  (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20

CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
 ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
 0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
 ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
Call Trace:
 <IRQ>  [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
 [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
 [<ffffffff81085bed>] __might_sleep+0x4d/0x90
 [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
 [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
 [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
 [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
 [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
 [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
 [<ffffffff817e484d>] __sk_free+0x1d/0x160
 [<ffffffff817e49a9>] sk_free+0x19/0x20
[..]

Cong Wang says:

We can't hold mutex lock in a rcu callback, [..]

Thomas Graf says:

The socket should be dead at this point. It might be simpler to
add a netlink_release_ring() function which doesn't require
locking at all.
Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Diagnosed-by: Cong Wang <cwang@twopensource.com>
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

0470eb99

Merge branch 'arm-bpf-fixes' · 7c8cbaca

David S. Miller authored Jul 21, 2015

Nicolas Schichan says:

====================
BPF JIT fixes for ARM

These patches are fixing bugs in the ARM JIT and should probably find
their way to a stable kernel. All 60 test_bpf tests in Linux 4.1 release
are now passing OK (was 54 out of 60 before).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7c8cbaca

ARM: net: fix vlan access instructions in ARM JIT. · c18fe54b

Nicolas Schichan authored Jul 21, 2015

This makes BPF_ANC | SKF_AD_VLAN_TAG and BPF_ANC | SKF_AD_VLAN_TAG_PRESENT
have the same behaviour as the in kernel VM and makes the test_bpf LD_VLAN_TAG
and LD_VLAN_TAG_PRESENT tests pass.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

c18fe54b

ARM: net: handle negative offsets in BPF JIT. · 6d715e30

Nicolas Schichan authored Jul 21, 2015

Previously, the JIT would reject negative offsets known during code
generation and mishandle negative offsets provided at runtime.

Fix that by calling bpf_internal_load_pointer_neg_helper()
appropriately in the jit_get_skb_{b,h,w} slow path helpers and by forcing
the execution flow to the slow path helpers when the offset is
negative.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d715e30