1. 12 Dec, 2016 13 commits
    • Ding Tianhong's avatar
      rcu: Fix soft lockup for rcu_nocb_kthread · b6c52222
      Ding Tianhong authored
      commit bedc1969 upstream.
      
      Carrying out the following steps results in a softlockup in the
      RCU callback-offload (rcuo) kthreads:
      
      1. Connect to ixgbevf, and set the speed to 10Gb/s.
      2. Use ifconfig to bring the nic up and down repeatedly.
      
      [  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
      [  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
      [  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
      [  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
      [  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
      [  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
      [  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
      [  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
      [  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
      [  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
      [  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
      [  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
      [  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  368.106005] Stack:
      [  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
      [  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
      [  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
      [  368.106005] Call Trace:
      [  368.106005]  <IRQ>
      [  368.106005]
      [  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
      [  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
      [  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
      [  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
      [  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
      [  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
      [  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
      [  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
      [  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
      [  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
      [  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
      [  368.106005]  <EOI>
      [  368.106005]
      [  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
      [  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
      [  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
      [  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
      [  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
      [  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
      [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
      [  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
      [  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
      
      ==================================cut here==============================
      
      It turns out that the rcuos callback-offload kthread is busy processing
      a very large quantity of RCU callbacks, and it is not reliquishing the
      CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
      within the loop to allow other tasks to run.
      
      [js] use onlu cond_resched() in 3.12
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Dhaval Giani <dhaval.giani@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b6c52222
    • Andy Lutomirski's avatar
      x86/traps: Ignore high word of regs->cs in early_fixup_exception() · e59f8eb4
      Andy Lutomirski authored
      commit fc0e81b2 upstream.
      
      On the 80486 DX, it seems that some exceptions may leave garbage in
      the high bits of CS.  This causes sporadic failures in which
      early_fixup_exception() refuses to fix up an exception.
      
      As far as I can tell, this has been buggy for a long time, but the
      problem seems to have been exacerbated by commits:
      
        1e02ce4c ("x86: Store a per-cpu shadow copy of CR4")
        e1bfc11c ("x86/init: Fix cr4_init_shadow() on CR4-less machines")
      
      This appears to have broken for as long as we've had early
      exception handling.
      
      [ This backport should apply to kernels from 3.4 - 4.5. ]
      
      Fixes: 4c5023a3 ("x86-32: Handle exception table entries during early boot")
      Cc: H. Peter Anvin <hpa@zytor.com>
      Reported-by: default avatarMatthew Whitehead <tedheadster@gmail.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e59f8eb4
    • Michel Dänzer's avatar
      drm/radeon: Ensure vblank interrupt is enabled on DPMS transition to on · 78a9c69d
      Michel Dänzer authored
      NOTE: This patch only applies to 4.5.y or older kernels. With newer
      kernels, this problem cannot happen because the driver now uses
      drm_crtc_vblank_on/off instead of drm_vblank_pre/post_modeset[0]. I
      consider this patch safer for older kernels than backporting the API
      change, because drm_crtc_vblank_on/off had various issues in older
      kernels, and I'm not sure all fixes for those have been backported to
      all stable branches where this patch could be applied.
      
          ---------------------
      
      Fixes the vblank interrupt being disabled when it should be on, which
      can cause at least the following symptoms:
      
      * Hangs when running 'xset dpms force off' in a GNOME session with
        gnome-shell using DRI2.
      * RandR 1.4 slave outputs freezing with garbage displayed using
        xf86-video-ati 7.8.0 or newer.
      
      [0] See upstream commit:
      
      commit 777e3cbc
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Thu Jan 21 11:08:57 2016 +0100
      
          drm/radeon: Switch to drm_vblank_on/off
      Reported-and-Tested-by: default avatarMax Staudt <mstaudt@suse.de>
      Reviewed-by: default avatarDaniel Vetter <daniel@ffwll.ch>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarMichel Dänzer <michel.daenzer@amd.com>
      78a9c69d
    • Andrey Ryabinin's avatar
      mpi: Fix NULL ptr dereference in mpi_powm() [ver #3] · 18fb7a8f
      Andrey Ryabinin authored
      commit f5527fff upstream.
      
      This fixes CVE-2016-8650.
      
      If mpi_powm() is given a zero exponent, it wants to immediately return
      either 1 or 0, depending on the modulus.  However, if the result was
      initalised with zero limb space, no limbs space is allocated and a
      NULL-pointer exception ensues.
      
      Fix this by allocating a minimal amount of limb space for the result when
      the 0-exponent case when the result is 1 and not touching the limb space
      when the result is 0.
      
      This affects the use of RSA keys and X.509 certificates that carry them.
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      PGD 0
      Oops: 0002 [#1] SMP
      Modules linked in:
      CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff8804011944c0 task.stack: ffff880401294000
      RIP: 0010:[<ffffffff8138ce5d>]  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      RSP: 0018:ffff880401297ad8  EFLAGS: 00010212
      RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0
      RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0
      RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000
      R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50
      FS:  00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0
      Stack:
       ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4
       0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30
       ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8
      Call Trace:
       [<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66
       [<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d
       [<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd
       [<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146
       [<ffffffff8132a95c>] rsa_verify+0x9d/0xee
       [<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb
       [<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1
       [<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228
       [<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4
       [<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1
       [<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1
       [<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61
       [<ffffffff812fc9f3>] key_create_or_update+0x145/0x399
       [<ffffffff812fe227>] SyS_add_key+0x154/0x19e
       [<ffffffff81001c2b>] do_syscall_64+0x80/0x191
       [<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25
      Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f
      RIP  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
       RSP <ffff880401297ad8>
      CR2: 0000000000000000
      ---[ end trace d82015255d4a5d8d ]---
      
      Basically, this is a backport of a libgcrypt patch:
      
      	http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526
      
      Fixes: cdec9cb5 ("crypto: GnuPG based MPI lib - source files (part 1)")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
      cc: linux-ima-devel@lists.sourceforge.net
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      18fb7a8f
    • John Johansen's avatar
      apparmor: fix change_hat not finding hat after policy replacement · 6d7bc8a8
      John Johansen authored
      commit 3d40658c upstream.
      
      After a policy replacement, the task cred may be out of date and need
      to be updated. However change_hat is using the stale profiles from
      the out of date cred resulting in either: a stale profile being applied
      or, incorrect failure when searching for a hat profile as it has been
      migrated to the new parent profile.
      
      Fixes: 01e2b670 (failure to find hat)
      Fixes: 898127c3 (stale policy being applied)
      Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1000287Signed-off-by: default avatarJohn Johansen <john.johansen@canonical.com>
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6d7bc8a8
    • Johannes Berg's avatar
      cfg80211: limit scan results cache size · e525af57
      Johannes Berg authored
      commit 9853a55e upstream.
      
      It's possible to make scanning consume almost arbitrary amounts
      of memory, e.g. by sending beacon frames with random BSSIDs at
      high rates while somebody is scanning.
      
      Limit the number of BSS table entries we're willing to cache to
      1000, limiting maximum memory usage to maybe 4-5MB, but lower
      in practice - that would be the case for having both full-sized
      beacon and probe response frames for each entry; this seems not
      possible in practice, so a limit of 1000 entries will likely be
      closer to 0.5 MB.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e525af57
    • Chris Metcalf's avatar
      tile: avoid using clocksource_cyc2ns with absolute cycle count · 2323e1a9
      Chris Metcalf authored
      commit e658a6f1 upstream.
      
      For large values of "mult" and long uptimes, the intermediate
      result of "cycles * mult" can overflow 64 bits.  For example,
      the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
      we have mult = 853, and after 208.5 days, we overflow 64 bits.
      
      Since clocksource_cyc2ns() is intended to be used for relative
      cycle counts, not absolute cycle counts, performance is more
      importance than accepting a wider range of cycle values.  So,
      just use mult_frac() directly in tile's sched_clock().
      
      Commit 4cecf6d4 ("sched, x86: Avoid unnecessary overflow
      in sched_clock") by Salman Qazi results in essentially the same
      generated code for x86 as this change does for tile.  In fact,
      a follow-on change by Salman introduced mult_frac() and switched
      to using it, so the C code was largely identical at that point too.
      
      Peter Zijlstra then added mul_u64_u32_shr() and switched x86
      to use it.  This is, in principle, better; by optimizing the
      64x64->64 multiplies to be 32x32->64 multiplies we can potentially
      save some time.  However, the compiler piplines the 64x64->64
      multiplies pretty well, and the conditional branch in the generic
      mul_u64_u32_shr() causes some bubbles in execution, with the
      result that it's pretty much a wash.  If tilegx provided its own
      implementation of mul_u64_u32_shr() without the conditional branch,
      we could potentially save 3 cycles, but that seems like small gain
      for a fair amount of additional build scaffolding; no other platform
      currently provides a mul_u64_u32_shr() override, and tile doesn't
      currently have an <asm/div64.h> header to put the override in.
      
      Additionally, gcc currently has an optimization bug that prevents
      it from recognizing the opportunity to use a 32x32->64 multiply,
      and so the result would be no better than the existing mult_frac()
      until such time as the compiler is fixed.
      
      For now, just using mult_frac() seems like the right answer.
      Signed-off-by: default avatarChris Metcalf <cmetcalf@mellanox.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2323e1a9
    • Andrey Grodzovsky's avatar
      scsi: mpt3sas: Fix secure erase premature termination · 940aeba5
      Andrey Grodzovsky authored
      commit 18f6084a upstream.
      
      This is a work around for a bug with LSI Fusion MPT SAS2 when perfoming
      secure erase. Due to the very long time the operation takes, commands
      issued during the erase will time out and will trigger execution of the
      abort hook. Even though the abort hook is called for the specific
      command which timed out, this leads to entire device halt
      (scsi_state terminated) and premature termination of the secure erase.
      
      Set device state to busy while ATA passthrough commands are in progress.
      
      [mkp: hand applied to 4.9/scsi-fixes, tweaked patch description]
      Signed-off-by: default avatarAndrey Grodzovsky <andrey2805@gmail.com>
      Acked-by: default avatarSreekanth Reddy <Sreekanth.Reddy@broadcom.com>
      Cc: <linux-scsi@vger.kernel.org>
      Cc: Sathya Prakash <sathya.prakash@broadcom.com>
      Cc: Chaitra P B <chaitra.basappa@broadcom.com>
      Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
      Cc: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      940aeba5
    • Petr Vandrovec's avatar
      Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y · af5bc71b
      Petr Vandrovec authored
      commit 2ce9d227 upstream.
      
      Some code (all error handling) submits CDBs that are allocated
      on the stack.  This breaks with CB/CBI code that tries to create
      URB directly from SCSI command buffer - which happens to be in
      vmalloced memory with vmalloced kernel stacks.
      
      Let's make copy of the command in usb_stor_CB_transport.
      Signed-off-by: default avatarPetr Vandrovec <petr@vandrovec.name>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      af5bc71b
    • Doug Brown's avatar
      USB: serial: ftdi_sio: add support for TI CC3200 LaunchPad · 9a64b7b1
      Doug Brown authored
      commit 9bfef729 upstream.
      
      This patch adds support for the TI CC3200 LaunchPad board, which uses a
      custom USB vendor ID and product ID. Channel A is used for JTAG, and
      channel B is used for a UART.
      Signed-off-by: default avatarDoug Brown <doug@schmorgal.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9a64b7b1
    • Paul Jakma's avatar
      USB: serial: cp210x: add ID for the Zone DPMX · b2f43aa4
      Paul Jakma authored
      commit 2ab13292 upstream.
      
      The BRIM Brothers Zone DPMX is a bicycle powermeter. This ID is for the USB
      serial interface in its charging dock for the control pods, via which some
      settings for the pods can be modified.
      Signed-off-by: default avatarPaul Jakma <paul@jakma.org>
      Cc: Barry Redmond <barry@brimbrothers.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b2f43aa4
    • Peter Chen's avatar
      usb: chipidea: move the lock initialization to core file · 11a35448
      Peter Chen authored
      commit a5d906bb upstream.
      
      This can fix below dump when the lock is accessed at host
      mode due to it is not initialized.
      
      [   46.119638] INFO: trying to register non-static key.
      [   46.124643] the code is fine but needs lockdep annotation.
      [   46.130144] turning off the locking correctness validator.
      [   46.135659] CPU: 0 PID: 690 Comm: cat Not tainted 4.9.0-rc3-00079-g4b75f1d #1210
      [   46.143075] Hardware name: Freescale i.MX6 SoloX (Device Tree)
      [   46.148923] Backtrace:
      [   46.151448] [<c010c460>] (dump_backtrace) from [<c010c658>] (show_stack+0x18/0x1c)
      [   46.159038]  r7:edf52000
      [   46.161412]  r6:60000193
      [   46.163967]  r5:00000000
      [   46.165035]  r4:c0e25c2c
      
      [   46.169109] [<c010c640>] (show_stack) from [<c03f58a4>] (dump_stack+0xb4/0xe8)
      [   46.176362] [<c03f57f0>] (dump_stack) from [<c016d690>] (register_lock_class+0x4fc/0x56c)
      [   46.184554]  r10:c0e25d24
      [   46.187014]  r9:edf53e70
      [   46.189569]  r8:c1642444
      [   46.190637]  r7:ee9da024
      [   46.193191]  r6:00000000
      [   46.194258]  r5:00000000
      [   46.196812]  r4:00000000
      [   46.199185]  r3:00000001
      
      [   46.203259] [<c016d194>] (register_lock_class) from [<c0171294>] (__lock_acquire+0x80/0x10f0)
      [   46.211797]  r10:c0e25d24
      [   46.214257]  r9:edf53e70
      [   46.216813]  r8:ee9da024
      [   46.217880]  r7:c1642444
      [   46.220435]  r6:edcd1800
      [   46.221502]  r5:60000193
      [   46.224057]  r4:00000000
      
      [   46.227953] [<c0171214>] (__lock_acquire) from [<c01726c0>] (lock_acquire+0x74/0x94)
      [   46.235710]  r10:00000001
      [   46.238169]  r9:edf53e70
      [   46.240723]  r8:edf53f80
      [   46.241790]  r7:00000001
      [   46.244344]  r6:00000001
      [   46.245412]  r5:60000193
      [   46.247966]  r4:00000000
      
      [   46.251866] [<c017264c>] (lock_acquire) from [<c096c8fc>] (_raw_spin_lock_irqsave+0x40/0x54)
      [   46.260319]  r7:ee1c6a00
      [   46.262691]  r6:c062a570
      [   46.265247]  r5:20000113
      [   46.266314]  r4:ee9da014
      
      [   46.270393] [<c096c8bc>] (_raw_spin_lock_irqsave) from [<c062a570>] (ci_port_test_show+0x2c/0x70)
      [   46.279280]  r6:eebd2000
      [   46.281652]  r5:ee9da010
      [   46.284207]  r4:ee9da014
      
      [   46.286810] [<c062a544>] (ci_port_test_show) from [<c0248d04>] (seq_read+0x1ac/0x4f8)
      [   46.294655]  r9:edf53e70
      [   46.297028]  r8:edf53f80
      [   46.299583]  r7:ee1c6a00
      [   46.300650]  r6:00000001
      [   46.303205]  r5:00000000
      [   46.304273]  r4:eebd2000
      [   46.306850] [<c0248b58>] (seq_read) from [<c039e864>] (full_proxy_read+0x54/0x6c)
      [   46.314348]  r10:00000000
      [   46.316808]  r9:c0a6ad30
      [   46.319363]  r8:edf53f80
      [   46.320430]  r7:00020000
      [   46.322986]  r6:b6de3000
      [   46.324053]  r5:ee1c6a00
      [   46.326607]  r4:c0248b58
      
      [   46.330505] [<c039e810>] (full_proxy_read) from [<c021ec98>] (__vfs_read+0x34/0x118)
      [   46.338262]  r9:edf52000
      [   46.340635]  r8:c0107fc4
      [   46.343190]  r7:00020000
      [   46.344257]  r6:edf53f80
      [   46.346812]  r5:c039e810
      [   46.347879]  r4:ee1c6a00
      [   46.350447] [<c021ec64>] (__vfs_read) from [<c021fbd0>] (vfs_read+0x8c/0x11c)
      [   46.357597]  r9:edf52000
      [   46.359969]  r8:c0107fc4
      [   46.362524]  r7:edf53f80
      [   46.363592]  r6:b6de3000
      [   46.366147]  r5:ee1c6a00
      [   46.367214]  r4:00020000
      [   46.369782] [<c021fb44>] (vfs_read) from [<c0220a4c>] (SyS_read+0x4c/0xa8)
      [   46.376672]  r8:c0107fc4
      [   46.379045]  r7:00020000
      [   46.381600]  r6:b6de3000
      [   46.382667]  r5:ee1c6a00
      [   46.385222]  r4:ee1c6a00
      
      [   46.387817] [<c0220a00>] (SyS_read) from [<c0107e20>] (ret_fast_syscall+0x0/0x1c)
      [   46.395314]  r7:00000003
      [   46.397687]  r6:b6de3000
      [   46.400243]  r5:00020000
      [   46.401310]  r4:00020000
      
      Fixes: 26c696c6 ("USB: Chipidea: rename struct ci13xxx variables from udc to ci")
      Signed-off-by: default avatarPeter Chen <peter.chen@nxp.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      11a35448
    • Radim Krčmář's avatar
      KVM: x86: drop error recovery in em_jmp_far and em_ret_far · 53093fc1
      Radim Krčmář authored
      commit 2117d539 upstream.
      
      em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
      bit mode, but syzkaller proved otherwise (and SDM agrees).
      Code segment was restored upon failure, but it was left uninitialized
      outside of long mode, which could lead to a leak of host kernel stack.
      We could have fixed that by always saving and restoring the CS, but we
      take a simpler approach and just break any guest that manages to fail
      as the error recovery is error-prone and modern CPUs don't need emulator
      for this.
      
      Found by syzkaller:
      
        WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
        Kernel panic - not syncing: panic_on_warn set ...
      
        CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
         [...]
        Call Trace:
         [...] __dump_stack lib/dump_stack.c:15
         [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
         [...] panic+0x1b7/0x3a3 kernel/panic.c:179
         [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
         [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
         [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
         [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
         [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
         [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
         [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
         [...] complete_emulated_io arch/x86/kvm/x86.c:6870
         [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
         [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
         [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
         [...] vfs_ioctl fs/ioctl.c:43
         [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
         [...] SYSC_ioctl fs/ioctl.c:694
         [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
         [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: d1442d85 ("KVM: x86: Handle errors when RIP is set during far jumps")
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      53093fc1
  2. 01 Dec, 2016 1 commit
  3. 28 Nov, 2016 26 commits
    • Peter Hurley's avatar
      tty: audit: Fix audit source · ef99a35d
      Peter Hurley authored
      commit 6b2a3d62 upstream.
      
      The data to audit/record is in the 'from' buffer (ie., the input
      read buffer).
      
      Fixes: 72586c60 ("n_tty: Fix auditing support for cannonical mode")
      Cc: Miloslav Trmač <mitr@redhat.com>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Acked-by: default avatarLaura Abbott <labbott@fedoraproject.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ef99a35d
    • Vitaly Kuznetsov's avatar
      kernel/panic.c: turn off locks debug before releasing console lock · fd81c458
      Vitaly Kuznetsov authored
      commit 7625b3a0 upstream.
      
      Commit 08d78658 ("panic: release stale console lock to always get the
      logbuf printed out") introduced an unwanted bad unlock balance report when
      panic() is called directly and not from OOPS (e.g.  from out_of_memory()).
      The difference is that in case of OOPS we disable locks debug in
      oops_enter() and on direct panic call nobody does that.
      
      Fixes: 08d78658 ("panic: release stale console lock to always get the logbuf printed out")
      Reported-by: default avatarkernel test robot <ying.huang@linux.intel.com>
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fd81c458
    • Brian Norris's avatar
      mtd: blkdevs: fix potential deadlock + lockdep warnings · 727cf403
      Brian Norris authored
      commit f3c63795 upstream.
      
      Commit 073db4a5 ("mtd: fix: avoid race condition when accessing
      mtd->usecount") fixed a race condition but due to poor ordering of the
      mutex acquisition, introduced a potential deadlock.
      
      The deadlock can occur, for example, when rmmod'ing the m25p80 module, which
      will delete one or more MTDs, along with any corresponding mtdblock
      devices. This could potentially race with an acquisition of the block
      device as follows.
      
       -> blktrans_open()
          ->  mutex_lock(&dev->lock);
          ->  mutex_lock(&mtd_table_mutex);
      
       -> del_mtd_device()
          ->  mutex_lock(&mtd_table_mutex);
          ->  blktrans_notify_remove() -> del_mtd_blktrans_dev()
             ->  mutex_lock(&dev->lock);
      
      This is a classic (potential) ABBA deadlock, which can be fixed by
      making the A->B ordering consistent everywhere. There was no real
      purpose to the ordering in the original patch, AFAIR, so this shouldn't
      be a problem. This ordering was actually already present in
      del_mtd_blktrans_dev(), for one, where the function tried to ensure that
      its caller already held mtd_table_mutex before it acquired &dev->lock:
      
              if (mutex_trylock(&mtd_table_mutex)) {
                      mutex_unlock(&mtd_table_mutex);
                      BUG();
              }
      
      So, reverse the ordering of acquisition of &dev->lock and &mtd_table_mutex so
      we always acquire mtd_table_mutex first.
      
      Snippets of the lockdep output follow:
      
        # modprobe -r m25p80
        [   53.419251]
        [   53.420838] ======================================================
        [   53.427300] [ INFO: possible circular locking dependency detected ]
        [   53.433865] 4.3.0-rc6 #96 Not tainted
        [   53.437686] -------------------------------------------------------
        [   53.444220] modprobe/372 is trying to acquire lock:
        [   53.449320]  (&new->lock){+.+...}, at: [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
        [   53.457271]
        [   53.457271] but task is already holding lock:
        [   53.463372]  (mtd_table_mutex){+.+.+.}, at: [<c0439994>] del_mtd_device+0x18/0x100
        [   53.471321]
        [   53.471321] which lock already depends on the new lock.
        [   53.471321]
        [   53.479856]
        [   53.479856] the existing dependency chain (in reverse order) is:
        [   53.487660]
        -> #1 (mtd_table_mutex){+.+.+.}:
        [   53.492331]        [<c043fc5c>] blktrans_open+0x34/0x1a4
        [   53.497879]        [<c01afce0>] __blkdev_get+0xc4/0x3b0
        [   53.503364]        [<c01b0bb8>] blkdev_get+0x108/0x320
        [   53.508743]        [<c01713c0>] do_dentry_open+0x218/0x314
        [   53.514496]        [<c0180454>] path_openat+0x4c0/0xf9c
        [   53.519959]        [<c0182044>] do_filp_open+0x5c/0xc0
        [   53.525336]        [<c0172758>] do_sys_open+0xfc/0x1cc
        [   53.530716]        [<c000f740>] ret_fast_syscall+0x0/0x1c
        [   53.536375]
        -> #0 (&new->lock){+.+...}:
        [   53.540587]        [<c063f124>] mutex_lock_nested+0x38/0x3cc
        [   53.546504]        [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
        [   53.552606]        [<c043f164>] blktrans_notify_remove+0x7c/0x84
        [   53.558891]        [<c04399f0>] del_mtd_device+0x74/0x100
        [   53.564544]        [<c043c670>] del_mtd_partitions+0x80/0xc8
        [   53.570451]        [<c0439aa0>] mtd_device_unregister+0x24/0x48
        [   53.576637]        [<c046ce6c>] spi_drv_remove+0x1c/0x34
        [   53.582207]        [<c03de0f0>] __device_release_driver+0x88/0x114
        [   53.588663]        [<c03de19c>] device_release_driver+0x20/0x2c
        [   53.594843]        [<c03dd9e8>] bus_remove_device+0xd8/0x108
        [   53.600748]        [<c03dacc0>] device_del+0x10c/0x210
        [   53.606127]        [<c03dadd0>] device_unregister+0xc/0x20
        [   53.611849]        [<c046d878>] __unregister+0x10/0x20
        [   53.617211]        [<c03da868>] device_for_each_child+0x50/0x7c
        [   53.623387]        [<c046eae8>] spi_unregister_master+0x58/0x8c
        [   53.629578]        [<c03e12f0>] release_nodes+0x15c/0x1c8
        [   53.635223]        [<c03de0f8>] __device_release_driver+0x90/0x114
        [   53.641689]        [<c03de900>] driver_detach+0xb4/0xb8
        [   53.647147]        [<c03ddc78>] bus_remove_driver+0x4c/0xa0
        [   53.652970]        [<c00cab50>] SyS_delete_module+0x11c/0x1e4
        [   53.658976]        [<c000f740>] ret_fast_syscall+0x0/0x1c
        [   53.664621]
        [   53.664621] other info that might help us debug this:
        [   53.664621]
        [   53.672979]  Possible unsafe locking scenario:
        [   53.672979]
        [   53.679169]        CPU0                    CPU1
        [   53.683900]        ----                    ----
        [   53.688633]   lock(mtd_table_mutex);
        [   53.692383]                                lock(&new->lock);
        [   53.698306]                                lock(mtd_table_mutex);
        [   53.704658]   lock(&new->lock);
        [   53.707946]
        [   53.707946]  *** DEADLOCK ***
      
      Fixes: 073db4a5 ("mtd: fix: avoid race condition when accessing mtd->usecount")
      Reported-by: default avatarFelipe Balbi <balbi@ti.com>
      Tested-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      727cf403
    • Cyrille Pitchen's avatar
      i2c: at91: fix write transfers by clearing pending interrupt first · 087ff3d1
      Cyrille Pitchen authored
      commit 6f6ddbb0 upstream.
      
      In some cases a NACK interrupt may be pending in the Status Register (SR)
      as a result of a previous transfer. However at91_do_twi_transfer() did not
      read the SR to clear pending interruptions before starting a new transfer.
      Hence a NACK interrupt rose as soon as it was enabled again at the I2C
      controller level, resulting in a wrong sequence of operations and strange
      patterns of behaviour on the I2C bus, such as a clock stretch followed by
      a restart of the transfer.
      
      This first issue occurred with both DMA and PIO write transfers.
      
      Also when a NACK error was detected during a PIO write transfer, the
      interrupt handler used to wrongly start a new transfer by writing into the
      Transmit Holding Register (THR). Then the I2C slave was likely to reply
      with a second NACK.
      
      This second issue is fixed in atmel_twi_interrupt() by handling the TXRDY
      status bit only if both the TXCOMP and NACK status bits are cleared.
      
      Tested with a at24 eeprom on sama5d36ek board running a linux-4.1-at91
      kernel image. Adapted to linux-next.
      Reported-by: default avatarPeter Rosin <peda@lysator.liu.se>
      Signed-off-by: default avatarCyrille Pitchen <cyrille.pitchen@atmel.com>
      Signed-off-by: default avatarLudovic Desroches <ludovic.desroches@atmel.com>
      Tested-by: default avatarPeter Rosin <peda@lysator.liu.se>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Fixes: 93563a6a ("i2c: at91: fix a race condition when using the DMA controller")
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      087ff3d1
    • Alex Williamson's avatar
      PCI: Use function 0 VPD for identical functions, regular VPD for others · 7d1e51af
      Alex Williamson authored
      commit da2d03ea upstream.
      
      932c435c ("PCI: Add dev_flags bit to access VPD through function 0")
      added PCI_DEV_FLAGS_VPD_REF_F0.  Previously, we set the flag on every
      non-zero function of quirked devices.  If a function turned out to be
      different from function 0, i.e., it had a different class, vendor ID, or
      device ID, the flag remained set but we didn't make VPD accessible at all.
      
      Flip this around so we only set PCI_DEV_FLAGS_VPD_REF_F0 for functions that
      are identical to function 0, and allow regular VPD access for any other
      functions.
      
      [bhelgaas: changelog, stable tag]
      Fixes: 932c435c ("PCI: Add dev_flags bit to access VPD through function 0")
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <helgaas@kernel.org>
      Acked-by: default avatarMyron Stowe <myron.stowe@redhat.com>
      Acked-by: default avatarMark Rustad <mark.d.rustad@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7d1e51af
    • Alex Williamson's avatar
      PCI: Fix devfn for VPD access through function 0 · eabe51b6
      Alex Williamson authored
      commit 9d924075 upstream.
      
      Commit 932c435c ("PCI: Add dev_flags bit to access VPD through function
      0") passes PCI_SLOT(devfn) for the devfn parameter of pci_get_slot().
      Generally this works because we're fairly well guaranteed that a PCIe
      device is at slot address 0, but for the general case, including
      conventional PCI, it's incorrect.  We need to get the slot and then convert
      it back into a devfn.
      
      Fixes: 932c435c ("PCI: Add dev_flags bit to access VPD through function 0")
      Signed-off-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <helgaas@kernel.org>
      Acked-by: default avatarMyron Stowe <myron.stowe@redhat.com>
      Acked-by: default avatarMark Rustad <mark.d.rustad@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      eabe51b6
    • Jisheng Zhang's avatar
      x86/idle: Restore trace_cpu_idle to mwait_idle() calls · 15034b96
      Jisheng Zhang authored
      commit e43d0189 upstream.
      
      Commit b253149b ("sched/idle/x86: Restore mwait_idle() to fix boot
      hangs, to improve power savings and to improve performance") restores
      mwait_idle(), but the trace_cpu_idle related calls are missing. This
      causes powertop on my old desktop powered by Intel Core2 E6550 to
      report zero wakeups and zero events.
      
      Add them back to restore the proper behaviour.
      
      Fixes: b253149b ("sched/idle/x86: Restore mwait_idle() to ...")
      Signed-off-by: default avatarJisheng Zhang <jszhang@marvell.com>
      Cc: <len.brown@intel.com>
      Link: http://lkml.kernel.org/r/1440046479-4262-1-git-send-email-jszhang@marvell.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      15034b96
    • Jiri Slaby's avatar
      Linux 3.12.68 · f19260ac
      Jiri Slaby authored
      f19260ac
    • Takashi Iwai's avatar
      ALSA: usb-audio: Fix runtime PM unbalance · dbb41290
      Takashi Iwai authored
      commit 9003ebb1 upstream.
      
      The fix for deadlock in PM in commit [1ee23fe0: ALSA: usb-audio:
      Fix deadlocks at resuming] introduced a new check of in_pm flag.
      However, the brainless patch author evaluated it in a wrong way
      (logical AND instead of logical OR), thus usb_autopm_get_interface()
      is wrongly called at probing, leading to unbalance of runtime PM
      refcount.
      
      This patch fixes it by correcting the logic.
      Reported-by: default avatarHans Yang <hansy@nvidia.com>
      Fixes: 1ee23fe0 ('ALSA: usb-audio: Fix deadlocks at resuming')
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      dbb41290
    • Ben Hutchings's avatar
      xen-pciback: Add name prefix to global 'permissive' variable · 6417d6f2
      Ben Hutchings authored
      commit 8014bcc8 upstream.
      
      The variable for the 'permissive' module parameter used to be static
      but was recently changed to be extern.  This puts it in the kernel
      global namespace if the driver is built-in, so its name should begin
      with a prefix identifying the driver.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Fixes: af6fc858 ("xen-pciback: limit guest control of command register")
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6417d6f2
    • Myron Stowe's avatar
      PCI: Handle read-only BARs on AMD CS553x devices · f183980f
      Myron Stowe authored
      commit 06cf35f9 upstream.
      
      Some AMD CS553x devices have read-only BARs because of a firmware or
      hardware defect.  There's a workaround in quirk_cs5536_vsa(), but it no
      longer works after 36e81648 ("PCI: Restore detection of read-only
      BARs").  Prior to 36e81648, we filled in res->start; afterwards we
      leave it zeroed out.  The quirk only updated the size, so the driver tried
      to use a region starting at zero, which didn't work.
      
      Expand quirk_cs5536_vsa() to read the base addresses from the BARs and
      hard-code the sizes.
      
      On Nix's system BAR 2's read-only value is 0x6200.  Prior to 36e81648,
      we interpret that as a 512-byte BAR based on the lowest-order bit set.  Per
      datasheet sec 5.6.1, that BAR (MFGPT) requires only 64 bytes; use that to
      avoid clearing any address bits if a platform uses only 64-byte alignment.
      
      [js] pcibios_bus_to_resource takes pdev, not bus in 3.12
      
      [bhelgaas: changelog, reduce BAR 2 size to 64]
      Fixes: 36e81648 ("PCI: Restore detection of read-only BARs")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=85991#c4
      Link: http://support.amd.com/TechDocs/31506_cs5535_databook.pdf
      Link: http://support.amd.com/TechDocs/33238G_cs5536_db.pdfReported-and-tested-by: default avatarNix <nix@esperi.org.uk>
      Signed-off-by: default avatarMyron Stowe <myron.stowe@redhat.com>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f183980f
    • Peter Zijlstra's avatar
      perf: Tighten (and fix) the grouping condition · 5e08a111
      Peter Zijlstra authored
      commit c3c87e77 upstream.
      
      The fix from 9fc81d87 ("perf: Fix events installation during
      moving group") was incomplete in that it failed to recognise that
      creating a group with events for different CPUs is semantically
      broken -- they cannot be co-scheduled.
      
      Furthermore, it leads to real breakage where, when we create an event
      for CPU Y and then migrate it to form a group on CPU X, the code gets
      confused where the counter is programmed -- triggered in practice
      as well by me via the perf fuzzer.
      
      Fix this by tightening the rules for creating groups. Only allow
      grouping of counters that can be co-scheduled in the same context.
      This means for the same task and/or the same cpu.
      
      Fixes: 9fc81d87 ("perf: Fix events installation during moving group")
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5e08a111
    • Sebastian Andrzej Siewior's avatar
      usb: musb: musb_cppi41: recognize HS devices in hostmode · 1af405b1
      Sebastian Andrzej Siewior authored
      commit 1eec34e9 upstream.
      
      There is a poll loop for max 25us for HS devices. Now guess what, I
      tested it in gadget mode and forgot about the little detail. Nobody seem
      to have it noticed…
      This patch adds the missing logic for hostmode so it is recognized in
      host and device mode properly.
      
      Fixes: 50aea6fc ("usb: musb: cppi41: fire hrtimer according to
      programmed channel length")
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1af405b1
    • Ben Hutchings's avatar
      drivers/net: Disable UFO through virtio · a8767147
      Ben Hutchings authored
      commit 3d0ad094 upstream.
      
      IPv6 does not allow fragmentation by routers, so there is no
      fragmentation ID in the fixed header.  UFO for IPv6 requires the ID to
      be passed separately, but there is no provision for this in the virtio
      net protocol.
      
      Until recently our software implementation of UFO/IPv6 generated a new
      ID, but this was a bug.  Now we will use ID=0 for any UFO/IPv6 packet
      passed through a tap, which is even worse.
      
      Unfortunately there is no distinction between UFO/IPv4 and v6
      features, so disable UFO on taps and virtio_net completely until we
      have a proper solution.
      
      We cannot depend on VM managers respecting the tap feature flags, so
      keep accepting UFO packets but log a warning the first time we do
      this.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Fixes: 916e4cf4 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a8767147
    • Ard Biesheuvel's avatar
      KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn() · 79024d0f
      Ard Biesheuvel authored
      commit 85c8555f upstream.
      
      Read-only memory ranges may be backed by the zero page, so avoid
      misidentifying it a a MMIO pfn.
      
      This fixes another issue I identified when testing QEMU+KVM_UEFI, where
      a read to an uninitialized emulated NOR flash brought in the zero page,
      but mapped as a read-write device region, because kvm_is_mmio_pfn()
      misidentifies it as a MMIO pfn due to its PG_reserved bit being set.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Fixes: b8865767 ("ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping")
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      79024d0f
    • Ard Biesheuvel's avatar
      mm: export symbol dependencies of is_zero_pfn() · 571a8e0b
      Ard Biesheuvel authored
      commit 0b70068e upstream.
      
      In order to make the static inline function is_zero_pfn() callable by
      modules, export its symbol dependencies 'zero_pfn' and (for s390 and
      mips) 'zero_page_mask'.
      
      We need this for KVM, as CONFIG_KVM is a tristate for all supported
      architectures except ARM and arm64, and testing a pfn whether it refers
      to the zero page is required to correctly distinguish the zero page
      from other special RAM ranges that may also have the PG_reserved bit
      set, but need to be treated as MMIO memory.
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      571a8e0b
    • Johannes Weiner's avatar
      mm: filemap: update find_get_pages_tag() to deal with shadow entries · 820c7391
      Johannes Weiner authored
      commit 139b6a6f upstream.
      
      Dave Jones reports the following crash when find_get_pages_tag() runs
      into an exceptional entry:
      
        kernel BUG at mm/filemap.c:1347!
        RIP: find_get_pages_tag+0x1cb/0x220
        Call Trace:
          find_get_pages_tag+0x36/0x220
          pagevec_lookup_tag+0x21/0x30
          filemap_fdatawait_range+0xbe/0x1e0
          filemap_fdatawait+0x27/0x30
          sync_inodes_sb+0x204/0x2a0
          sync_inodes_one_sb+0x19/0x20
          iterate_supers+0xb2/0x110
          sys_sync+0x44/0xb0
          ia32_do_call+0x13/0x13
      
        1343                         /*
        1344                          * This function is never used on a shmem/tmpfs
        1345                          * mapping, so a swap entry won't be found here.
        1346                          */
        1347                         BUG();
      
      After commit 0cd6144a ("mm + fs: prepare for non-page entries in
      page cache radix trees") this comment and BUG() are out of date because
      exceptional entries can now appear in all mappings - as shadows of
      recently evicted pages.
      
      However, as Hugh Dickins notes,
      
        "it is truly surprising for a PAGECACHE_TAG_WRITEBACK (and probably
         any other PAGECACHE_TAG_*) to appear on an exceptional entry.
      
         I expect it comes down to an occasional race in RCU lookup of the
         radix_tree: lacking absolute synchronization, we might sometimes
         catch an exceptional entry, with the tag which really belongs with
         the unexceptional entry which was there an instant before."
      
      And indeed, not only is the tree walk lockless, the tags are also read
      in chunks, one radix tree node at a time.  There is plenty of time for
      page reclaim to swoop in and replace a page that was already looked up
      as tagged with a shadow entry.
      
      Remove the BUG() and update the comment.  While reviewing all other
      lookup sites for whether they properly deal with shadow entries of
      evicted pages, update all the comments and fix memcg file charge moving
      to not miss shmem/tmpfs swapcache pages.
      
      Fixes: 0cd6144a ("mm + fs: prepare for non-page entries in page cache radix trees")
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      820c7391
    • David S. Miller's avatar
      sparc64: Handle extremely large kernel TLB range flushes more gracefully. · 58c190c8
      David S. Miller authored
      [ Upstream commit a74ad5e6 ]
      
      When the vmalloc area gets fragmented, and because the firmware
      mapping area sits between where modules live and the vmalloc area, we
      can sometimes receive requests for enormous kernel TLB range flushes.
      
      When this happens the cpu just spins flushing billions of pages and
      this triggers the NMI watchdog and other problems.
      
      We took care of this on the TSB side by doing a linear scan of the
      table once we pass a certain threshold.
      
      Do something similar for the TLB flush, however we are limited by
      the TLB flush facilities provided by the different chip variants.
      
      First of all we use an (mostly arbitrary) cut-off of 256K which is
      about 32 pages.  This can be tuned in the future.
      
      The huge range code path for each chip works as follows:
      
      1) On spitfire we flush all non-locked TLB entries using diagnostic
         acceses.
      
      2) On cheetah we use the "flush all" TLB flush.
      
      3) On sun4v/hypervisor we do a TLB context flush on context 0, which
         unlike previous chips does not remove "permanent" or locked
         entries.
      
      We could probably do something better on spitfire, such as limiting
      the flush to kernel TLB entries or even doing range comparisons.
      However that probably isn't worth it since those chips are old and
      the TLB only had 64 entries.
      Reported-by: default avatarJames Clarke <jrtc27@jrtc27.com>
      Tested-by: default avatarJames Clarke <jrtc27@jrtc27.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      58c190c8
    • David S. Miller's avatar
      sparc64: Fix illegal relative branches in hypervisor patched TLB cross-call code. · 5d43827d
      David S. Miller authored
      [ Upstream commit a236441b ]
      
      Just like the non-cross-call TLB flush handlers, the cross-call ones need
      to avoid doing PC-relative branches outside of their code blocks.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5d43827d
    • David S. Miller's avatar
      sparc64: Fix instruction count in comment for __hypervisor_flush_tlb_pending. · 3febdf4c
      David S. Miller authored
      [ Upstream commit 830cda3f ]
      
      Noticed by James Clarke.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3febdf4c
    • David S. Miller's avatar
      sparc64: Fix illegal relative branches in hypervisor patched TLB code. · 6246428d
      David S. Miller authored
      [ Upstream commit b429ae4d ]
      
      When we copy code over to patch another piece of code, we can only use
      PC-relative branches that target code within that piece of code.
      
      Such PC-relative branches cannot be made to external symbols because
      the patch moves the location of the code and thus modifies the
      relative address of external symbols.
      
      Use an absolute jmpl to fix this problem.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6246428d
    • David S. Miller's avatar
      sparc64: Handle extremely large kernel TSB range flushes sanely. · fbc1defa
      David S. Miller authored
      [ Upstream commit 849c4987 ]
      
      If the number of pages we are flushing is more than twice the number
      of entries in the TSB, just scan the TSB table for matches rather
      than probing each and every page in the range.
      
      Based upon a patch and report by James Clarke.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fbc1defa
    • James Clarke's avatar
      sparc: Handle negative offsets in arch_jump_label_transform · 6d6262c5
      James Clarke authored
      [ Upstream commit 9d9fa230 ]
      
      Additionally, if the offset will overflow the immediate for a ba,pt
      instruction, fall back on a standard ba to get an extra 3 bits.
      Signed-off-by: default avatarJames Clarke <jrtc27@jrtc27.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6d6262c5
    • Mike Kravetz's avatar
      sparc64 mm: Fix base TSB sizing when hugetlb pages are used · 2f9cb540
      Mike Kravetz authored
      [ Upstream commit af1b1a9b ]
      
      do_sparc64_fault() calculates both the base and huge page RSS sizes and
      uses this information in calls to tsb_grow().  The calculation for base
      page TSB size is not correct if the task uses hugetlb pages.  hugetlb
      pages are not accounted for in RSS, therefore the call to get_mm_rss(mm)
      does not include hugetlb pages.  However, the number of pages based on
      huge_pte_count (which does include hugetlb pages) is subtracted from
      this value.  This will result in an artificially small and often negative
      RSS calculation.  The base TSB size is then often set to max_tsb_size
      as the passed RSS is unsigned, so a negative value looks really big.
      
      THP pages are also accounted for in huge_pte_count, and THP pages are
      accounted for in RSS so the calculation in do_sparc64_fault() is correct
      if a task only uses THP pages.
      
      A single huge_pte_count is not sufficient for TSB sizing if both hugetlb
      and THP pages can be used.  Instead of a single counter, use two:  one
      for hugetlb and one for THP.
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2f9cb540
    • David S. Miller's avatar
      sparc: Don't leak context bits into thread->fault_address · 2d5cba50
      David S. Miller authored
      [ Upstream commit 4f6deb8c ]
      
      On pre-Niagara systems, we fetch the fault address on data TLB
      exceptions from the TLB_TAG_ACCESS register.  But this register also
      contains the context ID assosciated with the fault in the low 13 bits
      of the register value.
      
      This propagates into current_thread_info()->fault_address and can
      cause trouble later on.
      
      So clear the low 13-bits out of the TLB_TAG_ACCESS value in the cases
      where it matters.
      Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2d5cba50
    • Eric Dumazet's avatar
      tcp: take care of truncations done by sk_filter() · 9edbcfdc
      Eric Dumazet authored
      [ Upstream commit ac6e7800 ]
      
      With syzkaller help, Marco Grassi found a bug in TCP stack,
      crashing in tcp_collapse()
      
      Root cause is that sk_filter() can truncate the incoming skb,
      but TCP stack was not really expecting this to happen.
      It probably was expecting a simple DROP or ACCEPT behavior.
      
      We first need to make sure no part of TCP header could be removed.
      Then we need to adjust TCP_SKB_CB(skb)->end_seq
      
      Many thanks to syzkaller team and Marco for giving us a reproducer.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMarco Grassi <marco.gra@gmail.com>
      Reported-by: default avatarVladis Dronov <vdronov@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9edbcfdc