1. 21 Nov, 2014 40 commits
    • Cong Wang's avatar
      freezer: Do not freeze tasks killed by OOM killer · 5f422bea
      Cong Wang authored
      Since f660daac (oom: thaw threads if oom killed thread is frozen
      before deferring) OOM killer relies on being able to thaw a frozen task
      to handle OOM situation but a3201227 (freezer: make freezing() test
      freeze conditions in effect instead of TIF_FREEZE) has reorganized the
      code and stopped clearing freeze flag in __thaw_task. This means that
      the target task only wakes up and goes into the fridge again because the
      freezing condition hasn't changed for it. This reintroduces the bug
      fixed by f660daac.
      
      Fix the issue by checking for TIF_MEMDIE thread flag in
      freezing_slow_path and exclude the task from freezing completely. If a
      task was already frozen it would get woken by __thaw_task from OOM killer
      and get out of freezer after rechecking freezing().
      
      Changes since v1
      - put TIF_MEMDIE check into freezing_slowpath rather than in __refrigerator
        as per Oleg
      - return __thaw_task into oom_scan_process_thread because
        oom_kill_process will not wake task in the fridge because it is
        sleeping uninterruptible
      
      [mhocko@suse.cz: rewrote the changelog]
      Fixes: a3201227 (freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE)
      Cc: 3.3+ <stable@vger.kernel.org> # 3.3+
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      
      (cherry picked from commit 51fae6da)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5f422bea
    • Krzysztof Kozlowski's avatar
      power: charger-manager: Fix NULL pointer exception with missing cm-fuel-gauge · 80b4e30f
      Krzysztof Kozlowski authored
      NULL pointer exception happens during charger-manager probe if
      'cm-fuel-gauge' property is not present.
      
      [    2.448536] Unable to handle kernel NULL pointer dereference at virtual address 00000000
      [    2.456572] pgd = c0004000
      [    2.459217] [00000000] *pgd=00000000
      [    2.462759] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
      [    2.468047] Modules linked in:
      [    2.471089] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-rc6-00251-ge44cf96cd525-dirty #969
      [    2.479765] task: ea890000 ti: ea87a000 task.ti: ea87a000
      [    2.485161] PC is at strcmp+0x4/0x30
      [    2.488719] LR is at power_supply_match_device_by_name+0x10/0x1c
      [    2.494695] pc : [<c01f4220>]    lr : [<c030fe38>]    psr: a0000113
      [    2.494695] sp : ea87bde0  ip : 00000000  fp : eaa97010
      [    2.506150] r10: 00000004  r9 : ea97269c  r8 : ea3bbfd0
      [    2.511360] r7 : eaa97000  r6 : c030fe28  r5 : 00000000  r4 : ea3b0000
      [    2.517869] r3 : 0000006d  r2 : 00000000  r1 : 00000000  r0 : c057c195
      [    2.524381] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
      [    2.531671] Control: 10c5387d  Table: 4000404a  DAC: 00000015
      [    2.537399] Process swapper/0 (pid: 1, stack limit = 0xea87a240)
      [    2.543388] Stack: (0xea87bde0 to 0xea87c000)
      [    2.547733] bde0: ea3b0210 c026b1c8 eaa97010 eaa97000 eaa97010 eabb60a8 ea3b0210 00000000
      [    2.555891] be00: 00000008 ea2db210 ea1a3410 c030fee0 ea3bbf90 c03138fc c068969c c013526c
      [    2.564050] be20: eaa040c0 00000000 c068969c 00000000 eaa040c0 ea2da300 00000002 00000000
      [    2.572208] be40: 00000001 ea2da3c0 00000000 00000001 00000000 eaa97010 c068969c 00000000
      [    2.580367] be60: 00000000 c068969c 00000000 00000002 00000000 c026b71c c026b6f0 eaa97010
      [    2.588527] be80: c0e82530 c026a330 00000000 eaa97010 c068969c eaa97044 00000000 c061df50
      [    2.596686] bea0: ea87a000 c026a4dc 00000000 c068969c c026a448 c0268b5c ea8054a8 eaa8fd50
      [    2.604845] bec0: c068969c ea2db180 c06801f8 c0269b18 c0590f68 c068969c c0656c98 c068969c
      [    2.613004] bee0: c0656c98 ea3bbe40 c06988c0 c026aaf0 00000000 c0656c98 c0656c98 c00088a4
      [    2.621163] bf00: 00000000 c0055f48 00000000 00000004 00000000 ea890000 c05dbc54 c062c178
      [    2.629323] bf20: c0603518 c005f674 00000001 ea87a000 eb7ff83b c0476440 00000091 c003d41c
      [    2.637482] bf40: c05db344 00000007 eb7ff858 00000007 c065a76c c0647d24 00000007 c062c170
      [    2.645642] bf60: c06988c0 00000091 c062c178 c0603518 00000000 c0603cc4 00000007 00000007
      [    2.653801] bf80: c0603518 c0c0c0c0 00000000 c0453948 00000000 00000000 00000000 00000000
      [    2.661959] bfa0: 00000000 c0453950 00000000 c000e728 00000000 00000000 00000000 00000000
      [    2.670118] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    2.678277] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 c0c0c0c0 c0c0c0c0
      [    2.686454] [<c01f4220>] (strcmp) from [<c030fe38>] (power_supply_match_device_by_name+0x10/0x1c)
      [    2.695303] [<c030fe38>] (power_supply_match_device_by_name) from [<c026b1c8>] (class_find_device+0x54/0xac)
      [    2.705106] [<c026b1c8>] (class_find_device) from [<c030fee0>] (power_supply_get_by_name+0x1c/0x30)
      [    2.714137] [<c030fee0>] (power_supply_get_by_name) from [<c03138fc>] (charger_manager_probe+0x3d8/0xe58)
      [    2.723683] [<c03138fc>] (charger_manager_probe) from [<c026b71c>] (platform_drv_probe+0x2c/0x5c)
      [    2.732532] [<c026b71c>] (platform_drv_probe) from [<c026a330>] (driver_probe_device+0x10c/0x224)
      [    2.741384] [<c026a330>] (driver_probe_device) from [<c026a4dc>] (__driver_attach+0x94/0x98)
      [    2.749813] [<c026a4dc>] (__driver_attach) from [<c0268b5c>] (bus_for_each_dev+0x54/0x88)
      [    2.757969] [<c0268b5c>] (bus_for_each_dev) from [<c0269b18>] (bus_add_driver+0xd4/0x1d0)
      [    2.766123] [<c0269b18>] (bus_add_driver) from [<c026aaf0>] (driver_register+0x78/0xf4)
      [    2.774110] [<c026aaf0>] (driver_register) from [<c00088a4>] (do_one_initcall+0x80/0x1bc)
      [    2.782276] [<c00088a4>] (do_one_initcall) from [<c0603cc4>] (kernel_init_freeable+0x100/0x1cc)
      [    2.790952] [<c0603cc4>] (kernel_init_freeable) from [<c0453950>] (kernel_init+0x8/0xec)
      [    2.799029] [<c0453950>] (kernel_init) from [<c000e728>] (ret_from_fork+0x14/0x2c)
      [    2.806572] Code: e12fff1e e1a03000 eafffff7 e4d03001 (e4d12001)
      [    2.812832] ---[ end trace 7f12556111b9e7ef ]---
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Cc: <stable@vger.kernel.org>
      Fixes: 856ee611 ("charger-manager: Support deivce tree in charger manager driver")
      Signed-off-by: default avatarSebastian Reichel <sre@kernel.org>
      
      (cherry picked from commit 661a8886)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      80b4e30f
    • Daniel Borkmann's avatar
      random: add and use memzero_explicit() for clearing data · f144b26f
      Daniel Borkmann authored
      zatimend has reported that in his environment (3.16/gcc4.8.3/corei7)
      memset() calls which clear out sensitive data in extract_{buf,entropy,
      entropy_user}() in random driver are being optimized away by gcc.
      
      Add a helper memzero_explicit() (similarly as explicit_bzero() variants)
      that can be used in such cases where a variable with sensitive data is
      being cleared out in the end. Other use cases might also be in crypto
      code. [ I have put this into lib/string.c though, as it's always built-in
      and doesn't need any dependencies then. ]
      
      Fixes kernel bugzilla: 82041
      
      Reported-by: zatimend@hotmail.co.uk
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit d4c5efdb)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f144b26f
    • Artem Bityutskiy's avatar
      UBIFS: fix a race condition · 655f8f28
      Artem Bityutskiy authored
      Hu (hujianyang@huawei.com) discovered a race condition which may lead to a
      situation when UBIFS is unable to mount the file-system after an unclean
      reboot. The problem is theoretical, though.
      
      In UBIFS, we have the log, which basically a set of LEBs in a certain area. The
      log has the tail and the head.
      
      Every time user writes data to the file-system, the UBIFS journal grows, and
      the log grows as well, because we append new reference nodes to the head of the
      log. So the head moves forward all the time, while the log tail stays at the
      same position.
      
      At any time, the UBIFS master node points to the tail of the log. When we mount
      the file-system, we scan the log, and we always start from its tail, because
      this is where the master node points to. The only occasion when the tail of the
      log changes is the commit operation.
      
      The commit operation has 2 phases - "commit start" and "commit end". The former
      is relatively short, and does not involve much I/O. During this phase we mostly
      just build various in-memory lists of the things which have to be written to
      the flash media during "commit end" phase.
      
      During the commit start phase, what we do is we "clean" the log. Indeed, the
      commit operation will index all the data in the journal, so the entire journal
      "disappears", and therefore the data in the log become unneeded. So we just
      move the head of the log to the next LEB, and write the CS node there. This LEB
      will be the tail of the new log when the commit operation finishes.
      
      When the "commit start" phase finishes, users may write more data to the
      file-system, in parallel with the ongoing "commit end" operation. At this point
      the log tail was not changed yet, it is the same as it had been before we
      started the commit. The log head keeps moving forward, though.
      
      The commit operation now needs to write the new master node, and the new master
      node should point to the new log tail. After this the LEBs between the old log
      tail and the new log tail can be unmapped and re-used again.
      
      And here is the possible problem. We do 2 operations: (a) We first update the
      log tail position in memory (see 'ubifs_log_end_commit()'). (b) And then we
      write the master node (see the big lock of code in 'do_commit()').
      
      But nothing prevents the log head from moving forward between (a) and (b), and
      the log head may "wrap" now to the old log tail. And when the "wrap" happens,
      the contends of the log tail gets erased. Now a power cut happens and we are in
      trouble. We end up with the old master node pointing to the old tail, which was
      erased. And replay fails because it expects the master node to point to the
      correct log tail at all times.
      
      This patch merges the abovementioned (a) and (b) operations by moving the master
      node change code to the 'ubifs_log_end_commit()' function, so that it runs with
      the log mutex locked, which will prevent the log from being changed benween
      operations (a) and (b).
      
      Cc: stable@vger.kernel.org # 07e19dff UBIFS: remove mst_mutex
      Cc: stable@vger.kernel.org
      Reported-by: default avatarhujianyang <hujianyang@huawei.com>
      Tested-by: default avatarhujianyang <hujianyang@huawei.com>
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      
      (cherry picked from commit 052c2807)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      655f8f28
    • Artem Bityutskiy's avatar
      UBIFS: remove mst_mutex · 6e1d6f45
      Artem Bityutskiy authored
      commit 07e19dff upstream.
      
      The 'mst_mutex' is not needed since because 'ubifs_write_master()' is only
      called on the mount path and commit path. The mount path is sequential and
      there is no parallelism, and the commit path is also serialized - there is only
      one commit going on at a time.
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 855d89e8)
      6e1d6f45
    • Guenter Roeck's avatar
      Revert "percpu: free percpu allocation info for uniprocessor system" · 53f5fa07
      Guenter Roeck authored
      This reverts commit 3189eddb ("percpu: free percpu allocation info for
      uniprocessor system").
      
      The commit causes a hang with a crisv32 image. This may be an architecture
      problem, but at least for now the revert is necessary to be able to boot a
      crisv32 image.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Honggang Li <enjoymindful@gmail.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 3189eddb ("percpu: free percpu allocation info for uniprocessor system")
      Cc: stable@vger.kernel.org # Please don't apply 3189eddb
      
      (cherry picked from commit bb2e226b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      53f5fa07
    • Trond Myklebust's avatar
      SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT · 8ad6d979
      Trond Myklebust authored
      The flag RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT was intended introduced in
      order to allow NFSv4 clients to disable resend timeouts. Since those
      cause the RPC layer to break the connection, they mess up the duplicate
      reply caches that remain indexed on the port number in NFSv4..
      
      This patch includes the code that was missing in the original to
      set the appropriate flag in struct rpc_clnt, when the caller of
      rpc_create() sets RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT.
      
      Fixes: 8a19a0b6 (SUNRPC: Add RPC task and client level options to...)
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      
      (cherry picked from commit 2aca5b86)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8ad6d979
    • bob picco's avatar
      sparc64: sparse irq · 83fa64dc
      bob picco authored
      This patch attempts to do a few things. The highlights are: 1) enable
      SPARSE_IRQ unconditionally, 2) kills off !SPARSE_IRQ code 3) allocates
      ivector_table at boot time and 4) default to cookie only VIRQ mechanism
      for supported firmware. The first firmware with cookie only support for
      me appears on T5. You can optionally force the HV firmware to not cookie
      only mode which is the sysino support.
      
      The sysino is a deprecated HV mechanism according to the most recent
      SPARC Virtual Machine Specification. HV_GRP_INTR is what controls the
      cookie/sysino firmware versioning.
      
      The history of this interface is:
      
      1) Major version 1.0 only supported sysino based interrupt interfaces.
      
      2) Major version 2.0 added cookie based VIRQs, however due to the fact
         that OSs were using the VIRQs without negoatiating major version
         2.0 (Linux and Solaris are both guilty), the VIRQs calls were
         allowed even with major version 1.0
      
         To complicate things even further, the VIRQ interfaces were only
         actually hooked up in the hypervisor for LDC interrupt sources.
         VIRQ calls on other device types would result in HV_EINVAL errors.
      
         So effectively, major version 2.0 is unusable.
      
      3) Major version 3.0 was created to signal use of VIRQs and the fact
         that the hypervisor has these calls hooked up for all interrupt
         sources, not just those for LDC devices.
      
      A new boot option is provided should cookie only HV support have issues.
      hvirq - this is the version for HV_GRP_INTR. This is related to HV API
      versioning.  The code attempts major=3 first by default. The option can
      be used to override this default.
      
      I've tested with SPARSE_IRQ on T5-8, M7-4 and T4-X and Jalap?no.
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit ee6a9333)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      83fa64dc
    • Xiubo Li's avatar
      regmap: fix possible ZERO_SIZE_PTR pointer dereferencing error. · 72cbfc9a
      Xiubo Li authored
      Since we cannot make sure the 'val_count' will always be none zero
      here, and then if it equals to zero, the kmemdup() will return
      ZERO_SIZE_PTR, which equals to ((void *)16).
      
      So this patch fix this with just doing the zero check before calling
      kmemdup().
      Signed-off-by: default avatarXiubo Li <Li.Xiubo@freescale.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit d6b41cb0)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      72cbfc9a
    • Bryan O'Donoghue's avatar
      usb: pch_udc: usb gadget device support for Intel Quark X1000 · e92b4a4f
      Bryan O'Donoghue authored
      This patch is to enable the USB gadget device for Intel Quark X1000
      Signed-off-by: default avatarBryan O'Donoghue <bryan.odonoghue@intel.com>
      Signed-off-by: default avatarBing Niu <bing.niu@intel.com>
      Signed-off-by: default avatarAlvin (Weike) Chen <alvin.chen@intel.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      
      (cherry picked from commit a68df706)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      e92b4a4f
    • Steffen Klassert's avatar
      xfrm: Generate blackhole routes only from route lookup functions · 4089b712
      Steffen Klassert authored
      Currently we genarate a blackhole route route whenever we have
      matching policies but can not resolve the states. Here we assume
      that dst_output() is called to kill the balckholed packets.
      Unfortunately this assumption is not true in all cases, so
      it is possible that these packets leave the system unwanted.
      
      We fix this by generating blackhole routes only from the
      route lookup functions, here we can guarantee a call to
      dst_output() afterwards.
      
      Fixes: 2774c131 ("xfrm: Handle blackhole route creation via afinfo.")
      Reported-by: default avatarKonstantinos Kolelis <k.kolelis@sirrix.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      
      (cherry picked from commit f92ee619)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4089b712
    • Felipe Balbi's avatar
      usb: dwc3: core: fix order of PM runtime calls · 5b3b1aeb
      Felipe Balbi authored
      Currently, we disable pm_runtime before all register
      accesses are done, this is dangerous and might lead
      to abort exceptions due to the driver trying to access
      a register which is clocked by a clock which was long
      gated.
      
      Fix that by moving pm_runtime_put_sync() and pm_runtime_disable()
      as the last thing we do before returning from our ->remove()
      method.
      
      Fixes: 72246da4 (usb: Introduce DesignWare USB3 DRD Driver)
      Cc: <stable@vger.kernel.org> # v3.2+
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      
      (cherry picked from commit fed33afc)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5b3b1aeb
    • Honggang Li's avatar
      Revert "percpu: free percpu allocation info for uniprocessor system" · 05e558b7
      Honggang Li authored
      This reverts commit 3189eddb ("percpu: free percpu allocation info for
      uniprocessor system").
      
      The commit causes a hang with a crisv32 image. This may be an architecture
      problem, but at least for now the revert is necessary to be able to boot a
      crisv32 image.
      
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Honggang Li <enjoymindful@gmail.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 3189eddb ("percpu: free percpu allocation info for uniprocessor system")
      Cc: stable@vger.kernel.org # Please don't apply 3189eddb
      
      percpu-refcount: make percpu_ref based on longs instead of ints
      
      percpu_ref is currently based on ints and the number of refs it can
      cover is (1 << 31).  This makes it impossible to use a percpu_ref to
      count memory objects or pages on 64bit machines as it may overflow.
      This forces those users to somehow aggregate the references before
      contributing to the percpu_ref which is often cumbersome and sometimes
      challenging to get the same level of performance as using the
      percpu_ref directly.
      
      While using ints for the percpu counters makes them pack tighter on
      64bit machines, the possible gain from using ints instead of longs is
      extremely small compared to the overall gain from per-cpu operation.
      This patch makes percpu_ref based on longs so that it can be used to
      directly count memory objects or pages.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      
      percpu-refcount: improve WARN messages
      
      percpu_ref's WARN messages can be a lot more helpful by indicating
      who's the culprit.  Make them report the release function that the
      offending percpu-refcount is associated with.  This should make it a
      lot easier to track down the reported invalid refcnting operations.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      
      percpu: fix locking regression in the failure path of pcpu_alloc()
      
      While updating locking, b38d08f3 ("percpu: restructure locking")
      broke pcpu_create_chunk() creation path in pcpu_alloc().  It returns
      without releasing pcpu_alloc_mutex.  Fix it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarJulia Lawall <julia.lawall@lip6.fr>
      
      percpu-refcount: add @gfp to percpu_ref_init()
      
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_ref_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_refs too.
      
      This patch doesn't make any functional difference.
      
      v2: blk-mq conversion was missing.  Updated.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      
      proportions: add @gfp to init functions
      
      Percpu allocator now supports allocation mask.  Add @gfp to
      [flex_]proportions init functions so that !GFP_KERNEL allocation masks
      can be used with them too.
      
      This patch doesn't make any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      
      percpu_counter: add @gfp to percpu_counter_init()
      
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_counters too.
      
      We could have left percpu_counter_init() alone and added
      percpu_counter_init_gfp(); however, the number of users isn't that
      high and introducing _gfp variants to all percpu data structures would
      be quite ugly, so let's just do the conversion.  This is the one with
      the most users.  Other percpu data structures are a lot easier to
      convert.
      
      This patch doesn't make any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatar"David S. Miller" <davem@davemloft.net>
      Cc: x86@kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      
      percpu_counter: make percpu_counters_lock irq-safe
      
      percpu_counter is scheduled to grow @gfp support to allow atomic
      initialization.  This patch makes percpu_counters_lock irq-safe so
      that it can be safely used from atomic contexts.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: implement asynchronous chunk population
      
      The percpu allocator now supports atomic allocations by only
      allocating from already populated areas but the mechanism to ensure
      that there's adequate amount of populated areas was missing.
      
      This patch expands pcpu_balance_work so that in addition to freeing
      excess free chunks it also populates chunks to maintain an adequate
      level of populated areas.  pcpu_alloc() schedules pcpu_balance_work if
      the amount of free populated areas is too low or after an atomic
      allocation failure.
      
      * PERPCU_DYNAMIC_RESERVE is increased by two pages to account for
        PCPU_EMPTY_POP_PAGES_LOW.
      
      * pcpu_async_enabled is added to gate both async jobs -
        chunk->map_extend_work and pcpu_balance_work - so that we don't end
        up scheduling them while the needed subsystems aren't up yet.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: rename pcpu_reclaim_work to pcpu_balance_work
      
      pcpu_reclaim_work will also be used to populate chunks asynchronously.
      Rename it to pcpu_balance_work in preparation.  pcpu_reclaim() is
      renamed to pcpu_balance_workfn() and some of its local variables are
      renamed too.
      
      This is pure rename.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated
      
      pcpu_nr_empty_pop_pages counts the number of empty populated pages
      across all chunks and chunk->nr_populated counts the number of
      populated pages in a chunk.  Both will be used to implement pre/async
      population for atomic allocations.
      
      pcpu_chunk_[de]populated() are added to update chunk->populated,
      chunk->nr_populated and pcpu_nr_empty_pop_pages together.  All
      successful chunk [de]populations should be followed by the
      corresponding pcpu_chunk_[de]populated() calls.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: make sure chunk->map array has available space
      
      An allocation attempt may require extending chunk->map array which
      requires GFP_KERNEL context which isn't available for atomic
      allocations.  This patch ensures that chunk->map array usually keeps
      some amount of available space by directly allocating buffer space
      during GFP_KERNEL allocations and scheduling async extension during
      atomic ones.  This should make atomic allocation failures from map
      space exhaustion rare.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: implement [__]alloc_percpu_gfp()
      
      Now that pcpu_alloc_area() can allocate only from populated areas,
      it's easy to add atomic allocation support to [__]alloc_percpu().
      Update pcpu_alloc() so that it accepts @gfp and skips all the blocking
      operations and allocates only from the populated areas if @gfp doesn't
      contain GFP_KERNEL.  New interface functions [__]alloc_percpu_gfp()
      are added.
      
      While this means that atomic allocations are possible, this isn't
      complete yet as there's no mechanism to ensure that certain amount of
      populated areas is kept available and atomic allocations may keep
      failing under certain conditions.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: indent the population block in pcpu_alloc()
      
      The next patch will conditionalize the population block in
      pcpu_alloc() which will end up making a rather large indentation
      change obfuscating the actual logic change.  This patch puts the block
      under "if (true)" so that the next patch can avoid indentation
      changes.  The defintions of the local variables which are used only in
      the block are moved into the block.
      
      This patch is purely cosmetic.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: make pcpu_alloc_area() capable of allocating only from populated areas
      
      Update pcpu_alloc_area() so that it can skip unpopulated areas if the
      new parameter @pop_only is true.  This is implemented by a new
      function, pcpu_fit_in_area(), which determines the amount of head
      padding considering the alignment and populated state.
      
      @pop_only is currently always false but this will be used to implement
      atomic allocation.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: restructure locking
      
      At first, the percpu allocator required a sleepable context for both
      alloc and free paths and used pcpu_alloc_mutex to protect everything.
      Later, pcpu_lock was introduced to protect the index data structure so
      that the free path can be invoked from atomic contexts.  The
      conversion only updated what's necessary and left most of the
      allocation path under pcpu_alloc_mutex.
      
      The percpu allocator is planned to add support for atomic allocation
      and this patch restructures locking so that the coverage of
      pcpu_alloc_mutex is further reduced.
      
      * pcpu_alloc() now grab pcpu_alloc_mutex only while creating a new
        chunk and populating the allocated area.  Everything else is now
        protected soley by pcpu_lock.
      
        After this change, multiple instances of pcpu_extend_area_map() may
        race but the function already implements sufficient synchronization
        using pcpu_lock.
      
        This also allows multiple allocators to arrive at new chunk
        creation.  To avoid creating multiple empty chunks back-to-back, a
        new chunk is created iff there is no other empty chunk after
        grabbing pcpu_alloc_mutex.
      
      * pcpu_lock is now held while modifying chunk->populated bitmap.
        After this, all data structures are protected by pcpu_lock.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: make percpu-km set chunk->populated bitmap properly
      
      percpu-km instantiates the whole chunk on creation and doesn't make
      use of chunk->populated bitmap and leaves it as zero.  While this
      currently doesn't cause any problem, the inconsistency makes it
      difficult to build further logic on top of chunk->populated.  This
      patch makes percpu-km fill chunk->populated on creation so that the
      bitmap is always consistent.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      
      percpu: move region iterations out of pcpu_[de]populate_chunk()
      
      Previously, pcpu_[de]populate_chunk() were called with the range which
      may contain multiple target regions in it and
      pcpu_[de]populate_chunk() iterated over the regions.  This has the
      benefit of batching up cache flushes for all the regions; however,
      we're planning to add more bookkeeping logic around [de]population to
      support atomic allocations and this delegation of iterations gets in
      the way.
      
      This patch moves the region iterations out of
      pcpu_[de]populate_chunk() into its callers - pcpu_alloc() and
      pcpu_reclaim() - so that we can later add logic to track more states
      around them.  This change may make cache and tlb flushes more frequent
      but multi-region [de]populations are rare anyway and if this actually
      becomes a problem, it's not difficult to factor out cache flushes as
      separate callbacks which are directly invoked from percpu.c.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: move common parts out of pcpu_[de]populate_chunk()
      
      percpu-vm and percpu-km implement separate versions of
      pcpu_[de]populate_chunk() and some part which is or should be common
      are currently in the specific implementations.  Make the following
      changes.
      
      * Allocate area clearing is moved from the pcpu_populate_chunk()
        implementations to pcpu_alloc().  This makes percpu-km's version
        noop.
      
      * Quick exit tests in pcpu_[de]populate_chunk() of percpu-vm are moved
        to their respective callers so that they are applied to percpu-km
        too.  This doesn't make any meaningful difference as both functions
        are noop for percpu-km; however, this is more consistent and will
        help implementing atomic allocation support.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: remove @may_alloc from pcpu_get_pages()
      
      pcpu_get_pages() creates the temp pages array if not already allocated
      and returns the pointer to it.  As the function is called from both
      [de]population paths and depopulation can only happen after at least
      one successful population, the param doesn't make any difference - the
      allocation will always happen on the population path anyway.
      
      Remove @may_alloc from pcpu_get_pages().  Also, add an lockdep
      assertion pcpu_alloc_mutex instead of vaguely stating that the
      exclusion is the caller's responsibility.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      
      percpu: remove the usage of separate populated bitmap in percpu-vm
      
      percpu-vm uses pcpu_get_pages_and_bitmap() to acquire temp pages array
      and populated bitmap and uses the two during [de]population.  The temp
      bitmap is used only to build the new bitmap that is copied to
      chunk->populated after the operation succeeds; however, the new bitmap
      can be trivially set after success without using the temp bitmap.
      
      This patch removes the temp populated bitmap usage from percpu-vm.c.
      
      * pcpu_get_pages_and_bitmap() is renamed to pcpu_get_pages() and no
        longer hands out the temp bitmap.
      
      * @populated arugment is dropped from all the related functions.
        @populated updates in pcpu_[un]map_pages() are dropped.
      
      * Two loops in pcpu_map_pages() are merged.
      
      * pcpu_[de]populated_chunk() modify chunk->populated bitmap directly
        from @page_start and @page_end after success.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      
      percpu: free percpu allocation info for uniprocessor system
      
      Currently, only SMP system free the percpu allocation info.
      Uniprocessor system should free it too. For example, one x86 UML
      virtual machine with 256MB memory, UML kernel wastes one page memory.
      Signed-off-by: default avatarHonggang Li <enjoymindful@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit bb2e226b
      3189eddb)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      05e558b7
    • bob picco's avatar
      sparc64: sparse irq · 050da6ad
      bob picco authored
      This patch attempts to do a few things. The highlights are: 1) enable
      SPARSE_IRQ unconditionally, 2) kills off !SPARSE_IRQ code 3) allocates
      ivector_table at boot time and 4) default to cookie only VIRQ mechanism
      for supported firmware. The first firmware with cookie only support for
      me appears on T5. You can optionally force the HV firmware to not cookie
      only mode which is the sysino support.
      
      The sysino is a deprecated HV mechanism according to the most recent
      SPARC Virtual Machine Specification. HV_GRP_INTR is what controls the
      cookie/sysino firmware versioning.
      
      The history of this interface is:
      
      1) Major version 1.0 only supported sysino based interrupt interfaces.
      
      2) Major version 2.0 added cookie based VIRQs, however due to the fact
         that OSs were using the VIRQs without negoatiating major version
         2.0 (Linux and Solaris are both guilty), the VIRQs calls were
         allowed even with major version 1.0
      
         To complicate things even further, the VIRQ interfaces were only
         actually hooked up in the hypervisor for LDC interrupt sources.
         VIRQ calls on other device types would result in HV_EINVAL errors.
      
         So effectively, major version 2.0 is unusable.
      
      3) Major version 3.0 was created to signal use of VIRQs and the fact
         that the hypervisor has these calls hooked up for all interrupt
         sources, not just those for LDC devices.
      
      A new boot option is provided should cookie only HV support have issues.
      hvirq - this is the version for HV_GRP_INTR. This is related to HV API
      versioning.  The code attempts major=3 first by default. The option can
      be used to override this default.
      
      I've tested with SPARSE_IRQ on T5-8, M7-4 and T4-X and Jalap?no.
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit ee6a9333)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      050da6ad
    • bob picco's avatar
      sparc64: T5 PMU · ad24a619
      bob picco authored
      The T5 (niagara5) has different PCR related HV fast trap values and a new
      HV API Group. This patch utilizes these and shares when possible with niagara4.
      
      We use the same sparc_pmu niagara4_pmu. Should there be new effort to
      obtain the MCU perf statistics then this would have to be changed.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 05aa1651)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ad24a619
    • Xiubo Li's avatar
      regmap: fix possible ZERO_SIZE_PTR pointer dereferencing error. · 25cd2783
      Xiubo Li authored
      Since we cannot make sure the 'val_count' will always be none zero
      here, and then if it equals to zero, the kmemdup() will return
      ZERO_SIZE_PTR, which equals to ((void *)16).
      
      So this patch fix this with just doing the zero check before calling
      kmemdup().
      Signed-off-by: default avatarXiubo Li <Li.Xiubo@freescale.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit d6b41cb0)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      25cd2783
    • Bryan O'Donoghue's avatar
      usb: pch_udc: usb gadget device support for Intel Quark X1000 · 70a8623a
      Bryan O'Donoghue authored
      This patch is to enable the USB gadget device for Intel Quark X1000
      Signed-off-by: default avatarBryan O'Donoghue <bryan.odonoghue@intel.com>
      Signed-off-by: default avatarBing Niu <bing.niu@intel.com>
      Signed-off-by: default avatarAlvin (Weike) Chen <alvin.chen@intel.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      
      (cherry picked from commit a68df706)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      70a8623a
    • Stanislaw Gruszka's avatar
      asus-nb-wmi: Add wapf4 quirk for the X550VB · 4915c987
      Stanislaw Gruszka authored
      X550VB as many others Asus laptops need wapf4 quirk to make RFKILL
      switch be functional. Otherwise system boots with wireless card
      disabled and is only possible to enable it by suspend/resume.
      
      Bug report:
      http://bugzilla.redhat.com/show_bug.cgi?id=1089731#c23Reported-and-tested-by: default avatarVratislav Podzimek <vpodzime@redhat.com>
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      
      (cherry picked from commit 4ec7a45b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      4915c987
    • Daniel Borkmann's avatar
      net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks · 6aed7535
      Daniel Borkmann authored
      Commit 6f4c618d ("SCTP : Add paramters validity check for
      ASCONF chunk") added basic verification of ASCONF chunks, however,
      it is still possible to remotely crash a server by sending a
      special crafted ASCONF chunk, even up to pre 2.6.12 kernels:
      
      skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
       head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
       end:0x440 dev:<NULL>
       ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:129!
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff8144fb1c>] skb_put+0x5c/0x70
       [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
       [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
       [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
       [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
       [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
       [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
       [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
       [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
       [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
       [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
       [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
       [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
       [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
       [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
       [<ffffffff81497078>] ip_local_deliver+0x98/0xa0
       [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
       [<ffffffff81496ac5>] ip_rcv+0x275/0x350
       [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
       [<ffffffff81460588>] netif_receive_skb+0x58/0x60
      
      This can be triggered e.g., through a simple scripted nmap
      connection scan injecting the chunk after the handshake, for
      example, ...
      
        -------------- INIT[ASCONF; ASCONF_ACK] ------------->
        <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
        ------------------ ASCONF; UNKNOWN ------------------>
      
      ... where ASCONF chunk of length 280 contains 2 parameters ...
      
        1) Add IP address parameter (param length: 16)
        2) Add/del IP address parameter (param length: 255)
      
      ... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
      Address Parameter in the ASCONF chunk is even missing, too.
      This is just an example and similarly-crafted ASCONF chunks
      could be used just as well.
      
      The ASCONF chunk passes through sctp_verify_asconf() as all
      parameters passed sanity checks, and after walking, we ended
      up successfully at the chunk end boundary, and thus may invoke
      sctp_process_asconf(). Parameter walking is done with
      WORD_ROUND() to take padding into account.
      
      In sctp_process_asconf()'s TLV processing, we may fail in
      sctp_process_asconf_param() e.g., due to removal of the IP
      address that is also the source address of the packet containing
      the ASCONF chunk, and thus we need to add all TLVs after the
      failure to our ASCONF response to remote via helper function
      sctp_add_asconf_response(), which basically invokes a
      sctp_addto_chunk() adding the error parameters to the given
      skb.
      
      When walking to the next parameter this time, we proceed
      with ...
      
        length = ntohs(asconf_param->param_hdr.length);
        asconf_param = (void *)asconf_param + length;
      
      ... instead of the WORD_ROUND()'ed length, thus resulting here
      in an off-by-one that leads to reading the follow-up garbage
      parameter length of 12336, and thus throwing an skb_over_panic
      for the reply when trying to sctp_addto_chunk() next time,
      which implicitly calls the skb_put() with that length.
      
      Fix it by using sctp_walk_params() [ which is also used in
      INIT parameter processing ] macro in the verification *and*
      in ASCONF processing: it will make sure we don't spill over,
      that we walk parameters WORD_ROUND()'ed. Moreover, we're being
      more defensive and guard against unknown parameter types and
      missized addresses.
      
      Joint work with Vlad Yasevich.
      
      Fixes: b896b82b ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 9de7922b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      6aed7535
    • Daniel Borkmann's avatar
      net: sctp: fix panic on duplicate ASCONF chunks · 5d8ce09e
      Daniel Borkmann authored
      When receiving a e.g. semi-good formed connection scan in the
      form of ...
      
        -------------- INIT[ASCONF; ASCONF_ACK] ------------->
        <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
        ---------------- ASCONF_a; ASCONF_b ----------------->
      
      ... where ASCONF_a equals ASCONF_b chunk (at least both serials
      need to be equal), we panic an SCTP server!
      
      The problem is that good-formed ASCONF chunks that we reply with
      ASCONF_ACK chunks are cached per serial. Thus, when we receive a
      same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
      not need to process them again on the server side (that was the
      idea, also proposed in the RFC). Instead, we know it was cached
      and we just resend the cached chunk instead. So far, so good.
      
      Where things get nasty is in SCTP's side effect interpreter, that
      is, sctp_cmd_interpreter():
      
      While incoming ASCONF_a (chunk = event_arg) is being marked
      !end_of_packet and !singleton, and we have an association context,
      we do not flush the outqueue the first time after processing the
      ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
      queued up, although we set local_cork to 1. Commit 2e3216cd
      changed the precedence, so that as long as we get bundled, incoming
      chunks we try possible bundling on outgoing queue as well. Before
      this commit, we would just flush the output queue.
      
      Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
      continue to process the same ASCONF_b chunk from the packet. As
      we have cached the previous ASCONF_ACK, we find it, grab it and
      do another SCTP_CMD_REPLY command on it. So, effectively, we rip
      the chunk->list pointers and requeue the same ASCONF_ACK chunk
      another time. Since we process ASCONF_b, it's correctly marked
      with end_of_packet and we enforce an uncork, and thus flush, thus
      crashing the kernel.
      
      Fix it by testing if the ASCONF_ACK is currently pending and if
      that is the case, do not requeue it. When flushing the output
      queue we may relink the chunk for preparing an outgoing packet,
      but eventually unlink it when it's copied into the skb right
      before transmission.
      
      Joint work with Vlad Yasevich.
      
      Fixes: 2e3216cd ("sctp: Follow security requirement of responding with 1 packet")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit b69040d8)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5d8ce09e
    • Daniel Borkmann's avatar
      net: sctp: fix remote memory pressure from excessive queueing · 7f301435
      Daniel Borkmann authored
      This scenario is not limited to ASCONF, just taken as one
      example triggering the issue. When receiving ASCONF probes
      in the form of ...
      
        -------------- INIT[ASCONF; ASCONF_ACK] ------------->
        <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
        ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------>
        [...]
        ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------>
      
      ... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
      ASCONFs and have increasing serial numbers, we process such
      ASCONF chunk(s) marked with !end_of_packet and !singleton,
      since we have not yet reached the SCTP packet end. SCTP does
      only do verification on a chunk by chunk basis, as an SCTP
      packet is nothing more than just a container of a stream of
      chunks which it eats up one by one.
      
      We could run into the case that we receive a packet with a
      malformed tail, above marked as trailing JUNK. All previous
      chunks are here goodformed, so the stack will eat up all
      previous chunks up to this point. In case JUNK does not fit
      into a chunk header and there are no more other chunks in
      the input queue, or in case JUNK contains a garbage chunk
      header, but the encoded chunk length would exceed the skb
      tail, or we came here from an entirely different scenario
      and the chunk has pdiscard=1 mark (without having had a flush
      point), it will happen, that we will excessively queue up
      the association's output queue (a correct final chunk may
      then turn it into a response flood when flushing the
      queue ;)): I ran a simple script with incremental ASCONF
      serial numbers and could see the server side consuming
      excessive amount of RAM [before/after: up to 2GB and more].
      
      The issue at heart is that the chunk train basically ends
      with !end_of_packet and !singleton markers and since commit
      2e3216cd ("sctp: Follow security requirement of responding
      with 1 packet") therefore preventing an output queue flush
      point in sctp_do_sm() -> sctp_cmd_interpreter() on the input
      chunk (chunk = event_arg) even though local_cork is set,
      but its precedence has changed since then. In the normal
      case, the last chunk with end_of_packet=1 would trigger the
      queue flush to accommodate possible outgoing bundling.
      
      In the input queue, sctp_inq_pop() seems to do the right thing
      in terms of discarding invalid chunks. So, above JUNK will
      not enter the state machine and instead be released and exit
      the sctp_assoc_bh_rcv() chunk processing loop. It's simply
      the flush point being missing at loop exit. Adding a try-flush
      approach on the output queue might not work as the underlying
      infrastructure might be long gone at this point due to the
      side-effect interpreter run.
      
      One possibility, albeit a bit of a kludge, would be to defer
      invalid chunk freeing into the state machine in order to
      possibly trigger packet discards and thus indirectly a queue
      flush on error. It would surely be better to discard chunks
      as in the current, perhaps better controlled environment, but
      going back and forth, it's simply architecturally not possible.
      I tried various trailing JUNK attack cases and it seems to
      look good now.
      
      Joint work with Vlad Yasevich.
      
      Fixes: 2e3216cd ("sctp: Follow security requirement of responding with 1 packet")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      
      (cherry picked from commit 26b87c78)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      7f301435
    • Nadav Amit's avatar
      KVM: x86: Don't report guest userspace emulation error to userspace · 2d699aea
      Nadav Amit authored
      Commit fc3a9157 ("KVM: X86: Don't report L2 emulation failures to
      user-space") disabled the reporting of L2 (nested guest) emulation failures to
      userspace due to race-condition between a vmexit and the instruction emulator.
      The same rational applies also to userspace applications that are permitted by
      the guest OS to access MMIO area or perform PIO.
      
      This patch extends the current behavior - of injecting a #UD instead of
      reporting it to userspace - also for guest userspace code.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      
      (cherry picked from commit a2b9e6c1)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2d699aea
    • Florian Westphal's avatar
      netfilter: nfnetlink_log: fix maximum packet length logged to userspace · 49957680
      Florian Westphal authored
      don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work.
      The nla length includes the size of the nla struct, so anything larger
      results in u16 integer overflow.
      
      This patch is similar to
      9cefbbc9 (netfilter: nfnetlink_queue: cleanup copy_range usage).
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      
      (cherry picked from commit c1e7dc91)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      49957680
    • Florian Westphal's avatar
      netfilter: nf_log: account for size of NLMSG_DONE attribute · 5c9dff67
      Florian Westphal authored
      We currently neither account for the nlattr size, nor do we consider
      the size of the trailing NLMSG_DONE when allocating nlmsg skb.
      
      This can result in nflog to stop working, as __nfulnl_send() re-tries
      sending forever if it failed to append NLMSG_DONE (which will never
      work if buffer is not large enough).
      Reported-by: default avatarHoucheng Lin <houcheng@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      
      (cherry picked from commit 9dfa1dfe)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      5c9dff67
    • Andrey Vagin's avatar
      ipc: always handle a new value of auto_msgmni · 8c63c6cb
      Andrey Vagin authored
      proc_dointvec_minmax() returns zero if a new value has been set.  So we
      don't need to check all charecters have been handled.
      
      Below you can find two examples.  In the new value has not been handled
      properly.
      
      $ strace ./a.out
      open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
      write(3, "0\n\0", 3)                    = 2
      close(3)                                = 0
      exit_group(0)
      $ cat /sys/kernel/debug/tracing/trace
      
      $strace ./a.out
      open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
      write(3, "0\n", 2)                      = 2
      close(3)                                = 0
      
      $ cat /sys/kernel/debug/tracing/trace
      a.out-697   [000] ....  3280.998235: unregister_ipcns_notifier <-proc_ipcauto_dointvec_minmax
      
      Fixes: 9eefe520 ("ipc: do not use a negative value to re-enable msgmni automatic recomputin")
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      
      (cherry picked from commit 1195d94e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8c63c6cb
    • Devesh Sharma's avatar
      IB/core: Clear AH attr variable to prevent garbage data · 929fc772
      Devesh Sharma authored
      During create-ah from userspace, uverbs is sending garbage data in
      attr.dmac and attr.vlan_id.  This patch sets attr.dmac and
      attr.vlan_id to zero.
      
      Fixes: dd5f03be ("IB/core: Ethernet L2 attributes in verbs/cm structures")
      Signed-off-by: default avatarDevesh Sharma <devesh.sharma@emulex.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      
      (cherry picked from commit 8b0f93d9)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      929fc772
    • Bjorn Helgaas's avatar
      clocksource: Remove "weak" from clocksource_default_clock() declaration · f2cb0281
      Bjorn Helgaas authored
      kernel/time/jiffies.c provides a default clocksource_default_clock()
      definition explicitly marked "weak".  arch/s390 provides its own definition
      intended to override the default, but the "weak" attribute on the
      declaration applied to the s390 definition as well, so the linker chose one
      based on link order (see 10629d71 ("PCI: Remove __weak annotation from
      pcibios_get_phb_of_node decl")).
      
      Remove the "weak" attribute from the clocksource_default_clock()
      declaration so we always prefer a non-weak definition over the weak one,
      independent of link order.
      
      Fixes: f1b82746 ("clocksource: Cleanup clocksource selection")
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      CC: Daniel Lezcano <daniel.lezcano@linaro.org>
      CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
      
      (cherry picked from commit 96a2adbc)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f2cb0281
    • Bjorn Helgaas's avatar
      kgdb: Remove "weak" from kgdb_arch_pc() declaration · f990e77d
      Bjorn Helgaas authored
      kernel/debug/debug_core.c provides a default kgdb_arch_pc() definition
      explicitly marked "weak".  Several architectures provide their own
      definitions intended to override the default, but the "weak" attribute on
      the declaration applied to the arch definitions as well, so the linker
      chose one based on link order (see 10629d71 ("PCI: Remove __weak
      annotation from pcibios_get_phb_of_node decl")).
      
      Remove the "weak" attribute from the declaration so we always prefer a
      non-weak definition over the weak one, independent of link order.
      
      Fixes: 688b744d ("kgdb: fix signedness mixmatches, add statics, add declaration to header")
      Tested-by: Vineet Gupta <vgupta@synopsys.com>	# for ARC build
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarHarvey Harrison <harvey.harrison@gmail.com>
      
      (cherry picked from commit 107bcc6d)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f990e77d
    • Dan Carpenter's avatar
      media: ttusb-dec: buffer overflow in ioctl · 1d7c70fc
      Dan Carpenter authored
      commit f2e323ec upstream.
      
      We need to add a limit check here so we don't overflow the buffer.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      (cherry picked from commit 20cd3408)
      
      (cherry picked from commit HEAD)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      1d7c70fc
    • Jan Kara's avatar
      nfs: Fix use of uninitialized variable in nfs_getattr() · 59046e95
      Jan Kara authored
      Variable 'err' needn't be initialized when nfs_getattr() uses it to
      check whether it should call generic_fillattr() or not. That can result
      in spurious error returns. Initialize 'err' properly.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      
      (cherry picked from commit 16caf5b6)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      59046e95
    • Trond Myklebust's avatar
      NFS: Don't try to reclaim delegation open state if recovery failed · 41194f78
      Trond Myklebust authored
      If state recovery failed, then we should not attempt to reclaim delegated
      state.
      
      http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      
      (cherry picked from commit f8ebf7a8)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      41194f78
    • Trond Myklebust's avatar
      NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired · 27bd73a9
      Trond Myklebust authored
      NFSv4.0 does not have TEST_STATEID/FREE_STATEID functionality, so
      unlike NFSv4.1, the recovery procedure when stateids have expired or
      have been revoked requires us to just forget the delegation.
      
      http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      
      (cherry picked from commit 4dfd4f7a)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      27bd73a9
    • Pali Rohár's avatar
      Input: alps - allow up to 2 invalid packets without resetting device · a6ccfcae
      Pali Rohár authored
      On some Dell Latitude laptops ALPS device or Dell EC send one invalid byte
      in 6 bytes ALPS packet. In this case psmouse driver enter out of sync
      state. It looks like that all other bytes in packets are valid and also
      device working properly. So there is no need to do full device reset, just
      need to wait for byte which match condition for first byte (start of
      packet). Because ALPS packets are bigger (6 or 8 bytes) default limit is
      small.
      
      This patch increase number of invalid bytes to size of 2 ALPS packets which
      psmouse driver can drop before do full reset.
      
      Resetting ALPS devices take some time and when doing reset on some Dell
      laptops touchpad, trackstick and also keyboard do not respond. So it is
      better to do it only if really necessary.
      Signed-off-by: default avatarPali Rohár <pali.rohar@gmail.com>
      Tested-by: default avatarPali Rohár <pali.rohar@gmail.com>
      Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      
      (cherry picked from commit 9d720b34)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      a6ccfcae
    • Pali Rohár's avatar
      Input: alps - ignore potential bare packets when device is out of sync · cc975e28
      Pali Rohár authored
      5th and 6th byte of ALPS trackstick V3 protocol match condition for first
      byte of PS/2 3 bytes packet. When driver enters out of sync state and ALPS
      trackstick is sending data then driver match 5th, 6th and next 1st bytes as
      PS/2.
      
      It basically means if user is using trackstick when driver is in out of
      sync state driver will never resync. Processing these bytes as 3 bytes PS/2
      data cause total mess (random cursor movements, random clicks) and make
      trackstick unusable until psmouse driver decide to do full device reset.
      
      Lot of users reported problems with ALPS devices on Dell Latitude E6440,
      E6540 and E7440 laptops. ALPS device or Dell EC for unknown reason send
      some invalid ALPS PS/2 bytes which cause driver out of sync. It looks like
      that i8042 and psmouse/alps driver always receive group of 6 bytes packets
      so there are no missing bytes and no bytes were inserted between valid
      ones.
      
      This patch does not fix root of problem with ALPS devices found in Dell
      Latitude laptops but it does not allow to process some (invalid)
      subsequence of 6 bytes ALPS packets as 3 bytes PS/2 when driver is out of
      sync.
      
      So with this patch trackstick input device does not report bogus data when
      also driver is out of sync, so trackstick should be usable on those
      machines.
      Signed-off-by: default avatarPali Rohár <pali.rohar@gmail.com>
      Tested-by: default avatarPali Rohár <pali.rohar@gmail.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      
      (cherry picked from commit 4ab8f7f3)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      cc975e28
    • Heinz Mauelshagen's avatar
      dm raid: ensure superblock's size matches device's logical block size · 2a61f025
      Heinz Mauelshagen authored
      The dm-raid superblock (struct dm_raid_superblock) is padded to 512
      bytes and that size is being used to read it in from the metadata
      device into one preallocated page.
      
      Reading or writing this on a 512-byte sector device works fine but on
      a 4096-byte sector device this fails.
      
      Set the dm-raid superblock's size to the logical block size of the
      metadata device, because IO at that size is guaranteed too work.  Also
      add a size check to avoid silent partial metadata loss in case the
      superblock should ever grow past the logical block size or PAGE_SIZE.
      
      [includes pointer math fix from Dan Carpenter]
      Reported-by: default avatar"Liuhua Wang" <lwang@suse.com>
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      
      (cherry picked from commit 40d43c4b)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2a61f025
    • Jan Kara's avatar
      block: Fix computation of merged request priority · 8210609e
      Jan Kara authored
      Priority of a merged request is computed by ioprio_best(). If one of the
      requests has undefined priority (IOPRIO_CLASS_NONE) and another request
      has priority from IOPRIO_CLASS_BE, the function will return the
      undefined priority which is wrong. Fix the function to properly return
      priority of a request with the defined priority.
      
      Fixes: d58cdfb8
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      
      (cherry picked from commit ece9c72a)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      8210609e
    • Christoph Hellwig's avatar
      scsi: only re-lock door after EH on devices that were reset · d5ea2362
      Christoph Hellwig authored
      Setups that use the blk-mq I/O path can lock up if a host with a single
      device that has its door locked enters EH.  Make sure to only send the
      command to re-lock the door to devices that actually were reset and thus
      might have lost their state.  Otherwise the EH code might be get blocked
      on blk_get_request as all requests for non-reset devices might be in use.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarMeelis Roos <meelis.roos@ut.ee>
      Tested-by: default avatarMeelis Roos <meelis.roos@ut.ee>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      
      (cherry picked from commit 48379270)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      d5ea2362
    • Peng Tao's avatar
      nfs: fix pnfs direct write memory leak · f7ff0611
      Peng Tao authored
      For pNFS direct writes, layout driver may dynamically allocate ds_cinfo.buckets.
      So we need to take care to free them when freeing dreq.
      
      Ideally this needs to be done inside layout driver where ds_cinfo.buckets
      are allocated. But buckets are attached to dreq and reused across LD IO iterations.
      So I feel it's OK to free them in the generic layer.
      
      Cc: stable@vger.kernel.org [v3.4+]
      Signed-off-by: default avatarPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      
      (cherry picked from commit 8c393f9a)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      f7ff0611
    • Stefan Richter's avatar
      firewire: cdev: prevent kernel stack leaking into ioctl arguments · 2d615eda
      Stefan Richter authored
      Found by the UC-KLEE tool:  A user could supply less input to
      firewire-cdev ioctls than write- or write/read-type ioctl handlers
      expect.  The handlers used data from uninitialized kernel stack then.
      
      This could partially leak back to the user if the kernel subsequently
      generated fw_cdev_event_'s (to be read from the firewire-cdev fd)
      which notably would contain the _u64 closure field which many of the
      ioctl argument structures contain.
      
      The fact that the handlers would act on random garbage input is a
      lesser issue since all handlers must check their input anyway.
      
      The fix simply always null-initializes the entire ioctl argument buffer
      regardless of the actual length of expected user input.  That is, a
      runtime overhead of memset(..., 40) is added to each firewirew-cdev
      ioctl() call.  [Comment from Clemens Ladisch:  This part of the stack is
      most likely to be already in the cache.]
      
      Remarks:
        - There was never any leak from kernel stack to the ioctl output
          buffer itself.  IOW, it was not possible to read kernel stack by a
          read-type or write/read-type ioctl alone; the leak could at most
          happen in combination with read()ing subsequent event data.
        - The actual expected minimum user input of each ioctl from
          include/uapi/linux/firewire-cdev.h is, in bytes:
          [0x00] = 32, [0x05] =  4, [0x0a] = 16, [0x0f] = 20, [0x14] = 16,
          [0x01] = 36, [0x06] = 20, [0x0b] =  4, [0x10] = 20, [0x15] = 20,
          [0x02] = 20, [0x07] =  4, [0x0c] =  0, [0x11] =  0, [0x16] =  8,
          [0x03] =  4, [0x08] = 24, [0x0d] = 20, [0x12] = 36, [0x17] = 12,
          [0x04] = 20, [0x09] = 24, [0x0e] =  4, [0x13] = 40, [0x18] =  4.
      Reported-by: default avatarDavid Ramos <daramos@stanford.edu>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      
      (cherry picked from commit eaca2d8e)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      2d615eda
    • Kyle McMartin's avatar
      arm64: __clear_user: handle exceptions on strb · ca961f0e
      Kyle McMartin authored
      ARM64 currently doesn't fix up faults on the single-byte (strb) case of
      __clear_user... which means that we can cause a nasty kernel panic as an
      ordinary user with any multiple PAGE_SIZE+1 read from /dev/zero.
      i.e.: dd if=/dev/zero of=foo ibs=1 count=1 (or ibs=65537, etc.)
      
      This is a pretty obscure bug in the general case since we'll only
      __do_kernel_fault (since there's no extable entry for pc) if the
      mmap_sem is contended. However, with CONFIG_DEBUG_VM enabled, we'll
      always fault.
      
      if (!down_read_trylock(&mm->mmap_sem)) {
      	if (!user_mode(regs) && !search_exception_tables(regs->pc))
      		goto no_context;
      retry:
      	down_read(&mm->mmap_sem);
      } else {
      	/*
      	 * The above down_read_trylock() might have succeeded in
      	 * which
      	 * case, we'll have missed the might_sleep() from
      	 * down_read().
      	 */
      	might_sleep();
      	if (!user_mode(regs) && !search_exception_tables(regs->pc))
      		goto no_context;
      }
      
      Fix that by adding an extable entry for the strb instruction, since it
      touches user memory, similar to the other stores in __clear_user.
      Signed-off-by: default avatarKyle McMartin <kyle@redhat.com>
      Reported-by: default avatarMiloš Prchlík <mprchlik@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      
      (cherry picked from commit 97fc1543)
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      ca961f0e