1. 03 Mar, 2016 40 commits
    • Joe Thornber's avatar
      dm btree: fix bufio buffer leaks in dm_btree_del() error path · 859ac050
      Joe Thornber authored
      commit ed8b45a3 upstream.
      
      If dm_btree_del()'s call to push_frame() fails, e.g. due to
      btree_node_validator finding invalid metadata, the dm_btree_del() error
      path must unlock all frames (which have active dm-bufio buffers) that
      were pushed onto the del_stack.
      
      Otherwise, dm_bufio_client_destroy() will BUG_ON() because dm-bufio
      buffers have leaked, e.g.:
        device-mapper: bufio: leaked buffer 3, hold count 1, list 0
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      859ac050
    • Mikulas Patocka's avatar
      sata_sil: disable trim · d6eab6f7
      Mikulas Patocka authored
      commit d98f1cd0 upstream.
      
      When I connect an Intel SSD to SATA SIL controller (PCI ID 1095:3114), any
      TRIM command results in I/O errors being reported in the log. There is
      other similar error reported with TRIM and the SIL controller:
      https://bugs.centos.org/view.php?id=5880
      
      Apparently the controller doesn't support TRIM commands. This patch
      disables TRIM support on the SATA SIL controller.
      
      ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
      ata7.00: BMDMA2 stat 0x50001
      ata7.00: failed command: DATA SET MANAGEMENT
      ata7.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 0 dma 512 out
               res 51/04:01:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
      ata7.00: status: { DRDY ERR }
      ata7.00: error: { ABRT }
      ata7.00: device reported invalid CHS sector 0
      sd 8:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
      sd 8:0:0:0: [sdb] tag#0 Sense Key : Illegal Request [current] [descriptor]
      sd 8:0:0:0: [sdb] tag#0 Add. Sense: Unaligned write command
      sd 8:0:0:0: [sdb] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 21 95 88 00 20 00 00 00 00
      blk_update_request: I/O error, dev sdb, sector 2200968
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6eab6f7
    • Sasha Levin's avatar
      sched/core: Remove false-positive warning from wake_up_process() · 6bf97b05
      Sasha Levin authored
      commit 119d6f6a upstream.
      
      Because wakeups can (fundamentally) be late, a task might not be in
      the expected state. Therefore testing against a task's state is racy,
      and can yield false positives.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: oleg@redhat.com
      Fixes: 9067ac85 ("wake_up_process() should be never used to wakeup a TASK_STOPPED/TRACED task")
      Link: http://lkml.kernel.org/r/1448933660-23082-1-git-send-email-sasha.levin@oracle.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bf97b05
    • Mirza Krak's avatar
      can: sja1000: clear interrupts on start · 4af5f249
      Mirza Krak authored
      commit 7cecd9ab upstream.
      
      According to SJA1000 data sheet error-warning (EI) interrupt is not
      cleared by setting the controller in to reset-mode.
      
      Then if we have the following case:
      - system is suspended (echo mem > /sys/power/state) and SJA1000 is left
        in operating state
      - A bus error condition occurs which activates EI interrupt, system is
        still suspended which means EI interrupt will be not be handled nor
        cleared.
      
      If the above two events occur, on resume there is no way to return the
      SJA1000 to operating state, except to cycle power to it.
      
      By simply reading the IR register on start we will clear any previous
      conditions that could be present.
      Signed-off-by: default avatarMirza Krak <mirza.krak@hostmobility.com>
      Reported-by: default avatarChristian Magnusson <Christian.Magnusson@semcon.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4af5f249
    • Quentin Casasnovas's avatar
      RDS: fix race condition when sending a message on unbound socket · e2f3d505
      Quentin Casasnovas authored
      commit 8c7188b2 upstream.
      
      Sasha's found a NULL pointer dereference in the RDS connection code when
      sending a message to an apparently unbound socket.  The problem is caused
      by the code checking if the socket is bound in rds_sendmsg(), which checks
      the rs_bound_addr field without taking a lock on the socket.  This opens a
      race where rs_bound_addr is temporarily set but where the transport is not
      in rds_bind(), leading to a NULL pointer dereference when trying to
      dereference 'trans' in __rds_conn_create().
      
      Vegard wrote a reproducer for this issue, so kindly ask him to share if
      you're interested.
      
      I cannot reproduce the NULL pointer dereference using Vegard's reproducer
      with this patch, whereas I could without.
      
      Complete earlier incomplete fix to CVE-2015-6937:
      
        74e98eb0 ("RDS: verify the underlying transport exists before creating a connection")
      Reviewed-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Reviewed-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2f3d505
    • Johannes Berg's avatar
      mac80211: mesh: fix call_rcu() usage · 1467bec9
      Johannes Berg authored
      commit c2e703a5 upstream.
      
      When using call_rcu(), the called function may be delayed quite
      significantly, and without a matching rcu_barrier() there's no
      way to be sure it has finished.
      Therefore, global state that could be gone/freed/reused should
      never be touched in the callback.
      
      Fix this in mesh by moving the atomic_dec() into the caller;
      that's not really a problem since we already unlinked the path
      and it will be destroyed anyway.
      
      This fixes a crash Jouni observed when running certain tests in
      a certain order, in which the mesh interface was torn down, the
      memory reused for a function pointer (work struct) and running
      that then crashed since the pointer had been decremented by 1,
      resulting in an invalid instruction byte stream.
      
      Fixes: eb2b9311 ("mac80211: mesh path table implementation")
      Reported-by: default avatarJouni Malinen <j@w1.fi>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1467bec9
    • Suman Anna's avatar
      virtio: fix memory leak of virtio ida cache layers · 963e1625
      Suman Anna authored
      commit c13f99b7 upstream.
      
      The virtio core uses a static ida named virtio_index_ida for
      assigning index numbers to virtio devices during registration.
      The ida core may allocate some internal idr cache layers and
      an ida bitmap upon any ida allocation, and all these layers are
      truely freed only upon the ida destruction. The virtio_index_ida
      is not destroyed at present, leading to a memory leak when using
      the virtio core as a module and atleast one virtio device is
      registered and unregistered.
      
      Fix this by invoking ida_destroy() in the virtio core module
      exit.
      Signed-off-by: default avatarSuman Anna <s-anna@ti.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      963e1625
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Update read stamp with first real commit on page · 6039f028
      Steven Rostedt (Red Hat) authored
      commit b81f472a upstream.
      
      Do not update the read stamp after swapping out the reader page from the
      write buffer. If the reader page is swapped out of the buffer before an
      event is written to it, then the read_stamp may get an out of date
      timestamp, as the page timestamp is updated on the first commit to that
      page.
      
      rb_get_reader_page() only returns a page if it has an event on it, otherwise
      it will return NULL. At that point, check if the page being returned has
      events and has not been read yet. Then at that point update the read_stamp
      to match the time stamp of the reader page.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6039f028
    • Jan Kara's avatar
      vfs: Avoid softlockups with sendfile(2) · b63a96fa
      Jan Kara authored
      commit c2489e07 upstream.
      
      The following test program from Dmitry can cause softlockups or RCU
      stalls as it copies 1GB from tmpfs into eventfd and we don't have any
      scheduling point at that path in sendfile(2) implementation:
      
              int r1 = eventfd(0, 0);
              int r2 = memfd_create("", 0);
              unsigned long n = 1<<30;
              fallocate(r2, 0, 0, n);
              sendfile(r1, r2, 0, n);
      
      Add cond_resched() into __splice_from_pipe() to fix the problem.
      
      CC: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b63a96fa
    • Vineet Gupta's avatar
      ARC: dw2 unwind: Remove falllback linear search thru FDE entries · 677eea66
      Vineet Gupta authored
      commit 2e22502c upstream.
      
      Fixes STAR 9000953410: "perf callgraph profiling causing RCU stalls"
      
      | perf record -g -c 15000 -e cycles /sbin/hackbench
      |
      | INFO: rcu_preempt self-detected stall on CPU
      | 1: (1 GPs behind) idle=609/140000000000002/0 softirq=2914/2915 fqs=603
      | Task dump for CPU 1:
      
      in-kernel dwarf unwinder has a fast binary lookup and a fallback linear
      search (which iterates thru each of ~11K entries) thus takes 2 orders of
      magnitude longer (~3 million cycles vs. 2000). Routines written in hand
      assembler lack dwarf info (as we don't support assembler CFI pseudo-ops
      yet) fail the unwinder binary lookup, hit linear search, failing
      nevertheless in the end.
      
      However the linear search is pointless as binary lookup tables are created
      from it in first place. It is impossible to have binary lookup fail while
      succeed the linear search. It is pure waste of cycles thus removed by
      this patch.
      
      This manifested as RCU stalls / NMI watchdog splat when running
      hackbench under perf with callgraph profiling. The triggering condition
      was perf counter overflowing in routine lacking dwarf info (like memset)
      leading to patheic 3 million cycle unwinder slow path and by the time it
      returned new interrupts were already pending (Timer, IPI) and taken
      rightaway. The original memset didn't make forward progress, system kept
      accruing more interrupts and more unwinder delayes in a vicious feedback
      loop, ultimately triggering the NMI diagnostic.
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      677eea66
    • Kees Cook's avatar
      mac: validate mac_partition is within sector · 2a27f61b
      Kees Cook authored
      commit 02e2a5bf upstream.
      
      If md->signature == MAC_DRIVER_MAGIC and md->block_size == 1023, a single
      512 byte sector would be read (secsize / 512). However the partition
      structure would be located past the end of the buffer (secsize % 512).
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a27f61b
    • Luca Porzio's avatar
      mmc: remove bondage between REQ_META and reliable write · e3dda035
      Luca Porzio authored
      commit d3df0465 upstream.
      
      Anytime a write operation is performed with Reliable Write flag enabled,
      the eMMC device is enforced to bypass the cache and do a write to the
      underling NVM device by Jedec specification; this causes a performance
      penalty since write operations can't be optimized by the device cache.
      
      In our tests, we replayed a typical mobile daily trace pattern and found
      ~9% overall time reduction in trace replay by using this patch. Also the
      write ops within 4KB~64KB chunk size range get a 40~60% performance
      improvement by using the patch (as this range of write chunks are the ones
      affected by REQ_META).
      
      This patch has been discussed in the Mobile & Embedded Linux Storage Forum
      and it's the results of feedbacks from many people. We also checked with
      fsdevl and f2fs mailing list developers that this change in the usage of
      REQ_META is not affecting FS behavior and we got positive feedbacks.
      Reporting here the feedbacks:
      http://comments.gmane.org/gmane.linux.file-systems/97219
      http://thread.gmane.org/gmane.linux.file-systems.f2fs/3178/focus=3183Signed-off-by: default avatarBruce Ford <bford@micron.com>
      Signed-off-by: default avatarLuca Porzio <lporzio@micron.com>
      Fixes: ce39f9d1 ("mmc: support packed write command for eMMC4.5 devices")
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3dda035
    • sumit.saxena@avagotech.com's avatar
      megaraid_sas : SMAP restriction--do not access user memory from IOCTL code · eef12556
      sumit.saxena@avagotech.com authored
      commit 323c4a02 upstream.
      
      This is an issue on SMAP enabled CPUs and 32 bit apps running on 64 bit
      OS. Do not access user memory from kernel code. The SMAP bit restricts
      accessing user memory from kernel code.
      Signed-off-by: default avatarSumit Saxena <sumit.saxena@avagotech.com>
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@avagotech.com>
      Reviewed-by: default avatarTomas Henzl <thenzl@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eef12556
    • sumit.saxena@avagotech.com's avatar
      megaraid_sas: Do not use PAGE_SIZE for max_sectors · 64896131
      sumit.saxena@avagotech.com authored
      commit 357ae967 upstream.
      
      Do not use PAGE_SIZE marco to calculate max_sectors per I/O
      request. Driver code assumes PAGE_SIZE will be always 4096 which can
      lead to wrongly calculated value if PAGE_SIZE is not 4096. This issue
      was reported in Ubuntu Bugzilla Bug #1475166.
      Signed-off-by: default avatarSumit Saxena <sumit.saxena@avagotech.com>
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@avagotech.com>
      Reviewed-by: default avatarTomas Henzl <thenzl@redhat.com>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      64896131
    • Valentin Rothberg's avatar
      wm831x_power: Use IRQF_ONESHOT to request threaded IRQs · 621264c8
      Valentin Rothberg authored
      commit 90adf98d upstream.
      
      Since commit 1c6c6952 ("genirq: Reject bogus threaded irq requests")
      threaded IRQs without a primary handler need to be requested with
      IRQF_ONESHOT, otherwise the request will fail.
      
      scripts/coccinelle/misc/irqf_oneshot.cocci detected this issue.
      
      Fixes: b5874f33 ("wm831x_power: Use genirq")
      Signed-off-by: default avatarValentin Rothberg <valentinrothberg@gmail.com>
      Signed-off-by: default avatarSebastian Reichel <sre@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      621264c8
    • Dan Carpenter's avatar
      devres: fix a for loop bounds check · 73665216
      Dan Carpenter authored
      commit 1f35d04a upstream.
      
      The iomap[] array has PCIM_IOMAP_MAX (6) elements and not
      DEVICE_COUNT_RESOURCE (16).  This bug was found using a static checker.
      It may be that the "if (!(mask & (1 << i)))" check means we never
      actually go past the end of the array in real life.
      
      Fixes: ec04b075 ('iomap: implement pcim_iounmap_regions()')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73665216
    • Andrey Ryabinin's avatar
      lockd: create NSM handles per net namespace · db13625b
      Andrey Ryabinin authored
      commit 0ad95472 upstream.
      
      Commit cb7323ff ("lockd: create and use per-net NSM
       RPC clients on MON/UNMON requests") introduced per-net
      NSM RPC clients. Unfortunately this doesn't make any sense
      without per-net nsm_handle.
      
      E.g. the following scenario could happen
      Two hosts (X and Y) in different namespaces (A and B) share
      the same nsm struct.
      
      1. nsm_monitor(host_X) called => NSM rpc client created,
      	nsm->sm_monitored bit set.
      2. nsm_mointor(host-Y) called => nsm->sm_monitored already set,
      	we just exit. Thus in namespace B ln->nsm_clnt == NULL.
      3. host X destroyed => nsm->sm_count decremented to 1
      4. host Y destroyed => nsm_unmonitor() => nsm_mon_unmon() => NULL-ptr
      	dereference of *ln->nsm_clnt
      
      So this could be fixed by making per-net nsm_handles list,
      instead of global. Thus different net namespaces will not be able
      share the same nsm_handle.
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db13625b
    • Roman Volkov's avatar
      clocksource/drivers/vt8500: Increase the minimum delta · 1644fe6c
      Roman Volkov authored
      commit f9eccf24 upstream.
      
      The vt8500 clocksource driver declares itself as capable to handle the
      minimum delay of 4 cycles by passing the value into
      clockevents_config_and_register(). The vt8500_timer_set_next_event()
      requires the passed cycles value to be at least 16. The impact is that
      userspace hangs in nanosleep() calls with small delay intervals.
      
      This problem is reproducible in Linux 4.2 starting from:
      c6eb3f70 ('hrtimer: Get rid of hrtimer softirq')
      
      From Russell King, more detailed explanation:
      
      "It's a speciality of the StrongARM/PXA hardware. It takes a certain
      number of OSCR cycles for the value written to hit the compare registers.
      So, if a very small delta is written (eg, the compare register is written
      with a value of OSCR + 1), the OSCR will have incremented past this value
      before it hits the underlying hardware. The result is, that you end up
      waiting a very long time for the OSCR to wrap before the event fires.
      
      So, we introduce a check in set_next_event() to detect this and return
      -ETIME if the calculated delta is too small, which causes the generic
      clockevents code to retry after adding the min_delta specified in
      clockevents_config_and_register() to the current time value.
      
      min_delta must be sufficient that we don't re-trip the -ETIME check - if
      we do, we will return -ETIME, forward the next event time, try to set it,
      return -ETIME again, and basically lock the system up. So, min_delta
      must be larger than the check inside set_next_event(). A factor of two
      was chosen to ensure that this situation would never occur.
      
      The PXA code worked on PXA systems for years, and I'd suggest no one
      changes this mechanism without access to a wide range of PXA systems,
      otherwise they're risking breakage."
      
      Cc: Russell King <linux@arm.linux.org.uk>
      Acked-by: default avatarAlexey Charkov <alchark@gmail.com>
      Signed-off-by: default avatarRoman Volkov <rvolkov@v1ros.org>
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1644fe6c
    • Thomas Gleixner's avatar
      genirq: Prevent chip buslock deadlock · bf5cd0c6
      Thomas Gleixner authored
      commit abc7e40c upstream.
      
      If a interrupt chip utilizes chip->buslock then free_irq() can
      deadlock in the following way:
      
      CPU0				CPU1
      				interrupt(X) (Shared or spurious)
      free_irq(X)			interrupt_thread(X)
      chip_bus_lock(X)
      				   irq_finalize_oneshot(X)
      				     chip_bus_lock(X)
      synchronize_irq(X)
      
      synchronize_irq() waits for the interrupt thread to complete,
      i.e. forever.
      
      Solution is simple: Drop chip_bus_lock() before calling
      synchronize_irq() as we do with the irq_desc lock. There is nothing to
      be protected after the point where irq_desc lock has been released.
      
      This adds chip_bus_lock/unlock() to the remove_irq() code path, but
      that's actually correct in the case where remove_irq() is called on
      such an interrupt. The current users of remove_irq() are not affected
      as none of those interrupts is on a chip which requires buslock.
      Reported-by: default avatarFredrik Markström <fredrik.markstrom@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bf5cd0c6
    • Hannes Frederic Sowa's avatar
      unix: correctly track in-flight fds in sending process user_struct · 341f09c0
      Hannes Frederic Sowa authored
      commit 415e3d3e upstream.
      
      The commit referenced in the Fixes tag incorrectly accounted the number
      of in-flight fds over a unix domain socket to the original opener
      of the file-descriptor. This allows another process to arbitrary
      deplete the original file-openers resource limit for the maximum of
      open files. Instead the sending processes and its struct cred should
      be credited.
      
      To do so, we add a reference counted struct user_struct pointer to the
      scm_fp_list and use it to account for the number of inflight unix fds.
      
      Fixes: 712f4aad ("unix: properly account for FDs passed over unix sockets")
      Reported-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Cc: David Herrmann <dh.herrmann@gmail.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      341f09c0
    • Olga Kornievskaia's avatar
      Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount · 59ae7b1c
      Olga Kornievskaia authored
      commit a41cbe86 upstream.
      
      A test case is as the description says:
      open(foobar, O_WRONLY);
      sleep()  --> reboot the server
      close(foobar)
      
      The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
      line before going to restart, there is
      clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
      
      NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
      owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
      value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
      out state and when we go to close it, “call_close” doesn’t get set as
      state flag is not set and CLOSE doesn’t go on the wire.
      Signed-off-by: default avatarOlga Kornievskaia <aglo@umich.edu>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59ae7b1c
    • Christophe Leroy's avatar
      splice: sendfile() at once fails for big files · 4c67196f
      Christophe Leroy authored
      commit 0ff28d9f upstream.
      
      Using sendfile with below small program to get MD5 sums of some files,
      it appear that big files (over 64kbytes with 4k pages system) get a
      wrong MD5 sum while small files get the correct sum.
      This program uses sendfile() to send a file to an AF_ALG socket
      for hashing.
      
      /* md5sum2.c */
      #include <stdio.h>
      #include <stdlib.h>
      #include <unistd.h>
      #include <string.h>
      #include <fcntl.h>
      #include <sys/socket.h>
      #include <sys/stat.h>
      #include <sys/types.h>
      #include <linux/if_alg.h>
      
      int main(int argc, char **argv)
      {
      	int sk = socket(AF_ALG, SOCK_SEQPACKET, 0);
      	struct stat st;
      	struct sockaddr_alg sa = {
      		.salg_family = AF_ALG,
      		.salg_type = "hash",
      		.salg_name = "md5",
      	};
      	int n;
      
      	bind(sk, (struct sockaddr*)&sa, sizeof(sa));
      
      	for (n = 1; n < argc; n++) {
      		int size;
      		int offset = 0;
      		char buf[4096];
      		int fd;
      		int sko;
      		int i;
      
      		fd = open(argv[n], O_RDONLY);
      		sko = accept(sk, NULL, 0);
      		fstat(fd, &st);
      		size = st.st_size;
      		sendfile(sko, fd, &offset, size);
      		size = read(sko, buf, sizeof(buf));
      		for (i = 0; i < size; i++)
      			printf("%2.2x", buf[i]);
      		printf("  %s\n", argv[n]);
      		close(fd);
      		close(sko);
      	}
      	exit(0);
      }
      
      Test below is done using official linux patch files. First result is
      with a software based md5sum. Second result is with the program above.
      
      root@vgoip:~# ls -l patch-3.6.*
      -rw-r--r--    1 root     root         64011 Aug 24 12:01 patch-3.6.2.gz
      -rw-r--r--    1 root     root         94131 Aug 24 12:01 patch-3.6.3.gz
      
      root@vgoip:~# md5sum patch-3.6.*
      b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
      c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz
      
      root@vgoip:~# ./md5sum2 patch-3.6.*
      b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
      5fd77b24e68bb24dcc72d6e57c64790e  patch-3.6.3.gz
      
      After investivation, it appears that sendfile() sends the files by blocks
      of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each
      block, the SPLICE_F_MORE flag is missing, therefore the hashing operation
      is reset as if it was the end of the file.
      
      This patch adds SPLICE_F_MORE to the flags when more data is pending.
      
      With the patch applied, we get the correct sums:
      
      root@vgoip:~# md5sum patch-3.6.*
      b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
      c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz
      
      root@vgoip:~# ./md5sum2 patch-3.6.*
      b3ffb9848196846f31b2ff133d2d6443  patch-3.6.2.gz
      c5e8f687878457db77cb7158c38a7e43  patch-3.6.3.gz
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c67196f
    • James Hogan's avatar
      MIPS: KVM: Uninit VCPU in vcpu_create error path · 23a9a7dc
      James Hogan authored
      commit 585bb8f9 upstream.
      
      If either of the memory allocations in kvm_arch_vcpu_create() fail, the
      vcpu which has been allocated and kvm_vcpu_init'd doesn't get uninit'd
      in the error handling path. Add a call to kvm_vcpu_uninit() to fix this.
      
      Fixes: 669e846e ("KVM/MIPS32: MIPS arch specific APIs for KVM")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23a9a7dc
    • James Hogan's avatar
      MIPS: KVM: Fix CACHE immediate offset sign extension · 7b0fc451
      James Hogan authored
      commit c5c2a3b9 upstream.
      
      The immediate field of the CACHE instruction is signed, so ensure that
      it gets sign extended by casting it to an int16_t rather than just
      masking the low 16 bits.
      
      Fixes: e685c689 ("KVM/MIPS32: Privileged instruction/target branch emulation.")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b0fc451
    • James Hogan's avatar
      MIPS: KVM: Fix ASID restoration logic · 5525dd65
      James Hogan authored
      commit 002374f3 upstream.
      
      ASID restoration on guest resume should determine the guest execution
      mode based on the guest Status register rather than bit 30 of the guest
      PC.
      
      Fix the two places in locore.S that do this, loading the guest status
      from the cop0 area. Note, this assembly is specific to the trap &
      emulate implementation of KVM, so it doesn't need to check the
      supervisor bit as that mode is not implemented in the guest.
      
      Fixes: b680f70f ("KVM/MIPS32: Entry point for trampolining to...")
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5525dd65
    • Hariprasad S's avatar
      iw_cxgb3: Fix incorrectly returning error on success · 1630624d
      Hariprasad S authored
      commit 67f1aee6 upstream.
      
      The cxgb3_*_send() functions return NET_XMIT_ values, which are
      positive integers values. So don't treat positive return values
      as an error.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      [a pox on developers and maintainers who do not cc: stable for bug fixes like this - gregkh]
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1630624d
    • Corey Wright's avatar
      proc: Fix ptrace-based permission checks for accessing task maps · b048b93f
      Corey Wright authored
      Modify mm_access() calls in fs/proc/task_mmu.c and fs/proc/task_nommu.c to
      have the mode include PTRACE_MODE_FSCREDS so accessing /proc/pid/maps and
      /proc/pid/pagemap is not denied to all users.
      
      In backporting upstream commit caaee623 to pre-3.18 kernel versions it was
      overlooked that mm_access() is used in fs/proc/task_*mmu.c as those calls
      were removed in 3.18 (by upstream commit 29a40ace) and did not exist at the
      time of the original commit.
      Signed-off-by: default avatarCorey Wright <undefined@pobox.com>
      Acked-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b048b93f
    • Bjørn Mork's avatar
      USB: option: add "4G LTE usb-modem U901" · db9b6792
      Bjørn Mork authored
      commit d061c1ca upstream.
      
      Thomas reports:
      
      T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#=  4 Spd=480 MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=05c6 ProdID=6001 Rev=00.00
      S:  Manufacturer=USB Modem
      S:  Product=USB Modem
      S:  SerialNumber=1234567890ABCDEF
      C:  #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
      I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      I:  If#= 4 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
      Reported-by: default avatarThomas Schäfer <tschaefer@t-online.de>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db9b6792
    • Andrey Skvortsov's avatar
      USB: option: add support for SIM7100E · b920f51b
      Andrey Skvortsov authored
      commit 3158a8d4 upstream.
      
      $ lsusb:
      Bus 001 Device 101: ID 1e0e:9001 Qualcomm / Option
      
      $ usb-devices:
      T:  Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#=101 Spd=480  MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  2
      P:  Vendor=1e0e ProdID=9001 Rev= 2.32
      S:  Manufacturer=SimTech, Incorporated
      S:  Product=SimTech, Incorporated
      S:  SerialNumber=0123456789ABCDEF
      C:* #Ifs= 7 Cfg#= 1 Atr=80 MxPwr=500mA
      I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
      I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
      
      The last interface (6) is used for Android Composite ADB interface.
      
      Serial port layout:
      0: QCDM/DIAG
      1: NMEA
      2: AT
      3: AT/PPP
      4: audio
      Signed-off-by: default avatarAndrey Skvortsov <andrej.skvortzov@gmail.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b920f51b
    • Ken Lin's avatar
      USB: cp210x: add IDs for GE B650V3 and B850V3 boards · 9195b273
      Ken Lin authored
      commit 6627ae19 upstream.
      
      Add USB ID for cp2104/5 devices on GE B650v3 and B850v3 boards.
      Signed-off-by: default avatarKen Lin <ken.lin@advantech.com.tw>
      Signed-off-by: default avatarAkshay Bhat <akshay.bhat@timesys.com>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9195b273
    • Gerhard Uttenthaler's avatar
      can: ems_usb: Fix possible tx overflow · fadc5c17
      Gerhard Uttenthaler authored
      commit 90cfde46 upstream.
      
      This patch fixes the problem that more CAN messages could be sent to the
      interface as could be send on the CAN bus. This was more likely for slow baud
      rates. The sleeping _start_xmit was woken up in the _write_bulk_callback. Under
      heavy TX load this produced another bulk transfer without checking the
      free_slots variable and hence caused the overflow in the interface.
      Signed-off-by: default avatarGerhard Uttenthaler <uttenthaler@ems-wuensche.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fadc5c17
    • Nikolay Borisov's avatar
      dm thin: fix race condition when destroying thin pool workqueue · 86d80ecd
      Nikolay Borisov authored
      commit 18d03e8c upstream.
      
      When a thin pool is being destroyed delayed work items are
      cancelled using cancel_delayed_work(), which doesn't guarantee that on
      return the delayed item isn't running.  This can cause the work item to
      requeue itself on an already destroyed workqueue.  Fix this by using
      cancel_delayed_work_sync() which guarantees that on return the work item
      is not running anymore.
      
      Fixes: 905e51b3 ("dm thin: commit outstanding data every second")
      Fixes: 85ad643b ("dm thin: add timeout to stop out-of-data-space mode holding IO forever")
      Signed-off-by: default avatarNikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86d80ecd
    • Joe Thornber's avatar
      dm thin metadata: fix bug when taking a metadata snapshot · 9bb86db1
      Joe Thornber authored
      commit 49e99fc7 upstream.
      
      When you take a metadata snapshot the btree roots for the mapping and
      details tree need to have their reference counts incremented so they
      persist for the lifetime of the metadata snap.
      
      The roots being incremented were those currently written in the
      superblock, which could possibly be out of date if concurrent IO is
      triggering new mappings, breaking of sharing, etc.
      
      Fix this by performing a commit with the metadata lock held while taking
      a metadata snapshot.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9bb86db1
    • Ingo Molnar's avatar
      efi: Disable interrupts around EFI calls, not in the epilog/prolog calls · 9ee0d9ad
      Ingo Molnar authored
      commit 23a0d4e8 upstream.
      
      Tapasweni Pathak reported that we do a kmalloc() in efi_call_phys_prolog()
      on x86-64 while having interrupts disabled, which is a big no-no, as
      kmalloc() can sleep.
      
      Solve this by removing the irq disabling from the prolog/epilog calls
      around EFI calls: it's unnecessary, as in this stage we are single
      threaded in the boot thread, and we don't ever execute this from
      interrupt contexts.
      Reported-by: default avatarTapasweni Pathak <tapaswenipathak@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      [ luis: backported to 3.10: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9ee0d9ad
    • Dave Airlie's avatar
      drm/radeon: fix hotplug race at startup · f2419de6
      Dave Airlie authored
      commit 7f98ca45 upstream.
      
      We apparantly get a hotplug irq before we've initialised
      modesetting,
      
      [drm] Loading R100 Microcode
      BUG: unable to handle kernel NULL pointer dereference at   (null)
      IP: [<c125f56f>] __mutex_lock_slowpath+0x23/0x91
      *pde = 00000000
      Oops: 0002 [#1]
      Modules linked in: radeon(+) drm_kms_helper ttm drm i2c_algo_bit backlight pcspkr psmouse evdev sr_mod input_leds led_class cdrom sg parport_pc parport floppy intel_agp intel_gtt lpc_ich acpi_cpufreq processor button mfd_core agpgart uhci_hcd ehci_hcd rng_core snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm usbcore usb_common i2c_i801 i2c_core snd_timer snd soundcore thermal_sys
      CPU: 0 PID: 15 Comm: kworker/0:1 Not tainted 4.2.0-rc7-00015-gbf674028 #111
      Hardware name: MicroLink                               /D850MV                         , BIOS MV85010A.86A.0067.P24.0304081124 04/08/2003
      Workqueue: events radeon_hotplug_work_func [radeon]
      task: f6ca5900 ti: f6d3e000 task.ti: f6d3e000
      EIP: 0060:[<c125f56f>] EFLAGS: 00010282 CPU: 0
      EIP is at __mutex_lock_slowpath+0x23/0x91
      EAX: 00000000 EBX: f5e900fc ECX: 00000000 EDX: fffffffe
      ESI: f6ca5900 EDI: f5e90100 EBP: f5e90000 ESP: f6d3ff0c
       DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
      CR0: 8005003b CR2: 00000000 CR3: 36f61000 CR4: 000006d0
      Stack:
       f5e90100 00000000 c103c4c1 f6d2a5a0 f5e900fc f6df394c c125f162 f8b0faca
       f6d2a5a0 c138ca00 f6df394c f7395600 c1034741 00d40000 00000000 f6d2a5a0
       c138ca00 f6d2a5b8 c138ca10 c1034b58 00000001 f6d40000 f6ca5900 f6d0c940
      Call Trace:
       [<c103c4c1>] ? dequeue_task_fair+0xa4/0xb7
       [<c125f162>] ? mutex_lock+0x9/0xa
       [<f8b0faca>] ? radeon_hotplug_work_func+0x17/0x57 [radeon]
       [<c1034741>] ? process_one_work+0xfc/0x194
       [<c1034b58>] ? worker_thread+0x18d/0x218
       [<c10349cb>] ? rescuer_thread+0x1d5/0x1d5
       [<c103742a>] ? kthread+0x7b/0x80
       [<c12601c0>] ? ret_from_kernel_thread+0x20/0x30
       [<c10373af>] ? init_completion+0x18/0x18
      Code: 42 08 e8 8e a6 dd ff c3 57 56 53 83 ec 0c 8b 35 48 f7 37 c1 8b 10 4a 74 1a 89 c3 8d 78 04 8b 40 08 89 63
      Reported-and-Tested-by: default avatarMeelis Roos <mroos@linux.ee>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2419de6
    • Kamal Mostafa's avatar
      tools: Add a "make all" rule · 8719b00b
      Kamal Mostafa authored
      commit f6ba98c5 upstream.
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Pali Rohar <pali.rohar@gmail.com>
      Cc: Roberta Dobrescu <roberta.dobrescu@gmail.com>
      Link: http://lkml.kernel.org/r/1447280736-2161-2-git-send-email-kamal@canonical.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      [ kamal: backport to 3.10-stable: build all tools for this version ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8719b00b
    • Zheng Liu's avatar
      bcache: unregister reboot notifier if bcache fails to unregister device · 9fac6600
      Zheng Liu authored
      commit 2ecf0cdb upstream.
      
      In bcache_init() function it forgot to unregister reboot notifier if
      bcache fails to unregister a block device.  This commit fixes this.
      Signed-off-by: default avatarZheng Liu <wenqing.lz@taobao.com>
      Tested-by: default avatarJoshua Schmid <jschmid@suse.com>
      Tested-by: default avatarEric Wheeler <bcache@linux.ewheeler.net>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fac6600
    • Andrey Vagin's avatar
      netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get · f79d019c
      Andrey Vagin authored
      commit c6825c09 upstream.
      
      Lets look at destroy_conntrack:
      
      hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
      ...
      nf_conntrack_free(ct)
      	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
      
      net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
      
      The hash is protected by rcu, so readers look up conntracks without
      locks.
      A conntrack is removed from the hash, but in this moment a few readers
      still can use the conntrack. Then this conntrack is released and another
      thread creates conntrack with the same address and the equal tuple.
      After this a reader starts to validate the conntrack:
      * It's not dying, because a new conntrack was created
      * nf_ct_tuple_equal() returns true.
      
      But this conntrack is not initialized yet, so it can not be used by two
      threads concurrently. In this case BUG_ON may be triggered from
      nf_nat_setup_info().
      
      Florian Westphal suggested to check the confirm bit too. I think it's
      right.
      
      task 1			task 2			task 3
      			nf_conntrack_find_get
      			 ____nf_conntrack_find
      destroy_conntrack
       hlist_nulls_del_rcu
       nf_conntrack_free
       kmem_cache_free
      						__nf_conntrack_alloc
      						 kmem_cache_alloc
      						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
      			 if (nf_ct_is_dying(ct))
      			 if (!nf_ct_tuple_equal()
      
      I'm not sure, that I have ever seen this race condition in a real life.
      Currently we are investigating a bug, which is reproduced on a few nodes.
      In our case one conntrack is initialized from a few tasks concurrently,
      we don't have any other explanation for this.
      
      <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
      ...
      <4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>]  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat]
      ...
      <4>[46267.085549] Call Trace:
      <4>[46267.085622]  [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat]
      <4>[46267.085697]  [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
      <4>[46267.085770]  [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat]
      <4>[46267.085843]  [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat]
      <4>[46267.085919]  [<ffffffff814841b9>] nf_iterate+0x69/0xb0
      <4>[46267.085991]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
      <4>[46267.086063]  [<ffffffff81484374>] nf_hook_slow+0x74/0x110
      <4>[46267.086133]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
      <4>[46267.086207]  [<ffffffff814b5890>] ? dst_output+0x0/0x20
      <4>[46267.086277]  [<ffffffff81495204>] ip_output+0xa4/0xc0
      <4>[46267.086346]  [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910
      <4>[46267.086419]  [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0
      <4>[46267.086491]  [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50
      <4>[46267.086562]  [<ffffffff81444d67>] sock_sendmsg+0x117/0x140
      <4>[46267.086638]  [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20
      <4>[46267.086712]  [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40
      <4>[46267.086785]  [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80
      <4>[46267.086858]  [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20
      <4>[46267.086936]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
      <4>[46267.087006]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
      <4>[46267.087081]  [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0
      <4>[46267.087151]  [<ffffffff81445599>] sys_sendto+0x139/0x190
      <4>[46267.087229]  [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0
      <4>[46267.087303]  [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200
      <4>[46267.087378]  [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290
      <4>[46267.087454]  [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210
      <4>[46267.087531]  [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210
      <4>[46267.087607]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
      <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
      <1>[46267.088023] RIP  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f79d019c
    • Egbert Eich's avatar
      drm/ast: Initialized data needed to map fbdev memory · 9f5bc010
      Egbert Eich authored
      commit 28fb4cb7 upstream.
      
      Due to a missing initialization there was no way to map fbdev memory.
      Thus for example using the Xserver with the fbdev driver failed.
      This fix adds initialization for fix.smem_start and fix.smem_len
      in the fb_info structure, which fixes this problem.
      Requested-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarEgbert Eich <eich@suse.de>
      [pulled from SuSE tree by me - airlied]
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f5bc010
    • Steven Rostedt (Red Hat)'s avatar
      tracepoints: Do not trace when cpu is offline · c3e07084
      Steven Rostedt (Red Hat) authored
      commit f3775549 upstream.
      
      The tracepoint infrastructure uses RCU sched protection to enable and
      disable tracepoints safely. There are some instances where tracepoints are
      used in infrastructure code (like kfree()) that get called after a CPU is
      going offline, and perhaps when it is coming back online but hasn't been
      registered yet.
      
      This can probuce the following warning:
      
       [ INFO: suspicious RCU usage. ]
       4.4.0-00006-g0fe53e8-dirty #34 Tainted: G S
       -------------------------------
       include/trace/events/kmem.h:141 suspicious rcu_dereference_check() usage!
      
       other info that might help us debug this:
      
       RCU used illegally from offline CPU!  rcu_scheduler_active = 1, debug_locks = 1
       no locks held by swapper/8/0.
      
       stack backtrace:
        CPU: 8 PID: 0 Comm: swapper/8 Tainted: G S              4.4.0-00006-g0fe53e8-dirty #34
        Call Trace:
        [c0000005b76c78d0] [c0000000008b9540] .dump_stack+0x98/0xd4 (unreliable)
        [c0000005b76c7950] [c00000000010c898] .lockdep_rcu_suspicious+0x108/0x170
        [c0000005b76c79e0] [c00000000029adc0] .kfree+0x390/0x440
        [c0000005b76c7a80] [c000000000055f74] .destroy_context+0x44/0x100
        [c0000005b76c7b00] [c0000000000934a0] .__mmdrop+0x60/0x150
        [c0000005b76c7b90] [c0000000000e3ff0] .idle_task_exit+0x130/0x140
        [c0000005b76c7c20] [c000000000075804] .pseries_mach_cpu_die+0x64/0x310
        [c0000005b76c7cd0] [c000000000043e7c] .cpu_die+0x3c/0x60
        [c0000005b76c7d40] [c0000000000188d8] .arch_cpu_idle_dead+0x28/0x40
        [c0000005b76c7db0] [c000000000101e6c] .cpu_startup_entry+0x50c/0x560
        [c0000005b76c7ed0] [c000000000043bd8] .start_secondary+0x328/0x360
        [c0000005b76c7f90] [c000000000008a6c] start_secondary_prolog+0x10/0x14
      
      This warning is not a false positive either. RCU is not protecting code that
      is being executed while the CPU is offline.
      
      Instead of playing "whack-a-mole(TM)" and adding conditional statements to
      the tracepoints we find that are used in this instance, simply add a
      cpu_online() test to the tracepoint code where the tracepoint will be
      ignored if the CPU is offline.
      
      Use of raw_smp_processor_id() is fine, as there should never be a case where
      the tracepoint code goes from running on a CPU that is online and suddenly
      gets migrated to a CPU that is offline.
      
      Link: http://lkml.kernel.org/r/1455387773-4245-1-git-send-email-kda@linux-powerpc.orgReported-by: default avatarDenis Kirjanov <kda@linux-powerpc.org>
      Fixes: 97e1c18e ("tracing: Kernel Tracepoints")
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3e07084