1. 16 Mar, 2017 7 commits
    • Steven Rostedt (VMware)'s avatar
      sched/rt: Add comments describing the RT IPI pull method · 3e777f99
      Steven Rostedt (VMware) authored
      While looking into optimizations for the RT scheduler IPI logic, I realized
      that the comments are lacking to describe it efficiently. It deserves a
      lengthy description describing its design.
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170228155030.30c69068@gandalf.local.home
      [ Small typographical edits. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3e777f99
    • Steven Rostedt (VMware)'s avatar
      sched/deadline: Use deadline instead of period when calculating overflow · 2317d5f1
      Steven Rostedt (VMware) authored
      I was testing Daniel's changes with his test case, and tweaked it a
      little. Instead of having the runtime equal to the deadline, I
      increased the deadline ten fold.
      
      Daniel's test case had:
      
      	attr.sched_runtime  = 2 * 1000 * 1000;		/* 2 ms */
      	attr.sched_deadline = 2 * 1000 * 1000;		/* 2 ms */
      	attr.sched_period   = 2 * 1000 * 1000 * 1000;	/* 2 s */
      
      To make it more interesting, I changed it to:
      
      	attr.sched_runtime  =  2 * 1000 * 1000;		/* 2 ms */
      	attr.sched_deadline = 20 * 1000 * 1000;		/* 20 ms */
      	attr.sched_period   =  2 * 1000 * 1000 * 1000;	/* 2 s */
      
      The results were rather surprising. The behavior that Daniel's patch
      was fixing came back. The task started using much more than .1% of the
      CPU. More like 20%.
      
      Looking into this I found that it was due to the dl_entity_overflow()
      constantly returning true. That's because it uses the relative period
      against relative runtime vs the absolute deadline against absolute
      runtime.
      
        runtime / (deadline - t) > dl_runtime / dl_period
      
      There's even a comment mentioning this, and saying that when relative
      deadline equals relative period, that the equation is the same as using
      deadline instead of period. That comment is backwards! What we really
      want is:
      
        runtime / (deadline - t) > dl_runtime / dl_deadline
      
      We care about if the runtime can make its deadline, not its period. And
      then we can say "when the deadline equals the period, the equation is
      the same as using dl_period instead of dl_deadline".
      
      After correcting this, now when the task gets enqueued, it can throttle
      correctly, and Daniel's fix to the throttling of sleeping deadline
      tasks works even when the runtime and deadline are not the same.
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@santannapisa.it>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
      Link: http://lkml.kernel.org/r/02135a27f1ae3fe5fd032568a5a2f370e190e8d7.1488392936.git.bristot@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2317d5f1
    • Daniel Bristot de Oliveira's avatar
      sched/deadline: Throttle a constrained deadline task activated after the deadline · df8eac8c
      Daniel Bristot de Oliveira authored
      During the activation, CBS checks if it can reuse the current task's
      runtime and period. If the deadline of the task is in the past, CBS
      cannot use the runtime, and so it replenishes the task. This rule
      works fine for implicit deadline tasks (deadline == period), and the
      CBS was designed for implicit deadline tasks. However, a task with
      constrained deadline (deadine < period) might be awakened after the
      deadline, but before the next period. In this case, replenishing the
      task would allow it to run for runtime / deadline. As in this case
      deadline < period, CBS enables a task to run for more than the
      runtime / period. In a very loaded system, this can cause a domino
      effect, making other tasks miss their deadlines.
      
      To avoid this problem, in the activation of a constrained deadline
      task after the deadline but before the next period, throttle the
      task and set the replenishing timer to the begin of the next period,
      unless it is boosted.
      
      Reproducer:
      
       --------------- %< ---------------
        int main (int argc, char **argv)
        {
      	int ret;
      	int flags = 0;
      	unsigned long l = 0;
      	struct timespec ts;
      	struct sched_attr attr;
      
      	memset(&attr, 0, sizeof(attr));
      	attr.size = sizeof(attr);
      
      	attr.sched_policy   = SCHED_DEADLINE;
      	attr.sched_runtime  = 2 * 1000 * 1000;		/* 2 ms */
      	attr.sched_deadline = 2 * 1000 * 1000;		/* 2 ms */
      	attr.sched_period   = 2 * 1000 * 1000 * 1000;	/* 2 s */
      
      	ts.tv_sec = 0;
      	ts.tv_nsec = 2000 * 1000;			/* 2 ms */
      
      	ret = sched_setattr(0, &attr, flags);
      
      	if (ret < 0) {
      		perror("sched_setattr");
      		exit(-1);
      	}
      
      	for(;;) {
      		/* XXX: you may need to adjust the loop */
      		for (l = 0; l < 150000; l++);
      		/*
      		 * The ideia is to go to sleep right before the deadline
      		 * and then wake up before the next period to receive
      		 * a new replenishment.
      		 */
      		nanosleep(&ts, NULL);
      	}
      
      	exit(0);
        }
        --------------- >% ---------------
      
      On my box, this reproducer uses almost 50% of the CPU time, which is
      obviously wrong for a task with 2/2000 reservation.
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@santannapisa.it>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
      Link: http://lkml.kernel.org/r/edf58354e01db46bf42df8d2dd32418833f68c89.1488392936.git.bristot@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      df8eac8c
    • Daniel Bristot de Oliveira's avatar
      sched/deadline: Make sure the replenishment timer fires in the next period · 5ac69d37
      Daniel Bristot de Oliveira authored
      Currently, the replenishment timer is set to fire at the deadline
      of a task. Although that works for implicit deadline tasks because the
      deadline is equals to the begin of the next period, that is not correct
      for constrained deadline tasks (deadline < period).
      
      For instance:
      
      f.c:
       --------------- %< ---------------
      int main (void)
      {
      	for(;;);
      }
       --------------- >% ---------------
      
        # gcc -o f f.c
      
        # trace-cmd record -e sched:sched_switch                              \
      				   -e syscalls:sys_exit_sched_setattr   \
         chrt -d --sched-runtime  490000000					\
                 --sched-deadline 500000000					\
      	   --sched-period  1000000000 0 ./f
      
        # trace-cmd report | grep "{pid of ./f}"
      
      After setting parameters, the task is replenished and continue running
      until being throttled:
      
               f-11295 [003] 13322.113776: sys_exit_sched_setattr: 0x0
      
      The task is throttled after running 492318 ms, as expected:
      
               f-11295 [003] 13322.606094: sched_switch:   f:11295 [-1] R ==> watchdog/3:32 [0]
      
      But then, the task is replenished 500719 ms after the first
      replenishment:
      
          <idle>-0     [003] 13322.614495: sched_switch:   swapper/3:0 [120] R ==> f:11295 [-1]
      
      Running for 490277 ms:
      
               f-11295 [003] 13323.104772: sched_switch:   f:11295 [-1] R ==>  swapper/3:0 [120]
      
      Hence, in the first period, the task runs 2 * runtime, and that is a bug.
      
      During the first replenishment, the next deadline is set one period away.
      So the runtime / period starts to be respected. However, as the second
      replenishment took place in the wrong instant, the next replenishment
      will also be held in a wrong instant of time. Rather than occurring in
      the nth period away from the first activation, it is taking place
      in the (nth period - relative deadline).
      Signed-off-by: default avatarDaniel Bristot de Oliveira <bristot@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarLuca Abeni <luca.abeni@santannapisa.it>
      Reviewed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Reviewed-by: default avatarJuri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
      Link: http://lkml.kernel.org/r/ac50d89887c25285b47465638354b63362f8adff.1488392936.git.bristot@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5ac69d37
    • Matt Fleming's avatar
      sched/loadavg: Use {READ,WRITE}_ONCE() for sample window · caeb5882
      Matt Fleming authored
      'calc_load_update' is accessed without any kind of locking and there's
      a clear assumption in the code that only a single value is read or
      written.
      
      Make this explicit by using READ_ONCE() and WRITE_ONCE(), and avoid
      unintentionally seeing multiple values, or having the load/stores
      split.
      
      Technically the loads in calc_global_*() don't require this since
      those are the only functions that update 'calc_load_update', but I've
      added the READ_ONCE() for consistency.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Link: http://lkml.kernel.org/r/20170217120731.11868-3-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      caeb5882
    • Matt Fleming's avatar
      sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting · 6e5f32f7
      Matt Fleming authored
      If we crossed a sample window while in NO_HZ we will add LOAD_FREQ to
      the pending sample window time on exit, setting the next update not
      one window into the future, but two.
      
      This situation on exiting NO_HZ is described by:
      
        this_rq->calc_load_update < jiffies < calc_load_update
      
      In this scenario, what we should be doing is:
      
        this_rq->calc_load_update = calc_load_update		     [ next window ]
      
      But what we actually do is:
      
        this_rq->calc_load_update = calc_load_update + LOAD_FREQ   [ next+1 window ]
      
      This has the effect of delaying load average updates for potentially
      up to ~9seconds.
      
      This can result in huge spikes in the load average values due to
      per-cpu uninterruptible task counts being out of sync when accumulated
      across all CPUs.
      
      It's safe to update the per-cpu active count if we wake between sample
      windows because any load that we left in 'calc_load_idle' will have
      been zero'd when the idle load was folded in calc_global_load().
      
      This issue is easy to reproduce before,
      
        commit 9d89c257 ("sched/fair: Rewrite runnable load and utilization average tracking")
      
      just by forking short-lived process pipelines built from ps(1) and
      grep(1) in a loop. I'm unable to reproduce the spikes after that
      commit, but the bug still seems to be present from code review.
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Fixes: commit 5167e8d5 ("sched/nohz: Rewrite and fix load-avg computation -- again")
      Link: http://lkml.kernel.org/r/20170217120731.11868-2-matt@codeblueprint.co.ukSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6e5f32f7
    • Wanpeng Li's avatar
      sched/deadline: Add missing update_rq_clock() in dl_task_timer() · dcc3b5ff
      Wanpeng Li authored
      The following warning can be triggered by hot-unplugging the CPU
      on which an active SCHED_DEADLINE task is running on:
      
       ------------[ cut here ]------------
       WARNING: CPU: 7 PID: 0 at kernel/sched/sched.h:833 replenish_dl_entity+0x71e/0xc40
       rq->clock_update_flags < RQCF_ACT_SKIP
       CPU: 7 PID: 0 Comm: swapper/7 Tainted: G    B           4.11.0-rc1+ #24
       Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
       Call Trace:
        <IRQ>
        dump_stack+0x85/0xc4
        __warn+0x172/0x1b0
        warn_slowpath_fmt+0xb4/0xf0
        ? __warn+0x1b0/0x1b0
        ? debug_check_no_locks_freed+0x2c0/0x2c0
        ? cpudl_set+0x3d/0x2b0
        replenish_dl_entity+0x71e/0xc40
        enqueue_task_dl+0x2ea/0x12e0
        ? dl_task_timer+0x777/0x990
        ? __hrtimer_run_queues+0x270/0xa50
        dl_task_timer+0x316/0x990
        ? enqueue_task_dl+0x12e0/0x12e0
        ? enqueue_task_dl+0x12e0/0x12e0
        __hrtimer_run_queues+0x270/0xa50
        ? hrtimer_cancel+0x20/0x20
        ? hrtimer_interrupt+0x119/0x600
        hrtimer_interrupt+0x19c/0x600
        ? trace_hardirqs_off+0xd/0x10
        local_apic_timer_interrupt+0x74/0xe0
        smp_apic_timer_interrupt+0x76/0xa0
        apic_timer_interrupt+0x93/0xa0
      
      The DL task will be migrated to a suitable later deadline rq once the DL
      timer fires and currnet rq is offline. The rq clock of the new rq should
      be updated. This patch fixes it by updating the rq clock after holding
      the new rq's rq lock.
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1488865888-15894-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dcc3b5ff
  2. 15 Mar, 2017 6 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 69eea5a4
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Four small fixes for this cycle:
      
         - followup fix from Neil for a fix that went in before -rc2, ensuring
           that we always see the full per-task bio_list.
      
         - fix for blk-mq-sched from me that ensures that we retain similar
           direct-to-issue behavior on running the queue.
      
         - fix from Sagi fixing a potential NULL pointer dereference in blk-mq
           on spurious CPU unplug.
      
         - a memory leak fix in writeback from Tahsin, fixing a case where
           device removal of a mounted device can leak a struct
           wb_writeback_work"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq-sched: don't run the queue async from blk_mq_try_issue_directly()
        writeback: fix memory leak in wb_queue_work()
        blk-mq: Fix tagset reinit in the presence of cpu hot-unplug
        blk: Ensure users for current->bio_list can see the full list.
      69eea5a4
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 95422dec
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "This is a rather large set of fixes. The bulk are for lpfc correcting
        a lot of issues in the new NVME driver code which just went in in the
        merge window.
      
        The others are:
      
         - fix a hang in the vmware paravirt driver caused by incorrect
           handling of the new MSI vector allocation
      
         - long standing bug in storvsc, which recent block changes turned
           from being a harmless annoyance into a hang
      
         - yet more fallout (in mpt3sas) from the changes to device blocking
      
        The remainder are small fixes and updates"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (34 commits)
        scsi: lpfc: Add shutdown method for kexec
        scsi: storvsc: Workaround for virtual DVD SCSI version
        scsi: lpfc: revise version number to 11.2.0.10
        scsi: lpfc: code cleanups in NVME initiator discovery
        scsi: lpfc: code cleanups in NVME initiator base
        scsi: lpfc: correct rdp diag portnames
        scsi: lpfc: remove dead sli3 nvme code
        scsi: lpfc: correct double print
        scsi: lpfc: Rename LPFC_MAX_EQ_DELAY to LPFC_MAX_EQ_DELAY_EQID_CNT
        scsi: lpfc: Rework lpfc Kconfig for NVME options
        scsi: lpfc: add transport eh_timed_out reference
        scsi: lpfc: Fix eh_deadline setting for sli3 adapters.
        scsi: lpfc: add NVME exchange aborts
        scsi: lpfc: Fix nvme allocation bug on failed nvme_fc_register_localport
        scsi: lpfc: Fix IO submission if WQ is full
        scsi: lpfc: Fix NVME CMD IU byte swapped word 1 problem
        scsi: lpfc: Fix RCTL value on NVME LS request and response
        scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters
        scsi: lpfc: fix missing spin_unlock on sql_list_lock
        scsi: lpfc: don't dereference dma_buf->iocbq before null check
        ...
      95422dec
    • Linus Torvalds's avatar
      Merge tag 'gfs2-4.11-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · aabcf5fc
      Linus Torvalds authored
      Pull gfs2 fix from Bob Peterson:
       "This is an emergency patch for 4.11-rc3
      
        The GFS2 developers uncovered a really nasty problem that can lead to
        random corruption and kernel panic, much like the last one. Andreas
        Gruenbacher wrote a simple one-line patch to fix the problem."
      
      * tag 'gfs2-4.11-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Avoid alignment hole in struct lm_lockname
      aabcf5fc
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · defc7d75
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
      
       - self-test failure of crc32c on powerpc
      
       - regressions of ecb(aes) when used with xts/lrw in s5p-sss
      
       - a number of bugs in the omap RNG driver
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: s5p-sss - Fix spinlock recursion on LRW(AES)
        hwrng: omap - Do not access INTMASK_REG on EIP76
        hwrng: omap - use devm_clk_get() instead of of_clk_get()
        hwrng: omap - write registers after enabling the clock
        crypto: s5p-sss - Fix completing crypto request in IRQ handler
        crypto: powerpc - Fix initialisation of crc32c context
      defc7d75
    • Andreas Gruenbacher's avatar
      gfs2: Avoid alignment hole in struct lm_lockname · 28ea06c4
      Andreas Gruenbacher authored
      Commit 88ffbf3e switches to using rhashtables for glocks, hashing over
      the entire struct lm_lockname instead of its individual fields.  On some
      architectures, struct lm_lockname contains a hole of uninitialized
      memory due to alignment rules, which now leads to incorrect hash values.
      Get rid of that hole.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      CC: <stable@vger.kernel.org> #v4.3+
      28ea06c4
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ae50dfd6
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Ensure that mtu is at least IPV6_MIN_MTU in ipv6 VTI tunnel driver,
          from Steffen Klassert.
      
       2) Fix crashes when user tries to get_next_key on an LPM bpf map, from
          Alexei Starovoitov.
      
       3) Fix detection of VLAN fitlering feature for bnx2x VF devices, from
          Michal Schmidt.
      
       4) We can get a divide by zero when TCP socket are morphed into
          listening state, fix from Eric Dumazet.
      
       5) Fix socket refcounting bugs in skb_complete_wifi_ack() and
          skb_complete_tx_timestamp(). From Eric Dumazet.
      
       6) Use after free in dccp_feat_activate_values(), also from Eric
          Dumazet.
      
       7) Like bonding team needs to use ETH_MAX_MTU as netdev->max_mtu, from
          Jarod Wilson.
      
       8) Fix use after free in vrf_xmit(), from David Ahern.
      
       9) Don't do UDP Fragmentation Offload on IPComp ipsec packets, from
          Alexey Kodanev.
      
      10) Properly check napi_complete_done() return value in order to decide
          whether to re-enable IRQs or not in amd-xgbe driver, from Thomas
          Lendacky.
      
      11) Fix double free of hwmon device in marvell phy driver, from Andrew
          Lunn.
      
      12) Don't crash on malformed netlink attributes in act_connmark, from
          Etienne Noss.
      
      13) Don't remove routes with a higher metric in ipv6 ECMP route replace,
          from Sabrina Dubroca.
      
      14) Don't write into a cloned SKB in ipv6 fragmentation handling, from
          Florian Westphal.
      
      15) Fix routing redirect races in dccp and tcp, basically the ICMP
          handler can't modify the socket's cached route in it's locked by the
          user at this moment. From Jon Maxwell.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (108 commits)
        qed: Enable iSCSI Out-of-Order
        qed: Correct out-of-bound access in OOO history
        qed: Fix interrupt flags on Rx LL2
        qed: Free previous connections when releasing iSCSI
        qed: Fix mapping leak on LL2 rx flow
        qed: Prevent creation of too-big u32-chains
        qed: Align CIDs according to DORQ requirement
        mlxsw: reg: Fix SPVMLR max record count
        mlxsw: reg: Fix SPVM max record count
        net: Resend IGMP memberships upon peer notification.
        dccp: fix memory leak during tear-down of unsuccessful connection request
        tun: fix premature POLLOUT notification on tun devices
        dccp/tcp: fix routing redirect race
        ucc/hdlc: fix two little issue
        vxlan: fix ovs support
        net: use net->count to check whether a netns is alive or not
        bridge: drop netfilter fake rtable unconditionally
        ipv6: avoid write to a possibly cloned skb
        net: wimax/i2400m: fix NULL-deref at probe
        isdn/gigaset: fix NULL-deref at probe
        ...
      ae50dfd6
  3. 14 Mar, 2017 22 commits
  4. 13 Mar, 2017 5 commits
    • Nicolas Dichtel's avatar
      vxlan: fix ovs support · c80498e3
      Nicolas Dichtel authored
      The required changes in the function vxlan_dev_create() were missing
      in commit 8bcdc4f3.
      The vxlan device is not registered anymore after this patch and the error
      path causes an stack dump:
       WARNING: CPU: 3 PID: 1498 at net/core/dev.c:6713 rollback_registered_many+0x9d/0x3f0
      
      Fixes: 8bcdc4f3 ("vxlan: add changelink support")
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c80498e3
    • Andrey Vagin's avatar
      net: use net->count to check whether a netns is alive or not · 91864f58
      Andrey Vagin authored
      The previous idea was to check whether a net namespace is in
      net_exit_list or not. It doesn't work, because net->exit_list is used in
      __register_pernet_operations and __unregister_pernet_operations where
      all namespaces are added to a temporary list to make cleanup in a error
      case, so list_empty(&net->exit_list) always returns false.
      Reported-by: default avatarMantas Mikulėnas <grawity@gmail.com>
      Fixes: 002d8a1a ("net: skip genenerating uevents for network namespaces that are exiting")
      Signed-off-by: default avatarAndrei Vagin <avagin@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91864f58
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v4.11-2' of git://git.infradead.org/linux-platform-drivers-x86 · 065f3e49
      Linus Torvalds authored
      Pull x86 platform driver updates from Darren Hart:
       "Asus fixes for the airplane LED and a long awaited fujitsu cleanup.
      
        asus-wmi:
         - Remove quirk_no_rfkill
         - Detect quirk_no_rfkill from the DSDT
      
        fujitsu-laptop:
         - remove redundant MODULE_ALIAS entries
         - autodetect LCD interface on all models
         - simplify acpi_bus_register_driver() error handling
         - remove redundant forward declarations
         - replace numeric values with constants
         - rename FUNC_RFKILL to FUNC_FLAGS
         - make platform-related variables match naming convention
         - replace "hotkey" with "laptop" in symbol names
         - clearly denote backlight-related symbols"
      
      * tag 'platform-drivers-x86-v4.11-2' of git://git.infradead.org/linux-platform-drivers-x86:
        platform/x86: asus-wmi: Remove quirk_no_rfkill
        platform/x86: asus-wmi: Detect quirk_no_rfkill from the DSDT
        platform/x86: fujitsu-laptop: remove redundant MODULE_ALIAS entries
        platform/x86: fujitsu-laptop: autodetect LCD interface on all models
        platform/x86: fujitsu-laptop: simplify acpi_bus_register_driver() error handling
        platform/x86: fujitsu-laptop: remove redundant forward declarations
        platform/x86: fujitsu-laptop: replace numeric values with constants
        platform/x86: fujitsu-laptop: rename FUNC_RFKILL to FUNC_FLAGS
        platform/x86: fujitsu-laptop: make platform-related variables match naming convention
        platform/x86: fujitsu-laptop: replace "hotkey" with "laptop" in symbol names
        platform/x86: fujitsu-laptop: clearly denote backlight-related symbols
      065f3e49
    • Florian Westphal's avatar
      bridge: drop netfilter fake rtable unconditionally · a13b2082
      Florian Westphal authored
      Andreas reports kernel oops during rmmod of the br_netfilter module.
      Hannes debugged the oops down to a NULL rt6info->rt6i_indev.
      
      Problem is that br_netfilter has the nasty concept of adding a fake
      rtable to skb->dst; this happens in a br_netfilter prerouting hook.
      
      A second hook (in bridge LOCAL_IN) is supposed to remove these again
      before the skb is handed up the stack.
      
      However, on module unload hooks get unregistered which means an
      skb could traverse the prerouting hook that attaches the fake_rtable,
      while the 'fake rtable remove' hook gets removed from the hooklist
      immediately after.
      
      Fixes: 34666d46 ("netfilter: bridge: move br_netfilter out of the core")
      Reported-by: default avatarAndreas Karis <akaris@redhat.com>
      Debugged-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a13b2082
    • Florian Westphal's avatar
      ipv6: avoid write to a possibly cloned skb · 79e49503
      Florian Westphal authored
      ip6_fragment, in case skb has a fraglist, checks if the
      skb is cloned.  If it is, it will move to the 'slow path' and allocates
      new skbs for each fragment.
      
      However, right before entering the slowpath loop, it updates the
      nexthdr value of the last ipv6 extension header to NEXTHDR_FRAGMENT,
      to account for the fragment header that will be inserted in the new
      ipv6-fragment skbs.
      
      In case original skb is cloned this munges nexthdr value of another
      skb.  Avoid this by doing the nexthdr update for each of the new fragment
      skbs separately.
      
      This was observed with tcpdump on a bridge device where netfilter ipv6
      reassembly is active:  tcpdump shows malformed fragment headers as
      the l4 header (icmpv6, tcp, etc). is decoded as a fragment header.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: default avatarAndreas Karis <akaris@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79e49503