1. 10 Sep, 2015 18 commits
  2. 09 Sep, 2015 17 commits
    • Phil Sutter's avatar
      net: ipv6: use common fib_default_rule_pref · f53de1e9
      Phil Sutter authored
      This switches IPv6 policy routing to use the shared
      fib_default_rule_pref() function of IPv4 and DECnet. It is also used in
      multicast routing for IPv4 as well as IPv6.
      
      The motivation for this patch is a complaint about iproute2 behaving
      inconsistent between IPv4 and IPv6 when adding policy rules: Formerly,
      IPv6 rules were assigned a fixed priority of 0x3FFF whereas for IPv4 the
      assigned priority value was decreased with each rule added.
      
      Since then all users of the default_pref field have been converted to
      assign the generic function fib_default_rule_pref(), fib_nl_newrule()
      may just use it directly instead. Therefore get rid of the function
      pointer altogether and make fib_default_rule_pref() static, as it's not
      used outside fib_rules.c anymore.
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f53de1e9
    • Tobias Klauser's avatar
      net: ethoc: Remove unnecessary #ifdef CONFIG_OF · 444c5f92
      Tobias Klauser authored
      For !CONFIG_OF of_get_property() is defined to always return NULL. Thus
      there's no need to protect the call to of_get_property() with #ifdef
      CONFIG_OF.
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      444c5f92
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Fix 64-bits register writes · 03679a14
      Florian Fainelli authored
      The macro to write 64-bits quantities to the 32-bits register swapped
      the value and offsets arguments, we want to preserve the ordering of the
      arguments with respect to how writel() is implemented for instance:
      value first, offset/base second.
      
      Fixes: 246d7f77 ("net: dsa: add Broadcom SF2 switch driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03679a14
    • Alexei Starovoitov's avatar
      bpf: fix out of bounds access in verifier log · 687f0715
      Alexei Starovoitov authored
      when the verifier log is enabled the print_bpf_insn() is doing
      bpf_alu_string[BPF_OP(insn->code) >> 4]
      and
      bpf_jmp_string[BPF_OP(insn->code) >> 4]
      where BPF_OP is a 4-bit instruction opcode.
      Malformed insns can cause out of bounds access.
      Fix it by sizing arrays appropriately.
      
      The bug was found by clang address sanitizer with libfuzzer.
      Reported-by: default avatarYonghong Song <yhs@plumgrid.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      687f0715
    • Roopa Prabhu's avatar
      ipv6: fix multipath route replace error recovery · 6b9ea5a6
      Roopa Prabhu authored
      Problem:
      The ecmp route replace support for ipv6 in the kernel, deletes the
      existing ecmp route too early, ie when it installs the first nexthop.
      If there is an error in installing the subsequent nexthops, its too late
      to recover the already deleted existing route leaving the fib
      in an inconsistent state.
      
      This patch reduces the possibility of this by doing the following:
      a) Changes the existing multipath route add code to a two stage process:
        build rt6_infos + insert them
      	ip6_route_add rt6_info creation code is moved into
      	ip6_route_info_create.
      b) This ensures that most errors are caught during building rt6_infos
        and we fail early
      c) Separates multipath add and del code. Because add needs the special
        two stage mode in a) and delete essentially does not care.
      d) In any event if the code fails during inserting a route again, a
        warning is printed (This should be unlikely)
      
      Before the patch:
      $ip -6 route show
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
      
      /* Try replacing the route with a duplicate nexthop */
      $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
      fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
      swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
      RTNETLINK answers: File exists
      
      $ip -6 route show
      /* previously added ecmp route 3000:1000:1000:1000::2 dissappears from
       * kernel */
      
      After the patch:
      $ip -6 route show
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
      
      /* Try replacing the route with a duplicate nexthop */
      $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
      fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
      swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
      RTNETLINK answers: File exists
      
      $ip -6 route show
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
      3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
      
      Fixes: 27596472 ("ipv6: fix ECMP route replacement")
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Reviewed-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b9ea5a6
    • Daniel Borkmann's avatar
      ebpf: fix fd refcount leaks related to maps in bpf syscall · 592867bf
      Daniel Borkmann authored
      We may already have gotten a proper fd struct through fdget(), so
      whenever we return at the end of an map operation, we need to call
      fdput(). However, each map operation from syscall side first probes
      CHECK_ATTR() to verify that unused fields in the bpf_attr union are
      zero.
      
      In case of malformed input, we return with error, but the lookup to
      the map_fd was already performed at that time, so that we return
      without an corresponding fdput(). Fix it by performing an fdget()
      only right before bpf_map_get(). The fdget() invocation on maps in
      the verifier is not affected.
      
      Fixes: db20fd2b ("bpf: add lookup/update/delete/iterate methods to BPF maps")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      592867bf
    • Sasha Levin's avatar
      RDS: verify the underlying transport exists before creating a connection · 74e98eb0
      Sasha Levin authored
      There was no verification that an underlying transport exists when creating
      a connection, this would cause dereferencing a NULL ptr.
      
      It might happen on sockets that weren't properly bound before attempting to
      send a message, which will cause a NULL ptr deref:
      
      [135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
      [135546.051270] Modules linked in:
      [135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
      [135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
      [135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
      [135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
      [135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
      [135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
      [135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
      [135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
      [135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
      [135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
      [135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
      [135546.064723] Stack:
      [135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
      [135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
      [135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
      [135546.068629] Call Trace:
      [135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
      [135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
      [135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
      [135546.071981] rds_sendmsg (net/rds/send.c:1058)
      [135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
      [135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
      [135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
      [135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
      [135546.076349] ? __might_fault (mm/memory.c:3795)
      [135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
      [135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
      [135546.078856] SYSC_sendto (net/socket.c:1657)
      [135546.079596] ? SYSC_connect (net/socket.c:1628)
      [135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
      [135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
      [135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
      [135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
      [135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
      [135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
      [135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74e98eb0
    • David Vrabel's avatar
      xen-netback: require fewer guest Rx slots when not using GSO · 1d5d4852
      David Vrabel authored
      Commit f48da8b1 (xen-netback: fix
      unlimited guest Rx internal queue and carrier flapping) introduced a
      regression.
      
      The PV frontend in IPXE only places 4 requests on the guest Rx ring.
      Since netback required at least (MAX_SKB_FRAGS + 1) slots, IPXE could
      not receive any packets.
      
      a) If GSO is not enabled on the VIF, fewer guest Rx slots are required
         for the largest possible packet.  Calculate the required slots
         based on the maximum GSO size or the MTU.
      
         This calculation of the number of required slots relies on
         1650d545 (xen-netback: always fully coalesce guest Rx packets)
         which present in 4.0-rc1 and later.
      
      b) Reduce the Rx stall detection to checking for at least one
         available Rx request.  This is fine since we're predominately
         concerned with detecting interfaces which are down and thus have
         zero available Rx requests.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d5d4852
    • David S. Miller's avatar
      Merge branch 'cxgb4-fixes' · 9b57ab8b
      David S. Miller authored
      Hariprasad Shenai says:
      
      ====================
      cxgb4: Fix tx flit calculation and wc stat configuration
      
      This patch series fixes the following:
      Patch 1/2 fixes tx flit calculation, which if wrong can lead to
      stall, hang, data corrpution, write combining failure. Patch 2/2 fixes
      PCI-E write combining stats configuration.
      
      This patch series has been created against net tree and includes
      patches on cxgb4 driver.
      
      We have included all the maintainers of respective drivers. Kindly review
      the change and let us know in case of any review comments.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b57ab8b
    • Hariprasad Shenai's avatar
      cxgb4: Fix for write-combining stats configuration · 2a485cf7
      Hariprasad Shenai authored
      The write-combining configuration register SGE_STAT_CFG_A needs to
      be configured after FW initializes the adapter, else FW will reset
      the configuration
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a485cf7
    • Hariprasad Shenai's avatar
      cxgb4: Fix tx flit calculation · fd1754fb
      Hariprasad Shenai authored
      In commit 0aac3f56 ("cxgb4: Add comment for calculate tx flits
      and sge length code") introduced a regression where tx flit calculation
      is going wrong, which can lead to data corruption, hang, stall and
      write-combining failure. Fixing it.
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd1754fb
    • Atsushi Nemoto's avatar
      net: eth: altera: Fix the initial device operstate · d43cefcd
      Atsushi Nemoto authored
      Call netif_carrier_off() prior to register_netdev(), otherwise
      userspace can see incorrect link state.
      Signed-off-by: default avatarAtsushi Nemoto <nemoto@toshiba-tops.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d43cefcd
    • Kolmakov Dmitriy's avatar
      net: tipc: fix stall during bclink wakeup procedure · 7845989c
      Kolmakov Dmitriy authored
      If an attempt to wake up users of broadcast link is made when there is
      no enough place in send queue than it may hang up inside the
      tipc_sk_rcv() function since the loop breaks only after the wake up
      queue becomes empty. This can lead to complete CPU stall with the
      following message generated by RCU:
      
      INFO: rcu_sched self-detected stall on CPU { 0}  (t=2101 jiffies
      					g=54225 c=54224 q=11465)
      Task dump for CPU 0:
      tpch            R  running task        0 39949  39948 0x0000000a
       ffffffff818536c0 ffff88181fa037a0 ffffffff8106a4be 0000000000000000
       ffffffff818536c0 ffff88181fa037c0 ffffffff8106d8a8 ffff88181fa03800
       0000000000000001 ffff88181fa037f0 ffffffff81094a50 ffff88181fa15680
      Call Trace:
       <IRQ>  [<ffffffff8106a4be>] sched_show_task+0xae/0x120
       [<ffffffff8106d8a8>] dump_cpu_task+0x38/0x40
       [<ffffffff81094a50>] rcu_dump_cpu_stacks+0x90/0xd0
       [<ffffffff81097c3b>] rcu_check_callbacks+0x3eb/0x6e0
       [<ffffffff8106e53f>] ? account_system_time+0x7f/0x170
       [<ffffffff81099e64>] update_process_times+0x34/0x60
       [<ffffffff810a84d1>] tick_sched_handle.isra.18+0x31/0x40
       [<ffffffff810a851c>] tick_sched_timer+0x3c/0x70
       [<ffffffff8109a43d>] __run_hrtimer.isra.34+0x3d/0xc0
       [<ffffffff8109aa95>] hrtimer_interrupt+0xc5/0x1e0
       [<ffffffff81030d52>] ? native_smp_send_reschedule+0x42/0x60
       [<ffffffff81032f04>] local_apic_timer_interrupt+0x34/0x60
       [<ffffffff810335bc>] smp_apic_timer_interrupt+0x3c/0x60
       [<ffffffff8165a3fb>] apic_timer_interrupt+0x6b/0x70
       [<ffffffff81659129>] ? _raw_spin_unlock_irqrestore+0x9/0x10
       [<ffffffff8107eb9f>] __wake_up_sync_key+0x4f/0x60
       [<ffffffffa313ddd1>] tipc_write_space+0x31/0x40 [tipc]
       [<ffffffffa313dadf>] filter_rcv+0x31f/0x520 [tipc]
       [<ffffffffa313d699>] ? tipc_sk_lookup+0xc9/0x110 [tipc]
       [<ffffffff81659259>] ? _raw_spin_lock_bh+0x19/0x30
       [<ffffffffa314122c>] tipc_sk_rcv+0x2dc/0x3e0 [tipc]
       [<ffffffffa312e7ff>] tipc_bclink_wakeup_users+0x2f/0x40 [tipc]
       [<ffffffffa313ce26>] tipc_node_unlock+0x186/0x190 [tipc]
       [<ffffffff81597c1c>] ? kfree_skb+0x2c/0x40
       [<ffffffffa313475c>] tipc_rcv+0x2ac/0x8c0 [tipc]
       [<ffffffffa312ff58>] tipc_l2_rcv_msg+0x38/0x50 [tipc]
       [<ffffffff815a76d3>] __netif_receive_skb_core+0x5a3/0x950
       [<ffffffff815a98d3>] __netif_receive_skb+0x13/0x60
       [<ffffffff815a993e>] netif_receive_skb_internal+0x1e/0x90
       [<ffffffff815aa138>] napi_gro_receive+0x78/0xa0
       [<ffffffffa07f93f4>] tg3_poll_work+0xc54/0xf40 [tg3]
       [<ffffffff81597c8c>] ? consume_skb+0x2c/0x40
       [<ffffffffa07f9721>] tg3_poll_msix+0x41/0x160 [tg3]
       [<ffffffff815ab0f2>] net_rx_action+0xe2/0x290
       [<ffffffff8104b92a>] __do_softirq+0xda/0x1f0
       [<ffffffff8104bc26>] irq_exit+0x76/0xa0
       [<ffffffff81004355>] do_IRQ+0x55/0xf0
       [<ffffffff8165a12b>] common_interrupt+0x6b/0x6b
       <EOI>
      
      The issue occurs only when tipc_sk_rcv() is used to wake up postponed
      senders:
      
      	tipc_bclink_wakeup_users()
      		// wakeupq - is a queue which consists of special
      		// 		 messages with SOCK_WAKEUP type.
      		tipc_sk_rcv(wakeupq)
      			...
      			while (skb_queue_len(inputq)) {
      				filter_rcv(skb)
      					// Here the type of message is checked
      					// and if it is SOCK_WAKEUP then
      					// it tries to wake up a sender.
      					tipc_write_space(sk)
      						wake_up_interruptible_sync_poll()
      			}
      
      After the sender thread is woke up it can gather control and perform
      an attempt to send a message. But if there is no enough place in send
      queue it will call link_schedule_user() function which puts a message
      of type SOCK_WAKEUP to the wakeup queue and put the sender to sleep.
      Thus the size of the queue actually is not changed and the while()
      loop never exits.
      
      The approach I proposed is to wake up only senders for which there is
      enough place in send queue so the described issue can't occur.
      Moreover the same approach is already used to wake up senders on
      unicast links.
      
      I have got into the issue on our product code but to reproduce the
      issue I changed a benchmark test application (from
      tipcutils/demos/benchmark) to perform the following scenario:
      	1. Run 64 instances of test application (nodes). It can be done
      	   on the one physical machine.
      	2. Each application connects to all other using TIPC sockets in
      	   RDM mode.
      	3. When setup is done all nodes start simultaneously send
      	   broadcast messages.
      	4. Everything hangs up.
      
      The issue is reproducible only when a congestion on broadcast link
      occurs. For example, when there are only 8 nodes it works fine since
      congestion doesn't occur. Send queue limit is 40 in my case (I use a
      critical importance level) and when 64 nodes send a message at the
      same moment a congestion occurs every time.
      Signed-off-by: default avatarDmitry S Kolmakov <kolmakov.dmitriy@huawei.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7845989c
    • Barry Song's avatar
      dm9000: fix a typo · 7b901873
      Barry Song authored
      Signed-off-by: default avatarBarry Song <Baohua.Song@csr.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b901873
    • Vivien Didelot's avatar
      net: bridge: remove unnecessary switchdev include · 7a577f01
      Vivien Didelot authored
      Remove the unnecessary switchdev.h include from br_netlink.c.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a577f01
    • Vivien Didelot's avatar
      net: bridge: check __vlan_vid_del for error · bf361ad3
      Vivien Didelot authored
      Since __vlan_del can return an error code, change its inner function
      __vlan_vid_del to return an eventual error from switchdev_port_obj_del.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf361ad3
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Fix ageing conditions and operation · 39797a27
      Florian Fainelli authored
      The comparison check between cur_hw_state and hw_state is currently
      invalid because cur_hw_state is right shifted by G_MISTP_SHIFT, while
      hw_state is not, so we end-up comparing bits 2:0 with bits 7:5, which is
      going to cause an additional aging to occur. Fix this by not shifting
      cur_hw_state while reading it, but instead, mask the value with the
      appropriately shitfted bitmask.
      
      The other problem with the fast-ageing process is that we did not set
      the EN_AGE_DYNAMIC bit to request the ageing to occur for dynamically
      learned MAC addresses. Finally, write back 0 to the FAST_AGE_CTRL
      register to avoid leaving spurious bits sets from one operation to the
      other.
      
      Fixes: 12f460f2 ("net: dsa: bcm_sf2: add HW bridging support")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39797a27
  3. 08 Sep, 2015 2 commits
    • Julien Grall's avatar
      device property: Don't overwrite addr when failing in device_get_mac_address · 5b902d6f
      Julien Grall authored
      The function device_get_mac_address is trying different property names
      in order to get the mac address. To check the return value, the variable
      addr (which contain the buffer pass by the caller) will be re-used. This
      means that if the previous property is not found, the next property will
      be read using a NULL buffer.
      
      Therefore it's only possible to retrieve the mac if node contains a
      property "mac-address". Fix it by using a temporary buffer for the
      return value.
      
      This has been introduced by commit 4c96b7dc
      "Add a matching set of device_ functions for determining mac/phy"
      Signed-off-by: default avatarJulien Grall <julien.grall@citrix.com>
      Cc: Jeremy Linton <jeremy.linton@arm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Reviewed-by: default avatarJeremy Linton <jeremy.linton@arm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b902d6f
    • Eugene Shatokhin's avatar
      usbnet: Fix a race between usbnet_stop() and the BH · fcb0bb6a
      Eugene Shatokhin authored
      The race may happen when a device (e.g. YOTA 4G LTE Modem) is
      unplugged while the system is downloading a large file from the Net.
      
      Hardware breakpoints and Kprobes with delays were used to confirm that
      the race does actually happen.
      
      The race is on skb_queue ('next' pointer) between usbnet_stop()
      and rx_complete(), which, in turn, calls usbnet_bh().
      
      Here is a part of the call stack with the code where the changes to the
      queue happen. The line numbers are for the kernel 4.1.0:
      
      *0 __skb_unlink (skbuff.h:1517)
          prev->next = next;
      *1 defer_bh (usbnet.c:430)
          spin_lock_irqsave(&list->lock, flags);
          old_state = entry->state;
          entry->state = state;
          __skb_unlink(skb, list);
          spin_unlock(&list->lock);
          spin_lock(&dev->done.lock);
          __skb_queue_tail(&dev->done, skb);
          if (dev->done.qlen == 1)
              tasklet_schedule(&dev->bh);
          spin_unlock_irqrestore(&dev->done.lock, flags);
      *2 rx_complete (usbnet.c:640)
          state = defer_bh(dev, skb, &dev->rxq, state);
      
      At the same time, the following code repeatedly checks if the queue is
      empty and reads these values concurrently with the above changes:
      
      *0  usbnet_terminate_urbs (usbnet.c:765)
          /* maybe wait for deletions to finish. */
          while (!skb_queue_empty(&dev->rxq)
              && !skb_queue_empty(&dev->txq)
              && !skb_queue_empty(&dev->done)) {
                  schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
                  set_current_state(TASK_UNINTERRUPTIBLE);
                  netif_dbg(dev, ifdown, dev->net,
                        "waited for %d urb completions\n", temp);
          }
      *1  usbnet_stop (usbnet.c:806)
          if (!(info->flags & FLAG_AVOID_UNLINK_URBS))
              usbnet_terminate_urbs(dev);
      
      As a result, it is possible, for example, that the skb is removed from
      dev->rxq by __skb_unlink() before the check
      "!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is
      also possible in this case that the skb is added to dev->done queue
      after "!skb_queue_empty(&dev->done)" is checked. So
      usbnet_terminate_urbs() may stop waiting and return while dev->done
      queue still has an item.
      
      Locking in defer_bh() and usbnet_terminate_urbs() was revisited to avoid
      this race.
      Signed-off-by: default avatarEugene Shatokhin <eugene.shatokhin@rosalab.ru>
      Reviewed-by: default avatarBjørn Mork <bjorn@mork.no>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fcb0bb6a
  4. 07 Sep, 2015 3 commits