1. 07 Mar, 2013 21 commits
  2. 06 Mar, 2013 19 commits
    • Eric Dumazet's avatar
      tun: add a missing nf_reset() in tun_net_xmit() · f8af75f3
      Eric Dumazet authored
      Dave reported following crash :
      
      general protection fault: 0000 [#1] SMP
      CPU 2
      Pid: 25407, comm: qemu-kvm Not tainted 3.7.9-205.fc18.x86_64 #1 Hewlett-Packard HP Z400 Workstation/0B4Ch
      RIP: 0010:[<ffffffffa0399bd5>]  [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
      RSP: 0018:ffff880276913d78  EFLAGS: 00010206
      RAX: 50626b6b7876376c RBX: ffff88026e530d68 RCX: ffff88028d158e00
      RDX: ffff88026d0d5470 RSI: 0000000000000011 RDI: 0000000000000002
      RBP: ffff880276913d88 R08: 0000000000000000 R09: ffff880295002900
      R10: 0000000000000000 R11: 0000000000000003 R12: ffffffff81ca3b40
      R13: ffffffff8151a8e0 R14: ffff880270875000 R15: 0000000000000002
      FS:  00007ff3bce38a00(0000) GS:ffff88029fc40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007fd1430bd000 CR3: 000000027042b000 CR4: 00000000000027e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process qemu-kvm (pid: 25407, threadinfo ffff880276912000, task ffff88028c369720)
      Stack:
       ffff880156f59100 ffff880156f59100 ffff880276913d98 ffffffff815534f7
       ffff880276913db8 ffffffff8151a74b ffff880270875000 ffff880156f59100
       ffff880276913dd8 ffffffff8151a5a6 ffff880276913dd8 ffff88026d0d5470
      Call Trace:
       [<ffffffff815534f7>] nf_conntrack_destroy+0x17/0x20
       [<ffffffff8151a74b>] skb_release_head_state+0x7b/0x100
       [<ffffffff8151a5a6>] __kfree_skb+0x16/0xa0
       [<ffffffff8151a666>] kfree_skb+0x36/0xa0
       [<ffffffff8151a8e0>] skb_queue_purge+0x20/0x40
       [<ffffffffa02205f7>] __tun_detach+0x117/0x140 [tun]
       [<ffffffffa022184c>] tun_chr_close+0x3c/0xd0 [tun]
       [<ffffffff8119669c>] __fput+0xec/0x240
       [<ffffffff811967fe>] ____fput+0xe/0x10
       [<ffffffff8107eb27>] task_work_run+0xa7/0xe0
       [<ffffffff810149e1>] do_notify_resume+0x71/0xb0
       [<ffffffff81640152>] int_signal+0x12/0x17
      Code: 00 00 04 48 89 e5 41 54 53 48 89 fb 4c 8b a7 e8 00 00 00 0f 85 de 00 00 00 0f b6 73 3e 0f b7 7b 2a e8 10 40 00 00 48 85 c0 74 0e <48> 8b 40 28 48 85 c0 74 05 48 89 df ff d0 48 c7 c7 08 6a 3a a0
      RIP  [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
       RSP <ffff880276913d78>
      
      This is because tun_net_xmit() needs to call nf_reset()
      before queuing skb into receive_queue
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8af75f3
    • David S. Miller's avatar
      Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless · 930df2df
      David S. Miller authored
      John W. Linville says:
      
      ====================
      This time just passing along a big batch of fixes from Johannes...
      
      For the mac80211 bits:
      
      "Here I have fixes from Ben Greear for stray work items when deleting
      interfaces, another idle handling fix from Felix, a fix from Marco ro a
      mesh PS buffering crash and I have a fix for the VHT MCS calculation in
      association request frames and more nl80211 feature advertising removal
      as well as a workaround to increase the dump size if the SKB overhead is
      too large. For 3.10 I already have a complete fix queued, but that also
      requires (simple) userspace changes."
      
      And for the iwlwifi bits:
      
      "The patches from Dor fix a bunch of calibration issues in the new MVM
      driver, and Emmanuel has a number of fixes there as well. Also, we
      decided to disable 8k A-MSDU by default, so that's in there. My own
      patches are addressing an issue we found with the new devices but that
      seems to also exist on older ones, the DMA writeback the devices do can
      be delayed and cause issues. The fix is unfortunately relatively large
      and depends on two other changes (to not be hugely conflicting), but I
      think it's still worth it at this point."
      
      As Johannes says, it is a bit large.  But I hope it is still early
      enough in the cycle to make that worthwhile.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      930df2df
    • Sathya Perla's avatar
      be2net: use CSR-BAR SEMAPHORE reg for BE2/BE3 · c5b3ad4c
      Sathya Perla authored
      The SLIPORT_SEMAPHORE register shadowed in the
      config-space may not reflect the correct POST stage after
      an EEH reset in BE2/3; it may return FW_READY state even though
      FW is not ready. This causes the driver to prematurely
      poll the FW mailbox and fail.
      
      For BE2/3 use the CSR-BAR/0xac instead.
      Reported-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: default avatarSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5b3ad4c
    • Jason Wang's avatar
      f422d2a0
    • David S. Miller's avatar
      Merge branch 'sfc-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc · 70e21fe4
      David S. Miller authored
      Ben Hutchings says:
      
      ====================
      Fix regressions introduced by the last set of fixes (sorry):
      
      1. Potential deadlock when disabling TX queues.
      2. RX was broken on architectures other than x86 and powerpc.
      
      I still expect to send one more bug fix for 3.9, but as it sometimes
      takes days to reproduce the bug it's going to take a couple of weeks of
      testing to be confident that it's really fixed.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70e21fe4
    • Ben Hutchings's avatar
      sfc: Correct efx_rx_buffer::page_offset when EFX_PAGE_IP_ALIGN != 0 · c73e787a
      Ben Hutchings authored
      RX DMA buffers start at an offset of EFX_PAGE_IP_ALIGN bytes from the
      start of a cache line.  This offset obviously needs to be included in
      the virtual address, but this was missed in commit b590ace0
      ('sfc: Fix efx_rx_buf_offset() in the presence of swiotlb') since
      EFX_PAGE_IP_ALIGN is equal to 0 on both x86 and powerpc.
      Signed-off-by: default avatarBen Hutchings <bhutchings@solarflare.com>
      c73e787a
    • Ben Hutchings's avatar
      sfc: Disable soft interrupt handling during efx_device_detach_sync() · 35205b21
      Ben Hutchings authored
      efx_device_detach_sync() locks all TX queues before marking the device
      detached and thus disabling further TX scheduling.  But it can still
      be interrupted by TX completions which then result in TX scheduling in
      soft interrupt context.  This will deadlock when it tries to acquire
      a TX queue lock that efx_device_detach_sync() already acquired.
      
      To avoid deadlock, we must use netif_tx_{,un}lock_bh().
      Signed-off-by: default avatarBen Hutchings <bhutchings@solarflare.com>
      35205b21
    • John W. Linville's avatar
      Merge branch 'master' of... · 32cdd592
      John W. Linville authored
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
      32cdd592
    • Gavin Shan's avatar
      benet: Wait f/w POST until timeout · 66d29cbc
      Gavin Shan authored
      While PCI card faces EEH errors, reset (usually hot reset) is
      expected to recover from the EEH errors. After EEH core finishes
      the reset, the driver callback (be_eeh_reset) is called and wait
      the firmware to complete POST successfully. The original code would
      return with error once detecting failure during POST stage. That
      seems not enough.
      
      The patch forces the driver (be_eeh_reset) to wait the firmware
      completes POST until timeout, instead of returning error upon
      detection POST failure immediately. Also, it would improve the
      reliability of the EEH funtionality of the driver.
      Signed-off-by: default avatarGavin Shan <shangw@linux.vnet.ibm.com>
      Acked-by: default avatarSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66d29cbc
    • David Ward's avatar
      net/ipv4: Timestamp option cannot overflow with prespecified addresses · fa2b04f4
      David Ward authored
      When a router forwards a packet that contains the IPv4 timestamp option,
      if there is no space left in the option for the router to add its own
      timestamp, then the router increments the Overflow value in the option.
      
      However, if the addresses of the routers are prespecified in the option,
      then the overflow condition cannot happen: the option is structured so
      that each prespecified router has a place to write its timestamp. Other
      routers do not add a timestamp, so there will never be a lack of space.
      
      This fix ensures that the Overflow value in the IPv4 timestamp option is
      not incremented when the addresses of the routers are prespecified, even
      if the Pointer value is greater than the Length value.
      Signed-off-by: default avatarDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa2b04f4
    • Eric Dumazet's avatar
      net: reduce net_rx_action() latency to 2 HZ · d1f41b67
      Eric Dumazet authored
      We should use time_after_eq() to get maximum latency of two ticks,
      instead of three.
      
      Bug added in commit 24f8b238 (net: increase receive packet quantum)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1f41b67
    • Randy Dunlap's avatar
      net: fix new kernel-doc warnings in net core · 691b3b7e
      Randy Dunlap authored
      Fix new kernel-doc warnings in net/core/dev.c:
      
      Warning(net/core/dev.c:4788): No description found for parameter 'new_carrier'
      Warning(net/core/dev.c:4788): Excess function parameter 'new_carries' description in 'dev_change_carrier'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      691b3b7e
    • Zang MingJie's avatar
      reset nf before xmit vxlan encapsulated packet · 88c4c066
      Zang MingJie authored
      We should reset nf settings bond to the skb as ipip/ipgre do.
      
      If not, the conntrack/nat info bond to the origin packet may continually
      redirect the packet to vxlan interface causing a routing loop.
      
      this is the scenario:
      
           VETP     VXLAN Gateway
          /----\  /---------------\
          |    |  |               |
          |  vx+--+vx --NAT-> eth0+--> Internet
          |    |  |               |
          \----/  \---------------/
      
      when there are any packet coming from internet to the vetp, there will be lots
      of garbage packets coming out the gateway's vxlan interface, but none actually
      sent to the physical interface, because they are redirected back to the vxlan
      interface in the postrouting chain of NAT rule, and dmesg complains:
      
          Mar  1 21:52:53 debian kernel: [ 8802.997699] Dead loop on virtual device vxlan0, fix it urgently!
          Mar  1 21:52:54 debian kernel: [ 8804.004907] Dead loop on virtual device vxlan0, fix it urgently!
          Mar  1 21:52:55 debian kernel: [ 8805.012189] Dead loop on virtual device vxlan0, fix it urgently!
          Mar  1 21:52:56 debian kernel: [ 8806.020593] Dead loop on virtual device vxlan0, fix it urgently!
      
      the patch should fix the problem
      Signed-off-by: default avatarZang MingJie <zealot0630@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88c4c066
    • Paolo Valente's avatar
      pkt_sched: sch_qfq: remove a useless invocation of qfq_update_eligible · 76e4cb0d
      Paolo Valente authored
      QFQ+ can select for service only 'eligible' aggregates, i.e.,
      aggregates that would have started to be served also in the emulated
      ideal system.  As a consequence, for QFQ+ to be work conserving, at
      least one of the active aggregates must be eligible when it is time to
      choose the next aggregate to serve.
      
      The set of eligible aggregates is updated through the function
      qfq_update_eligible(), which does guarantee that, after its
      invocation, at least one of the active aggregates is eligible.
      Because of this property, this function is invoked in
      qfq_deactivate_agg() to guarantee that at least one of the active
      aggregates is still eligible after an aggregate has been deactivated.
      In particular, the critical case is when there are other active
      aggregates, but the aggregate being deactivated happens to be the only
      one eligible.
      
      However, this precaution is not needed for QFQ+ to be work conserving,
      because update_eligible() is always invoked also at the beginning of
      qfq_choose_next_agg(). This patch removes the additional invocation of
      update_eligible() in qfq_deactivate_agg().
      Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76e4cb0d
    • Paolo Valente's avatar
      pkt_sched: sch_qfq: do not allow virtual time to jump if an aggregate is in service · 40dd2d54
      Paolo Valente authored
      By definition of (the algorithm of) QFQ+, the system virtual time must
      be pushed up only if there is no 'eligible' aggregate, i.e. no
      aggregate that would have started to be served also in the ideal
      system emulated by QFQ+.  QFQ+ serves only eligible aggregates, hence
      the aggregate currently in service is eligible.  As a consequence, to
      decide whether there is no eligible aggregate, QFQ+ must also check
      whether there is no aggregate in service.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40dd2d54
    • Paolo Valente's avatar
      pkt_sched: sch_qfq: prevent budget from wrapping around after a dequeue · a0143efa
      Paolo Valente authored
      Aggregate budgets are computed so as to guarantee that, after an
      aggregate has been selected for service, that aggregate has enough
      budget to serve at least one maximum-size packet for the classes it
      contains. For this reason, after a new aggregate has been selected
      for service, its next packet is immediately dequeued, without any
      further control.
      
      The maximum packet size for a class, lmax, can be changed through
      qfq_change_class(). In case the user sets lmax to a lower value than
      the the size of some of the still-to-arrive packets, QFQ+ will
      automatically push up lmax as it enqueues these packets.  This
      automatic push up is likely to happen with TSO/GSO.
      
      In any case, if lmax is assigned a lower value than the size of some
      of the packets already enqueued for the class, then the following
      problem may occur: the size of the next packet to dequeue for the
      class may happen to be larger than lmax, after the aggregate to which
      the class belongs has been just selected for service. In this case,
      even the budget of the aggregate, which is an unsigned value, may be
      lower than the size of the next packet to dequeue. After dequeueing
      this packet and subtracting its size from the budget, the latter would
      wrap around.
      
      This fix prevents the budget from wrapping around after any packet
      dequeue.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0143efa
    • Paolo Valente's avatar
      pkt_sched: sch_qfq: serve activated aggregates immediately if the scheduler is empty · 2f3b89a1
      Paolo Valente authored
      If no aggregate is in service, then the function qfq_dequeue() does
      not dequeue any packet. For this reason, to guarantee QFQ+ to be work
      conserving, a just-activated aggregate must be set as in service
      immediately if it happens to be the only active aggregate.
      This is done by the function qfq_enqueue().
      
      Unfortunately, the function qfq_add_to_agg(), used to add a class to
      an aggregate, does not perform this important additional operation.
      In particular, if: 1) qfq_add_to_agg() is invoked to complete the move
      of a class from a source aggregate, becoming, for this move, inactive,
      to a destination aggregate, becoming instead active, and 2) the
      destination aggregate becomes the only active aggregate, then this
      aggregate is not however set as in service. QFQ+ remains then in a
      non-work-conserving state until a new invocation of qfq_enqueue()
      recovers the situation.
      
      This fix solves the problem by moving the logic for setting an
      aggregate as in service directly into the function qfq_activate_agg().
      Hence, from whatever point qfq_activate_aggregate() is invoked, QFQ+
      remains work conserving.  Since the more-complex logic of this new
      version of activate_aggregate() is not necessary, in qfq_dequeue(), to
      reschedule an aggregate that finishes its budget, then the aggregate
      is now rescheduled by invoking directly the functions needed.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2f3b89a1
    • Paolo Valente's avatar
      pkt_sched: sch_qfq: fix the update of eligible-group sets · 624b85fb
      Paolo Valente authored
      Between two invocations of make_eligible, the system virtual time may
      happen to grow enough that, in its binary representation, a bit with
      higher order than 31 flips. This happens especially with
      TSO/GSO. Before this fix, the mask used in make_eligible was computed
      as (1UL<<index_of_last_flipped_bit)-1, whose value is well defined on
      a 64-bit architecture, because index_of_flipped_bit <= 63, but is in
      general undefined on a 32-bit architecture if index_of_flipped_bit > 31.
      The fix just replaces 1UL with 1ULL.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      624b85fb
    • Paolo Valente's avatar
      pkt_sched: sch_qfq: properly cap timestamps in charge_actual_service · 9b99b7e9
      Paolo Valente authored
      QFQ+ schedules the active aggregates in a group using a bucket list
      (one list per group). The bucket in which each aggregate is inserted
      depends on the aggregate's timestamps, and the number
      of buckets in a group is enough to accomodate the possible (range of)
      values of the timestamps of all the aggregates in the group. For this
      property to hold, timestamps must however be computed correctly.  One
      necessary condition for computing timestamps correctly is that the
      number of bits dequeued for each aggregate, while the aggregate is in
      service, does not exceed the maximum budget budgetmax assigned to the
      aggregate.
      
      For each aggregate, budgetmax is proportional to the number of classes
      in the aggregate. If the number of classes of the aggregate is
      decreased through qfq_change_class(), then budgetmax is decreased
      automatically as well.  Problems may occur if the aggregate is in
      service when budgetmax is decreased, because the current remaining
      budget of the aggregate and/or the service already received by the
      aggregate may happen to be larger than the new value of budgetmax.  In
      this case, when the aggregate is eventually deselected and its
      timestamps are updated, the aggregate may happen to have received an
      amount of service larger than budgetmax.  This may cause the aggregate
      to be assigned a higher virtual finish time than the maximum
      acceptable value for the last bucket in the bucket list of the group.
      
      This fix introduces a cap that addresses this issue.
      Signed-off-by: default avatarPaolo Valente <paolo.valente@unimore.it>
      Reviewed-by: default avatarFabio Checconi <fchecconi@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b99b7e9