1. 22 Jul, 2010 2 commits
    • Andrea Shepard's avatar
      net: Fix corruption of skb csum field in pskb_expand_head() of net/core/skbuff.c · 00c5a983
      Andrea Shepard authored
      Make pskb_expand_head() check ip_summed to make sure csum_start is really
      csum_start and not csum before adjusting it.
      
      This fixes a bug I encountered using a Sun Quad-Fast Ethernet card and VLANs.
      On my configuration, the sunhme driver produces skbs with differing amounts
      of headroom on receive depending on the packet size.  See line 2030 of
      drivers/net/sunhme.c; packets smaller than RX_COPY_THRESHOLD have 52 bytes
      of headroom but packets larger than that cutoff have only 20 bytes.
      
      When these packets reach the VLAN driver, vlan_check_reorder_header()
      calls skb_cow(), which, if the packet has less than NET_SKB_PAD (== 32) bytes
      of headroom, uses pskb_expand_head() to make more.
      
      Then, pskb_expand_head() needs to adjust a lot of offsets into the skb,
      including csum_start.  Since csum_start is a union with csum, if the packet
      has a valid csum value this will corrupt it, which was the effect I observed.
      The sunhme hardware computes receive checksums, so the skbs would be created
      by the driver with ip_summed == CHECKSUM_COMPLETE and a valid csum field, and
      then pskb_expand_head() would corrupt the csum field, leading to an "hw csum
      error" message later on, for example in icmp_rcv() for pings larger than the
      sunhme RX_COPY_THRESHOLD.
      
      On the basis of the comment at the beginning of include/linux/skbuff.h,
      I believe that the csum_start skb field is only meaningful if ip_csummed is
      CSUM_PARTIAL, so this patch makes pskb_expand_head() adjust it only in that
      case to avoid corrupting a valid csum value.
      
      Please see my more in-depth disucssion of tracking down this bug for
      more details if you like:
      
      http://puellavulnerata.livejournal.com/112186.html
      http://puellavulnerata.livejournal.com/112567.html
      http://puellavulnerata.livejournal.com/112891.html
      http://puellavulnerata.livejournal.com/113096.html
      http://puellavulnerata.livejournal.com/113591.html
      
      I am not subscribed to this list, so please CC me on replies.
      Signed-off-by: default avatarAndrea Shepard <andrea@persephoneslair.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00c5a983
    • Herbert Xu's avatar
      macvtap: Limit packet queue length · 8a35747a
      Herbert Xu authored
      Mark Wagner reported OOM symptoms when sending UDP traffic over
      a macvtap link to a kvm receiver.
      
      This appears to be caused by the fact that macvtap packet queues
      are unlimited in length.  This means that if the receiver can't
      keep up with the rate of flow, then we will hit OOM. Of course
      it gets worse if the OOM killer then decides to kill the receiver.
      
      This patch imposes a cap on the packet queue length, in the same
      way as the tuntap driver, using the device TX queue length.
      
      Please note that macvtap currently has no way of giving congestion
      notification, that means the software device TX queue cannot be
      used and packets will always be dropped once the macvtap driver
      queue fills up.
      
      This shouldn't be a great problem for the scenario where macvtap
      is used to feed a kvm receiver, as the traffic is most likely
      external in origin so congestion notification can't be applied
      anyway.
      
      Of course, if anybody decides to complain about guest-to-guest
      UDP packet loss down the track, then we may have to revisit this.
      
      Incidentally, this patch also fixes a real memory leak when
      macvtap_get_queue fails.
      
      Chris Wright noticed that for this patch to work, we need a
      non-zero TX queue length.  This patch includes his work to change
      the default macvtap TX queue length to 500.
      Reported-by: default avatarMark Wagner <mwagner@redhat.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarChris Wright <chrisw@sous-sol.org>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a35747a
  2. 21 Jul, 2010 4 commits
  3. 20 Jul, 2010 1 commit
  4. 19 Jul, 2010 2 commits
  5. 18 Jul, 2010 1 commit
    • Arnaud Ebalard's avatar
      IPv6: fix CoA check in RH2 input handler (mip6_rthdr_input()) · d9a9dc66
      Arnaud Ebalard authored
      The input handler for Type 2 Routing Header (mip6_rthdr_input())
      checks if the CoA in the packet matches the CoA in the XFRM state.
      
      Current check is buggy: it compares the adddress in the Type 2
      Routing Header, i.e. the HoA, against the expected CoA in the state.
      The comparison should be made against the address in the destination
      field of the IPv6 header.
      
      The bug remained unnoticed because the main (and possibly only current)
      user of the code (UMIP MIPv6 Daemon) initializes the XFRM state with the
      unspecified address, i.e. explicitly allows everything.
      
      Yoshifuji-san, can you ack that one?
      Signed-off-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9a9dc66
  6. 16 Jul, 2010 5 commits
    • Robert Jennings's avatar
      ibmveth: lost IRQ while closing/opening device leads to service loss · ee2e6114
      Robert Jennings authored
      The order of freeing the IRQ and freeing the device in firmware
      in ibmveth_close can cause the adapter to become unusable after a
      subsequent ibmveth_open.  Only a reboot of the OS will make the
      network device usable again. This is seen when cycling the adapter
      up and down while there is network activity.
      
      There is a window where an IRQ will be left unserviced (H_EOI will not
      be called).  The solution is to make a VIO_IRQ_DISABLE h_call, free the
      device with firmware, and then call free_irq.
      Signed-off-by: default avatarRobert Jennings <rcj@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee2e6114
    • David S. Miller's avatar
    • Stephen Boyd's avatar
      rt2x00: Fix lockdep warning in rt2x00lib_probe_dev() · 9acd56d3
      Stephen Boyd authored
      The rt2x00dev->intf_work workqueue is never initialized when a driver is
      probed for a non-existent device (in this case rt2500usb). On such a
      path we call rt2x00lib_remove_dev() to free any resources initialized
      during the probe before we use INIT_WORK to initialize the workqueue.
      This causes lockdep to get confused since the lock used in the workqueue
      hasn't been initialized yet but is now being acquired during
      cancel_work_sync() called by rt2x00lib_remove_dev().
      
      Fix this by initializing the workqueue first before we attempt to probe
      the device. This should make lockdep happy and avoid breaking any
      assumptions about how the library cleans up after a probe fails.
      
      phy0 -> rt2x00lib_probe_dev: Error - Failed to allocate device.
      INFO: trying to register non-static key.
      the code is fine but needs lockdep annotation.
      turning off the locking correctness validator.
      Pid: 2027, comm: modprobe Not tainted 2.6.35-rc5+ #60
      Call Trace:
       [<ffffffff8105fe59>] register_lock_class+0x152/0x31f
       [<ffffffff81344a00>] ? usb_control_msg+0xd5/0x111
       [<ffffffff81061bde>] __lock_acquire+0xce/0xcf4
       [<ffffffff8105f6fd>] ? trace_hardirqs_off+0xd/0xf
       [<ffffffff81492aef>] ?  _raw_spin_unlock_irqrestore+0x33/0x41
       [<ffffffff810628d5>] lock_acquire+0xd1/0xf7
       [<ffffffff8104f037>] ? __cancel_work_timer+0x99/0x17e
       [<ffffffff8104f06e>] __cancel_work_timer+0xd0/0x17e
       [<ffffffff8104f037>] ? __cancel_work_timer+0x99/0x17e
       [<ffffffff8104f136>] cancel_work_sync+0xb/0xd
       [<ffffffffa0096675>] rt2x00lib_remove_dev+0x25/0xb0 [rt2x00lib]
       [<ffffffffa0096bf7>] rt2x00lib_probe_dev+0x380/0x3ed [rt2x00lib]
       [<ffffffff811d78a7>] ? __raw_spin_lock_init+0x31/0x52
       [<ffffffffa00bbd2c>] ? T.676+0xe/0x10 [rt2x00usb]
       [<ffffffffa00bbe4f>] rt2x00usb_probe+0x121/0x15e [rt2x00usb]
       [<ffffffff813468bd>] usb_probe_interface+0x151/0x19e
       [<ffffffff812ea08e>] driver_probe_device+0xa7/0x136
       [<ffffffff812ea167>] __driver_attach+0x4a/0x66
       [<ffffffff812ea11d>] ? __driver_attach+0x0/0x66
       [<ffffffff812e96ca>] bus_for_each_dev+0x54/0x89
       [<ffffffff812e9efd>] driver_attach+0x19/0x1b
       [<ffffffff812e9b64>] bus_add_driver+0xb4/0x204
       [<ffffffff812ea41b>] driver_register+0x98/0x109
       [<ffffffff813465dd>] usb_register_driver+0xb2/0x173
       [<ffffffffa00ca000>] ? rt2500usb_init+0x0/0x20 [rt2500usb]
       [<ffffffffa00ca01e>] rt2500usb_init+0x1e/0x20 [rt2500usb]
       [<ffffffff81000203>] do_one_initcall+0x6d/0x17a
       [<ffffffff8106cae8>] sys_init_module+0x9c/0x1e0
       [<ffffffff8100296b>] system_call_fastpath+0x16/0x1b
      Signed-off-by: default avatarStephen Boyd <bebarino@gmail.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      9acd56d3
    • Michael S. Tsirkin's avatar
      vhost: avoid pr_err on condition guest can trigger · 95c0ec6a
      Michael S. Tsirkin authored
      Guest can trigger packet truncation by posting
      a very short buffer and disabling buffer merging.
      Convert pr_err to pr_debug to avoid log from filling
      up when this happens.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      95c0ec6a
    • Ben Greear's avatar
      ipmr: Don't leak memory if fib lookup fails. · e40dbc51
      Ben Greear authored
      This was detected using two mcast router tables.  The
      pimreg for the second interface did not have a specific
      mrule, so packets received by it were handled by the
      default table, which had nothing configured.
      
      This caused the ipmr_fib_lookup to fail, causing
      the memory leak.
      Signed-off-by: default avatarBen Greear <greearb@candelatech.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e40dbc51
  7. 15 Jul, 2010 3 commits
    • Michael S. Tsirkin's avatar
      vhost-net: avoid flush under lock · 1680e906
      Michael S. Tsirkin authored
      We flush under vq mutex when changing backends.
      This creates a deadlock as workqueue being flushed
      needs this lock as well.
      
      https://bugzilla.redhat.com/show_bug.cgi?id=612421
      
      Drop the vq mutex before flush: we have the device mutex
      which is sufficient to prevent another ioctl from touching
      the vq.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      1680e906
    • Tom Herbert's avatar
      net: fix problem in reading sock TX queue · b0f77d0e
      Tom Herbert authored
      Fix problem in reading the tx_queue recorded in a socket.  In
      dev_pick_tx, the TX queue is read by doing a check with
      sk_tx_queue_recorded on the socket, followed by a sk_tx_queue_get.
      The problem is that there is not mutual exclusion across these
      calls in the socket so it it is possible that the queue in the
      sock can be invalidated after sk_tx_queue_recorded is called so
      that sk_tx_queue get returns -1, which sets 65535 in queue_index
      and thus dev_pick_tx returns 65536 which is a bogus queue and
      can cause crash in dev_queue_xmit.
      
      We fix this by only calling sk_tx_queue_get which does the proper
      checks.  The interface is that sk_tx_queue_get returns the TX queue
      if the sock argument is non-NULL and TX queue is recorded, else it
      returns -1.  sk_tx_queue_recorded is no longer used so it can be
      completely removed.
      Signed-off-by: default avatarTom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0f77d0e
    • Doug Kehn's avatar
      net/core: neighbour update Oops · 91a72a70
      Doug Kehn authored
      When configuring DMVPN (GRE + openNHRP) and a GRE remote
      address is configured a kernel Oops is observed.  The
      obserseved Oops is caused by a NULL header_ops pointer
      (neigh->dev->header_ops) in neigh_update_hhs() when
      
      void (*update)(struct hh_cache*, const struct net_device*, const unsigned char *)
      = neigh->dev->header_ops->cache_update;
      
      is executed.  The dev associated with the NULL header_ops is
      the GRE interface.  This patch guards against the
      possibility that header_ops is NULL.
      
      This Oops was first observed in kernel version 2.6.26.8.
      Signed-off-by: default avatarDoug Kehn <rdkehn@yahoo.com>
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91a72a70
  8. 14 Jul, 2010 4 commits
  9. 13 Jul, 2010 3 commits
  10. 12 Jul, 2010 2 commits
  11. 09 Jul, 2010 1 commit
  12. 08 Jul, 2010 3 commits
  13. 07 Jul, 2010 5 commits
  14. 06 Jul, 2010 3 commits
  15. 04 Jul, 2010 1 commit