1. 24 Jul, 2013 9 commits
  2. 23 Jul, 2013 11 commits
    • David S. Miller's avatar
      Merge branch 'team' ("add support for peer notifications and igmp rejoins for team") · 45c91490
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      The middle patch adjusts core infrastructure so the bonding code can be
      generalized and reused by team.
      
      v1->v2: using msecs_to_jiffies() as suggested by Eric
      
      Jiri Pirko (3):
        team: add peer notification
        net: convert resend IGMP to notifier event
        team: add support for sending multicast rejoins
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45c91490
    • Jiri Pirko's avatar
      team: add support for sending multicast rejoins · 492b200e
      Jiri Pirko authored
      Similar to what is implemented in bonding. User is able to ask team
      driver to send IGMP rejoins in case port is enabled or disabled. Using
      previously introduced netdev notifier.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      492b200e
    • Jiri Pirko's avatar
      net: convert resend IGMP to notifier event · 4aa5dee4
      Jiri Pirko authored
      Until now, bond_resend_igmp_join_requests() looks for vlans attached to
      bonding device, bridge where bonding act as port manually. It does not
      care of other scenarios, like stacked bonds or team device above. Make
      this more generic and use netdev notifier to propagate the event to
      upper devices and to actually call ip_mc_rejoin_groups().
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4aa5dee4
    • Jiri Pirko's avatar
      team: add peer notification · fc423ff0
      Jiri Pirko authored
      When port is enabled or disabled, allow to notify peers by unsolicitated
      NAs or gratuitous ARPs. Disabled by default.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc423ff0
    • Thomas Richter's avatar
      macvlan fdb replace support · ab2cfbb2
      Thomas Richter authored
      Add support for iproute2 command 'bridge fdb replace ...'.
      The rtnletlink call back function ndo_fdb_add will be called
      with the NLM_F_REPLACE flag set.
      Simply return -EOPNOTSUP.
      
      Resubmitted because net-next was closed last week.
      Signed-off-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab2cfbb2
    • Thomas Richter's avatar
      vxlan fdb replace an existing entry · 906dc186
      Thomas Richter authored
      Add support to replace an existing entry found in the
      vxlan fdb database. The entry in question is identified
      by its unicast mac address and the destination information
      is changed. If the entry is not found, it is added in the
      forwarding database. This is similar to changing an entry
      in the neighbour table.
      
      Multicast mac addresses can not be changed with the replace
      option.
      
      This is useful for virtual machine migration when the
      destination of a target virtual machine changes. The replace
      feature can be used instead of delete followed by add.
      
      Resubmitted because net-next was closed last week.
      Signed-off-by: default avatarThomas Richter <tmricht@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      906dc186
    • David S. Miller's avatar
      Merge branch 'tcp' · 20ff44aa
      David S. Miller authored
      Yuchung Cheng says:
      
      ====================
      This patch series improve RTT sampling in three ways:
      1. Sample RTT during fast recovery and reordering events.
      2. Favor ack-based RTT to timestamps because of broken TS ECR fields
      3. Consolidate the RTT measurement logic.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      20ff44aa
    • Yuchung Cheng's avatar
      tcp: use RTT from SACK for RTO · ed08495c
      Yuchung Cheng authored
      If RTT is not available because Karn's check has failed or no
      new packet is acked, use the RTT measured from SACK to estimate
      the RTO. The sender can continue to estimate the RTO during loss
      recovery or reordering event upon receiving non-partial ACKs.
      
      This also changes when the RTO is re-armed. Previously it is
      only re-armed when some data is cummulatively acknowledged (i.e.,
      SND.UNA advances), but now it is re-armed whenever RTT estimator
      is updated. This feature is particularly useful to reduce spurious
      timeout for buffer bloat including cellular carriers [1], and
      RTT estimation on reordering events.
      
      [1] "An In-depth Study of LTE: Effect of Network Protocol and
       Application Behavior on Performance", In Proc. of SIGCOMM 2013
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed08495c
    • Yuchung Cheng's avatar
      tcp: measure RTT from new SACK · 59c9af42
      Yuchung Cheng authored
      Take RTT sample if an ACK selectively acks some sequences that
      have never been retransmitted. The Karn's algorithm does not apply
      even if that ACK (s)acks other retransmitted sequences, because it
      must been generated by an original but perhaps out-of-order packet.
      There is no ambiguity. In case when multiple blocks are newly
      sacked because of ACK losses the earliest block is used to
      measure RTT, similar to cummulative ACKs.
      
      Such RTT samples allow the sender to estimate the RTO during loss
      recovery and packet reordering events. It is still useful even with
      TCP timestamps. That's because during these events the SND.UNA may
      not advance preventing RTT samples from TS ECR (thus the FLAG_ACKED
      check before calling tcp_ack_update_rtt()).  Therefore this new
      RTT source is complementary to existing ACK and TS RTT mechanisms.
      
      This patch does not update the RTO. It is done in the next patch.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59c9af42
    • Yuchung Cheng's avatar
      tcp: prefer packet timing to TS-ECR for RTT · 5b08e47c
      Yuchung Cheng authored
      Prefer packet timings to TS-ecr for RTT measurements when both
      sources are available. That's because broken middle-boxes and remote
      peer can return packets with corrupted TS ECR fields. Similarly most
      congestion controls that require RTT signals favor timing-based
      sources as well. Also check for bad TS ECR values to avoid RTT
      blow-ups. It has happened on production Web servers.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b08e47c
    • Yuchung Cheng's avatar
      tcp: consolidate SYNACK RTT sampling · 375fe02c
      Yuchung Cheng authored
      The first patch consolidates SYNACK and other RTT measurement to use a
      central function tcp_ack_update_rtt(). A (small) bonus is now SYNACK
      RTT measurement happens after PAWS check, potentially reducing the
      impact of RTO seeding on bad TCP timestamps values.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      375fe02c
  3. 22 Jul, 2013 9 commits
  4. 20 Jul, 2013 3 commits
  5. 19 Jul, 2013 3 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ecb2cf1a
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "A couple interesting SKB fragment handling fixes, plus the usual small
        bits here and there:
      
         1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from
            Tim Gardner.
      
         2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC
            address printing helper.
      
         3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the
            device can't do checksumming offloads then it shouldn't say it can
            do SG either.  From Haiyang Zhang.
      
         4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens.
      
         5) Don't leak DMA mappings on mapping failures, from Neil Horman.
      
         6) We need to reset the transport header of SKBs in ipv4 before we
            attempt to perform early socket demux, just like ipv6 does.  From
            Eric Dumazet.
      
         7) Add missing locking on vxlan device removal, from Stephen
            Hemminger.
      
         8) xen-netfront has to make two passes over an SKB to prepare it for
            transfer.  One pass calculates the number of slots needed, the
            second massages the SKB and fills the slots.  Unfortunately, the
            first pass doesn't calculate the number of slots properly so we
            can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't
            work out so well.  Fix from Jan Beulich with help and discussion
            with several others.
      
         9) Fix a similar problem in tun and macvtap, which have to split up
            scatter-gather elements at PAGE_SIZE boundaries.  Don't do
            zerocopy if it would result in a > MAX_SKB_FRAGS skb.  Fixes from
            Jason Wang.
      
        10) On receive, once we've decoded the VLAN state completely, clear
            skb->vlan_tci.  Otherwise demuxed tunnels underneath can trigger
            the VLAN code again, corrupting the packet.  Fix from Eric
            Dumazet"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        vlan: fix a race in egress prio management
        vlan: mask vlan prio bits
        macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
        tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
        pkt_sched: sch_qfq: remove a source of high packet delay/jitter
        xen-netfront: pull on receive skb may need to happen earlier
        vxlan: add necessary locking on device removal
        hyperv: Fix the NETIF_F_SG flag setting in netvsc
        net: Fix sysfs_format_mac() code duplication.
        be2net: Fix to avoid hardware workaround when not needed
        macvtap: do not assume 802.1Q when send vlan packets
        macvtap: fix the missing ret value of TUNSETQUEUE
        ipv4: set transport header earlier
        mlx5 core: Fix __udivdi3 when compiling for 32 bit arches
        bgmac: add dependency to phylib
        net/irda: fixed style issues in irlan_eth
        ethtool: fixed trailing statements in ethtool
        ndisc: bool initializations should use true and false
        atl1e: unmap partially mapped skb on dma error and free skb
      ecb2cf1a
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ee114b97
      Linus Torvalds authored
      Pull x86 fixes from Peter Anvin:
       "Trying again to get the fixes queue, including the fixed IDT alignment
        patch.
      
        The UEFI patch is by far the biggest issue at hand: it is currently
        causing quite a few machines to boot.  Which is sad, because the only
        reason they would is because their BIOSes touch memory that has
        already been freed.  The other major issue is that we finally have
        tracked down the root cause of a significant number of machines
        failing to suspend/resume"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Make sure IDT is page aligned
        x86, suspend: Handle CPUs which fail to #GP on RDMSR
        x86/platform/ce4100: Add header file for reboot type
        Revert "UEFI: Don't pass boot services regions to SetVirtualAddressMap()"
        efivars: check for EFI_RUNTIME_SERVICES
      ee114b97
    • Linus Torvalds's avatar
      Merge tag 'md-3.11-fixes' of git://neil.brown.name/md · 4b8b8a4a
      Linus Torvalds authored
      Pull md bug fixes from NeilBrown:
       "Sorry boss, back at work now boss.  Here's them nice shiny patches ya
        wanted.  All nicely tagged and justified for -stable and everyfing:
      
        Three bug fixes for md in 3.10
      
        3.10 wasn't a good release for md.  The bio changes left a couple of
        bugs, and an md "fix" created another one.
      
        These three patches appear to fix the issues and have been tagged for
        -stable"
      
      * tag 'md-3.11-fixes' of git://neil.brown.name/md:
        md/raid1: fix bio handling problems in process_checks()
        md: Remove recent change which allows devices to skip recovery.
        md/raid10: fix two problems with RAID10 resync.
      4b8b8a4a
  6. 18 Jul, 2013 5 commits
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 0a693ab6
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "You'll be terribly disappointed in this, I'm not trying to sneak any
        features in or anything, its mostly radeon and intel fixes, a couple
        of ARM driver fixes"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (34 commits)
        drm/radeon/dpm: add debugfs support for RS780/RS880 (v3)
        drm/radeon/dpm/atom: fix broken gcc harder
        drm/radeon/dpm/atom: restructure logic to work around a compiler bug
        drm/radeon/dpm: fix atom vram table parsing
        drm/radeon: fix an endian bug in atom table parsing
        drm/radeon: add a module parameter to disable aspm
        drm/rcar-du: Use the GEM PRIME helpers
        drm/shmobile: Use the GEM PRIME helpers
        uvesafb: Really allow mtrr being 0, as documented and warn()ed
        radeon kms: do not flush uninitialized hotplug work
        drm/radeon/dpm/sumo: handle boost states properly when forcing a perf level
        drm/radeon: align VM PTBs (Page Table Blocks) to 32K
        drm/radeon: allow selection of alignment in the sub-allocator
        drm/radeon: never unpin UVD bo v3
        drm/radeon: fix UVD fence emit
        drm/radeon: add fault decode function for CIK
        drm/radeon: add fault decode function for SI (v2)
        drm/radeon: add fault decode function for cayman/TN (v2)
        drm/radeon: use radeon device for request firmware
        drm/radeon: add missing ttm_eu_backoff_reservation to radeon_bo_list_validate
        ...
      0a693ab6
    • Eric Dumazet's avatar
      vlan: fix a race in egress prio management · 3e3aac49
      Eric Dumazet authored
      egress_priority_map[] hash table updates are protected by rtnl,
      and we never remove elements until device is dismantled.
      
      We have to make sure that before inserting an new element in hash table,
      all its fields are committed to memory or else another cpu could
      find corrupt values and crash.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e3aac49
    • Eric Dumazet's avatar
      vlan: mask vlan prio bits · d4b812de
      Eric Dumazet authored
      In commit 48cc32d3
      ("vlan: don't deliver frames for unknown vlans to protocols")
      Florian made sure we set pkt_type to PACKET_OTHERHOST
      if the vlan id is set and we could find a vlan device for this
      particular id.
      
      But we also have a problem if prio bits are set.
      
      Steinar reported an issue on a router receiving IPv6 frames with a
      vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
      because skb->vlan_tci is set.
      
      Forwarded frame is completely corrupted : We can see (8100:4000)
      being inserted in the middle of IPv6 source address :
      
      16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
      9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
             0x0000:  0000 0029 8000 c7c3 7103 0001 a0ae e651
             0x0010:  0000 0000 ccce 0b00 0000 0000 1011 1213
             0x0020:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
             0x0030:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233
      
      It seems we are not really ready to properly cope with this right now.
      
      We can probably do better in future kernels :
      vlan_get_ingress_priority() should be a netdev property instead of
      a per vlan_dev one.
      
      For stable kernels, lets clear vlan_tci to fix the bugs.
      Reported-by: default avatarSteinar H. Gunderson <sesse@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4b812de
    • Jason Wang's avatar
      macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS · ece793fc
      Jason Wang authored
      We try to linearize part of the skb when the number of iov is greater than
      MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
      one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
      network.
      
      Solve this problem by calculate the pages needed for iov before trying to do
      zerocopy and switch to use copy instead of zerocopy if it needs more than
      MAX_SKB_FRAGS.
      
      This is done through introducing a new helper to count the pages for iov, and
      call uarg->callback() manually when switching from zerocopy to copy to notify
      vhost.
      
      We can do further optimization on top.
      
      This bug were introduced from b92946e2
      (macvtap: zerocopy: validate vectors before building skb).
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ece793fc
    • Jason Wang's avatar
      tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS · 88529176
      Jason Wang authored
      We try to linearize part of the skb when the number of iov is greater than
      MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
      one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
      network.
      
      Solve this problem by calculate the pages needed for iov before trying to do
      zerocopy and switch to use copy instead of zerocopy if it needs more than
      MAX_SKB_FRAGS.
      
      This is done through introducing a new helper to count the pages for iov, and
      call uarg->callback() manually when switching from zerocopy to copy to notify
      vhost.
      
      We can do further optimization on top.
      
      The bug were introduced from commit 0690899b
      (tun: experimental zero copy tx support)
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88529176