1. 16 Feb, 2013 12 commits
  2. 15 Feb, 2013 21 commits
  3. 14 Feb, 2013 7 commits
    • David S. Miller's avatar
      net: Don't write to current task flags on every packet received. · 9754e293
      David S. Miller authored
      Even for non-pfmalloc SKBs, __netif_receive_skb() will do a
      tsk_restore_flags() on current unconditionally.
      
      Make __netif_receive_skb() a shim around the existing code, renamed to
      __netif_receive_skb_core().  Let __netif_receive_skb() wrap the
      __netif_receive_skb_core() call with the task flag modifications, if
      necessary.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9754e293
    • Claudiu Manoil's avatar
      gianfar: Fix and cleanup Rx FCB indication · ba779711
      Claudiu Manoil authored
      This fixes a less obvious error on one hand, and prevents futher
      similar errors by disambiguating and optimizing RxFCB indication,
      on the other hand.
      
      The error consists in NETIF_F_HW_VLAN_TX flag being used as an
      indication of Rx FCB insertion. This happened as soon gfar_uses_fcb(),
      which despite its name indicates Rx FCB insertion, started
      incorporating is_vlan_on().
      is_vlan_on(), on the other hand, is also a misleading construct because
      we need to differentiate b/w hw VLAN extraction/VLEX (marked by VLAN_RX
      flag) and hw VLAN insertion/VLINS (VLAN_TX flag), which are different
      mechanisms using different types of FCBs.
      
      The hw spec for the RxFCB feature is as follows:
      In the case of RxBD rings, FCBs (Frame Control Block) are inserted by
      the eTSEC whenever RCTRL[PRSDEP] is set to a non-zero value. Only one
      FCB is inserted per frame (in the buffer pointed to by the RxBD with
      bit F set). TOE acceleration for receive is enabled for all rx frames
      in this case.
      
      This patch introduces priv->uses_rxfcb field to quickly signal RxFCB
      insertion in accordance with the specification above.
      
      The dependency on FSL_GIANFAR_DEV_HAS_TIMER was also eliminated as
      another source of confusion. The actual dependency is to priv->hwts_rx_en.
      Upon changing priv->hwts_rx_en via IOCTL, the gfar device is being
      restarted and on init_mac() the priv->hwts_rx_en flag determines RxFCB
      insertion, and rctrl is programmed accordingly. The patch takes care
      of this case too.
      
      Though maybe not as self documenting as the inlining version uses_fcb(),
      priv->uses_rxfcb has the main purpose to quickly signal, on the hot path,
      that the incoming frame has a *Rx* FCB block inserted which needs to be
      pulled out before passing the skb to the stack. This is a performance
      critical operation, it needs to happen fast, that's why uses_rxfcb is
      placed in the first cacheline of gfar_private.
      This is also why a cached rctrl cannot be used instead: 1) because
      we don't have 32 bits available in the first cacheline of gfar_priv
      (but only 16); 2) bit operations are expensive on the hot path.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba779711
    • Claudiu Manoil's avatar
      gianfar: Remove wrong buffer size conditioning to VLAN h/w offload · 13f228da
      Claudiu Manoil authored
      The controller's ref manual states clearly that when the hw Rx vlan
      offload feature is enabled, meaning that the VLEX bit from RCTRL is
      correctly enabled, then the hw performs automatic VLAN tag extraction
      and deletion from the ethernet frames. So there's no point in trying to
      increase the rx buff size when rxvlan is on, as the frame is actually
      smaller.
      And the Tx vlan hw accel feature (VLINS) has nothing to do with rx buff
      size computation.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13f228da
    • Claudiu Manoil's avatar
      gianfar: gfar_process_frame returns void · 61db26c6
      Claudiu Manoil authored
      No return code is expected from gfar_process_frame(), hence
      change it to return void.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61db26c6
    • Claudiu Manoil's avatar
      gianfar: GRO_DROP is unlikely · bd9e89f2
      Claudiu Manoil authored
      The change is significant since it affects the rx hot path.
      Paul observed and documented the effects at asm level, see
      below:
      
      "It turns out that it does make a difference, since gfar_process_frame
      gets inlined, and so the increment code gets moved out of line (I have
      marked the if statment with * and the increment code within "-----"):
      
        ------------------------- as is currently ------------------
           4d14:       80 61 00 18     lwz     r3,24(r1)
           4d18:       7f c4 f3 78     mr      r4,r30
           4d1c:       48 00 00 01     bl      4d1c <gfar_clean_rx_ring+0x10c>
        *  4d20:       2f 83 00 04     cmpwi   cr7,r3,4
           4d24:       40 9e 00 1c     bne-    cr7,4d40
      <gfar_clean_rx_ring+0x130>
              ----------------------------
           4d28:       81 3c 01 f8     lwz     r9,504(r28)
           4d2c:       81 5c 01 fc     lwz     r10,508(r28)
           4d30:       31 4a 00 01     addic   r10,r10,1
           4d34:       7d 29 01 94     addze   r9,r9
           4d38:       91 3c 01 f8     stw     r9,504(r28)
           4d3c:       91 5c 01 fc     stw     r10,508(r28)
              ----------------------------
           4d40:       a0 1f 00 24     lhz     r0,36(r31)
           4d44:       81 3f 00 00     lwz     r9,0(r31)
           4d48:       7f a4 eb 78     mr      r4,r29
           4d4c:       7f e3 fb 78     mr      r3,r31
      
        -------------------------- unlikely ------------------------
           4d14:       80 61 00 18     lwz     r3,24(r1)
           4d18:       7f c4 f3 78     mr      r4,r30
           4d1c:       48 00 00 01     bl      4d1c <gfar_clean_rx_ring+0x10c>
        *  4d20:       2f 83 00 04     cmpwi   cr7,r3,4
           4d24:       41 9e 03 94     beq-    cr7,50b8
      <gfar_clean_rx_ring+0x4a8>
           4d28:       a0 1f 00 24     lhz     r0,36(r31)
           4d2c:       81 3f 00 00     lwz     r9,0(r31)
           4d30:       7f a4 eb 78     mr      r4,r29
           4d34:       7f e3 fb 78     mr      r3,r31
      [...]
           50b8:       81 3c 01 f8     lwz     r9,504(r28)
           50bc:       81 5c 01 fc     lwz     r10,508(r28)
           50c0:       31 4a 00 01     addic   r10,r10,1
           50c4:       7d 29 01 94     addze   r9,r9
           50c8:       91 3c 01 f8     stw     r9,504(r28)
           50cc:       91 5c 01 fc     stw     r10,508(r28)
           50d0:       4b ff fc 58     b       4d28 <gfar_clean_rx_ring+0x118>
      
      So, the increment does actually get moved ~1k away."
      
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd9e89f2
    • Claudiu Manoil's avatar
      gianfar: Cleanup and optimize struct gfar_private · b597d20d
      Claudiu Manoil authored
      Group run-time critical fields within the 1st cacheline (32B)
      followed by the tx|rx_queue reference arrays and the interrupt
      group instances (gfargrp), all cacheline aligned.
      
      This has several benefits. Firstly comes the performance benefit
      by having the members required by the driver's hot path re-grouped
      in the structure's first cache lines, whereas the unimportant
      members were pushed towards the end of the struct.
      Another benefit comes from eliminating a 24 byte memory hole that
      was rendering gfar_priv's 2nd cacheline useless. The default gcc
      layout of gfar_private leaves an implicit 24 byte hole after the
      errata (enum) member. This patch fixes it.
      
      The uchar bitfields were pushed towards the end of the struct
      as these are not run-time performance critical (used for init
      time operations). Because there is no other 2 byte member
      around to couple the uchar bitfields memeber with, we will
      have an addititnal 2 byte hole after the bitfields. This is
      unsignificant however, and it doesn't influence gfar_priv's
      size, because the whole structure is padded to be a 32B multiple.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b597d20d
    • Claudiu Manoil's avatar
      gianfar: Add device ref (dev) in gfar_private · 369ec162
      Claudiu Manoil authored
      Use device pointer (dev) to simplify the code and to
      avoid double indirections, especially on the hot path.
      
      Basically, instead of accessing priv to get the ofdev
      reference and then accessing the ofdev structure to
      dereference the needed dev pointer, we will get the
      dev pointer directly from priv.
      
      The dev pointer is required on the hot path, see gfar_new_rxbdp
      or gfar_clean_rx_ring (or xmit), and this patch makes
      it available directly from priv's 1st cacheline.
      
      This change is reflected at asm level too, taking (the hot)
      gfar_new_rxbdp():
      initial version -
          18c0:	7c 7e 1b 78 	mr      r30,r3
      
          18d0:	81 69 04 3c 	lwz     r11,1084(r9)
      
          18d8:	34 6b 00 10 	addic.  r3,r11,16
          18dc:	41 82 00 08 	beq-    18e4
      
      patched version -
          18d0:	80 69 04 38 	lwz     r3,1080(r9)
      
          18d8:	2f 83 00 00 	cmpwi   cr7,r3,0
          18dc:	41 9e 00 08 	beq-    cr7,18e4
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      369ec162