1. 19 Nov, 2010 3 commits
    • Eric Dumazet's avatar
      filter: use reciprocal divide · c26aed40
      Eric Dumazet authored
      At compile time, we can replace the DIV_K instruction (divide by a
      constant value) by a reciprocal divide.
      
      At exec time, the expensive divide is replaced by a multiply, a less
      expensive operation on most processors.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c26aed40
    • Eric Dumazet's avatar
      filter: cleanup codes[] init · 8c1592d6
      Eric Dumazet authored
      Starting the translated instruction to 1 instead of 0 allows us to
      remove one descrement at check time and makes codes[] array init
      cleaner.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c1592d6
    • Eric Dumazet's avatar
      filter: optimize sk_run_filter · 93aaae2e
      Eric Dumazet authored
      Remove pc variable to avoid arithmetic to compute fentry at each filter
      instruction. Jumps directly manipulate fentry pointer.
      
      As the last instruction of filter[] is guaranteed to be a RETURN, and
      all jumps are before the last instruction, we dont need to check filter
      bounds (number of instructions in filter array) at each iteration, so we
      remove it from sk_run_filter() params.
      
      On x86_32 remove f_k var introduced in commit 57fe93b3
      (filter: make sure filters dont read uninitialized memory)
      
      Note : We could use a CONFIG_ARCH_HAS_{FEW|MANY}_REGISTERS in order to
      avoid too many ifdefs in this code.
      
      This helps compiler to use cpu registers to hold fentry and A
      accumulator.
      
      On x86_32, this saves 401 bytes, and more important, sk_run_filter()
      runs much faster because less register pressure (One less conditional
      branch per BPF instruction)
      
      # size net/core/filter.o net/core/filter_pre.o
         text    data     bss     dec     hex filename
         2948       0       0    2948     b84 net/core/filter.o
         3349       0       0    3349     d15 net/core/filter_pre.o
      
      on x86_64 :
      # size net/core/filter.o net/core/filter_pre.o
         text    data     bss     dec     hex filename
         5173       0       0    5173    1435 net/core/filter.o
         5224       0       0    5224    1468 net/core/filter_pre.o
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93aaae2e
  2. 18 Nov, 2010 9 commits
  3. 17 Nov, 2010 28 commits