1. 07 Nov, 2007 40 commits
    • Vlad Yasevich's avatar
      SCTP: Allow ADD_IP to work with AUTH for backward compatibility. · 73d9c4fd
      Vlad Yasevich authored
      This patch adds a tunable that will allow ADD_IP to work without
      AUTH for backward compatibility.  The default value is off since
      the default value for ADD_IP is off as well.  People who need
      to use ADD-IP with older implementations take risks of connection
      hijacking and should consider upgrading or turning this tunable on.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      73d9c4fd
    • Vlad Yasevich's avatar
    • Vlad Yasevich's avatar
      SCTP: Update RCU handling during the ADD-IP case · 0ed90fb0
      Vlad Yasevich authored
      After learning more about rcu, it looks like the ADD-IP hadling
      doesn't need to call call_rcu_bh.  All the rcu critical sections
      use rcu_read_lock, so using call_rcu_bh is wrong here.
      Now, restore the local_bh_disable() code blocks and use normal
      call_rcu() calls.  Also restore the missing return statement.
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      0ed90fb0
    • Vlad Yasevich's avatar
      SCTP: Fix difference cases of retransmit. · b6157d8e
      Vlad Yasevich authored
      Commit d0ce9291 broke several retransmit
      cases including fast retransmit.  The reason is that we should
      only delay by rto while doing retranmists as a result of a timeout.
      Retransmit as a result of path mtu discover, fast retransmit, or
      other evernts that should trigger immidiate retransmissions got broken.
      
      Also, since rto is doubled prior to marking of packets elegable for
      retransmission, we never marked correct chunks anyway.
      
      The fix is provide a reason for a given retransmission so that we
      can mark chunks appropriately and to save the old rto value to do
      comparisons against.
      
      All regressions tests passed with this code.
      
      Spotted by Wei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      b6157d8e
    • Wei Yongjun's avatar
      SCTP : Fix to process bundled ASCONF chunk correctly · f3830ccc
      Wei Yongjun authored
      If ASCONF chunk is bundled with other chunks as the first chunk, when
      process the ASCONF parameters, full packet data will be process as the
      parameters of the ASCONF chunk, not only the real parameters. So if you
      send a ASCONF chunk bundled with other chunks, you will get an unexpect
      result.
      This problem also exists when ASCONF-ACK chunk is bundled with other chunks.
      
      This patch fix this problem.
      Signed-off-by: default avatarWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      f3830ccc
    • Wei Yongjun's avatar
      SCTP : Fix bad formatted comment in outqueue.c · 64b0812b
      Wei Yongjun authored
      Just fix the bad format of the comment in outqueue.c.
      Signed-off-by: default avatarWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      64b0812b
    • Patrick McHardy's avatar
      [NETLINK]: Fix unicast timeouts · c3d8d1e3
      Patrick McHardy authored
      Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts
      by moving the schedule_timeout() call to a new function that doesn't
      propagate the remaining timeout back to the caller. This means on each
      retry we start with the full timeout again.
      
      ipc/mqueue.c seems to actually want to wait indefinitely so this
      behaviour is retained.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3d8d1e3
    • Eric Dumazet's avatar
      [INET]: Remove per bucket rwlock in tcp/dccp ehash table. · 230140cf
      Eric Dumazet authored
      As done two years ago on IP route cache table (commit
      22c047cc) , we can avoid using one
      lock per hash bucket for the huge TCP/DCCP hash tables.
      
      On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
      litle performance differences. (we hit a different cache line for the
      rwlock, but then the bucket cache line have a better sharing factor
      among cpus, since we dirty it less often). For netstat or ss commands
      that want a full scan of hash table, we perform fewer memory accesses.
      
      Using a 'small' table of hashed rwlocks should be more than enough to
      provide correct SMP concurrency between different buckets, without
      using too much memory. Sizing of this table depends on
      num_possible_cpus() and various CONFIG settings.
      
      This patch provides some locking abstraction that may ease a future
      work using a different model for TCP/DCCP table.
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Acked-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      230140cf
    • Rumen G. Bogdanovski's avatar
      [IPVS]: Synchronize closing of Connections · efac5276
      Rumen G. Bogdanovski authored
      This patch makes the master daemon to sync the connection when it is about
      to close.  This makes the connections on the backup to close or timeout
      according their state.  Before the sync was performed only if the
      connection is in ESTABLISHED state which always made the connections to
      timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
      ([IPVS]: use proper timeout instead of fixed value) effectively did nothing
      more than increasing this to 15 minutes (Established state timeout).  So
      this patch makes use of proper timeout since it syncs the connections on
      status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
      However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
      Otherwise we will just have to wait for the ESTABLISHED state timeout. As
      it is without this patch.  This way the number of the hanging connections
      on the backup is kept to minimum. And very few of them will be left to
      timeout with a long timeout.
      
      This is important if we want to make use of the fix for the real server
      overcommit on master/backup fail-over.
      Signed-off-by: default avatarRumen G. Bogdanovski <rumen@voicecho.com>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efac5276
    • Rumen G. Bogdanovski's avatar
      [IPVS]: Bind connections on stanby if the destination exists · 1e356f9c
      Rumen G. Bogdanovski authored
      This patch fixes the problem with node overload on director fail-over.
      Given the scenario: 2 nodes each accepting 3 connections at a time and 2
      directors, director failover occurs when the nodes are fully loaded (6
      connections to the cluster) in this case the new director will assign
      another 6 connections to the cluster, If the same real servers exist
      there.
      
      The problem turned to be in not binding the inherited connections to
      the real servers (destinations) on the backup director. Therefore:
      "ipvsadm -l" reports 0 connections:
      root@test2:~# ipvsadm -l
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
      TCP  test2.local:5999 wlc
        -> node473.local:5999           Route   1000   0          0
        -> node484.local:5999           Route   1000   0          0
      
      while "ipvs -lnc" is right
      root@test2:~# ipvsadm -lnc
      IPVS connection entries
      pro expire state       source             virtual            destination
      TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
      192.168.0.51:5999
      TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
      192.168.0.52:5999
      
      So the patch I am sending fixes the problem by binding the received
      connections to the appropriate service on the backup director, if it
      exists, else the connection will be handled the old way. So if the
      master and the backup directors are synchronized in terms of real
      services there will be no problem with server over-committing since
      new connections will not be created on the nonexistent real services
      on the backup. However if the service is created later on the backup,
      the binding will be performed when the next connection update is
      received. With this patch the inherited connections will show as
      inactive on the backup:
      
      root@test2:~# ipvsadm -l
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
      TCP  test2.local:5999 wlc
        -> node473.local:5999           Route   1000   0          1
        -> node484.local:5999           Route   1000   0          1
      
      rumen@test2:~$ cat /proc/net/ip_vs
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress:Port Scheduler Flags
        -> RemoteAddress:Port Forward Weight ActiveConn InActConn
      TCP  C0A800DE:176F wlc
        -> C0A80033:176F      Route   1000   0          1
        -> C0A80032:176F      Route   1000   0          1
      
      Regards,
      Rumen Bogdanovski
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarRumen G. Bogdanovski <rumen@voicecho.com>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      1e356f9c
    • Adrian Bunk's avatar
      [NET]: Remove Documentation/networking/pt.txt · c183783e
      Adrian Bunk authored
      There's no no point in keeping documentation for a driver that was
      removed many years ago.
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      Acked-by: default avatarAlan Cox <alan@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c183783e
    • Adrian Bunk's avatar
      [NET]: Remove Documentation/networking/routing.txt · e8b2cadd
      Adrian Bunk authored
      This file is so outdated that I can't see any value in keeping it.
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8b2cadd
    • Adrian Bunk's avatar
      [NET]: Remove Documentation/networking/ncsa-telnet · 17a83c75
      Adrian Bunk authored
      Newsflash: There once was a version of NCSA telnet that had some bug.
      
      Spotted by Pekka Pietikainen.
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      17a83c75
    • Adrian Bunk's avatar
      [NET]: Remove comx driver docs. · 915590cf
      Adrian Bunk authored
      The drivers have already been removed 3.5 years ago.
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      Acked-by: default avatarAlan Cox <alan@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      915590cf
    • Adrian Bunk's avatar
      [NET]: Remove Documentation/networking/Configurable · 240e5464
      Adrian Bunk authored
      After more than 11 years this file does no longer contain much useful
      information.
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      240e5464
    • Pavel Emelyanov's avatar
      [NET]: Clean proto_(un)register from in-code ifdefs · b733c007
      Pavel Emelyanov authored
      The struct proto has the per-cpu "inuse" counter, which is handled
      with a special care. All the handling code hides under the ifdef
      CONFIG_SMP and it introduces some code duplication and makes it
      look worse than it could.
      
      Clean this.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b733c007
    • Herbert Xu's avatar
      [IPSEC]: Fix crypto_alloc_comp error checking · 4999f362
      Herbert Xu authored
      The function crypto_alloc_comp returns an errno instead of NULL
      to indicate error.  So it needs to be tested with IS_ERR.
      
      This is based on a patch by Vicen Beltran Querol.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4999f362
    • Patrick McHardy's avatar
      [VLAN]: Fix SET_VLAN_INGRESS_PRIORITY_CMD ioctl · fffe470a
      Patrick McHardy authored
      Based on report and patch by Doug Kehn <rdkehn@yahoo.com>:
      
      vconfig returns the following error when attempting to execute the
      set_ingress_map command:
      
      vconfig: socket or ioctl error for set_ingress_map: Operation not permitted
      
      In vlan.c, vlan_ioctl_handler for SET_VLAN_INGRESS_PRIORITY_CMD
      sets err = -EPERM and calls vlan_dev_set_ingress_priority.
      vlan_dev_set_ingress_priority is a void function so err remains
      at -EPERM and results in the vconfig error (even though the ingress
      map was set).
      
      Fix by setting err = 0 after the vlan_dev_set_ingress_priority call.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fffe470a
    • Johann Felix Soden's avatar
      [NETNS]: Fix compiler error in net_namespace.c · 45a19b0a
      Johann Felix Soden authored
      Because net_free is called by copy_net_ns before its declaration, the
      compiler gives an error. This patch puts net_free before copy_net_ns
      to fix this.
      
      The compiler error:
      net/core/net_namespace.c: In function 'copy_net_ns':
      net/core/net_namespace.c:97: error: implicit declaration of function 'net_free'
      net/core/net_namespace.c: At top level:
      net/core/net_namespace.c:104: warning: conflicting types for 'net_free'
      net/core/net_namespace.c:104: error: static declaration of 'net_free' follows non-static declaration
      net/core/net_namespace.c:97: error: previous implicit declaration of 'net_free' was here
      
      The error was introduced by the '[NET]: Hide the dead code in the
      net_namespace.c' patch (6a1a3b9f).
      Signed-off-by: default avatarJohann Felix Soden <johfel@users.sourceforge.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45a19b0a
    • Alan Cox's avatar
      [TTY]: Use tty_mode_ioctl() in network drivers. · d0127539
      Alan Cox authored
      We conciously make a change here - we permit mode and speed setting to
      be done in things like SLIP mode. There isn't actually a technical
      reason to disallow this. It's usually a silly thing to do but we can
      do it and soemone might wish to do so.
      Signed-off-by: default avatarAlan Cox <alan@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0127539
    • Alan Cox's avatar
      [TTY]: Fix network driver interactions with TCGET/SET calls. · 0fc00e24
      Alan Cox authored
      Dave Miller noted various cases where line disciplines for things like
      ppp go poking around in termios themselves in ways that broke with the
      new termios code. Rather than have them all learning about termios
      internals provide proper methods for this
      
      - tty_mode_ioctl()
      
      	This handles all the terminal mode handling for speed/carrier
      etc and none of the methods are ldisc dependant so they can be called
      by any user
      
      - tty_perform_flush()
      
      	This extracts the flush functionality and enables pppd the ppp
      layer to share it cleanly.
      
      The existing n_tty_ioctl code is refactored in this patch to provide
      the new functions and to call them itself appropriately. This patch
      has no (intended) behaviour changes and simply prepares for the other
      fixes.
      Signed-off-by: default avatarAlan Cox <alan@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fc00e24
    • Radu Rendec's avatar
      [PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks. · 543821c6
      Radu Rendec authored
      While trying to implement u32 hashes in my shaping machine I ran into
      a possible bug in the u32 hash/bucket computing algorithm
      (net/sched/cls_u32.c).
      
      The problem occurs only with hash masks that extend over the octet
      boundary, on little endian machines (where htonl() actually does
      something).
      
      Let's say that I would like to use 0x3fc0 as the hash mask. This means
      8 contiguous "1" bits starting at b6. With such a mask, the expected
      (and logical) behavior is to hash any address in, for instance,
      192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in
      bucket 1, then 192.168.0.128/26 in bucket 2 and so on.
      
      This is exactly what would happen on a big endian machine, but on
      little endian machines, what would actually happen with current
      implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl()
      in the userspace tool and then applied to 192.168.x.x in the u32
      classifier. When shifting right by 16 bits (rank of first "1" bit in
      the reversed mask) and applying the divisor mask (0xff for divisor
      256), what would actually remain is 0x3f applied on the "168" octet of
      the address.
      
      One could say is this can be easily worked around by taking endianness
      into account in userspace and supplying an appropriate mask (0xfc03)
      that would be turned into contiguous "1" bits when reversed
      (0x03fc0000). But the actual problem is the network address (inside
      the packet) not being converted to host order, but used as a
      host-order value when computing the bucket.
      
      Let's say the network address is written as n31 n30 ... n0, with n0
      being the least significant bit. When used directly (without any
      conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15
      etc in the machine's registers. Thus bits n7 and n8 would no longer be
      adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be
      consecutive.
      
      The fix is to apply ntohl() on the hmask before computing fshift,
      and in u32_hash_fold() convert the packet data to host order before
      shifting down by fshift.
      
      With helpful feedback from Jamal Hadi Salim and Jarek Poplawski.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      543821c6
    • Jiri Olsa's avatar
      [NET]: Removing duplicit #includes · 40208d71
      Jiri Olsa authored
      Removing duplicit #includes for net/
      Signed-off-by: default avatarJiri Olsa <olsajiri@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40208d71
    • Adrian Bunk's avatar
      [NET]: Let USB_USBNET always select MII. · 4aa92cd9
      Adrian Bunk authored
      All this USB_USBNET_MII trickery is simply not worth it considering how
      few code it saves.
      
      As a side effect, this also fixes the following compile error reported
      by Toralf Frster:
      
      <--  snip  -->
      
      ...
        LD      .tmp_vmlinux1
      drivers/built-in.o: In function `usbnet_set_settings':
      (.text+0xf1876): undefined reference to `mii_ethtool_sset'
      drivers/built-in.o: In function `usbnet_get_settings':
      (.text+0xf1836): undefined reference to `mii_ethtool_gset'
      drivers/built-in.o: In function `usbnet_get_link':
      (.text+0xf18d6): undefined reference to `mii_link_ok'
      drivers/built-in.o: In function `usbnet_nway_reset':
      (.text+0xf18f6): undefined reference to `mii_nway_restart'
      make: *** [.tmp_vmlinux1] Error 1
      
      <--  snip  -->
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4aa92cd9
    • David S. Miller's avatar
      [RRUNNER]: Do not muck with sysctl_{r,w}mem_max · df1e6e54
      David S. Miller authored
      Drivers have no business changing these values.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df1e6e54
    • David S. Miller's avatar
      [DLM] lowcomms: Do not muck with sysctl_rmem_max. · df61c952
      David S. Miller authored
      Use SO_RCVBUFFORCE instead.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df61c952
    • Pavel Emelyanov's avatar
      [IPV4]: Compact some ifdefs in the fib code. · c3e9a353
      Pavel Emelyanov authored
      There are places that check for CONFIG_IP_MULTIPLE_TABLES
      twice in the same file, but the internals of these #ifdefs
      can be merged.
      
      As a side effect - remove one ifdef from inside a function.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3e9a353
    • Rusty Russell's avatar
      [VETH]: Clarify "virtual ethernet device" to "virtual ethernet pair device". · 6a9a0250
      Rusty Russell authored
      It'd also be nice to mention "containers" somewhere in the help text
      (I'm assuming that's what it's for?).
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a9a0250
    • David S. Miller's avatar
      [NET]: Kill proc_net_create() · 44656ba1
      David S. Miller authored
      There are no more users.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44656ba1
    • Alexey Dobriyan's avatar
      [IPV6]: Convert /proc/net/ipv6_route to seq_file interface · 33120b30
      Alexey Dobriyan authored
      This removes last proc_net_create() user. Kudos to Benjamin Thery and
      Stephen Hemminger for comments on previous version.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33120b30
    • Evgeniy Polyakov's avatar
      [PKT_SCHED]: Fix OOPS when removing devices from a teql queuing discipline · 4f9f8311
      Evgeniy Polyakov authored
      tecl_reset() is called from deactivate and qdisc is set to noop already,
      but subsequent teql_xmit does not know about it and dereference private
      data as teql qdisc and thus oopses.
      not catch it first :)
      Signed-off-by: default avatarEvgeniy Polyakov <johnpol@2ka.mipt.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f9f8311
    • David S. Miller's avatar
      c62cf5cb
    • Eric Dumazet's avatar
      [SCTP]: Use the {DEFINE|REF}_PROTO_INUSE infrastructure · 8295b6d9
      Eric Dumazet authored
      Trivial patch to make "sctcp,sctpv6" protocols uses the fast "inuse
      sockets" infrastructure
      
      Each protocol use then a static percpu var, instead of a dynamic one.
      This saves some ram and some cpu cycles
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8295b6d9
    • Eric Dumazet's avatar
      [IPV6]: Use the {DEFINE|REF}_PROTO_INUSE infrastructure · c5a432f1
      Eric Dumazet authored
      Trivial patch to make "tcpv6,udpv6,udplitev6,rawv6" protocols uses the
      fast "inuse sockets" infrastructure
      
      Each protocol use then a static percpu var, instead of a dynamic one.
      This saves some ram and some cpu cycles
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5a432f1
    • Eric Dumazet's avatar
      [IPV4]: Use the {DEFINE|REF}_PROTO_INUSE infrastructure · 47a31a6f
      Eric Dumazet authored
      Trivial patch to make "tcp,udp,udplite,raw" protocols uses the fast
      "inuse sockets" infrastructure
      
      Each protocol use then a static percpu var, instead of a dynamic one.
      This saves some ram and some cpu cycles
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47a31a6f
    • Eric Dumazet's avatar
      [NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way. · 286ab3d4
      Eric Dumazet authored
      "struct proto" currently uses an array stats[NR_CPUS] to track change on
      'inuse' sockets per protocol.
      
      If NR_CPUS is big, this means we use a big memory area for this.
      Moreover, all this memory area is located on a single node on NUMA
      machines, increasing memory pressure on the boot node.
      
      In this patch, I tried to :
      
      - Keep a fast !CONFIG_SMP implementation
      - Keep a fast CONFIG_SMP implementation for often used protocols
      (tcp,udp,raw,...)
      - Introduce a NUMA efficient implementation
      
      Some helper macros are defined in include/net/sock.h
      These macros take into account CONFIG_SMP
      
      If a "struct proto" is declared without using DEFINE_PROTO_INUSE /
      REF_PROTO_INUSE
      macros, it will automatically use a default implementation, using a
      dynamically allocated percpu zone.
      This default implementation will be NUMA efficient, but might use 32/64
      bytes per possible cpu
      because of current alloc_percpu() implementation.
      However it still should be better than previous implementation based on
      stats[NR_CPUS] field.
      
      When a "struct proto" is changed to use the new macros, we use a single
      static "int" percpu variable,
      lowering the memory and cpu costs, still preserving NUMA efficiency.
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      286ab3d4
    • James Chapman's avatar
      [PPP]: L2TP: Fix oops in transmit and receive paths · 91781004
      James Chapman authored
      Changes made on 18-sep to fix skb handling in the pppol2tp driver
      broke the transmit and receive paths. Users are only running into this
      now because distros are now using 2.6.23 and I must have messed up
      when I tested the change.
      
      For receive, we now do our own calculation of how much to pull from
      the skb (variable length L2TP header) rather than using
      skb_transport_offset(). Also, if the skb isn't a data packet, it must
      be passed back to UDP with skb->data pointing to the UDP header.
      
      For transmit, make sure skb->sk is set up because ip_queue_xmit()
      needs it.
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91781004
    • Pavel Emelyanov's avatar
      [IPV4]: Clean the ip_sockglue.c from some ugly ifdefs · 6a9fb947
      Pavel Emelyanov authored
      The #idfed CONFIG_IP_MROUTE is sometimes places inside the if-s,
      which looks completely bad. Similar ifdefs inside the functions
      looks a bit better, but they are also not recommended to be used.
      
      Provide an ifdef-ed ip_mroute_opt() helper to cleanup the code.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a9fb947
    • Alexey Dobriyan's avatar
      [DECNET]: "addr" module param can't be __initdata · 4e058063
      Alexey Dobriyan authored
      sysfs keeps references to module parameters via /sys/module/*/parameters,
      so marking them as __initdata can't work.
      
      Steps to reproduce:
      
      	modprobe decnet
      	cat /sys/module/decnet/parameters/addr
      
      BUG: unable to handle kernel paging request at virtual address f88cd410
      printing eip: c043dfd1 *pdpt = 0000000000004001 *pde = 0000000004408067 *pte = 0000000000000000
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: decnet sunrpc af_packet ipv6 binfmt_misc dm_mirror dm_multipath dm_mod sbs sbshc fan dock battery backlight ac power_supply parport loop rtc_cmos serio_raw rtc_core rtc_lib button amd_rng sr_mod cdrom shpchp pci_hotplug ehci_hcd ohci_hcd uhci_hcd usbcore
      Pid: 2099, comm: cat Not tainted (2.6.24-rc1-b1d08ac0-bloat #6)
      EIP: 0060:[<c043dfd1>] EFLAGS: 00210286 CPU: 1
      EIP is at param_get_int+0x6/0x20
      EAX: c5c87000 EBX: 00000000 ECX: 000080d0 EDX: f88cd410
      ESI: f8a108f8 EDI: c5c87000 EBP: 00000000 ESP: c5c97f00
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process cat (pid: 2099, ti=c5c97000 task=c641ee10 task.ti=c5c97000)
      Stack: 00000000 f8a108f8 c5c87000 c043db6b f8a108f1 00000124 c043de1a c043db2f
             f88cd410 ffffffff c5c87000 f8a16bc8 f8a16bc8 c043dd69 c043dd54 c5dd5078
             c043dbc8 c5cc7580 c06ee64c c5d679f8 c04c431f c641f480 c641f484 00001000
      Call Trace:
       [<c043db6b>] param_array_get+0x3c/0x62
       [<c043de1a>] param_array_set+0x0/0xdf
       [<c043db2f>] param_array_get+0x0/0x62
       [<c043dd69>] param_attr_show+0x15/0x2d
       [<c043dd54>] param_attr_show+0x0/0x2d
       [<c043dbc8>] module_attr_show+0x1a/0x1e
       [<c04c431f>] sysfs_read_file+0x7c/0xd9
       [<c04c42a3>] sysfs_read_file+0x0/0xd9
       [<c048d4b2>] vfs_read+0x88/0x134
       [<c042090b>] do_page_fault+0x0/0x7d5
       [<c048d920>] sys_read+0x41/0x67
       [<c04080fa>] sysenter_past_esp+0x6b/0xc1
       =======================
      Code: 00 83 c4 0c c3 83 ec 0c 8b 52 10 8b 12 c7 44 24 04 27 dd 6c c0 89 04 24 89 54 24 08 e8 ea 01 0c 00 83 c4 0c c3 83 ec 0c 8b 52 10 <8b> 12 c7 44 24 04 58 8c 6a c0 89 04 24 89 54 24 08 e8 ca 01 0c
      EIP: [<c043dfd1>] param_get_int+0x6/0x20 SS:ESP 0068:c5c97f00
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e058063
    • Mitsuru Chinen's avatar
      [IPv6] SNMP: Restore Udp6InErrors incrementation · 7a0ff716
      Mitsuru Chinen authored
      As the checksum verification is postponed till user calls recv or poll,
      the inrementation of Udp6InErrors counter should be also postponed.
      Currently, it is postponed in non-blocking operation case. However it
      should be postponed in all case like the IPv4 code.
      Signed-off-by: default avatarMitsuru Chinen <mitch@linux.vnet.ibm.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a0ff716