1. 17 Jun, 2013 8 commits
    • Ying Xue's avatar
      tipc: introduce new TIPC server infrastructure · c5fa7b3c
      Ying Xue authored
      TIPC has two internal servers, one providing a subscription
      service for topology events, and another providing the
      configuration interface. These servers have previously been running
      in BH context, accessing the TIPC-port (aka native) API directly.
      Apart from these servers, even the TIPC socket implementation is
      partially built on this API.
      
      As this API may simultaneously be called via different paths and in
      different contexts, a complex and costly lock policiy is required
      in order to protect TIPC internal resources.
      
      To eliminate the need for this complex lock policiy, we introduce
      a new, generic service API that uses kernel sockets for message
      passing instead of the native API. Once the toplogy and configuration
      servers are converted to use this new service, all code pertaining
      to the native API can be removed. This entails a significant
      reduction in code amount and complexity, and opens up for a complete
      rework of the locking policy in TIPC.
      
      The new service also solves another problem:
      
      As the current topology server works in BH context, it cannot easily
      be blocked when sending of events fails due to congestion. In such
      cases events may have to be silently dropped, something that is
      unacceptable. Therefore, the new service keeps a dedicated outbound
      queue receiving messages from BH context. Once messages are
      inserted into this queue, we will immediately schedule a work from a
      special workqueue. This way, messages/events from the topology server
      are in reality sent in process context, and the server can block
      if necessary.
      
      Analogously, there is a new workqueue for receiving messages. Once a
      notification about an arriving message is received in BH context, we
      schedule a work from the receive workqueue to do the job of
      receiving the message in process context.
      
      As both sending and receive messages are now finished in processes,
      subscribed events cannot be dropped any more.
      
      As of this commit, this new server infrastructure is built, but
      not actually yet called by the existing TIPC code, but since the
      conversion changes required in order to use it are significant,
      the addition is kept here as a separate commit.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5fa7b3c
    • Erik Hugne's avatar
      tipc: allow implicit connect for stream sockets · 5d21cb70
      Erik Hugne authored
      TIPC's implied connect feature, aka piggyback connect, allows
      applications to save one syscall and all SYN/SYN-ACK signalling
      overhead when setting up a connection.  Until now, this has only
      been supported for SEQPACKET sockets.  Here, we make it possible
      to use this feature even with stream sockets.
      
      At the connecting side, the connection is completed when the
      first data message arrives from the accepting peer.  This means
      that we must allow the connecting user to call blocking recv()
      before the socket has reached state SS_CONNECTED.  So we must must
      relax the state machine check at recv_stream(), and allow the
      recv() call even if socket is in state SS_CONNECTING.
      Signed-off-by: default avatarErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d21cb70
    • Ying Xue's avatar
      tipc: change socket buffer overflow control to respect sk_rcvbuf · cc79dd1b
      Ying Xue authored
      As per feedback from the netdev community, we change the buffer
      overflow protection algorithm in receiving sockets so that it
      always respects the nominal upper limit set in sk_rcvbuf.
      
      Instead of scaling up from a small sk_rcvbuf value, which leads to
      violation of the configured sk_rcvbuf limit, we now calculate the
      weighted per-message limit by scaling down from a much bigger value,
      still in the same field, according to the importance priority of the
      received message.
      
      To allow for administrative tunability of the socket receive buffer
      size, we create a tipc_rmem sysctl variable to allow the user to
      configure an even bigger value via sysctl command.  It is a size of
      three (min/default/max) to be consistent with things like tcp_rmem.
      
      By default, the value initialized in tipc_rmem[1] is equal to the
      receive socket size needed by a TIPC_CRITICAL_IMPORTANCE message.
      This value is also set as the default value of sk_rcvbuf.
      Originally-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      [Ying: added sysctl variation to Jon's original patch]
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      [PG: don't compile sysctl.c if not config'd; add Documentation]
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc79dd1b
    • Ying Xue's avatar
      tipc: update code comments to reflect new uapi header path · 8941bbcd
      Ying Xue authored
      Files tipc.h and tipc_config.h were moved to uapi directory, but
      the corresponding comments were not updated at the same time.
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8941bbcd
    • Eliezer Tamir's avatar
      net: add socket option for low latency polling · dafcc438
      Eliezer Tamir authored
      adds a socket option for low latency polling.
      This allows overriding the global sysctl value with a per-socket one.
      Unexport sysctl_net_ll_poll since for now it's not needed in modules.
      Signed-off-by: default avatarEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dafcc438
    • Eliezer Tamir's avatar
      net: remove NET_LL_RX_POLL config menue · 89bf1b5a
      Eliezer Tamir authored
      Remove NET_LL_RX_POLL from the config menu.
      Change default to y.
      Busy polling still needs to be enabled at run time.
      Signed-off-by: default avatarEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89bf1b5a
    • Eliezer Tamir's avatar
      net: convert low latency sockets to sched_clock() · 9a3c71aa
      Eliezer Tamir authored
      Use sched_clock() instead of get_cycles().
      We can use sched_clock() because we don't care much about accuracy.
      Remove the dependency on X86_TSC
      Signed-off-by: default avatarEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a3c71aa
    • Eliezer Tamir's avatar
      net: change sysctl_net_ll_poll into an unsigned int · eb6db622
      Eliezer Tamir authored
      There is no reason for sysctl_net_ll_poll to be an unsigned long.
      Change it into an unsigned int.
      Fix the proc handler.
      Signed-off-by: default avatarEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb6db622
  2. 14 Jun, 2013 18 commits
  3. 13 Jun, 2013 10 commits
  4. 12 Jun, 2013 4 commits
    • Cong Wang's avatar
      net: add doc for ip_early_demux sysctl · e3d73bce
      Cong Wang authored
      commit 6648bd7e (ipv4: Add sysctl knob to control
      early socket demux) introduced such sysctl, but forgot to add
      doc into Documentation/networking/ip-sysctl.txt. This patch adds it.
      
      Basically I grab the doc from the description of commit 41063e9d
      (ipv4: Early TCP socket demux.) and the above commit.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <amwang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3d73bce
    • Pavel Emelyanov's avatar
      tun: Turn tun_flow_init() into void fn · 944a1376
      Pavel Emelyanov authored
      This routine doesn't fail since 9fdc6bef (tuntap: dont use a private kmem_cache)
      so it makes sense to compact the code a little bit.
      Signed-off-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      944a1376
    • Pavel Emelyanov's avatar
      tun: Report "persist" flag to userspace · 274038f8
      Pavel Emelyanov authored
      The TUN_PERSIST flag is not reported at all -- both TUNGETIFF, and sysfs
      "flags" attribute skip one. Knowing whether a device is persistent or not
      is critical for checkpoint-restore, thus I propose to add the read-only
      IFF_PERSIST one for this.
      
      Setting this new IFF_PERSIST is hardly possible, as TUNSETIFF doesn't check
      for unknown flags being zero and thus there can be trash.
      Signed-off-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      274038f8
    • Eric Dumazet's avatar
      udp: fix two sparse errors · 7c0cadc6
      Eric Dumazet authored
      commit ba418fa3 ("soreuseport: UDP/IPv4 implementation")
      added following sparse errors :
      
      net/ipv4/udp.c:433:60: warning: cast from restricted __be16
      net/ipv4/udp.c:433:60: warning: incorrect type in argument 1 (different base types)
      net/ipv4/udp.c:433:60:    expected unsigned short [unsigned] [usertype] val
      net/ipv4/udp.c:433:60:    got restricted __be16 [usertype] sport
      net/ipv4/udp.c:433:60: warning: cast from restricted __be16
      net/ipv4/udp.c:433:60: warning: cast from restricted __be16
      net/ipv4/udp.c:514:60: warning: cast from restricted __be16
      net/ipv4/udp.c:514:60: warning: incorrect type in argument 1 (different base types)
      net/ipv4/udp.c:514:60:    expected unsigned short [unsigned] [usertype] val
      net/ipv4/udp.c:514:60:    got restricted __be16 [usertype] sport
      net/ipv4/udp.c:514:60: warning: cast from restricted __be16
      net/ipv4/udp.c:514:60: warning: cast from restricted __be16
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c0cadc6