1. 04 Sep, 2008 40 commits
    • Gerrit Renker's avatar
      dccp ccid-3: Remove redundant 'options_received' struct · ce177ae2
      Gerrit Renker authored
      The `options_received' struct is redundant, since it re-duplicates the existing
      `p' and `x_recv' fields. This patch removes the sub-struct and migrates the
      format conversion operations (cf. below) to ccid3_hc_tx_parse_options().
      
                           Why the fields are redundant
                           ----------------------------
      The Loss Event Rate p and the Receive Rate x_recv are initially 0 when first 
      loading CCID-3, as ccid_new() zeroes out the entire ccid3_hc_tx_sock. 
      
      When Loss Event Rate or Receive Rate options are received, they are stored by
      ccid3_hc_tx_parse_options() into the fields `ccid3or_loss_event_rate' and
      `ccid3or_receive_rate' of the sub-struct `options_received' in ccid3_hc_tx_sock.
      
      After parsing (considering only the established state - dccp_rcv_established()),
      the packet is passed on to ccid_hc_tx_packet_recv(). This calls the CCID-3
      specific routine ccid3_hc_tx_packet_recv(), which performs the following copy
      operations between fields of ccid3_hc_tx_sock:
      
       * hctx->options_received.ccid3or_receive_rate is copied into hctx->x_recv,
         after scaling it for fixpoint arithmetic, by 2^64;
       * hctx->options_received.ccid3or_loss_event_rate is copied into hctx->p,
         considering the above special cases; in addition, a value of 0 here needs to
         be mapped into p=0 (when no Loss Event Rate option has been received yet).
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      ce177ae2
    • Gerrit Renker's avatar
      dccp tfrc/ccid-3: Computing Loss Rate from Loss Event Rate · 535c55df
      Gerrit Renker authored
      This adds a function to take care of the following cases occurring in the
      computation of the Loss Rate p:
      
       * 1/(2^32-1) is mapped into 0% as per RFC 4342, 8.5;
       * 1/0        is mapped into the maximum of 100%;
       * we want to avoid that p = 1/x is rounded down to 0 when x is very large,
         since this means accidentally re-entering slow-start (indicated by p==0).
      
      In the last case, the minimum-resolution value of p is returned.
      
      Furthermore, a bug in ccid3_hc_rx_getsockopt is fixed (1/0 was mapped into ~0U),
      which now allows to consistently print the scaled p-values as
      
              printf("Loss Event Rate = %u.%04u %%\n", rx_info.tfrcrx_p / 10000, 
                                                       rx_info.tfrcrx_p % 10000);
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      535c55df
    • Gerrit Renker's avatar
      dccp: Add packet type information to CCID-specific option parsing · 3306c781
      Gerrit Renker authored
      This patch ...
       1. adds packet type information to ccid_hc_{rx,tx}_parse_options(). This is 
          necessary, since table 3 in RFC 4340, 5.8 leaves it to the CCIDs to state
          which options may (not) appear on what packet type.
       
       2. adds such a check for CCID-3's {Loss Event, Receive} Rate as specified in
          RFC 4340 8.3 ("Receive Rate options MUST NOT be sent on DCCP-Data packets")
          and 8.5 ("Loss Event Rate options MUST NOT be sent on DCCP-Data packets").
      
       3. removes an unused argument `idx' from ccid_hc_{rx,tx}_parse_options(). This
          is also no longer necessary, since the CCID-specific option-parsing routines
          are passed every single parameter of the type-length-value option encoding.
      
      Also added documentation and made argument naming scheme consistent.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      3306c781
    • Gerrit Renker's avatar
      dccp ccid-3: Simplify and consolidate tx_parse_options · 47a61e7b
      Gerrit Renker authored
      This simplifies and consolidates the TX option-parsing code:
      
       1. The Loss Intervals option is not currently used, so dead code related to
          this option is removed. I am aware of no plans to support the option, but
          if someone wants to implement it (e.g. for inter-op tests), it is better
          to start afresh than having to also update currently unused code.
      
       2. The Loss Event and Receive Rate options have a lot of code in common (both
          are 32 bit, both have same length etc.), so this is consolidated.
      
       3. The test against GSR is not necessary, because
          - on first loading CCID3, ccid_new() zeroes out all fields in the socket; 
          - ccid3_hc_tx_packet_recv() treats 0 and ~0U equivalently, due to
      
      	pinv = opt_recv->ccid3or_loss_event_rate;
      	if (pinv == ~0U || pinv == 0)
      		hctx->p = 0;
      
          - as a result, the sequence number field is removed from opt_recv.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      47a61e7b
    • Gerrit Renker's avatar
      dccp ccid-3: Remove ugly RTT-sampling history lookup · 63b3a73b
      Gerrit Renker authored
      This removes the RTT-sampling function tfrc_tx_hist_rtt(), since
      
       1. it suffered from complex passing of return values (the return value both
          indicated successful lookup while the value doubled as RTT sample);
      
       2. when for some odd reason the sample value equalled 0, this triggered a bug
          warning about "bogus Ack", due to the ambiguity of the return value;
      
       3. on a passive host which has not sent anything the TX history is empty and
          thus will lead to unwanted "bogus Ack" warnings such as
          ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-28197148
          ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-26641606.
      
      The fix is to replace the implicit encoding by performing the steps manually.					       
      
      Furthermore, the "bogus Ack" warning has been removed, since it can actually be
      triggered due to several reasons (network reordering, old packet, (3) above),
      hence it is not very useful.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      63b3a73b
    • Gerrit Renker's avatar
      dccp ccid-3: Bug fix for the inter-packet scheduling algorithm · de6f2b59
      Gerrit Renker authored
      This fixes a subtle bug in the calculation of the inter-packet gap and shows
      that t_delta, as it is currently used, is not needed. And hence replaced.
      
      The algorithm from RFC 3448, 4.6 below continually computes a send time t_nom,
      which is initialised with the current time t_now; t_gran = 1E6 / HZ specifies
      the scheduling granularity, s the packet size, and X the sending rate:
      
        t_distance = t_nom - t_now;		// in microseconds
        t_delta    = min(t_ipi, t_gran) / 2;	// `delta' parameter in microseconds
      
        if (t_distance >= t_delta) {
      	reschedule after (t_distance / 1000) milliseconds;
        } else {
        	t_ipi  = s / X;			// inter-packet interval in usec
      	t_nom += t_ipi;			// compute the next send time
      	send packet now;
        }
      
      
      1) Description of the bug
      -------------------------
      Rescheduling requires a conversion into milliseconds, due to this call chain:
      
       * ccid3_hc_tx_send_packet() returns a timeout in milliseconds,
       * this value is converted by msecs_to_jiffies() in dccp_write_xmit(),
       * and finally used as jiffy-expires-value for sk_reset_timer().
      
      The highest jiffy resolution with HZ=1000 is 1 millisecond, so using a higher
      granularity does not make much sense here.
      
      As a consequence, values of t_distance < 1000 are truncated to 0. This issue 
      has so far been resolved by using instead
      
        if (t_distance >= t_delta + 1000)
      	reschedule after (t_distance / 1000) milliseconds;
      
      The bug is in artificially inflating t_delta to t_delta' = t_delta + 1000. This
      is unnecessarily large, a more adequate value is t_delta' = max(t_delta, 1000).
      
      
      2) Consequences of using the corrected t_delta'
      -----------------------------------------------
      Since t_delta <= t_gran/2 = 10^6/(2*HZ), we have t_delta <= 1000 as long as
      HZ >= 500. This means that t_delta' = max(1000, t_delta) is constant at 1000.
      
      On the other hand, when using a coarse HZ value of HZ < 500, we have three
      sub-cases that can all be reduced to using another constant of t_gran/2.
      
       (a) The first case arises when t_ipi > t_gran. Here t_delta' is the constant
           t_delta' = max(1000, t_gran/2) = t_gran/2.
      
       (b) If t_ipi <= 2000 < t_gran = 10^6/HZ usec, then t_delta = t_ipi/2 <= 1000,
           so that t_delta' = max(1000, t_delta) = 1000 < t_gran/2. 
      
       (c) If 2000 < t_ipi <= t_gran, we have t_delta' = max(t_delta, 1000) = t_ipi/2.
      
      In the second and third cases we have delay values less than t_gran/2, which is
      in the order of less than or equal to half a jiffy. 
      
      How these are treated depends on how fractions of a jiffy are handled: they
      are either always rounded down to 0, or always rounded up to 1 jiffy (assuming
      non-zero values). In both cases the error is on average in the order of 50%.
      
      Thus we are not increasing the error when in the second/third case we replace
      a value less than t_gran/2 with 0, by setting t_delta' to the constant t_gran/2.
      
      
      3) Summary
      ----------
      Fixing (1) and considering (2), the patch replaces t_delta with a constant,
      whose value depends on CONFIG_HZ, changing the above algorithm to:
       
        if (t_distance >= t_delta')
      	reschedule after (t_distance / 1000) milliseconds;
      
      where t_delta' = 10^6/(2*HZ) if HZ < 500, and t_delta' = 1000 otherwise.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      de6f2b59
    • Gerrit Renker's avatar
      dccp ccid-3: No more CCID control blocks in LISTEN state · b2e317f4
      Gerrit Renker authored
      The CCIDs are activated as last of the features, at the end of the handshake,
      were the LISTEN state of the master socket is inherited into the server
      state of the child socket. Thus, the only states visible to CCIDs now are
      OPEN/PARTOPEN, and the closing states.
      
      This allows to remove tests which were previously necessary to protect
      against referencing a socket in the listening state (in CCID3), but which
      now have become redundant.
      
      As a further byproduct of enabling the CCIDs only after the connection has been
      fully established, several typecast-initialisations of ccid3_hc_{rx,tx}_sock
      can now be eliminated:
       * the CCID is loaded, so it is not necessary to test if it is NULL,
       * if it is possible to load a CCID and leave the private area NULL, then this
          is a bug, which should crash loudly - and earlier,
       * the test for state==OPEN || state==PARTOPEN now reduces only to the closing
         phase (e.g. when the node has received an unexpected Reset).		  
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      b2e317f4
    • Gerrit Renker's avatar
      dccp ccid-3: Remove ccid3hc{tx,rx}_ prefixes · 842d1ef1
      Gerrit Renker authored
      This patch does the same for CCID-3 as the previous patch for CCID-2:
      
              s#ccid3hctx_##g;
              s#ccid3hcrx_##g;
      
      plus manual editing to retain consistency.
      
      Please note: expanded the fields of the `struct tfrc_tx_info' in the hc_tx_sock,
      since using short #define identifiers is not a good idea. The only place where
      this embedded struct was used is ccid3_hc_tx_getsockopt().
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      842d1ef1
    • Gerrit Renker's avatar
      dccp ccid-2: Remove ccid2hc{tx,rx}_ prefixes · 1fb87509
      Gerrit Renker authored
      This patch fixes two problems caused by the ubiquitous long "hctx->ccid2htx_"
      and "hcrx->ccid2hcrx_" prefixes:
       * code becomes hard to read;
       * multiple-line statements are almost inevitable even for simple expressions;
      The prefixes are not really necessary (compare with "struct tcp_sock").
      
      There had been previous discussion of this on dccp@vger, but so far this was
      not followed up (most people agreed that the prefixes are too long). 
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: default avatarLeandro Melo de Sales <leandroal@gmail.com>
      1fb87509
    • Gerrit Renker's avatar
      dccp: Special case of the MPS for client-PARTOPEN with DataAcks · 88ddac51
      Gerrit Renker authored
      To increase robustness, it is necessary to resend Confirm feature-negotiation
      options, even though the RFC does not mandate it. But feature negotiation
      options can take (much) more room than the options on common DataAck packets.
      
      Instead of reducing the MPS always for a case which only applies to the three
      messages send during initial handshake, this patch devises a special case:
      
         if the payload length of the DataAck in PARTOPEN is too large, an Ack is sent
         to carry the options, and the feature-negotiation list is then flushed.
      
         This means that the server gets two Acks for one Response. If both Acks get
         lost, it is probably better to restart the connection anyway and devising yet
         another special-case does not seem worth the extra complexity.
      
      The patch (over-)estimates the expected overhead to be 32*4 bytes -- commonly
      seen values were 20-90 bytes for initial feature-negotiation options. 
      
      It uses sizeof(u32) to mean "aligned units of 4 bytes". For consistency,
      another use of sizeof is modified.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      88ddac51
    • Gerrit Renker's avatar
      dccp: Leave headroom for options when calculating the MPS · 55ebe3ab
      Gerrit Renker authored
      The Maximum Packet Size (MPS) is of interest for applications which want
      to transfer data, so it is only relevant to the data transfer phase of a
      connection (unless one wants to send data on the DCCP-Request, but that is
      not considered here).
      
      The strategy chosen to deal with this requirement is to leave room for only 
      such options that may appear on data packets.
      
      A special consideration applies to Ack Vectors: this is purely guesswork,
      since these can have any length between 3 and 1020 bytes. The strategy
      chosen here is to subtract a configurable minimum, the value of 16 bytes
      (2 bytes for type/length plus 14 Ack Vector cells) has been found by 
      experimentatation. If people experience this as too much or too little,
      this could later be turned into a Kconfig option.	
      
      There are currently no CCID-specific header options which may appear on data
      packets, hence it is not necessary to define a corresponding CCID field.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      55ebe3ab
    • Gerrit Renker's avatar
      dccp ccid-2: Use feature-negotiation to report Ack Ratio changes · 2faae558
      Gerrit Renker authored
      This uses the new feature-negotiation framework to signal Ack Ratio changes,
      as required by RFC 4341, sec. 6.1.2.
      
      This raises some problems for CCID-2 since it can at the moment not cope
      gracefully with Ack Ratio of e.g. 2. A FIXME has thus been added which
      reverts to the existing policy of bypassing the Ack Ratio sysctl.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      2faae558
    • Gerrit Renker's avatar
      dccp: Support for exchanging of NN options in established state · 4861a354
      Gerrit Renker authored
      This patch provides support for the reception of NN options in (PART)OPEN state. 
      
      It is a combination of change_recv() and confirm_recv(), specifically geared
      towards receiving the `fast-path' NN options.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      4861a354
    • Gerrit Renker's avatar
      dccp: Support for the exchange of NN options in established state · 624a965a
      Gerrit Renker authored
      In contrast to static feature negotiation at the begin of a connection, which
      establishes the capabilities of both endpoints, this patch introduces support
      for dynamic exchange of feature negotiation options.
      
      Such a dynamic exchange is necessary in at least two cases:
       * CCID-2's Ack Ratio (RFC 4341, 6.1.2) which changes during the connection;
       * Sequence Window values that, as per RFC 4340, 7.5.2, should be sent "as
         as the connection progresses".
      
      Both are NN (non-negotiable) features. Hence dynamic feature "negotiation" is
      distinguished from static/pre-connection negotiation by the following:
       * no new capabilities are negotiated (those that matter for the connection
         are negotiated prior to setting up the connection, comparable to SIP);
       * features must be understood by each endpoint: as per RFC 4340, 6.4, 
         Sequence Window is "Req'd" and Ack Ratio must be understood when CCID-2
         is used as per the note underneath Table 4.
      
      These characteristics are reflected in the implementation:
       * only NN options can be exchanged after connection setup;
       * NN options are activated directly after validating them. The rationale is
         that a peer must accept every valid NN value (RFC 4340, 6.3.2), hence it
         will either accept the value and send a "Confirm R", or it will send an
         empty Confirm (which will reset the connection according to FN rules). 
       * An Ack is scheduled directly after activation to accelerate communicating
         the update to the peer.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      624a965a
    • Gerrit Renker's avatar
      dccp: Debugging functions for feature negotiation · 76f738a7
      Gerrit Renker authored
      Since all feature-negotiation processing now takes place in feat.c, functions
      for producing verbose debugging output are concentrated there.
      
      New functions to print out values, entry records, and options are provided,
      and also a macro is defined to not always have the function name in the
      output line.
      
      Thanks a lot to Wei Yongjun and Giuseppe Galeota for help with errors in an
      earlier revision of this patch.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      76f738a7
    • Gerrit Renker's avatar
      dccp: Initialisation and type-checking of feature sysctls · 0a482267
      Gerrit Renker authored
      This patch takes care of initialising and type-checking sysctls related to
      feature negotiation. Type checking is important since some of the sysctls
      now directly act on the feature-negotiation process.
      
      The sysctls are initialised with the known default values for each feature.
      For the type-checking the value constraints from RFC 4340 are used:
      
       * Sequence Window uses the specified Wmin=32, the maximum is ulong (4 bytes),
         tested and confirmed that it works up to 4294967295 - for Gbps speed;
       * Ack Ratio is between 0 .. 0xffff (2-byte unsigned integer);
       * CCIDs are between 0 .. 255;
       * request_retries, retries1, retries2 also between 0..255 for good measure;
       * tx_qlen is checked to be non-negative;
       * sync_ratelimit remains as before.
      
      Further changes:
      ----------------
      Performed s@sysctl_dccp_feat@sysctl_dccp@g since the sysctls are now in feat.c.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      0a482267
    • Gerrit Renker's avatar
      dccp: Implement both feature-local and feature-remote Sequence Window feature · 51c7d4fa
      Gerrit Renker authored
      This adds full support for local/remote Sequence Window feature, from which the 
        * sequence-number-validity (W) and 
        * acknowledgment-number-validity (W') windows 
      derive as specified in RFC 4340, 7.5.3. 
      
      Specifically, the following changes are introduced:
        * integrated new socket fields into dccp_sk;
        * updated the update_gsr/gss routines with regard to these fields;
        * updated handler code: the Sequence Window feature is located at the TX side,
          so the local feature is meant if the handler-rx flag is false;
        * the initialisation of `rcv_wnd' in reqsk is removed, since
          - rcv_wnd is not used by the code anywhere;
          - sequence number checks are not done in the LISTEN state (cf. 7.5.3);
          - dccp_check_req checks the Ack number validity more rigorously;
        * the `struct dccp_minisock' became empty and is now removed.
      
      Until the handshake completes with activating negotiated values, the local/remote
      Sequence-Window values are undefined and thus can not reliably be estimated.
      This issue is addressed in a separate patch.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      51c7d4fa
    • Gerrit Renker's avatar
      dccp: Auto-load (when supported) CCID plugins for negotiation · 09856c10
      Gerrit Renker authored
      This adds auto-loading of CCIDs (when module loading is enabled) 
      for the purpose of feature negotiation. 
      
      The problem with loading the CCIDs at the end of feature negotiation is
      that this would happen in software interrupt context. Besides, if the host
      advertises CCIDs during negotiation, it should have them ready to use, in
      case an agreeing peer wants to use it for the connection.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      09856c10
    • Gerrit Renker's avatar
      dccp: Initialisation framework for feature negotiation · 5d3dac26
      Gerrit Renker authored
      This initialises feature negotiation from two tables, which are initialised
      from sysctls. 
      
      As a novel feature, specifics of the implementation (e.g. currently short
      seqnos and ECN are not supported) are advertised for robustness.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      5d3dac26
    • Gerrit Renker's avatar
      dccp ccid-2: Phase out the use of boolean Ack Vector sysctl · b235dc4a
      Gerrit Renker authored
      This removes the use of the sysctl and the minisock variable for the Send Ack
      Vector feature, which is now handled fully dynamically via feature negotiation;
      i.e. when CCID2 is enabled, Ack Vectors are automatically enabled (as per
      RFC 4341, 4.).
      
      Using a sysctl in parallel to this implementation would open the door to
      crashes, since much of the code relies on tests of the boolean minisock /
      sysctl variable. Thus, this patch replaces all tests of type
      
      	if (dccp_msk(sk)->dccpms_send_ack_vector)
      		/* ... */
      with
      	if (dp->dccps_hc_rx_ackvec != NULL)
      		/* ... */
      
      The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
      negotiation concluded that Ack Vectors are to be used on the half-connection.
      Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
      so that the test is a valid one.
      
      The activation handler for Ack Vectors is called as soon as the feature
      negotiation has concluded at the
       * server when the Ack marking the transition RESPOND => OPEN arrives;
       * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.
      
      Adding the sequence number of the Response packet to the Ack Vector has been 
      removed, since
       (a) connection establishment implies that the Response has been received;
       (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
           this entry will always be ignored;
       (c) it can not be used for anything useful - to detect loss for instance, only
           packets received after the loss can serve as pseudo-dupacks.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      b235dc4a
    • Gerrit Renker's avatar
      dccp: Remove manual influence on NDP Count feature · 68e074bf
      Gerrit Renker authored
      Updating the NDP count feature is handled automatically now:
       * for CCID-2 it is disabled, since the code does not use NDP counts;
       * for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.
      
      Allowing the user to change NDP values leads to unpredictable and failing
      behaviour, since it is then possible to disable NDP counts even when they
      are needed (e.g. in CCID-3).
      
      This means that only those user settings are sensible that agree with the
      values for Send NDP Count implied by the choice of CCID. But those settings
      are already activated by the feature negotiation (CCID dependency tracking),
      hence this form of support is redundant.
      
      At startup the initialisation of the NDP count feature is with the default
      value of 0, which is done implicitly by the zeroing-out of the socket when
      it is allocated. If the choice of CCID or feature negotiation enables NDP
      count, this will then be updated via the NDP activation handler.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      68e074bf
    • Gerrit Renker's avatar
      dccp: Remove obsolete parts of the old CCID interface · 78673e24
      Gerrit Renker authored
      The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector
      case, their value equals initially that of the sysctl, but at the end of
      feature negotiation may be something different.
      
      The old interface removed by this patch thus has been replaced by the newer
      interface to dynamically query the currently loaded CCIDs earlier in this
      patch set.
      
      Also removed the constructors for the TX CCID and the RX CCID, since the
      switch rx/non-rx is done by the handler in minisocks.c (and the handler is
      the only place in the code where CCIDs are loaded).
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      78673e24
    • Gerrit Renker's avatar
      dccp: Clean up old feature-negotiation infrastructure · 23479cbf
      Gerrit Renker authored
      The code removed by this patch is no longer referenced or used, the added
      lines update documentation and copyrights.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      23479cbf
    • Gerrit Renker's avatar
      dccp: Integration of dynamic feature activation - part 3 (client side) · c49b2272
      Gerrit Renker authored
      This integrates feature-activation in the client, with these details:
      
       1. When dccp_parse_options() fails, the reset code is already set, request_sent
          _state_process() currently overrides this with `Packet Error', which is not
          intended - so changed to use the reset code set in dccp_parse_options();
      
       2. There was a FIXME to change the error code when dccp_ackvec_add() fails.
          I have looked this up and found that: 
          * the check whether ackno < ISN is already made earlier,
          * this Response is likely the 1st packet with an Ackno that the client gets,
          * so when dccp_ackvec_add() fails, the reason is likely not a packet error.
      
       3. When feature negotiation fails, the socket should be marked as not usable,
          so that the application is notified that an error occurs. This is achieved
          by a new label, which uses an error code of `Aborted' and which sets the
          socket state to CLOSED, as well as sk_err.
      
       4. Avoids parsing the Ack twice in Respond state by not doing option processing
          again in dccp_rcv_respond_partopen_state_process (as option processing has
          already been done on the request_sock in dccp_check_req).    
      
      Since this addresses congestion-control initialisation, a corresponding
      FIXME has been removed.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      c49b2272
    • Gerrit Renker's avatar
      dccp: Integration of dynamic feature activation - part 2 (server side) · e70cacb9
      Gerrit Renker authored
      This patch integrates the activation of features at the end of negotiation
      into the server-side code.
      
      Note: 
        In dccp_create_openreq_child the request_sock argument is no longer constant,
        since dccp_activate_values() uses the feature-negotiation list on dreq to sort
        out the initialisation values for the different features of the child socket;
        and purges this queue after use (but the `req' argument to openreq_child
        can and does still remain constant).
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      e70cacb9
    • Gerrit Renker's avatar
      dccp: Integration of dynamic feature activation - part 1 (socket setup) · 3a53a9ad
      Gerrit Renker authored
      This first patch out of three replaces the hardcoded default settings with
      initialisation code for the dynamic feature negotiation.
      
      Note on retransmitting Confirm options:
      ---------------------------------------
      This patch also defers flushing the client feature-negotiation queue,
      due to the following considerations.
      
      As long as the client is in PARTOPEN, it needs to retransmit the Confirm
      options for the Change options received on the DCCP-Response from the server.
      
      Otherwise, if the packet containing the Confirm options gets dropped in the 
      network, the connection aborts due to undefined feature negotiation state.
      
      Thanks to Leandro Melo de Sales who reported a bug in an earlier revision
      of the patch set, resulting from not retransmitting the Confirm options.
      
      The patch now ensures that the client feature-negotiation queue is flushed only
      when entering the OPEN state. Since confirmed Change options are removed as
      soon as they are confirmed (in the DCCP-Response), this ensures that Confirm
      options are retransmitted.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      3a53a9ad
    • Gerrit Renker's avatar
      dccp: Feature activation handlers · c926c6ae
      Gerrit Renker authored
      This patch provides the post-processing of feature negotiation state, after
      the negotiation has completed.
      
      To this purpose, handlers are used and added to the dccp_feat_table. Each
      handler is passed a boolean flag whether the RX or TX side of the feature
      is meant.
      
      Several handlers are provided already, new handlers can easily be added.
      
      The initialisation is now fully dynamic, i.e. CCIDs are activated only
      after the feature negotiation. The integration of this dynamic activation
      is done in the subsequent patches.
      
      Thanks to Wei Yongjun for pointing out the necessity of skipping over empty
      Confirm options while copying the negotiated feature values.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      c926c6ae
    • Gerrit Renker's avatar
      dccp: Processing Confirm options · d2150b7b
      Gerrit Renker authored
      Analogous to the previous patch, this adds code to interpret incoming Confirm
      feature-negotiation options. Both functions operate on the feature-negotiation
      list of either the request_sock (server) or the dccp_sock (client).
      
      Thanks to Wei Yongjun for pointing out that it is overly restrictive to check
      the entire list of confirmed SP values.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      d2150b7b
    • Gerrit Renker's avatar
      dccp: Process incoming Change feature-negotiation options · 5a146b97
      Gerrit Renker authored
      This adds/replaces code for processing incoming ChangeL/R options.
      The main difference is that:
       * mandatory FN options are now interpreted inside the function
        (there are too many individual cases to do this externally);
       * the function returns an appropriate Reset code or 0,
         which is then used to fill in the data for the Reset packet.
      
      Old code, which is no longer used or referenced, has been removed.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      5a146b97
    • Gerrit Renker's avatar
      dccp: Preference list reconciliation · c664d4f4
      Gerrit Renker authored
      This provides two functions to
       * reconcile preference lists (with appropriate return codes) and
       * reorder the preference list if successful reconciliation changed the
         preferred value.
      
      The patch also removes the old code for processing SP/NN Change options, since
      new code to process these is mostly there already; related references have been
      commented out.
      
      The code for processing Change options follows in the next patch.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      c664d4f4
    • Gerrit Renker's avatar
      dccp: Integrate feature-negotiation insertion code · f8a644c0
      Gerrit Renker authored
      The patch implements insertion of feature negotiation at the server (listening
      and request socket) and the client (connecting socket).
      
      In dccp_insert_options(), several statements have been grouped together now
      to achieve (I hope) better efficiency by reducing the number of tests each
      packet has to go through:
       - Ack Vectors are sent if the packet is neither a Data or a Request packet;
       - a previous issue is corrected - feature negotiation options are allowed
         on DataAck packets (5.8).
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      f8a644c0
    • Gerrit Renker's avatar
      dccp: Insert feature-negotiation options into skb · 0ef118a0
      Gerrit Renker authored
      This patch replaces the earlier insertion routine from options.c, so that
      code specific to feature negotiation can remain in feat.c. This is possible
      by calling a function already existing in options.c.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      0ef118a0
    • Gerrit Renker's avatar
      dccp: Header option insertion routine for feature-negotiation · cf9ddf73
      Gerrit Renker authored
      The patch extends existing code:
       * Confirm options divide into the confirmed value plus an optional preference
         list for SP values. Previously only the preference list was echoed for SP
         values, now the confirmed value is added as per RFC 4340, 6.1;
       * length and sanity checks are added to avoid illegal memory (or NULL) access.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      cf9ddf73
    • Gerrit Renker's avatar
      dccp: Support for Mandatory options · d0440ee6
      Gerrit Renker authored
      Support for Mandatory options is provided by this patch, which will
      be used by subsequent feature-negotiation patches.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      Acked-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d0440ee6
    • Gerrit Renker's avatar
      dccp: Increase the scope of variable-length htonl/ntohl functions · b9aaac1c
      Gerrit Renker authored
      This extends the scope of two available functions, encode|decode_value_var,
      to work up to 6 (8) bytes, to match maximum requirements in the RFC.
      
      These functions are going to be used both by general option processing and 
      feature negotiation code, hence declarations have been put into feat.h.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      Acked-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b9aaac1c
    • Gerrit Renker's avatar
      dccp: API to query the current TX/RX CCID · c8041e26
      Gerrit Renker authored
      This provides function to query the current TX/RX CCID dynamically, without
      reliance on the minisock value, using dynamic information available in the
      currently loaded CCID module.
      
      This query function is then used to 
       (a) provide the getsockopt part for getting/setting CCIDs via sockopts;
       (b) replace the current test for "which CCID is in use" in probe.c.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      c8041e26
    • Gerrit Renker's avatar
      dccp: Set per-connection CCIDs via socket options · fade756f
      Gerrit Renker authored
      With this patch, TX/RX CCIDs can now be changed on a per-connection basis, which
      overrides the defaults set by the global sysctl variables for TX/RX CCIDs.
      
      To make full use of this facility, the remaining patches of this patch set are
      needed, which track dependencies and activate negotiated feature values.
      
      Note on the maximum number of CCIDs that can be registered:
      -----------------------------------------------------------
      The maximum number of CCIDs that can be registered on the socket is constrained
      by the space in a Confirm/Change feature negotiation option. 
      
      The space in these in turn depends on the size of header options as defined
      in RFC 4340, 5.8. Since this is a recurring constant, it has been moved from
      ackvec.h into linux/dccp.h, clarifying its purpose.
      
      Relative to this size, the maximum number of CCID identifiers that can be 
      present in a Confirm option (which always consumes 1 byte more than a Change
      option, cf. 6.1) is 2 bytes less than the maximum TLV size: one for the
      CCID-feature-type and one for the selected value.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      fade756f
    • Gerrit Renker's avatar
      dccp: Tidy up setsockopt calls · 73bbe095
      Gerrit Renker authored
      This splits the setsockopt calls into two groups, depending on whether an
      integer argument (val) is required and whether routines being called do
      their own locking.
      
      Some options (such as setting the CCID) use u8 rather than int, so that for
      these the test with regard to integer-sizeof can not be used.
      
      The second switch-case statement now only has those statements which need
      locking and which make use of `val'.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      Acked-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Reviewed-by: default avatarEugene Teo <eugeneteo@kernel.sg>
      73bbe095
    • Gerrit Renker's avatar
      dccp: Deprecate Ack Ratio sysctl · 17c30b40
      Gerrit Renker authored
      This patch deprecates the Ack Ratio sysctl, since
       * Ack Ratio is entirely ignored by CCID-3 and CCID-4,
       * Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
       * even if it would work in CCID-2, there is no point for a user to change it:
         - Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
         - if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts 
           (since waiting for Acks which will never arrive in this window),
         - cwnd is not a user-configurable value.	
      
      The only reasonable place for Ack Ratio is to print it for debugging. It is
      planned to do this later on, as part of e.g. dccp_probe.
      
      With this patch Ack Ratio is now under full control of feature negotiation:
       * Ack Ratio is resolved as a dependency of the selected CCID;
       * if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
         the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
         Ratio 2 for both endpoints";
       * what happens then is part of another patch set, since it concerns the 
         dynamic update of Ack Ratio while the connection is in full flight.
      
      Thanks to Tomasz Grobelny for discussion leading up to this patch.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      17c30b40
    • Gerrit Renker's avatar
      dccp: Feature negotiation for minimum-checksum-coverage · 20f41eee
      Gerrit Renker authored
      This provides feature negotiation for server minimum checksum coverage
      which so far has been missing.
      
      Since sender/receiver coverage values range only from 0...15, their
      type has also been reduced in size from u16 to u4.
      
      Feature-negotiation options are now generated for both sender and receiver
      coverage, i.e. when the peer has `forgotten' to enable partial coverage
      then feature negotiation will automatically enable (negotiate) the partial
      coverage value for this connection.
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: default avatarIan McDonald <ian.mcdonald@jandi.co.nz>
      20f41eee