1. 28 Dec, 2015 1 commit
  2. 14 Dec, 2015 1 commit
  3. 10 Dec, 2015 2 commits
  4. 09 Dec, 2015 3 commits
  5. 20 Nov, 2015 12 commits
  6. 15 Nov, 2015 1 commit
    • Andrew Jeffery's avatar
      strgrp: Cache cosine popcounts to reduce computation · 4a439212
      Andrew Jeffery authored
      Neatly, this reduces some of the OpenMP-related noise in the
      implementation by removing the need for the include. The loop body of
      grp_for() is also clearer as a result of the change; the changes to the
      function interfaces motivated a re-think of the complexity here.
      4a439212
  7. 13 Nov, 2015 1 commit
  8. 12 Nov, 2015 1 commit
  9. 08 Nov, 2015 1 commit
    • Andrew Jeffery's avatar
      strgrp: Add cosine similarity filter · e95575c4
      Andrew Jeffery authored
      The cosine similarity measure[1] (O(m + n)) contributes a decent runtime
      reduction when used as a filter prior to execution of more expensive
      algorithms such as LCS[2] (O(m * n)).
      
      A private test set of 3500 strings was used to quantify the improvement.
      The shape of the test set is described by Python's Pandas module as:
      
          >>> frames.apply(len).describe()
          count    3500.000000
          mean       47.454286
          std        14.980197
          min        10.000000
          25%        37.000000
          50%        45.000000
          75%        61.000000
          max       109.000000
          dtype: float64
          >>>
      
      The tests were performed on a lightly loaded Lenovo X201s stocked with a
      Intel Core i7 L 640 @ 2.13GHz CPU with 3.7 GiB of memory. The test was
      compiled with GCC 4.9.3:
      
          $ gcc --version
          gcc (Gentoo 4.9.3 p1.0, pie-0.6.2) 4.9.3
          Copyright (C) 2015 Free Software Foundation, Inc.
          This is free software; see the source for copying conditions.  There is NO
          warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
      
      Using the test program outlined below, ten runs for input set sizes
      incrementing in batches of 500 were taken prior to filtering with cosine
      similarity:
      
          500:  0.61, 0.25, 0.08, 0.07, 0.07, 0.07, 0.09, 0.07, 0.07, 0.07
          1000: 0.33, 0.32, 0.34, 0.32, 0.32, 0.33, 0.32, 0.32, 0.34, 0.32
          1500: 0.72, 1.53, 0.72, 0.70, 0.72, 0.70, 0.72, 0.71, 1.46, 0.71
          2000: 1.22, 1.20, 1.22, 1.98, 1.20, 1.20, 1.22, 1.94, 1.20, 1.20
          2500: 1.97, 2.72, 1.94, 2.33, 2.44, 1.94, 2.82, 1.93, 1.94, 2.69
          3000: 2.69, 3.41, 2.66, 3.38, 2.67, 3.42, 2.63, 3.44, 2.65, 3.39
          3500: 4.18, 3.65, 4.21, 4.21, 3.56, 4.21, 4.16, 3.63, 4.20, 4.13
      
      After adding the cosine similarity filter the runtimes became:
      
          500:  0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02
          1000: 0.08, 0.07, 0.07, 0.07, 0.08, 0.07, 0.07, 0.08, 0.07, 0.07
          1500: 0.16, 0.16, 0.15, 0.16, 0.16, 0.15, 0.15, 0.15, 0.16, 0.16
          2000: 0.26, 0.26, 0.25, 0.26, 0.26, 0.26, 0.25, 0.26, 0.26, 0.26
          2500: 0.41, 0.41, 0.41, 0.40, 0.42, 0.42, 0.42, 0.41, 0.41, 0.41
          3000: 0.58, 0.56, 0.57, 0.56, 0.58, 0.57, 0.56, 0.56, 0.57, 0.55
          3500: 0.75, 0.74, 0.73, 0.74, 0.74, 0.73, 0.72, 0.75, 0.75, 0.75
      
      As such, on average the runtime improvements are:
      
          N       Avg Pre         Avg Post        Improvement (Pre / Post)
          500     0.145           0.02            7.25
          1000    0.326           0.073           4.47
          1500    0.869           0.156           5.57
          2000    1.358           0.258           5.26
          2500    2.272           0.412           5.51
          3000    3.034           0.566           5.36
          3500    4.014           0.74            5.42
      
      The test driver is as below, where both it and its dependencies were
      compiled with 'CFLAGS=-O2 -fopenmp' and linked with 'LDFLAGS=-fopenmp',
      'LDLIBS=-lm':
      
          $ cat test.c
          #include "config.h"
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          #include "ccan/strgrp/strgrp.h"
      
          int main(void) {
              FILE *f;
              char *buf;
              struct strgrp *ctx;
              f = fdopen(0, "r");
          #define BUF_SIZE 512
              buf = malloc(BUF_SIZE);
              ctx = strgrp_new(0.85);
              while(fgets(buf, BUF_SIZE, f)) {
                  buf[strcspn(buf, "\r\n")] = '\0';
                  if (!strgrp_add(ctx, buf, NULL)) {
                      printf("Failed to classify %s\n", buf);
                  }
              }
              strgrp_free(ctx);
              free(buf);
              fclose(f);
              return 0;
          }
      
      [1] https://en.wikipedia.org/wiki/Cosine_similarity
      [2] https://en.wikipedia.org/wiki/Longest_common_subsequence_problem
      e95575c4
  10. 03 Nov, 2015 2 commits
    • David Gibson's avatar
      aga: Error codes · afae9c82
      David Gibson authored
      Add an enum to record error codes for aga routines.  The current
      algorithms, dfs and bfs don't have any error conditions except those
      reported by callbacks.  So, for now, the only code is "no error", but this
      will be expanded in future.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      afae9c82
    • David Gibson's avatar
      aga: Fix initialization bug in aga_for_each_edge_info · 2ab26c62
      David Gibson authored
      The aga_for_each_edge_info macro is constructed so that if the edge_info
      callback returns an error, the for loop terminates early and leaves the
      _err parameter set to the error.  On successful completion of the loop,
      _err should be zero.
      
      However, on a node with no edges, _err will not be initialized, meaning
      that it could be non-zero even on successful (trivial) completion of the
      loop.  This fixes the bug.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      2ab26c62
  11. 02 Nov, 2015 1 commit
  12. 27 Oct, 2015 1 commit
  13. 25 Oct, 2015 8 commits
    • David Gibson's avatar
      lqueue: Streamline interface with TCON_CONTAINER · 42b7ab48
      David Gibson authored
      The interfaces the lqueue module currently has implement (partial) type
      safety in a somewhat clunky way - types and member names need to be passed
      to a number of entry points.
      
      This patch uses the new TCON_CONTAINER magic to stash the typing
      information into the declaration of the queue object, so it no longer needs
      to be explicitly passed to later calls.
      
      This does alter the lqueue interface incompatibly.  I think the module
      is young enough that we can reasonably do that.  One other module,
      'aga', was using lqueue, and this also includes fixes so that it still
      works.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      42b7ab48
    • David Gibson's avatar
      lstack: Streamline interface with TCON_CONTAINER · e47d78e9
      David Gibson authored
      The interfaces the lstack module currently has implement (partial) type
      safety in a somewhat clunk way - types and member names need to be passed
      to a number of entry points.
      
      This patch uses the new TCON_CONTAINER magic to stash the typing
      information into the declaration of the stack object, so it no longer needs
      to be explicitly passed to later calls.
      
      This does alter the lstack interface incompatibly.  I think the module
      is young enough that we can reasonably do that.  One other module,
      'aga', was using lstack, and this also includes fixes so that it still
      works.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      e47d78e9
    • David Gibson's avatar
      tcon: Encode information on container members in "type" canaries · 0ddea413
      David Gibson authored
      Add "container canaries" to tcon.  This allows information about a specific
      member of a container structure to be encoded with TCON or TCON_WRAP.  Once
      that's done, tcon_container_of() and tcon_member_of() can be used to
      translate between member and container pointers based on the canary
      information, without having to repeat the type and member details.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      0ddea413
    • David Gibson's avatar
      tcon: Encode integer values into "type" canaries · fd48aee6
      David Gibson authored
      Add the ability to encode, as well as types, integer values (must be
      positive, compile time constant and in the range of size_t) into TCON
      constructed "type" canaries.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      fd48aee6
    • David Gibson's avatar
      tcon: Add tcon_sizeof() helper · cc3db07e
      David Gibson authored
      Add a new tcon_sizeof() helper macro which returns the size of a type for
      which a canary has been constructed with TCON or TCON_WRAP.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      cc3db07e
    • David Gibson's avatar
      tcon: Add an alternate way of building type canaries · 43992de4
      David Gibson authored
      The tcon module allows you to add "type canaries" to a structures, which
      can be used for later typechecks.  The canaries are implemented using
      a flexible array member, to avoid them taking any actual space at runtime.
      That means the canaries must go at the end of your structure.
      
      That doesn't seem like a big limitation, except that it also means the
      structure containing the canaries must be at the end of any structure it
      is embedded in in turn, which is a rather more serious limitation.
      
      This patch adds a TCON_WRAP() macro which wraps a given type in a new type
      which also contains type canaries, and doesn't suffer the last member
      limitation.  The drawback is that if the wrapped type has smaller size than
      a pointer, the type canaries will pad the wrapped type out to the size of a
      pointer.
      
      By constructing the wrappers carefully, the existing tcon macros will work
      on both wrapper types constructed with TCON_WRAP and on structures
      explicitly including TCON type canaries.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      43992de4
    • David Gibson's avatar
      lstack, lqueue: Move from MODS_WITH_SRC to MODS_NO_SRC · fced03d3
      David Gibson authored
      The lstack and lqueue modules are entirely implemented in a single header.
      However, in Makefile-ccan they're listed in MODS_WITH_SRC, instead of
      MODS_NO_SRC.  This appears to be harmless, but this patch moves them to
      the correct category anyway.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      fced03d3
    • David Gibson's avatar
      ptrint: ptr2int and int2ptr are constant functions · 14addbab
      David Gibson authored
      By construction these functions depend only on their arguments, so declare
      them as CONST_FUNCTION using the helper from ccan/compiler.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      14addbab
  14. 21 Oct, 2015 2 commits
  15. 20 Oct, 2015 1 commit
  16. 18 Oct, 2015 2 commits