1. 02 Feb, 2016 5 commits
    • David Gibson's avatar
      idtree: Fix undefined behaviour (left shift of signed value) · 36a15d7d
      David Gibson authored
      ~0 will be signed and negative on any 2s complement system, and
      left shifting a negative value has undefined behaviour in C.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      36a15d7d
    • David Gibson's avatar
      idtree: Fix misindented statement · fb33661a
      David Gibson authored
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      fb33661a
    • David Gibson's avatar
      idtree: Fix comparison is always false warning · 2b48199f
      David Gibson authored
      idtree.c:146 triggers a "comparison is always false" warning on some
      compiler configurations, since the 'id' variable is unsigned.
      
      Elsewhere in the module ids seem to be represented by (signed) ints, so
      use the same convention here, suppressing the warning and also maybe being
      more correct in other ways.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      2b48199f
    • David Gibson's avatar
      htable: Mark functions constructed by HTABLE_DEFINE_TYPE as UNNEEDED · 71b4e3ad
      David Gibson authored
      The HTABLE_DEFINE_TYPE macro builds a type-specific hash table by
      constructing a bunch of simple wrapper functions.  The user of the hash
      table may not end up using all of these.  With gcc the fact that the
      functions are inline stops an warnings about unused functions, but that's
      not the case with clang.
      
      Suppress these warnings by marking all the constructed functions except
      for name##_add() as UNNEEDED (using the macro from ccan/compiler).  _add
      is left alone on the grounds that a hash table you never add anything to
      isn't much use, so this will help you to spot an entirely redundant
      HTABLE_DEFINE_TYPE invocation.  *_init() would be a more obvious choice,
      except that there is both *_init() and *_init_sized() and you only need
      to use one of these.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      71b4e3ad
    • David Gibson's avatar
      strmap: Convert to using TCON_WRAP() instead of plain TCON() · ab244401
      David Gibson authored
      The usual way of construction strmap objects is to use the STRMAP_MEMBERS()
      macro which expands to both a raw strmap structure and a tcon type canary.
      However, the tcon type canary involves a flexible array member which means
      that in standard C99 STRMAP_MEMBERS() must appear only at the end of a
      structure definition.  But worse, that structure can then only appear at
      the end of any other structure it is included in, which is pretty
      inconvenient for the intended purpose of creating type specific strmaps.
      
      gcc extensions allow this to work (somehow), but clang complains loudly
      about it.
      
      The tcon module already includes the TCON_WRAP() mechanism, which already
      provides this same sort of type-specific definitions in a more general way.
      So convert strmap (and its users) to that approach.
      
      This removes STRMAP_MEMBERS() entirely, breaking compatibility.  I'm hoping
      strmap is used in few enough places that we can get away with that.
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      ab244401
  2. 29 Jan, 2016 2 commits
  3. 27 Jan, 2016 2 commits
  4. 26 Jan, 2016 2 commits
  5. 21 Jan, 2016 1 commit
  6. 19 Jan, 2016 3 commits
  7. 18 Jan, 2016 1 commit
  8. 05 Jan, 2016 1 commit
    • Dan Good's avatar
      deque: check HAVE_STATEMENT_EXPR, tweaks from feedback · 3022d16d
      Dan Good authored
      Thanks to the detailed feedback from David Gibson, I made the
      following improvements:
      
      * add missing includes
      * check for statement expression support, give an error if absent
      * ccanlint directive to skip "without features" steps
      * add license ref to top of source files
      * rename run1.c test to api1.c
      Signed-off-by: default avatarDan Good <dan@dancancode.com>
      3022d16d
  9. 28 Dec, 2015 1 commit
  10. 14 Dec, 2015 1 commit
  11. 10 Dec, 2015 2 commits
  12. 09 Dec, 2015 3 commits
  13. 20 Nov, 2015 12 commits
  14. 15 Nov, 2015 1 commit
    • Andrew Jeffery's avatar
      strgrp: Cache cosine popcounts to reduce computation · 4a439212
      Andrew Jeffery authored
      Neatly, this reduces some of the OpenMP-related noise in the
      implementation by removing the need for the include. The loop body of
      grp_for() is also clearer as a result of the change; the changes to the
      function interfaces motivated a re-think of the complexity here.
      4a439212
  15. 13 Nov, 2015 1 commit
  16. 12 Nov, 2015 1 commit
  17. 08 Nov, 2015 1 commit
    • Andrew Jeffery's avatar
      strgrp: Add cosine similarity filter · e95575c4
      Andrew Jeffery authored
      The cosine similarity measure[1] (O(m + n)) contributes a decent runtime
      reduction when used as a filter prior to execution of more expensive
      algorithms such as LCS[2] (O(m * n)).
      
      A private test set of 3500 strings was used to quantify the improvement.
      The shape of the test set is described by Python's Pandas module as:
      
          >>> frames.apply(len).describe()
          count    3500.000000
          mean       47.454286
          std        14.980197
          min        10.000000
          25%        37.000000
          50%        45.000000
          75%        61.000000
          max       109.000000
          dtype: float64
          >>>
      
      The tests were performed on a lightly loaded Lenovo X201s stocked with a
      Intel Core i7 L 640 @ 2.13GHz CPU with 3.7 GiB of memory. The test was
      compiled with GCC 4.9.3:
      
          $ gcc --version
          gcc (Gentoo 4.9.3 p1.0, pie-0.6.2) 4.9.3
          Copyright (C) 2015 Free Software Foundation, Inc.
          This is free software; see the source for copying conditions.  There is NO
          warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
      
      Using the test program outlined below, ten runs for input set sizes
      incrementing in batches of 500 were taken prior to filtering with cosine
      similarity:
      
          500:  0.61, 0.25, 0.08, 0.07, 0.07, 0.07, 0.09, 0.07, 0.07, 0.07
          1000: 0.33, 0.32, 0.34, 0.32, 0.32, 0.33, 0.32, 0.32, 0.34, 0.32
          1500: 0.72, 1.53, 0.72, 0.70, 0.72, 0.70, 0.72, 0.71, 1.46, 0.71
          2000: 1.22, 1.20, 1.22, 1.98, 1.20, 1.20, 1.22, 1.94, 1.20, 1.20
          2500: 1.97, 2.72, 1.94, 2.33, 2.44, 1.94, 2.82, 1.93, 1.94, 2.69
          3000: 2.69, 3.41, 2.66, 3.38, 2.67, 3.42, 2.63, 3.44, 2.65, 3.39
          3500: 4.18, 3.65, 4.21, 4.21, 3.56, 4.21, 4.16, 3.63, 4.20, 4.13
      
      After adding the cosine similarity filter the runtimes became:
      
          500:  0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02, 0.02
          1000: 0.08, 0.07, 0.07, 0.07, 0.08, 0.07, 0.07, 0.08, 0.07, 0.07
          1500: 0.16, 0.16, 0.15, 0.16, 0.16, 0.15, 0.15, 0.15, 0.16, 0.16
          2000: 0.26, 0.26, 0.25, 0.26, 0.26, 0.26, 0.25, 0.26, 0.26, 0.26
          2500: 0.41, 0.41, 0.41, 0.40, 0.42, 0.42, 0.42, 0.41, 0.41, 0.41
          3000: 0.58, 0.56, 0.57, 0.56, 0.58, 0.57, 0.56, 0.56, 0.57, 0.55
          3500: 0.75, 0.74, 0.73, 0.74, 0.74, 0.73, 0.72, 0.75, 0.75, 0.75
      
      As such, on average the runtime improvements are:
      
          N       Avg Pre         Avg Post        Improvement (Pre / Post)
          500     0.145           0.02            7.25
          1000    0.326           0.073           4.47
          1500    0.869           0.156           5.57
          2000    1.358           0.258           5.26
          2500    2.272           0.412           5.51
          3000    3.034           0.566           5.36
          3500    4.014           0.74            5.42
      
      The test driver is as below, where both it and its dependencies were
      compiled with 'CFLAGS=-O2 -fopenmp' and linked with 'LDFLAGS=-fopenmp',
      'LDLIBS=-lm':
      
          $ cat test.c
          #include "config.h"
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          #include "ccan/strgrp/strgrp.h"
      
          int main(void) {
              FILE *f;
              char *buf;
              struct strgrp *ctx;
              f = fdopen(0, "r");
          #define BUF_SIZE 512
              buf = malloc(BUF_SIZE);
              ctx = strgrp_new(0.85);
              while(fgets(buf, BUF_SIZE, f)) {
                  buf[strcspn(buf, "\r\n")] = '\0';
                  if (!strgrp_add(ctx, buf, NULL)) {
                      printf("Failed to classify %s\n", buf);
                  }
              }
              strgrp_free(ctx);
              free(buf);
              fclose(f);
              return 0;
          }
      
      [1] https://en.wikipedia.org/wiki/Cosine_similarity
      [2] https://en.wikipedia.org/wiki/Longest_common_subsequence_problem
      e95575c4