1. 06 Jul, 2016 1 commit
    • Kirill Smelkov's avatar
      Fix build for Python 3.5 · e6beab19
      Kirill Smelkov authored
      @kazuhiko reports that wendelin.core build is currently broken on Python 3.5.
      Indeed it was:
          In file included from bigfile/_bigfile.c:37:0:
          ./include/wendelin/compat_py2.h: In function ‘_PyThreadState_UncheckedGetx’:
          ./include/wendelin/compat_py2.h:66:28: warning: implicit declaration of function ‘_Py_atomic_load_relaxed’ [-Wimplicit-function-declaration]
               return (PyThreadState*)_Py_atomic_load_relaxed(&_PyThreadState_Current);
          ./include/wendelin/compat_py2.h:66:53: error: ‘_PyThreadState_Current’ undeclared (first use in this function)
               return (PyThreadState*)_Py_atomic_load_relaxed(&_PyThreadState_Current);
          ./include/wendelin/compat_py2.h:66:53: note: each undeclared identifier is reported only once for each function it appears in
          ./include/wendelin/compat_py2.h:67:1: warning: control reaches end of non-void function [-Wreturn-type]
      The story here is that in 3.5 they decided to remove direct access to
      _PyThreadState_Current and atomic implementations - because that might
      semantically conflict with other headers implementing atomics - and
      provide only access by function.
      Starting from Python 3.5.2rc1 the function to get current thread state
      without asserting it is !NULL - _PyThreadState_UncheckedGet() - was added:
      so for those python versions we can directly use it.
      After the fix wendelin.core tox tests pass under all python2.7, python3.4 and python3.5.
      More context here:
      Fixes: #1
  2. 01 Jul, 2016 1 commit
  3. 13 Jun, 2016 2 commits
  4. 15 Dec, 2015 1 commit
  5. 24 Sep, 2015 2 commits
    • Kirill Smelkov's avatar
      bigfile/zodb: Format #1 which is optimized for small changes · 13c0c17c
      Kirill Smelkov authored
      Our current approach is that each file block is represented by 1 zodb
      object, with block size being 2M. Even with trailing \0 trimming, which
      halves the overhead on average, DB size grows very fast if we do a lot
      of small appends or changes. So another format needs to be introduced
      which has lower overhead for storing small changes:
      In general, to represent BigFile as ZODB objects, each file block could
      be represented separately either as
          1) one ZODB object, or          (ZBlk0 - this what we have already)
          2) group of ZODB objects        (ZBlk1 - this is what we introduce)
      with top-level BTree directory #blk -> objects representing block.
      For "1" we have
          - low-overhead access time (only 1 object loaded from DB), but
          - high-overhead in terms of ZODB size (with FileStorage / ZEO, every change
            to a block causes it to be written into DB in full again)
      For "2" we have
          - low-overhead in terms of ZODB size (only part of a block is overwritten
            in DB on single change), but
          - high-overhead in terms of access time
            (several objects need to be loaded for 1 block)
      In general it is not possible to have low-overhead for both i) access-time, and
      ii) DB size, with approach where we do block objects representation /
      management on *client* side.
      On the other hand, if object management is moved to DB *server* side, it is
      possible to deduplicate them there and this way have low-overhead for both
      access-time and DB size with just client storing 1 object per file block. This
      will be our future approach after we teach NEO about object deduplication.
      As shown above in the last paragraph it is not possible to perform
      optimally on client side. Thus ZBlk1 should be only an intermediate
      solution until we move data management to DB server side, with main
      criteria for ZBlk1 to keep it simple.
      In this patch a simple scheme is used, where every block is divided into
      chunks organized via BTree. When a block part changes, only corresponding
      chunk is updated. Chunk size is chosen to be 4K which creates ~ 512
      fanout for 2M block.
      DB size after tests is changed as follows:
              bigfile     bigarray
      ZBlk0     24K       6200K
      ZBlk1     36K         36K
      ( slight size increase for bigfile tests is because of btree structures
        overhead )
      Time to run tests stays approximately the same.
      /cc @Tyagov, @klaus
    • Kirill Smelkov's avatar
      bigfile/zodb: Prepare to have several ZBlk formats · 70ea8573
      Kirill Smelkov authored
      - current ZBlk becomes format 0
      - write format can be selected via WENDELIN_CORE_ZBLK_FMT env var
      - upon writing a block we always make sure we write it in current write
        format - so if a block was previously written in one format, it could
        be changed on the next write.
      - tox is prepared to test all write formats (so far only ZBlk0 there).
      The reason is - in the next patch we'll introduce another format for
      blocks which is optimized for small changes.
  6. 06 Aug, 2015 1 commit
  7. 26 Jun, 2015 1 commit
  8. 28 May, 2015 1 commit
  9. 03 Apr, 2015 1 commit