1. 02 Jun, 2015 1 commit
    • Kirill Smelkov's avatar
      *: It is not safe to use multiply.reduce() - it overflows · 73926487
      Kirill Smelkov authored
      e.g.
      
          In [1]: multiply.reduce((1<<30, 1<<30, 1<<30))
          Out[1]: 0
      
      instead of
      
          In [2]: (1<<30) * (1<<30) * (1<<30)
          Out[2]: 1237940039285380274899124224
      
          In [3]: 1<<90
          Out[3]: 1237940039285380274899124224
      
      also multiply.reduce returns int64, instead of python int:
      
          In [4]: type( multiply.reduce([1,2,3]) )
          Out[4]: numpy.int64
      
      which also leads to overflow-related problems if we further compute with
      this value and other integers and results exceeds int64 - it becomes
      float:
      
          In [5]: idx0_stop = 18446744073709551615
      
          In [6]: stride0   = numpy.int64(1)
      
          In [7]: byte0_stop = idx0_stop * stride0
      
          In [8]: byte0_stop
          Out[8]: 1.8446744073709552e+19
      
      and then it becomes a real problem for BigArray.__getitem__()
      
          wendelin.core/bigarray/__init__.py:326: RuntimeWarning: overflow encountered in long_scalars
            page0_min  = min(byte0_start, byte0_stop+byte0_stride) // pagesize # TODO -> fileh.pagesize
      
      and then
      
          >           vma0 = self._fileh.mmap(page0_min, page0_max-page0_min+1)
          E           TypeError: integer argument expected, got float
      
      ~~~~
      
      So just avoid multiple.reduce() and do our own mul() properly the same
      way sum() is builtin into python, and we avoid overflow-related
      problems.
      73926487
  2. 03 Apr, 2015 2 commits
    • Kirill Smelkov's avatar
      Demo program that shows how to work with ZBigArrays bigger than RAM in size · 1ee72371
      Kirill Smelkov authored
      This shows how to first generate such arrays (in steps, as every
      transaction change should fit in memory), and then gather data from
      whole array using C/Fortran/etc code.
      
      It shows how to compute mean via NumPy's ndarray.mean()
      
      It also shows that e.g. ndarray.var() wants to create temporaries in
      size of original ndarray and that would fail, because it does not fit
      into RAM.
      
      ndarray.var() should not need to create such temporaries in principle -
      all it has to do is to first compute mean, and then compute
      
          sum (Xi - <X>)^2
      
      in a loop.
      
      <X> is scalar, Xi - is just access to original array.
      
      ~~~~
      
      So this also show NumPy can be incrementally improved to avoid creating
      such temporaries, and then it will work.
      1ee72371
    • Kirill Smelkov's avatar
      lib/mem: Python utilities to zero, set & copy memory · 699b1375
      Kirill Smelkov authored
      Like C bzero / memset & memcopy - but work on python buffers. We
      leverage NumPy for doing actual work, and this way NumPy becomes a
      depenency.
      
      Having NumPy as a dependency is ok - we'll for sure need it later as we
      are trying to build out-of-core ndarrays.
      699b1375