- 02 Jun, 2015 1 commit
-
-
Kirill Smelkov authored
e.g. In [1]: multiply.reduce((1<<30, 1<<30, 1<<30)) Out[1]: 0 instead of In [2]: (1<<30) * (1<<30) * (1<<30) Out[2]: 1237940039285380274899124224 In [3]: 1<<90 Out[3]: 1237940039285380274899124224 also multiply.reduce returns int64, instead of python int: In [4]: type( multiply.reduce([1,2,3]) ) Out[4]: numpy.int64 which also leads to overflow-related problems if we further compute with this value and other integers and results exceeds int64 - it becomes float: In [5]: idx0_stop = 18446744073709551615 In [6]: stride0 = numpy.int64(1) In [7]: byte0_stop = idx0_stop * stride0 In [8]: byte0_stop Out[8]: 1.8446744073709552e+19 and then it becomes a real problem for BigArray.__getitem__() wendelin.core/bigarray/__init__.py:326: RuntimeWarning: overflow encountered in long_scalars page0_min = min(byte0_start, byte0_stop+byte0_stride) // pagesize # TODO -> fileh.pagesize and then > vma0 = self._fileh.mmap(page0_min, page0_max-page0_min+1) E TypeError: integer argument expected, got float ~~~~ So just avoid multiple.reduce() and do our own mul() properly the same way sum() is builtin into python, and we avoid overflow-related problems.
-
- 03 Apr, 2015 2 commits
-
-
Kirill Smelkov authored
This shows how to first generate such arrays (in steps, as every transaction change should fit in memory), and then gather data from whole array using C/Fortran/etc code. It shows how to compute mean via NumPy's ndarray.mean() It also shows that e.g. ndarray.var() wants to create temporaries in size of original ndarray and that would fail, because it does not fit into RAM. ndarray.var() should not need to create such temporaries in principle - all it has to do is to first compute mean, and then compute sum (Xi - <X>)^2 in a loop. <X> is scalar, Xi - is just access to original array. ~~~~ So this also show NumPy can be incrementally improved to avoid creating such temporaries, and then it will work.
-
Kirill Smelkov authored
Like C bzero / memset & memcopy - but work on python buffers. We leverage NumPy for doing actual work, and this way NumPy becomes a depenency. Having NumPy as a dependency is ok - we'll for sure need it later as we are trying to build out-of-core ndarrays.
-