Commit e44bd761 authored by Kirill Smelkov's avatar Kirill Smelkov

bigarray: Explicitly reject dtypes with object inside

From time to time people keep trying to use wendelin.core with
dtype=object arrays and get segfaults without anything in logs or
whatever else.

Wendelin.core does not support it, because in case of dtype=object elements are
really pointers and data for each object is stored in separate place in RAM
with different per-object size.

As we are memory-mapping arrays this won't work. It also does not
essentially work for numpy.memmap for the same reason:

    (z4+numpy) kirr@mini:~/src/wendelin$ dd if=/dev/zero of=zero.dat bs=128 count=1
    1+0 records in
    1+0 records out
    128 bytes copied, 0.000209873 s, 610 kB/s
    (z4+numpy) kirr@mini:~/src/wendelin$ dd if=/dev/urandom of=random.dat bs=128 count=1
    1+0 records in
    1+0 records out
    128 bytes copied, 0.000225726 s, 567 kB/s
    (z4+numpy) kirr@mini:~/src/wendelin$ ipython
    ...

    In [1]: import numpy as np

    In [2]: np.memmap('zero.dat', dtype=np.object)
    Out[2]:
    memmap([None, None, None, None, None, None, None, None, None, None, None,
           None, None, None, None, None], dtype=object)

    In [3]: np.memmap('random.dat', dtype=np.object)
    Out[3]: Segmentation fault

So let's clarify this to users via explicitly raising exception when
BigArray with non-appropriate dtype is trying to be created with
descriptive explanation also logged.

/reviewed-on !4
parent b5de3f1e
...@@ -76,7 +76,19 @@ class BigArray(object): ...@@ -76,7 +76,19 @@ class BigArray(object):
# __init__ part without fileh # __init__ part without fileh
def _init0(self, shape, dtype_, order): def _init0(self, shape, dtype_, order):
self._dtype = dtype(dtype_) _dtype = dtype(dtype_)
if _dtype.hasobject:
logging.warn("You tried to use dtype containing object (%r) with out-of-core array ..." % _dtype)
logging.warn("... wendelin.core does not support it, because in case of dtype=object elements are")
logging.warn("... really pointers and data for each object is stored in separate place in RAM")
logging.warn("... with different per-object size.")
logging.warn("... ")
logging.warn("... As out-of-core arrays are really memory-mapping of data in external storage")
logging.warn("... this won't work. It also does not essentially work with numpy.memmap() for the")
logging.warn("... same reason.")
raise TypeError("dtypes with object are not supported", _dtype)
self._dtype = _dtype
self._shape = shape self._shape = shape
self._order = order self._order = order
# TODO +offset ? # TODO +offset ?
......
...@@ -64,6 +64,19 @@ class BigFile_Data_RO(BigFile_Data): ...@@ -64,6 +64,19 @@ class BigFile_Data_RO(BigFile_Data):
PS = 2*1024*1024 # FIXME hardcoded, TODO -> ram.pagesize PS = 2*1024*1024 # FIXME hardcoded, TODO -> ram.pagesize
# make sure we don't let dtype with object to be used with BigArray
def test_bigarray_noobject():
Z = BigFile_Zero(PS)
Zh = Z.fileh_open()
# NOTE str & unicode are fixed-size types - if size is not explicitly given
# it will become S0 or U0
obj_dtypev = [numpy.object, 'O', 'i4, O', [('x', 'i4'), ('y', 'i4, O')]]
for dtype_ in obj_dtypev:
print dtype_
raises(TypeError, "BigArray((1,), dtype_, Zh)")
# basic ndarray-compatibility attributes of BigArray # basic ndarray-compatibility attributes of BigArray
def test_bigarray_basic(): def test_bigarray_basic():
Z = BigFile_Zero(PS) Z = BigFile_Zero(PS)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment