Commit cbe4938b authored by Klaus Wölfel's avatar Klaus Wölfel

make sure to return view

parent 59535549
...@@ -72,7 +72,6 @@ class DataArrayViewLine(DataArray): ...@@ -72,7 +72,6 @@ class DataArrayViewLine(DataArray):
createExpressionContext(None, portal=getindex), dtype_expression)) createExpressionContext(None, portal=getindex), dtype_expression))
if name_list: if name_list:
dtype.names = name_list dtype.names = name_list
array_view = array_view.reshape(array_view.size,).view(dtype) return np.ndarray((array_view.shape[0],), dtype, zbigarray[:], 0, array_view.strides[0])
else: else:
array_view = array_view return array_view
return array_view
  • @kirr the previous way zbigarray[index].reshape().view() OOM killed zope on big arrays. So it seems it did not give back a view which was surprising to me because I expected a view for reshape. So I change to this way. Do you think it's ok ?

  • @klaus, I think something else is going here and array_view = zbigarray[index] gives a view but you got OOM due to some other problem. numpy.reshape says:

    reshaped_array : ndarray This will be a new view object if possible; otherwise, it will be a copy. Note there is no guarantee of the memory layout (C- or Fortran- contiguous) of the returned array.

    maybe it is the case, because reshape indeed copies data on non-trivial strides:

    In [22]: c
    Out[22]: 
    array([[10,  1,  2,  3,  4],
           [ 5,  6,  7,  8,  9]])
    
    In [23]: c.strides
    Out[23]: (40, 8)
    
    In [24]: c.strides = (30, 8)
    
    In [25]: c
    Out[25]: 
    array([[    10,      1,      2,      3,      4],
           [262144, 327680, 393216, 458752, 524288]])
    
    In [28]: cc = c.reshape(10)
    
    In [29]: c
    Out[29]: 
    array([[    10,      1,      2,      3,      4],
           [262144, 327680, 393216, 458752, 524288]])
    
    In [30]: cc
    Out[30]: 
    array([    10,      1,      2,      3,      4, 262144, 327680, 393216,
           458752, 524288])
    
    In [31]: c[0][0] = 100
    
    In [32]: c
    Out[32]: 
    array([[   100,      1,      2,      3,      4],
           [262144, 327680, 393216, 458752, 524288]])
    
    In [33]: cc
    Out[33]: 
    array([    10,      1,      2,      3,      4, 262144, 327680, 393216,
           458752, 524288])

    (note the difference in first element)

    reshape docs also says:

    Notes It is not always possible to change the shape of an array without copying the data. If you want an error to be raised when the data is copied, you should assign the new shape to the shape attribute of the array:

    So I suggest to switch to that way as working with strides directly is easy to get wrong and worst to corrupt memory.

    I suggest to change to .shape = ... so that if non-copy mode is possible we get the error and try to understand what is going on.

  • @kirr thanks for feedback. The context of this is to view a normal array as a structured array, so I must set shape and dtype at the same time, beause else:

    >>> a = np.ones((3,3))
    >>> a
    array([[ 1.,  1.,  1.],
           [ 1.,  1.,  1.],
           [ 1.,  1.,  1.]])
    >>> a.shape=(3,)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: cannot reshape array of size 9 into shape (3,)
  • @kirr sorry, it works of course:

    >>> a.shape=(a.size,)
    >>> a.view([('a', 'float64'),('b', 'float64'),('c', 'float64'),])
    array([( 1.,  1.,  1.), ( 1.,  1.,  1.), ( 1.,  1.,  1.)], 
          dtype=[('a', '<f8'), ('b', '<f8'), ('c', '<f8')])
  • Yes, I was goint to reply the same - first change dtype, then shape:

    In [14]: dtxyz = np.dtype([('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
    
    In [15]: a
    Out[15]: 
    array([[1., 1., 1.],
           [1., 1., 1.],
           [1., 1., 1.]])
    
    In [16]: a.view(dtype=dtxyz)
    Out[16]: 
    array([[(1., 1., 1.)],
           [(1., 1., 1.)],
           [(1., 1., 1.)]], dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
    
    In [17]: b = a.view(dtype=dtxyz)
    
    In [18]: b
    Out[18]: 
    array([[(1., 1., 1.)],
           [(1., 1., 1.)],
           [(1., 1., 1.)]], dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
    
    In [19]: b.shape
    Out[19]: (3, 1)
    
    In [20]: b.shape = (3,)
    
    In [21]: b
    Out[21]: 
    array([(1., 1., 1.), (1., 1., 1.), (1., 1., 1.)],
          dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8')])
  • (not exactly the same, but sitll)

  • @kirr I tried what you propose:

    self.log(dtype)
    self.log(array_view.dtype)
    self.log(array_view.shape)
    array_view_2 = array_view.view(dtype)
    array_view_2.shape = (array_view.size,)
    return array_view_2

    Then I get this error:

      File "<portal_components/document.erp5.DataArrayViewLine>", line 78, in getArray
        array_view_2 = array_view.view(dtype)
    ValueError: new type not compatible with array.

    new dtype and original shape of plain from log:

    ------
    2018-10-24 13:21:53,833 INFO getArray, 75: [] [('0', '<f8'), ('1', '<f8'), ('2,', '<f8'), ('3', '<f8'), ('4', '<f8'), ('5', '<f8'), ('6', '<f8'), ('7', '<f8'), ('8', '<f8'), ('9', '<f8'), ('10,, '<f8'), ('11', '<f8'), ('12', '<f8'), ('13', '<f8'), ('14', '<f8'), ('15', '<f8'), ('16', '<f8'), ('17', '<f8'), ('18', '<f8'), ('19', '<f8'), ('20', '<f8'), ('21', '<f8'), ('22', '<f8'), ('23', '<f8'), ('24', '<f8'), ('25', '<f8'), ('26', '<f8'), ('27', '<f8'), ('28', '<f8'), ('29', '<f8'), ('30', '<f8'), ('31', '<f8'), ('32', '<f8'), ('33', '<f8'), ('34', '<f8'), ('35', '<f8'), ('36, '<f8'), ('37', '<f8'), ('38', '<f8')]
    ------
    2018-10-24 13:21:53,837 INFO getArray, 76: [] float64
    ------
    2018-10-24 13:21:53,841 INFO getArray, 77: [] (1438671, 39)
    ------

    And if I do the other way round (first set shape, then set dtype) then zope gets OOM killed again.

  • @kirr here is same problem with plain numpy. It seems this way works for arrays but not for views created by indexing the arrays:

    >>> a=np.ones((3,3))
    >>> a
    array([[ 1.,  1.,  1.],
           [ 1.,  1.,  1.],
           [ 1.,  1.,  1.]])
    >>> b=a[:,:-1]
    >>> b
    array([[ 1.,  1.],
           [ 1.,  1.],
           [ 1.,  1.]])
    >>> dtxy = np.dtype([('x', '<f8'), ('y', '<f8')])
    >>> b.view(dtxy)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: new type not compatible with array.
  • @klaus, ok, if your way, as described in cbe4938b (comment 68181), works, - then it should be fine to use it. Or am I'm missing something and it does not work too?

  • @kirr the cbe4938b works. but cbe4938b (comment 68181) does not work because it raises OOM

  • ValueError: new type not compatible with array.

    @klaus, first of all sorry for delay with replying. I'm not sure what is going on on your side, but I've just tried (with correcting a bit in your dtype print):

    #!/usr/bin/env python
    from numpy import *
    
    d = [('0', '<f8'), ('1', '<f8'), ('2,', '<f8'), ('3', '<f8'), ('4', '<f8'), ('5', '<f8'), ('6', '<f8'), ('7', '<f8'), ('8', '<f8'), ('9', '<f8'), ('10', '<f8'), ('11', '<f8'), ('12', '<f8'), ('13', '<f8'), ('14', '<f8'), ('15', '<f8'), ('16', '<f8'), ('17', '<f8'), ('18', '<f8'), ('19', '<f8'), ('20', '<f8'), ('21', '<f8'), ('22', '<f8'), ('23', '<f8'), ('24', '<f8'), ('25', '<f8'), ('26', '<f8'), ('27', '<f8'), ('28', '<f8'), ('29', '<f8'), ('30', '<f8'), ('31', '<f8'), ('32', '<f8'), ('33', '<f8'), ('34', '<f8'), ('35', '<f8'), ('36', '<f8'), ('37', '<f8'), ('38', '<f8')]
    
    assert len(d) == 39
    a = array([[0]*39, [1]*39], float64)
    print 'a.shape:', a.shape
    
    b = a.view(d)
    print 'b.shape: %s\t # after .view' % (b.shape,)
    b.shape = (a.size / (dtype(d).itemsize / a.dtype.itemsize),)
    print 'b.shape: %s\t # after .shape =' % (b.shape,)
    print 'b:', b

    and it works:

    kirr@deco:~$ ./x.py 
    a.shape: (2, 39)
    b.shape: (2, 1)  # after .view
    b.shape: (2,)    # after .shape =
    b: [(0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.)
     (1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.)]

    was there any copy-paste error in cbe4938b (comment 68186) ?

    If it was not a copy-paste error, please log objects with their repr.

    I suggest we don't blindly "fix" it and understand the cause of the errror.

  • @kirr your code works, but it does not reflect what happens in Data Array View Line: Here is your code modified to reflect the Data Array View Line functionality:

    >>> from numpy import *
    >>> d = [('0', '<f8'), ('1', '<f8'), ('2,', '<f8'), ('3', '<f8'), ('4', '<f8'), ('5', '<f8'), ('6', '<f8'), ('7', '<f8'), ('8', '<f8'), ('9', '<f8'), ('10', '<f8'), ('11', '<f8'), ('12', '<f8'), ('13', '<f8'), ('14', '<f8'), ('15', '<f8'), ('16', '<f8'), ('17', '<f8'), ('18', '<f8'), ('19', '<f8'), ('20', '<f8'), ('21', '<f8'), ('22', '<f8'), ('23', '<f8'), ('24', '<f8'), ('25', '<f8'), ('26', '<f8'), ('27', '<f8'), ('28', '<f8'), ('29', '<f8'), ('30', '<f8'), ('31', '<f8'), ('32', '<f8'), ('33', '<f8'), ('34', '<f8'), ('35', '<f8'), ('36', '<f8'), ('37', '<f8'), ('38', '<f8')]
    >>> a = array([[0]*41, [1]*41], float64)
    >>> print 'a.shape:', a.shape
    a.shape: (2, 41)
    >>> a_slice = a[:,:-2]
    >>> print 'a_slice.shape:', a_slice.shape
    a_slice.shape: (2, 39)
    >>> b = a_slice.view(d)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: new type not compatible with array.

    You see ?

  • @klaus I see, thanks for pointing the :-2 thing. I'm currently flying back. I will look into this once again when back home. This topic is important to get right and uniform and I do not want to inspect it on mobile or in rush. Hope it can wait a day or two.

  • @kirr n/p

  • ok @klaus, I see the problem now (sorry before I was distracting all time and missed your cbe4938b (comment 68188) example).

    I propose we put the conversion code into one function and then use this function everywhere. It is important to keep the tricky details and the complexity in one place instead of scatterring it all over. Would something like:

    # restructure creates view of the array interpreting its minor axis as fully covered by dtype.
    #
    # The minor axis of the array must be C-contiguous and be fully covered by dtype in size.
    #
    # Restructure is similar to arr.view(dtype) + corresponding reshape, but does
    # not have limitations of ndarray.view(). For example:
    #
    # In [1]: a = np.arange(3*3, dtype=np.int32).reshape((3,3))
    #
    # In [2]: a
    # Out[2]:
    # array([[0, 1, 2],
    #        [3, 4, 5],
    #        [6, 7, 8]], dtype=int32)
    #
    # In [3]: b = a[:2,:2]
    #
    # In [4]: b
    # Out[4]:
    # array([[0, 1],
    #        [3, 4]], dtype=int32)
    #
    # In [5]: b.view(np.int64)
    # ---------------------------------------------------------------------------
    # ValueError                                Traceback (most recent call last)
    # <ipython-input-66-af98529aa150> in <module>()
    # ----> 1 b.view(np.int64)
    #
    # ValueError: To change to a dtype of a different size, the array must be C-contiguous
    #
    # In [6]: restructure(b, np.int64)
    # TODO
    #
    # restructure always creates view and never copies data.
    #
    # TODO tests
    def restructure(arr, dtype):
        dtype = np.dtype(dtype) # convenience
    
        atype = arr.dtype
        # m* denotes minor *
        maxis   = np.argmin(np.abs(arr.strides))
        mstride = arr.strides[maxis]
        if mstride < 0:
            raise ValueError("minor-axis is not C-contiguous: stride (%d) < 0" % mstride)
        if mstride != atype.itemsize:
            raise ValueError("minor-axis is not C-contiguous: stride (%d) != itemsize (%d)" % (mstride, atype.itemsize))
    
        # verify dtype fully covers whole minor axis
        mnelem = arr.shape[maxis]
        msize  = mnelem * atype.itemsize
        if dtype.itemsize != msize:
            raise ValueError("dtype.itemsize (%d) != sizeof(minor-axis) (%d)" % (dtype.itemsize, msize))
        
        # ok to go
        shape   = arr.shape[:maxis] + arr.shape[maxis+1:]
        stridev = arr.strides[:maxis] + arr.strides[maxis+1:]
        return np.ndarray.__new__(type(arr), shape, dtype, buffer(arr), 0, stridev)

    (kirr/wendelin.core@2569b175)

    work for you?

    It currently fails with

    /home/kirr/src/wendelin/wendelin.core/lib/xnumpy.py in restructure(arr, dtype)
         82     print 'stridev:', stridev
         83     #return np.ndarray.__new__(type(arr), shape, dtype, buffer(arr), 0, stridev)
    ---> 84     return np.ndarray(shape, dtype, buffer(arr), 0, stridev)
    
    TypeError: expected a single-segment buffer object

    but I will (hopefully) fix it after the weekend.

  • @klaus, the "single-segment buffer object" problem is gone: kirr/wendelin.core@fbebef94

    If you agree on the interface I will polish the patch (tests, deduplicate wrt ArrayRef etc).

  • @kirr thanks for the effort! I agree with the interface and with the direction of putting complicated code into reusable functions

  • @klaus, thanks for feedback. Ok, I finish the patch and will let you know.

  • mentioned in commit kirr/wendelin.core@d9eecd12

    Toggle commit list
  • @klaus, ok, I renamed restructure to structured and created nexedi/wendelin.core!10 (merged).

    I think structured(a) better tells that it creates a view, whereas restructure(a) might be interpreted sometimes as working on a directly. I hope this is ok. Please feel free to comment on naming and on other aspects of the MR.

  • mentioned in commit nexedi/wendelin.core@32ca80e2

    Toggle commit list
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment