Commit 00db08d6 authored by Kirill Smelkov's avatar Kirill Smelkov

bigarray: Teach it how to automatically convert to ndarray (if enough address space is available)

BigArrays can be big - up to 2^64 bytes, and thus in general it is not
possible to represent whole BigArray as ndarray view, because address
space is usually smaller on 64bit architectures.

However users often try to pass BigArrays to numpy functions as-is, and
numpy finds a way to convert, or start converting, BigArray to ndarray -
via detecting it as a sequence, and extracting elements one-by-one.
Which is slooooow.

Because of the above, we provide users a well-defined service:
- if virtual address space is available - we succeed at creating ndarray
  view for whole BigArray, without delay and copying.
- if not - we report properly the error and give hint how BigArrays have
  to be processed in chunks.

Verifying that big BigArrays cannot be converted to ndarray also tests
for behaviour and issues fixed in last 5 patches.

/cc @Tyagov
/cc @klaus
parent 73926487
......@@ -37,6 +37,7 @@ of physical RAM.
from __future__ import print_function
from wendelin.lib.calc import mul
from numpy import ndarray, dtype, multiply, sign, newaxis
import logging
pagesize = 2*1024*1024 # FIXME hardcoded, TODO -> fileh.ram.pagesize
......@@ -359,5 +360,36 @@ class BigArray(object):
a[:] = v
# XXX __array__(self) = self[:] ?
# (for numpy functions to accept bigarray as-is (if size permits))
# BigArray -> ndarray (if enough address space available)
#
# BigArrays can be big - up to 2^64 bytes, and thus in general it is not
# possible to represent whole BigArray as ndarray view, because address
# space is usually smaller on 64bit architectures.
#
# However users often try to pass BigArrays to numpy functions as-is, and
# numpy finds a way to convert, or start converting, BigArray to ndarray -
# via detecting it as a sequence, and extracting elements one-by-one.
# Which is slooooow.
#
# Because of the above, we provide users a well-defined service:
# - if virtual address space is available - we succeed at creating ndarray
# view for whole BigArray, without delay and copying.
# - if not - we report properly the error and give hint how BigArrays have
# to be processed in chunks.
def __array__(self):
# NOTE numpy also sometimes uses optional arguments |dtype,context
# but specifying dtype means the result should be a copy.
#
# Copying BigArray data is not a good idea in all cases,
# so we don't support accepting dtype.
try:
return self[:]
except MemoryError:
logging.warn('You tried to map BigArray (~ %.1f GB) and it failed ...' %
(float(self.nbytes) // (1<<30)))
logging.warn('... because there is no so much memory or so much virtual address')
logging.warn('... space available. BigArrays larger than available virtual')
logging.warn('... address space can not be mapped at once and have to be')
logging.warn('... processed in chunks.')
raise
......@@ -435,3 +435,20 @@ def test_bigarray_to_ndarray():
# - would work with numpy-1.8
# - would loop forever eating memory with numpy-1.9
a = asarray(A)
assert array_equal(a, A[:])
# "medium"-sized array of 1TB. converting it to ndarray should work here
# without hanging, becuse initially all data are unmapped, and we don't
# touch mapped memory.
B = BigArray((1<<40,), uint8, Zh)
b = asarray(B)
assert isinstance(b, ndarray)
assert b.nbytes == 1<<40
# array of size larger than virtual address space (~ 2^47 on linux/amd64)
# converting it to ndarray is should be not possible
for i in range(48,65):
C = BigArray(((1<<i)-1,), uint8, Zh)
raises(MemoryError, 'asarray(C)')
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment