Hello All, I've noticed some incompatibilities between numpy and cctbx, which is understandable. However, the incompatibilities manifest either as nonsense errors or, worse, not at all. Following are a couple of examples I have found. 1. The first example should really throw an exception, because silent failures like this can be catastrophic: py> from numpy import random py> r = random.randint(2, size=10) py> r array([1, 0, 1, 1, 0, 1, 0, 1, 1, 1]) py> list(flex.bool(r)) [True, False, False, False, False, False, False, False, False, False] This example is clearly due to incorrect assumptions about the internal representations of numpy ndarrays. Much better behavior can be seen when using a sequence of python `int` objects: py> list(flex.bool(range(10))) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/jstroud/Unison/Code/radialx/testdata/<ipython-input-754-dce96b02979e> in <module>() ----> 1 list(flex.bool(range(10))) TypeError: No registered converter was able to produce a C++ rvalue of type bool from this Python object of type int 2. Although not as potentially catastrophic as the first, the second example should (in a perfect world) either work or throw a more meaningful exception: py> flex.int(range(10))[r[0]] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/jstroud/Unison/Code/radialx/testdata/<ipython-input-753-f14587f0ceeb> in <module>() ----> 1 flex.int(range(10))[r[0]] TypeError: 'numpy.int64' object is not iterable This example is obviously a result of type checking for a python `int`. The usual python approach of duck-type checking would avoid this problem: def __getitem__(self, i): try: i = int(i) except: pass return do_whatever_with_i(i) James
Hi James,
1. The first example should really throw an exception, because silent failures like this can be catastrophic:
py> from numpy import random py> r = random.randint(2, size=10) py> r array([1, 0, 1, 1, 0, 1, 0, 1, 1, 1]) py> list(flex.bool(r)) [True, False, False, False, False, False, False, False, False, False]
This example is clearly due to incorrect assumptions about the internal representations of numpy ndarrays.
As far as I can tell, this is much worse than that as the numpy-to-flex conversion assumes the same element type in the source and target array. Thus converting a numpy array of int's into a flex array of bools is illegal but this precondition is not asserted unfortunately. Nasty indeed. I do not understand numpy well enough to propose a solution, I am afraid.
2. Although not as potentially catastrophic as the first, t
py> flex.int(range(10))[r[0]] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/jstroud/Unison/Code/radialx/testdata/<ipython-input-753-f14587f0ceeb> in <module>() ----> 1 flex.int(range(10))[r[0]]
TypeError: 'numpy.int64' object is not iterable
For flex arrays, the following __getitem__ variants are tried in this order: 1. one that takes a tuple 2. one that takes a slice 3. one that takes a single index, of type long in C++ The problem I think is that we are missing a registered conversion from the type numpy.int64 to the type long, and therefore all 3 variants fail. Somehow the first failure is reported. Best wishes, Luc
On Apr 4, 2013, at 3:15 AM, Luc Bourhis wrote:
1. The first example should really throw an exception, because silent failures like this can be catastrophic:
py> from numpy import random py> r = random.randint(2, size=10) py> r array([1, 0, 1, 1, 0, 1, 0, 1, 1, 1]) py> list(flex.bool(r)) [True, False, False, False, False, False, False, False, False, False]
This example is clearly due to incorrect assumptions about the internal representations of numpy ndarrays.
As far as I can tell, this is much worse than that as the numpy-to-flex conversion assumes the same element type in the source and target array. Thus converting a numpy array of int's into a flex array of bools is illegal but this precondition is not asserted unfortunately. Nasty indeed. I do not understand numpy well enough to propose a solution, I am afraid.
On the python side, the correct types for which to check have names that mirror the python names: py> numpy.complex is type(complex()) True py> numpy.bool is type(bool()) True py> numpy.int is type(int()) True py> numpy.float is type(float()) True py> numpy.long is type(long()) True It is unfortunate that array creation by numpy does not respect this naming symmetry: py> numpy.array([1, 2, 3]).dtype is int False py> numpy.array([1, 2, 3], dtype=int).dtype is int False py> # same as py> numpy.array([1, 2, 3], dtype=numpy.int).dtype is int False So, even if numpy-to-flex conversion catches wrongly [1] typed arrays, users still need to know which numpy dtypes to use for which flex constructors. Here is what I have determined: flex.bool : numpy.bool flex.int : numpy.int32 flex.long : numpy.int64, numpy.int flex.float : numpy.float32 flex.double : numpy.float, numpy.double In all cases, if a wrongly typed numpy array is used then flex will fail silently as of build 2013_02_27_0005. Here's one more example: py> numpy.arange(4) py> array([0, 1, 2, 3]) py> list(flex.int(numpy.arange(4))) [0, 0, 1, 0] James [1] The concept of "wrongly" typed objects doesn't truly exist in python.
On Apr 4, 2013, at 12:42 PM, James Stroud wrote:
So, even if numpy-to-flex conversion catches wrongly [1] typed arrays, users still need to know which numpy dtypes to use for which flex constructors. Here is what I have determined:
I realized that it is possible that unqualified specifiers like "numpy.float", etc, may have different meanings on different systems. Therefore, I think a better (i.e. more explicit) mapping is flex.bool : numpy.bool8 flex.int : numpy.int32 flex.long : numpy.int64 flex.float : numpy.float32 flex.double : numpy.float64 James
Hi Luc, On Apr 4, 2013, at 2:34 PM, Luc Bourhis wrote:
Hi James,
On the python side, the correct types for which to check have names that mirror the python names:
Ideally, I would need to know how to get at those types on the C++ side...
Best wishes,
Luc
I think the relevant part of the numpy documentation is here: http://docs.scipy.org/doc/numpy/reference/c-api.array.html#dealing-with-type... James
On Apr 4, 2013, at 2:56 PM, James Stroud wrote:
Hi Luc,
On Apr 4, 2013, at 2:34 PM, Luc Bourhis wrote:
Ideally, I would need to know how to get at those types on the C++ side...
Best wishes,
Luc
I think the relevant part of the numpy documentation is here:
http://docs.scipy.org/doc/numpy/reference/c-api.array.html#dealing-with-type...
This may be more generally useful: http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_DTYPE The relevant fields of the returned PyArray_Descr seem to be kind, byteorder, and elsize. James
Hi James,
This may be more generally useful:
http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_DTYPE
The relevant fields of the returned PyArray_Descr seem to be kind, byteorder, and elsize.
thanks for digging that out. I can now envision how to compare the numpy array and the flex array element types. Lots of boring bookkeeping but definitively possible. However, we haven't address the important question: what behaviour do we really want? (i) the numpy array and the flex array must have the same element type, and an exception is thrown if this precondition is violated; (ii) we use a fast copy of the numpy array to the flex array if the element types are the same; otherwise we fall back to a slower conversion. Best wishes, Luc
Hu Luc, On Apr 5, 2013, at 6:46 AM, Luc Bourhis wrote:
However, we haven't address the important question: what behaviour do we really want? (i) the numpy array and the flex array must have the same element type, and an exception is thrown if this precondition is violated; (ii) we use a fast copy of the numpy array to the flex array if the element types are the same; otherwise we fall back to a slower conversion.
My understanding of the "python way" would be that the desired behavior is (ii). I usually base these types of conclusions on how the python interpreter behaves, and, if that isn't specific enough, I look to the standard library. Consider the behavior of python's built-in array, which works like lists: py> import array py> array.array('f', [1, 2, 3]) array('f', [1.0, 2.0, 3.0]) py> array.array('f', flex.bool([True, False, False])) array('f', [1.0, 0.0, 0.0]) py> array.array('i', numpy.arange(4)) array('i', [0, 1, 2, 3]) Python arrays and lists don't care what type of elements comprise the data structures they are built from. For example, with array, the constructor doesn't care what the elements of the second argument are as long as they behave as specified in the first: integers can be converted to floats, bools to ints, numpy.int64s can be converted to ints, etc. The "python way" is concerned with behavior rather than type (usually at the sacrifice of performance, which is purposefully an afterthought). Besides being the "python way", I actually think that such behavior is often easier to implement. Numpy arrays already contain all of the information that describes how they are built. And furthermore, they are able to be converted to any other type without the need to inspect their innards. So the programmer only really needs to know what type of array he wants, and to avoid decisions based on what he has (which would be required in order to throw an error for every "illegal" type). Taking the "pythonic" approach only requires http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_CastToTyp... For each of the element types that could comprise flex arrays, one only needs to write one PyArray_Descr http://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.html#Py... which becomes the second argument for PyArray_CastToType. James
On 5 Apr 2013, at 19:38, James Stroud wrote:
Hu Luc,
On Apr 5, 2013, at 6:46 AM, Luc Bourhis wrote:
However, we haven't address the important question: what behaviour do we really want? (i) the numpy array and the flex array must have the same element type, and an exception is thrown if this precondition is violated; (ii) we use a fast copy of the numpy array to the flex array if the element types are the same; otherwise we fall back to a slower conversion.
My understanding of the "python way" would be that the desired behavior is (ii). [...]
Good demonstration: I fully agree with you. We will try to implement (ii) then.
Taking the "pythonic" approach only requires
http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_CastToTyp...
For each of the element types that could comprise flex arrays, one only needs to write one PyArray_Descr
http://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.html#Py...
which becomes the second argument for PyArray_CastToType.
Thanks again to look into this for me. Unfortunately the real difficulty is that a flex array does not have a descriptor as a numpy array does. In some sense the latter does not know the type of its elements: only the C++ compiler does. Thus some template trickery is still needed to map the each numpy array element type to the right flex element type. But thanks for your researches, that has helped a long way. Best wishes, Luc
participants (2)
-
James Stroud
-
Luc Bourhis