Issue
When i specify the 'dtype' when creating an array i get a different shape than when i don't specify 'dtype'.
I have this code below where i only want the function np.sort() to sort taking 'factor' and 'var' as parameters, not 'data'. Because inside 'data' i have 'None' values. When i run this code i get this: TypeError: '<' not supported between instances of 'int' and 'NoneType' But when i substitute 'None' with any 'int' the code runs as expected.
How can i make the code not to consider 'data' when sorting and/or not to break with 'None' values inside the list.
Code:
>>> hzhDtype = [('factor', float), ('var', int), ('data', list)]
>>> values = [
... (3.82, 21, ['foo1', None, None, 1, 'g1', 'x', None, 0]),
... (1.91, 21, ['foo1_GT', 'foo2', 3, 1, 'g1', 'x', None, 0]),
... (1.91, 21, ['foo1_GT', 'foo3', 1, 1, 'g1', 'x', None, 0]),
... (1.91, 21, ['foo1_GT', 'foo4', 1, 1, 'g1', 'x', None, 0]),
... (1.91, 21, ['foo1_GT', 'foo5', 2, 1, 'g1', 'x', None, 0]),
... (2.55, 21, ['foo1_GT', 'foo6', 1, 1, 'g1', 'x', None, 0]),
... (0.5, 1, ['foo2_GT', 'foo1', 2, 3, 'g2', 'x', None, 1]),
... (0.5, 1, ['foo2_GT', 'foo1', 2, 3, 'g1', 'x', 0, 1]),
... (0.5, 1, ['foo2_GT', 'foo1', 2, 3, 'g2', 'x', None, 0]),
... (0.5, 1, ['foo2_GT', 'foo1', 2, 3, 'g2', 'x', None, 0]),
... (2.0, 2, ['foo2_GT', 'foo1', 2, 1, 'g2', 'x', None, 0]),
... (0.5, 1, ['foo2_GT', 'foo1', 2, 2, 'g1', 'x', None, 1]),
... (0.5, 1, ['foo2_GT', 'foo1', 2, 2, 'g2', 'x', 0, 1]),
... (0.5, 1, ['foo2_GT', 'foo1', 2, 2, 'g1', 'x', None, 0]),
... (0.5, 1, ['foo2_GT', 'foo1', 2, 2, 'g1', 'x', None, 0]),
... (0.5, 1, ['foo2_GT', 'foo6', 1, 2, 'g1', 'x', None, 1]),
... (2.0, 4, ['foo3_GT', 'foo1', 2, 1, 'g2', 'x', None, 0]),
... (2.0, 4, ['foo3_GT', 'foo1', 2, 1, 'g2', 'x', None, 0]),
... (0.5, 1, ['foo3_GT', 'foo1', 2, 2, 'g1', 'x', None, 0]),
... (0.5, 1, ['foo3_GT', 'foo1', 2, 2, 'g1', 'x', None, 0]),
... (5.0, 5, ['foo3_GT', 'foo1', 2, 3, 'g2', 'x', None, 0]),
... (5.0, 5, ['foo3_GT', 'foo1', 2, 3, 'g2', 'x', None, 0]),
... (2.0, 1, ['foo4_GT', 'foo1', 2, 1, 'g1', 'x', None, 1]),
... (2.0, 1, ['foo4_GT', 'foo1', 2, 1, 'g2', 'x', 0, 1]),
... (2.0, 1, ['foo4_GT', 'foo1', 2, 1, 'g1', 'x', None, 0]),
... (2.0, 1, ['foo4_GT', 'foo1', 2, 1, 'g1', 'x', None, 0]),
... (1.0, 1, ['foo4_GT', 'foo2', 3, 1, 'g1', 'x', None, 1]),
... (1.0, 1, ['foo4_GT', 'foo2', 3, 1, 'g2', 'x', 0, 1]),
... (1.0, 1, ['foo4_GT', 'foo2', 3, 1, 'g1', 'x', None, 0]),
... (1.0, 1, ['foo4_GT', 'foo2', 3, 1, 'g1', 'x', None, 0]),
... (1.0, 1, ['foo4_GT', 'foo5', 2, 1, 'g1', 'x', 1, 0]),
... (1.0, 1, ['foo4_GT', 'foo5', 2, 1, 'g1', 'x', None, 1]),
... (1.0, 1, ['foo4_GT', 'foo6', 1, 1, 'g1', 'x', None, 1]),
... (4.0, 3, ['foo5_GT', 'foo1', 2, 2, 'g2', 'x', None, 1]),
... (4.0, 3, ['foo5_GT', 'foo1', 2, 2, 'g1', 'x', 0, 1]),
... (4.0, 3, ['foo5_GT', 'foo1', 2, 2, 'g2', 'x', None, 0]),
... (4.0, 3, ['foo5_GT', 'foo1', 2, 2, 'g2', 'x', None, 0]),
... (3.0, 3, ['foo5_GT', 'foo2', 3, 1, 'g1', 'x', 1, 0]),
... (3.0, 3, ['foo5_GT', 'foo2', 3, 1, 'g1', 'x', None, 1]),
... (3.0, 3, ['foo5_GT', 'foo3', 1, 1, 'g1', 'x', 1, 0]),
... (3.0, 3, ['foo5_GT', 'foo3', 1, 1, 'g1', 'x', None, 1]),
... (3.0, 3, ['foo5_GT', 'foo6', 1, 1, 'g1', 'x', None, 1]),
... (3.0, 3, ['foo5_GT', 'foo6', 1, 1, 'g2', 'x', 0, 1]),
... (3.0, 3, ['foo5_GT', 'foo6', 1, 1, 'g1', 'x', None, 0]),
... (2.0, 3, ['foo5_GT', 'foo6', 1, 1, 'g1', 'x', None, 0]),
... (3.0, 3, ['foo5_GT', 'foo6', 1, 1, 'g1', 'x', None, 0]),
... (3.0, 3, ['foo5_GT', 'foo6', 1, 1, 'g1', 'x', None, 0]),
... (4.0, 3, ['foo6_GT', 'foo1', 2, 1, 'g2', 'x', None, 0]),
... (4.0, 3, ['foo6_GT', 'foo1', 2, 1, 'g2', 'x', None, 0]),
... (3.0, 3, ['foo6_GT', 'foo2', 3, 2, 'g1', 'x', 1, 0]),
... (3.0, 3, ['foo6_GT', 'foo2', 3, 2, 'g1', 'x', None, 1]),
... (3.0, 3, ['foo6_GT', 'foo3', 1, 2, 'g1', 'x', 1, 0]),
... (3.0, 3, ['foo6_GT', 'foo3', 1, 2, 'g1', 'x', None, 1]),
... (3.0, 3, ['foo6_GT', 'foo5', 2, 2, 'g1', 'x', 1, 0]),
... (3.0, 3, ['foo6_GT', 'foo5', 2, 2, 'g1', 'x', None, 1])
... ]
>>> # this is the shape i want, but i can't sort it.
>>> arr = np.array(values)
>>> arr.shape
(55,3)
>>> np.sort(arr, axis=0)[::-1]
>>> *** TypeError: '<' not supported between instances of 'NoneType' and 'int'
>>>
>>> # here i have the dtype i want but not the shape i want
>>> arr2 = np.array(values, dtype=hzhDtype)
>>> arr2.shape()
(55,)
>>> np.sort(arr2, order=['factor', var'])[::-1]
*** TypeError: '<' not supported between instances of 'int' and 'NoneType'
>>> # i don't know why it look at the third column ('data') if i only want it to order considering 'factor' & 'var'.
Solution
The 2nd case makes structured array
, with 3 fields, not 3 columns. If is 1d, not 2d. You may need to read the structured array
docs a bit more carefully.
As for the sort problem, re(read) np.sort
:
order : str or list of str, optional
When `a` is an array with fields defined, this argument specifies
which fields to compare first, second, etc. A single field can
be specified as a string, and not all fields need be specified,
but unspecified fields will still be used, in the order in which
they come up in the dtype, to break ties.
It says "unspecified fields will still be used".
Looks like we can get around the None
comparison problem by using argsort
:
In [35]: idx=np.argsort(arr[['factor','var']])
In [36]: idx
Out[36]:
array([19, 14, 13, 12, 11, 9, 15, 8, 6, 7, 18, 28, 26, 29, 30, 31, 32,
27, 1, 2, 3, 4, 22, 23, 24, 25, 10, 44, 16, 17, 5, 51, 52, 49,
46, 45, 42, 43, 50, 41, 54, 39, 38, 37, 53, 40, 0, 36, 35, 34, 47,
48, 33, 21, 20])
In [37]: arr1=arr[idx]
Answered By - hpaulj
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.