Issue
Python 2.7:
On an attempt to:
add a column (arr_date) with a datetype64(D) from 1 dimension Numpy array to an existing multidimentional Numpy array (data)
The following errors are raised:
- 'TypeError: invalid type promotion'
- 'numpy.AxisError: axis 1 is out of bounds for array of dimension 1'
Created column, that is desired to be appended:
>> arr_date
<<
[['2019-04-21']
['2019-04-21']
['2019-04-21']]
Tried to create a datetime object out of the 3 columns provided in the source (data) in a new Numpy array (arr_date) and add it to the old array (data) using methods below:
- np.c_
- np.append
- np.hstack
- np.column_stack
- np.concatenate
data = [(2019, 4, 21, 4.9, -16.5447, -177.1961, 22.4, 'US')
(2019, 4, 21, 4.8, -9.5526, 109.6003, 10. , 'UK')
(2019, 4, 21, 4.6, -7.2737, 124.0192, 554.9, 'FR')]
arr_date = np.zeros((len(data),1), dtype='datetime64[D]')
i = 0
while i < len(data):
date = dt.date(data [i][0], data[i][1], data[i][2])
arr_date[i][0] = date
i += 1
test1 = np.column_stack((data,arr_date))
np.c_[data, np.zeros(len(data))]
test2 = np.concatenate(data.reshape(-1,1), arr_date.reshape(-1,1), axis=1)
np.append(data, arr_date, axis = 1)
np.stack((data, arr_date), axis=-1)
np.hstack((data, arr_date))
test3 = np.column_stack((data, arr_date))
Solution
Until you answer my question about data.dtype
, I'm going to add commas and make data
a list of tuples:
In [117]: data = [(2019, 4, 21, 4.9, -16.5447, -177.1961, 22.4, 'US'),
...: (2019, 4, 21, 4.8, -9.5526, 109.6003, 10. , 'UK'),
...: (2019, 4, 21, 4.6, -7.2737, 124.0192, 554.9, 'FR')]
In [118]: arr_date = np.zeros((len(data),1), dtype='datetime64[D]')
...:
...: i = 0
...:
...: while i < len(data):
...: date = dt.date(data [i][0], data[i][1], data[i][2])
...: arr_date[i][0] = date
...: i += 1
...:
In [119]: arr_date
Out[119]:
array([['2019-04-21'],
['2019-04-21'],
['2019-04-21']], dtype='datetime64[D]')
So arr_date
is a (3,1) array with datetime64[D]
dtype.
===
I'm guessing that your data
is actually a structured array, with a compound dtype. For example:
In [121]: data1 = np.array(data, dtype='i,i,i,f,f,f,f,U2')
In [122]: data1
Out[122]:
array([(2019, 4, 21, 4.9, -16.5447, -177.1961, 22.4, 'US'),
(2019, 4, 21, 4.8, -9.5526, 109.6003, 10. , 'UK'),
(2019, 4, 21, 4.6, -7.2737, 124.0192, 554.9, 'FR')],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<f4'), ('f4', '<f4'), ('f5', '<f4'), ('f6', '<f4'), ('f7', '<U2')])
In [123]: data1.shape
Out[123]: (3,)
In [124]: data1.dtype
Out[124]: dtype([('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<f4'), ('f4', '<f4'), ('f5', '<f4'), ('f6', '<f4'), ('f7', '<U2')])
Your date
iteration works with this. But the fields (not columns) of data1
can be accessed by name:
In [127]: data1['f0']
Out[127]: array([2019, 2019, 2019], dtype=int32)
column_stack
can join a (3,) array with a (3,1) to produce a (3,2), but:
In [130]: np.column_stack((data, arr_date))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-130-5c8e6a103474> in <module>
----> 1 np.column_stack((data, arr_date))
/usr/local/lib/python3.6/dist-packages/numpy/lib/shape_base.py in column_stack(tup)
638 arr = array(arr, copy=False, subok=True, ndmin=2).T
639 arrays.append(arr)
--> 640 return _nx.concatenate(arrays, 1)
TypeError: invalid type promotion
First note that the error occurs when trying to do concatenate
. I bet all the other random tries produced a similar error (if they got past the axis error). The error is telling us that it can't combine a compound dtype as in Out[124]
with the datetime64
dtype of arr_date
. The dtypes
don't match, and can't be made to match.
Basically this isn't a concatenation problem. You are not trying to add a 'column' to a 2d array, or even trying to create a 2d array. data
is not 2d. It is 1d. What you need to do is add a field to a structured array.
There is a module of functions that make it easier to work structured arrays.
In [131]: import numpy.lib.recfunctions as rf
append_fields
should do the trick, but, it can be a bit tricky to use:
In [137]: rf.append_fields(data1, 'date', arr_date.ravel(), usemask=False)
Out[137]:
array([(2019, 4, 21, 4.9, -16.5447, -177.1961, 22.4, 'US', '2019-04-21'),
(2019, 4, 21, 4.8, -9.5526, 109.6003, 10. , 'UK', '2019-04-21'),
(2019, 4, 21, 4.6, -7.2737, 124.0192, 554.9, 'FR', '2019-04-21')],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<f4'), ('f4', '<f4'), ('f5', '<f4'), ('f6', '<f4'), ('f7', '<U2'), ('date', '<M8[D]')])
This is still a 1d array, but with one more field, which I called date
.
===
In my answer to:
Add and access object-type field of a numpy structured array
I show how to construct a new structured array with fields from two arrays, which gives an idea of what append_fields
is doing.
Answered By - hpaulj
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.