Issue
I've got an error merging two dataframes by row. The last version I used pd.concat([df1, df2], axis=0)
, but in pandas version 2.1.0 doesn't work. Anybody knows how to solve the error?
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[47], line 2
1 print(real_last.shape, real_exp.shape) #(59202, 34) (4583, 34)
----> 2 real_out = pd.concat([real_exp, real_last], axis=0)
3 print(real_out.shape)
File c:\Users\sarud\anaconda3\envs\ETLupdate\Lib\site-packages\pandas\core\reshape\concat.py:393, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
378 copy = False
380 op = _Concatenator(
381 objs,
382 axis=axis,
(...)
390 sort=sort,
391 )
--> 393 return op.get_result()
File c:\Users\sarud\anaconda3\envs\ETLupdate\Lib\site-packages\pandas\core\reshape\concat.py:680, in _Concatenator.get_result(self)
676 indexers[ax] = obj_labels.get_indexer(new_labels)
678 mgrs_indexers.append((obj._mgr, indexers))
...
--> 230 return super()._concat_same_type(to_concat, axis=axis)
File arrays.pyx:190, in pandas._libs.arrays.NDArrayBacked._concat_same_type()
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 4583 and the array at index 1 has size 59202
I have the packages:
print(sys.version, pd.__version__, np.__version__, sep='\n')
3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:26:23) [MSC v.1916 64 bit (AMD64)]
2.1.0
1.26.0
The dataframes has the same structure, check a sample:
print(real_last.sample(2).T.to_markdown())
43597 | 9338 | |
---|---|---|
Orden | 006710000 | 006781111 |
Operacion | 0010 | 0020 |
Operacion.text | XXXXXX | YYYYYYY |
Cl.orden | NP | NP |
Cl.actividad | 030 | 035 |
Ubic.tecnica | XXXX-XX-LAS-DES-BAP19 | XXXX-XX-S13-MBA |
Status.sistema | CTEC NOTI IMOP KKMP PREC | LIB. IMOP KKMP PREC |
Status.sistema.op | NOTI CONT CTEC NLIQ | LIB. NLIQ |
Stat.Usuario | TRAT | TRAT |
Fe.Entrada | 2023-06-25 00:00:00 | 2023-07-23 00:00:00 |
Fe.Lib | 2023-07-06 00:00:00 | 2023-07-23 00:00:00 |
Fe.Ini.real.ot | 2023-07-01 00:00:00 | NaT |
Fe.Ini.real.op | 2023-07-01 00:00:00 | NaT |
Fe.Ini.temp | 2023-07-06 00:00:00 | 2023-07-23 00:00:00 |
Aviso | 00120100 | 11194911 |
Modif.por | XXXXX005 | XXXXX011 |
Fe.Modif | 2023-07-06 00:00:00 | 2023-07-23 00:00:00 |
Autor | XXXXX003 | XXXXX021 |
Grupo.planif | XXT | XX1 |
G.hojas.ruta | nan | nan |
CGH | nan | nan |
Plan.mant.prev | nan | nan |
Pos.PM | nan | nan |
Pto.tbjo.resp | XXXXXXXX | XXXXXXXX |
Pto.tbjo.op | XXXXXXXX | XXXXXXXX |
Cantidad | 1 | 0 |
Duracion.normal | 1.0 | 0.0 |
Trabajo | 1.0 | 0.0 |
Trabajo.real | 1.0 | 0.0 |
Costos tot.reales | 147.03 | 0.0 |
Sum.costo.plan | 147.03 | 479.96 |
Tot.plan.general | 147.03 | 479.96 |
Total.real.general | 147.03 | 0.0 |
Costo.dist | 0.0 | 0.0 |
print(real_exp.sample(2).T.to_markdown())
926 | 990 | |
---|---|---|
Orden | 222212222 | 333323333 |
Operacion | 0120 | 0040 |
Operacion.text | XXXXXXXXXX | YYYYYYYYY |
Cl.orden | PL | PL |
Cl.actividad | 010 | 010 |
Ubic.tecnica | XXXX-XX-S07-ALI-CTR7B | XXXX-XX-SCA-AL2-AOG1C |
Status.sistema | CTEC NOTI IMPR FMAT IMOP MOVM NLIQ PREC* | LIB. NOTI IMPR DOCU IMOP KKMP NLIQ PREC* |
Status.sistema.op | NOTI CTEC IMPR NLIQ | NOTI CONT IMPR LIB. NLIQ PLAN |
Stat.Usuario | TBTR | TRAT |
Fe.Entrada | 2023-08-02 00:00:00 | 2023-08-02 00:00:00 |
Fe.Lib | 2023-08-23 00:00:00 | 2023-08-21 00:00:00 |
Fe.Ini.real.ot | 2023-09-04 00:00:00 | 2023-09-05 00:00:00 |
Fe.Ini.real.op | 2023-09-05 00:00:00 | 2023-09-06 00:00:00 |
Fe.Ini.temp | 2023-09-07 00:00:00 | 2023-09-04 00:00:00 |
Aviso | 33333333 | 44444444 |
Modif.por | XXXXX009 | XXXXX003 |
Fe.Modif | 2023-09-10 00:00:00 | 2023-09-07 00:00:00 |
Autor | XXXXXXXXXXXX | XXXXXXXXXXXX |
Grupo.planif | XX0 | XXC |
G.hojas.ruta | 1886 | 76326 |
CGH | 3 | 3 |
Plan.mant.prev | 8763 | 191111 |
Pos.PM | 95475 | 357140 |
Pto.tbjo.resp | XXXXXXXX | XXXXXXXX |
Pto.tbjo.op | XXXXXXXX | XXXXXXXX |
Cantidad | 4 | 2 |
Duracion.normal | 4.0 | 1.0 |
Trabajo | 16.0 | 2.0 |
Trabajo.real | 16.0 | 0.5 |
Costos tot.reales | 1627.5 | 0.04 |
Sum.costo.plan | 2336.45 | 0.09 |
Tot.plan.general | 2336.45 | 0.09 |
Total.real.general | 1627.5 | 0.04 |
Costo.dist | nan | nan |
Solution
I can't trigger the ValueError
with the given examples but, since your dataframes hold datetimes values, this could be maybe due to a dtypes and/or resolution mismatch like in this Q/A. You can also check GH55067 that discusses a similar issue.
Try this :
real_out = pd.concat([real_exp, real_last.astype(real_exp.dtypes)], axis=0)
Output :
print(real_out)
Orden Operacion ... Total.real.general Costo.dist
926 222212222 0120 ... 1627.5 NaN
990 333323333 0040 ... 0.04 NaN
43597 006710000 0010 ... 147.03 0.0
9338 006781111 0020 ... 0.0 0.0
[4 rows x 34 columns]
Answered By - Timeless
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.