Saturday, October 8, 2022

[FIXED] Fill in empty space by merging columns using Pandas Python

October 08, 2022 dataframe, jupyter-notebook, pandas, python, txt No comments

Issue

I am a Python noob. I have an unstructured text file that I'm trying to capture to a dataframe and export to excel. I need to merge 38 to 36, 45 to 43, and 79 to 78 filling in the empty space with the data on the merging column.

Dummy Dataset

	0	5	36	38	43	45	78	79
1	A	01JUN2022		1.2		B		1.2
2	C	01JUN2022		1.4		D		1.4
3	E	01JUN2022	1.5		F		1.6
4	G	01JUN2022	1.7		H		1.7
5	I	01JUN2022	1.4		J		1.8
6	K	01JUN2022	1.7		L		1.3
1	A	01JUN2022		1.2		B		1.2
2	C	01JUN2022		1.4	D		1.4
3	E	01JUN2022		1.5	F		1.6
4	G	01JUN2022	1.7		H		1.7
5	I	01JUN2022	1.4			J		1.8
6	K	01JUN2022	1.7		L			1.3

Required output

	0	5	36	43	79
1	A	01JUN2022	1.2	B	1.2
2	C	01JUN2022	1.4	D	1.4
3	E	01JUN2022	1.5	F	1.6
4	G	01JUN2022	1.7	H	1.7
5	I	01JUN2022	1.4	J	1.8
6	K	01JUN2022	1.7	L	1.3
1	A	01JUN2022	1.2	B	1.2
2	C	01JUN2022	1.4	D	1.4
3	E	01JUN2022	1.5	F	1.6
4	G	01JUN2022	1.7	H	1.7
5	I	01JUN2022	1.4	J	1.8
6	K	01JUN2022	1.7	L	1.3

Solution

Would start by converting '' to NaN as follows

df = df.replace(r'^\s*$', np.nan, regex=True)

Then one can use pandas.Series.combine_first

df['36'] = df['36'].combine_first(df['38'])
df['43'] = df['43'].combine_first(df['45'])
df['79'] = df['79'].combine_first(df['78'])

[Out]:
    id  0          5   36   38 43   45   78   79
0    1  A  01JUN2022  1.2  1.2  B    B  NaN  1.2
1    2  C  01JUN2022  1.4  1.4  D    D  NaN  1.4
2    3  E  01JUN2022  1.5  NaN  F  NaN  1.6  1.6
3    4  G  01JUN2022  1.7  NaN  H  NaN  1.7  1.7
4    5  I  01JUN2022  1.4  NaN  J  NaN  1.8  1.8
5    6  K  01JUN2022  1.7  NaN  L  NaN  1.3  1.3
6    1  A  01JUN2022  1.2  1.2  B    B  NaN  1.2
7    2  C  01JUN2022  1.4  1.4  D  NaN  1.4  1.4
8    3  E  01JUN2022  1.5  1.5  F  NaN  1.6  1.6
9    4  G  01JUN2022  1.7  NaN  H  NaN  1.7  1.7
10   5  I  01JUN2022  1.4  NaN  J    J  NaN  1.8
11   6  K  01JUN2022  1.7  NaN  L  NaN  NaN  1.3

Finally, one can drop the columns that one doesn't want or select the one's to display as follows

df = df[['0', '5', '36', '43', '79']]

[Out]:

    0          5   36 43   79
0   A  01JUN2022  1.2  B  1.2
1   C  01JUN2022  1.4  D  1.4
2   E  01JUN2022  1.5  F  1.6
3   G  01JUN2022  1.7  H  1.7
4   I  01JUN2022  1.4  J  1.8
5   K  01JUN2022  1.7  L  1.3
6   A  01JUN2022  1.2  B  1.2
7   C  01JUN2022  1.4  D  1.4
8   E  01JUN2022  1.5  F  1.6
9   G  01JUN2022  1.7  H  1.7
10  I  01JUN2022  1.4  J  1.8
11  K  01JUN2022  1.7  L  1.3

and this gives the desired output.

Notes:

There are additional ways to remove NaN and combine the columns. For that, see: How to remove nan value while combining two column in Panda Data frame?
There are also additional ways to delete columns in Python. If one wants to learn more about it, see: Delete a column from a Pandas DataFrame

Answered By - Gonçalo Peres

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, October 8, 2022

[FIXED] Fill in empty space by merging columns using Pandas Python

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels