Wednesday, July 27, 2022

[FIXED] Pandas: Based on values in column A populate values in column B accordingly without iteration.(for loop)

July 27, 2022 dataframe, pandas, python, regex No comments

Issue

Here is my problem statement -

I have columnA with data set like

import pandas as pd
df = pd.DataFrame({
    'columnA': ['DD22HAHTL1NXX---', 'DD22HATNT1N--D3F', 'DD22HATNT1N--B3F', 'DD22HAHTL1N--A3F', 'DD22HATNT1N--C1F', 'DD22HAHTL1N--A3F', 'DD22HATNT1N--B3F', 'DD22HAHTL1N--A3F', 'DD22HAHTL1N--A3E', 'DD22HAHTL1N--A3F', 'DD22HAHTL1N--B3F', 'DD22HAHTL1N--A3F', 'DD22HAHTL1N--A3F', 'DD22HAHTL1NZZ---', 'DD22HAHTL1N--A3E']})

I am trying to create a new columnB with the substring of columnA but with a catch. The condition is,

If the last 3 characters of each row is --- then I need to extract XX , i.e 12th and 13th characters and add it in new columnB else I need to capture the last 3 characters and add them to columnB . My desired output will look like this -

How can i create this new columnB efficiently without using any loops?

I tried this and worked for me for only 1 case:

df['columnB'] = df.columnA.str[-3:]
df[columnB] = df.['columnA'].str.extract('([A-Za,z]{2}---)', expand=True)

This gave me last 3 charecters of all the columns which are not ending with '---' but with the column ending with '---' , I got it as is which is wrong. I also tried str.extract with regex but it gave me correct values for all the rows ending with '---' but not the others. Please help.

Solution

You could use np.where

import numpy as np

df['columnB'] = np.where(df['columnA'].str.endswith('---'), df['columnA'].str.slice(11,13), df['columnA'].str.slice(-3))
print(df)

Result

             columnA columnB
0   DD22HAHTL1NXX---      XX
1   DD22HATNT1N--D3F     D3F
2   DD22HATNT1N--B3F     B3F
3   DD22HAHTL1N--A3F     A3F
4   DD22HATNT1N--C1F     C1F
5   DD22HAHTL1N--A3F     A3F
6   DD22HATNT1N--B3F     B3F
7   DD22HAHTL1N--A3F     A3F
8   DD22HAHTL1N--A3E     A3E
9   DD22HAHTL1N--A3F     A3F
10  DD22HAHTL1N--B3F     B3F
11  DD22HAHTL1N--A3F     A3F
12  DD22HAHTL1N--A3F     A3F
13  DD22HAHTL1NZZ---      ZZ
14  DD22HAHTL1N--A3E     A3E

Answered By - jch

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, July 27, 2022

[FIXED] Pandas: Based on values in column A populate values in column B accordingly without iteration.(for loop)

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels