Saturday, December 23, 2023

[FIXED] Replace cell values efficiently in Dataframe based on another column data

December 23, 2023 dataframe, numpy, pandas, python-3.x No comments

Issue

I am currently running a for loop and if statement to check and replace values in cell based on row value of another column.

Simply put, my dataframe is 9000x27 rows similar to

I am using the below code to compare first 6 digits (CONT-1) of Package column with first 6 digits of Status column and if its true take entire row information to new dataframe, if not replace status column value with NaN.

new_dr = pd.DataFrame(columns = pr_compliant.columns)
for index, row in pr_compliant.iterrows():
    col1 = row['Package ID'][:6]
    col2 = row['Status ID'][:6]
    if col1 == col2:
        new_dr = new_dr._append(row, ignore_index=True)
    else:
        row['Status ID'] = np.nan
        new_dr = new_dr._append(row, ignore_index=True)
new_dr = new_dr.drop_duplicates()
print(new_dr)

Here pr_compliant is source dataframe and new_dr is output dataframe. I want output as below

Currently its taking more than 30 secs to compare 9000 rows and push output. I am looking for efficient way to reduce the time as my master file that I am deploying this code will be 100000x27 dataframe.

Any thoughts for efficieny?

Solution

Try this:

pr_compliant['Status ID'] = ((pr_compliant['Status ID'].str[:6] == pr_compliant['Package ID'].str[:6])
               *pr_compliant['Status ID']).replace({'': np.nan})

This is kind of a slick one liner, so I'll break it down. Adding .str to a dataframe column lets us treat it like a string, so pr_compliant['Status ID'].str[:6] will give us a column of just the first 6 characters of each entry. The when we do the comparison

pr_compliant['Status ID'].str[:6] == pr_compliant['Package ID'].str[:6]

That gives us a column of True and False values, with the Trues indicating the rows where the first 6 characters match. When we mulitply that by the original pr_compliant['Status ID'] column, that will put the status ID where the Trues are, and blank strings where the Falses are. Finally we replace those blank strings with nans using .replace({'': np.nan}).

Answered By - Jacob H

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, December 23, 2023

[FIXED] Replace cell values efficiently in Dataframe based on another column data

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels