Issue
Need help in repeating rows if found in list.
- If found value in list increment count
- If more than one instance found then repeat the row by incrementing count
DF
data = {'First_Name':['Tom', 'Nick', 'Daniel', 'Jack'],
'oter_col':['other_value1', 'other_value2', 'other_value3', 'other_value4']}
df = pd.DataFrame(data)
df["repeation"] = 0
DF looks like
First_Name oter_col count
0 Tom other_value1 0
1 Nick other_value2 0
2 Daniel other_value3 0
3 Jack other_value4 0
Input list:
Full_Name = ["Tom Cruise", "Tom Holland", "Tom Hardy", "Jack black", "Chris Hemsworth"]
In need output like:
First_Name Full_Name oter_col count
0 Tom Tom Cruise other_value1 1
1 Tom Tom Holland other_value1 2
2 Tom Tom Hardy other_value1 3
3 Jack Jack black other_value4 1
4 Nick other_value2 1
5 Daniel other_value3 0
Tried something like this to get first matching index so I can repeat the row but not sure how to do it.
for name in Full_Name:
m = df.First_Name.str.contains(name.split()[0])
first_index_found = m.idxmax() if m.any() else None
if type(first_index_found) == int:
print(first_index_found)
Solution
You can use:
# Create a regex pattern to extract First_Name from Full_Name
pat = fr"\b({'|'.join(df['First_Name'])})\b"
# Create a dataframe from Full_Name
df1 = pd.DataFrame({'Full_Name': Full_Name})
df1['First_Name'] = df1['Full_Name'].str.extract(pat)
# Merge them on First_Name column
out = df.merge(df1, on='First_Name', how='left')
# Count (choose one)
out['repeation'] = out.groupby('First_Name').cumcount().add(1)
# OR
out['repeation2'] = (out.dropna().groupby('First_Name').cumcount().add(1)
.reindex(out.index, fill_value=0))
Output:
>>> out
First_Name oter_col repeation Full_Name repeation2
0 Tom other_value1 1 Tom Cruise 1
1 Tom other_value1 2 Tom Holland 2
2 Tom other_value1 3 Tom Hardy 3
3 Nick other_value2 1 NaN 0
4 Daniel other_value3 1 NaN 0
5 Jack other_value4 1 Jack black 1
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.