Issue
import pandas as pd
mydata = {"Key" : [567, 568, 569, 570, 571, 572] , "Sprint" : ["Max1;Max2", "Max2", "DI001 2", "DI001 25", "DAS 100" , "DI001 101"]}
df = pd.DataFrame(mydata)
df ["sprintlist"]= df["Sprint"].str.split(";")
print (df)
From this dataframe, I want to extract only the numbers that appears in the last part of the string from column "Sprintlist" for each value in the list to the new list "Sprintnumb" as show below
Expected output:
In one of my previous query, I got clarity on how to extract the number when only one value present in "Sprint" column. I tried using lambda function to achieved the desired output but getting errors "str' object has no attribute 'str'"
df["Sprint Number"] = df.Sprint.str.extract(r"(\d+)$").astype(int)
Solution
Use Series.explode
with Series.str.extractall
, converting to numeric and aggregate lists:
df["Sprint Number"] = (df["sprintlist"].explode()
.str.extractall(r"(\d+)$")[0]
.astype(int)
.groupby(level=0)
.agg(list))
print (df)
Key Sprint sprintlist Sprint Number
0 567 Max1;Max2 [Max1, Max2] [1, 2]
1 568 Max2 [Max2] [2]
2 569 DI001 2 [DI001 2] [2]
3 570 DI001 25 [DI001 25] [25]
4 571 DAS 100 [DAS 100] [100]
5 572 DI001 101 [DI001 101] [101]
Or use list comprhension with regex
:
df["Sprint Number"] = [[int(re.search('(\d+)$', y).group(0)) for y in x]
for x in df["sprintlist"]]
print (df)
Key Sprint sprintlist Sprint Number
0 567 Max1;Max2 [Max1, Max2] [1, 2]
1 568 Max2 [Max2] [2]
2 569 DI001 2 [DI001 2] [2]
3 570 DI001 25 [DI001 25] [25]
4 571 DAS 100 [DAS 100] [100]
5 572 DI001 101 [DI001 101] [101]
If possible some string not ends with number add assign operator :=
with testing None
:
import re
mydata = {"Key" : [567, 568, 569, 570, 571, 572] ,
"Sprint" : ["Max1;Max", "Max2", "DI001 2", "DI001 25", "DAS 100" , "DI001 101"]}
df = pd.DataFrame(mydata)
df ["sprintlist"]= df["Sprint"].str.split(";")
df["Sprint Number"] = [[int(m.group(0))
for y in x if( m:=re.search('(\d+)$', y)) is not None]
for x in df["sprintlist"]]
print (df)
Key Sprint sprintlist Sprint Number
0 567 Max1;Max [Max1, Max] [1]
1 568 Max2 [Max2] [2]
2 569 DI001 2 [DI001 2] [2]
3 570 DI001 25 [DI001 25] [25]
4 571 DAS 100 [DAS 100] [100]
5 572 DI001 101 [DI001 101] [101]
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.