Issue
def split_it(email):
return re.findall(r"[^@]+@[^@]+(?:\.com|\.se|\.br|\.org)", s)
df['email_list'] = df['email'].apply(lambda x: split_it(x))
This code seems to work for the first row of the df, but then will print the result of the first row on all other rows.
Is it not iterating through all rows? Or does it print the result of row 1 on all rows?
Solution
You do not need to use apply
here, use Series.str.findall
directly:
df['email_list'] = df['email'].str.findall(r"[^@]+@[^@]+(?:\.com|\.se|\.br|\.org)")
If there are several emails per row, you can join the results:
df['email_list'] = df['email'].str.findall(r"[^@]+@[^@]+(?:\.com|\.se|\.br|\.org)").str.join(", ")
Note that the email pattern can be enhanced in many ways, but I would add \s
into the negated character classes to exclude whitespace matching, and move \.
outside the group to avoid repetition:
r"[^\s@]+@[^\s@]+\.(?:com|se|br|org)"
Answered By - Wiktor Stribiżew
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.