Issue
I have the following dataframe
id text
0 100 Donut/cookie, ; penguin
1 101 donut cake! penguinz
2 102 pizza!!!? donut cakepeanut
----------------------------------
data = [[100,'Donut/cookie, ; penguin'],
[101,'donut cake! penguinz'],
[102,'pizza!!!? donut cakepeanut']
]
df = pd.DataFrame(data, columns = ['id','text'])
I would to add several columns to my df depending on whether specific substrings exist in the text
column
So something like this:
id text donut cookie penguin pizza
0 100 Donut/cookie, ; penguin yes yes yes no
1 101 donut cake! penguinz yes no yes no
2 102 pizza!!!? donut cakepenguin yes no yes yes
I just need it yes/no if the substring exists, delimitation and white spaces don't really matter. Also it would be really helpful if it wasn't case sensitive
Solution
import pandas as pd
data = [[100, 'Donut/cookie, ; penguin'],
[101, 'donut cake! penguinz'],
[102, 'pizza!!!? donut cakepeanut']
]
df = pd.DataFrame(data, columns=['id', 'text'])
def check(df, key):
return df['text'].apply(lambda x: "yes" if key in x.lower() else "no")
df['donut'] = check(df, 'donut')
df['cookie'] = check(df, 'cookie')
df['penguin'] = check(df, 'penguin')
df['pizza'] = check(df, 'pizza')
Answered By - Mazhar
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.