Issue
my df looks like this
name type info
90 Sizeer - Annopol 2 shoe duplicate SIZEER
91 InterSport - Arkadia sport duplicate INTERSPORT
92 InterSport - Złota 59 sport NaN
...
what i want to do is to remove all rows where the value in info
column starts with the word "duplicate". Its kinda tricky because this columns has not only string values, but also booleans. Moreover, the ones i wish to delete are not just 'duplicate', they have more text afterwards.
i tried doing this
duplicates = []
for i in range(df.shape[0]):
if str(df['info'])[i][:10] == 'duplicate':
duplicates.append(i)
to get their ID's so i can delete them later, but it dosen't do anything. If i removed str()
from if str(df['info'])[i][:10] == 'duplicate':
there's an error
TypeError: 'float' object is not subscriptable
i also did this
dupli = df[df['info'] np.where('duplicate' in df['info'])]
but it's just a syntax error i dont really know how to do this properly :D
Solution
The simplest way for the word 'duplicate' at the start of the text:
df = df[~df.info.str.startswith('duplicate', na=False)]
If you want similarly but anywhere in the text:
df = df[~df.info.str.contains('duplicate', na=False)]
Answered By - gtomer
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.