Issue
Thought this would be straight forward but had some trouble tracking down an elegant way to search all columns in a dataframe at same time for a partial string match. Basically how would I apply df['col1'].str.contains('^')
to an entire dataframe at once and filter down to any rows that have records containing the match?
Solution
The Series.str.contains
method expects a regex pattern (by default), not a literal string. Therefore str.contains("^")
matches the beginning of any string. Since every string has a beginning, everything matches. Instead use str.contains("\^")
to match the literal ^
character.
To check every column, you could use for col in df
to iterate through the column names, and then call str.contains
on each column:
mask = np.column_stack([df[col].str.contains(r"\^", na=False) for col in df])
df.loc[mask.any(axis=1)]
Alternatively, you could pass regex=False
to str.contains
to make the test use the Python in
operator; but (in general) using regex is faster.
Answered By - unutbu
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.