Issue
I have a dataframe named df that has an ID column with several rows. I am reading in a file and want to quickly check if any of the IDs in the ID column of my dataframe are in the file contents. Below is what I'm currently doing. It works but I am sure there is a more efficient method for checking whether any row in a dataframe column exists in a file.
# Read in and store file contents
file = open('myfile.txt', 'r')
contents = file.read()
# Loop through each ID in the ID column and look for it in the file
for id in df['ID']:
if id in contents:
print(f"found {id}")
Solution
Perhaps df.isin()
(docs here) would be useful? You can then a new column that's a boolean mask of whether each ID is in the file.
df["id_found"] = df['ID'].isin(contents)
EDIT: Given that contents
is a string, you can instead use a lambda function in conjunction with df.apply
like this:
df["id_found"] = df.apply(lambda df: df['ID'] in contents, axis=1)
You can then view the IDs that are found by applying the mask
print(df["ID"][df["id_found"]])
Answered By - Tom Wagg
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.