Issue
I have two data frames. One contains the list of files (let's say this is df1) along with some data, and another one contains the list of files that I want (let's say this is df2). I want to create another df that contains only the wanted files and their data.
Any help is appreciated:))
df1:
df2:
Wanted_df
Solution
Following code modified to deal with modified df1. It's rather clunky with all the working apparent but it works. I have no time to make it sleeker.
import pandas as pd
df1 = pd.DataFrame({'File path': ['<filepath>/File 1', '<Filepath>/File 2', '<Filepath>/File 3','<Filepath>/File 4' , '<Filepath>/File 5'],
'Data about that file': ['apple', 'orange', 'strawberry', 'pineapple', 'starfruit']})
df2 = pd.DataFrame({'List of Wanted Files': ['File 1', 'File 3', 'File 5']})
a = df1['File path'].values
b = df2['List of Wanted Files'].values
mydict = {x: (False,) for x in range(len(a))}
for count, i in enumerate(a):
for j in b:
if j in i:
mydict[count] = (True, j)
info = list(mydict.values())
mask = [x[0] for x in info]
wanted_df = pd.DataFrame([x[1] for x in info if x[0]], columns = ['File Name'])
s = df1['Data about that file'][mask].reset_index(drop = True)
wanted_df['Data about that file'] = s
print(wanted_df)
which produces:
File Name Data about that file
0 File 1 apple
1 File 3 strawberry
2 File 5 starfruit
Answered By - user19077881
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.