Issue
Given the following (example) dataframe :
import pandas as pd
import pathlib
from pathlib import Path
cwd = Path('Path/to/somewhere')
df = pd.DataFrame(
{
'var1': [0, 5, 10, 15, 20, 25],
'var2': ['A', 'B']*3,
'var3': ['A', 'B']*3,
'path_col': [cwd / 'a.dat', cwd / 'b.dat', cwd / 'c.dat', cwd / 'd.dat', cwd / 'e.dat', cwd / 'f.dat'],
}
)
Each path in path_col
points to a datafile, which I have a function to convert into a dataframe, e.g. :
def open_and_convert_to_df(filepath: pathlib.Path):
# do things
return pd.Dataframe(...)
data_df = pd.DataFrame(
{
'var4': [10, 20, 30],
'var5': [100, 200, 300],
'obs': [1000, 2000, 3000],
}
)
I'd like to generate a data_df from each path in path_col
and merge into df
such that the final df looks like :
var1 var2 var3 var4 var5 obs
0 0 A 1 10 100 1000
1 0 A 1 10 100 2000
2 0 A 1 10 100 3000
3 0 A 1 10 200 1000
4 0 A 1 10 200 2000
5 0 A 1 10 200 3000
6 0 A 1 10 300 1000
...
n-3 25 B 2 30 200 3000
n-2 25 B 2 30 300 1000
n-1 25 B 2 30 300 2000
n 25 B 2 30 300 3000
In other words, variables 1 to 3 of the first df are indexes of the data contained in path_col
.
Inside this data, var 4 and 5 are indexes of obs
. I'm trying to index obs
with all variables from 1 to 5.
The best I've come up with so far is using the .map()
method like so :
df['path_col'] = df['path_col'].map(open_and_convert_to_df)
I end up with the right df's in each path_col
element but I'm lacking the next steps in order to "un-nest" those and obtain the desired df.
Solution
Assuming you want some kind of join
or each row with the output of the function, you could use concat
:
out = df.join(pd.concat({k: open_and_convert_to_df(v)
for k,v in df['path_col'].items()}
).droplevel(1))
Used input:
df = pd.DataFrame(
{
'var1': [0, 5, 10, 15, 20, 25],
'var2': ['A', 'B']*3,
'var3': [1, 2]*3,
'path_col': [cwd / 'a.dat', cwd / 'b.dat', cwd / 'c.dat', cwd / 'd.dat', cwd / 'e.dat', cwd / 'f.dat'],
}
)
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.