Issue
I have a data structure of the following form:
**********DATA:0************
name_A name_B
0.16561919 0.03640960
0.39564838 0.66708115
0.60828075 0.95785214
0.68716186 0.92803331
0.80615505 0.96219926
**********data:0************
**********DATA:1************
name_A name_B
0.32474381 0.82506909
0.30934914 0.60406956
0.99519513 0.23425607
0.72210821 0.61141751
0.47362605 0.09892009
**********data:1************
**********DATA:2************
name_A name_B
0.46561919 0.13640960
0.29564838 0.66708115
0.40828075 0.35785214
0.08716186 0.52803331
0.70615505 0.96219926
**********data:2************
I would like to read each block to a seperate pandas dataframe with appropriate header titles. When I use the simple function below, only a single data block is stored in the output list. However, when I comment out the data.append(pd.read_table(file, nrows=5))
line, the function prints all individual headers. The pandas read_table call seems to break out of the loop.
import pandas as pd
def read_data(filename):
data = []
with open(filename) as file:
for line in file:
if "**********DATA:" in line:
print(line)
data.append(pd.read_table(file, nrows=5))
return data
read_data("data_file.txt")
How should I change the function to read all blocks?
Solution
I suggest a slightly different approach, in which you avoid using read_table
and put dataframes in a dict instead of a list, like this:
import pandas as pd
def read_data(filename):
data = {}
i = 0
with open(filename) as file:
for line in file:
if "**********DATA:" in line:
data[i] = []
continue
if "**********data:" in line:
i += 1
data[i] = []
continue
else:
data[i].append(line.strip("\n").split(" "))
return {
f"data_{k}": pd.DataFrame(data=v[1:], columns=v[0])
for k, v in data.items()
if v
}
And so, with the text file you gave as input:
dfs = read_data("data_file.txt")
print(dfs["data_0"])
# Output
name_A name_B
0 0.16561919 0.03640960
1 0.39564838 0.66708115
2 0.60828075 0.95785214
3 0.68716186 0.92803331
4 0.80615505 0.96219926
print(dfs["data_1"])
# Output
name_A name_B
0 0.32474381 0.82506909
1 0.30934914 0.60406956
2 0.99519513 0.23425607
3 0.72210821 0.61141751
4 0.47362605 0.09892009
print(dfs["data_2"])
# Output
name_A name_B
0 0.46561919 0.13640960
1 0.29564838 0.66708115
2 0.40828075 0.35785214
3 0.08716186 0.52803331
4 0.70615505 0.96219926
Answered By - Laurent
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.