Monday, April 4, 2022

[FIXED] Using pd.read_table() multiple times on same open file

April 04, 2022 pandas, python-3.x No comments

Issue

I have a data structure of the following form:

**********DATA:0************
name_A  name_B
0.16561919  0.03640960
0.39564838  0.66708115
0.60828075  0.95785214
0.68716186  0.92803331
0.80615505  0.96219926
**********data:0************

**********DATA:1************
name_A  name_B
0.32474381  0.82506909
0.30934914  0.60406956
0.99519513  0.23425607
0.72210821  0.61141751
0.47362605  0.09892009
**********data:1************

**********DATA:2************
name_A  name_B
0.46561919  0.13640960
0.29564838  0.66708115
0.40828075  0.35785214
0.08716186  0.52803331
0.70615505  0.96219926
**********data:2************

I would like to read each block to a seperate pandas dataframe with appropriate header titles. When I use the simple function below, only a single data block is stored in the output list. However, when I comment out the data.append(pd.read_table(file, nrows=5)) line, the function prints all individual headers. The pandas read_table call seems to break out of the loop.

import pandas as pd

def read_data(filename):
    data = []
    with open(filename) as file:
        for line in file:
            if "**********DATA:" in line:
                print(line)
                data.append(pd.read_table(file, nrows=5))
    return data

read_data("data_file.txt")

How should I change the function to read all blocks?

Solution

I suggest a slightly different approach, in which you avoid using read_table and put dataframes in a dict instead of a list, like this:

import pandas as pd

def read_data(filename):
    data = {}
    i = 0
    with open(filename) as file:
        for line in file:
            if "**********DATA:" in line:
                data[i] = []
                continue
            if "**********data:" in line:
                i += 1
                data[i] = []
                continue
            else:
                data[i].append(line.strip("\n").split("  "))
    return {
        f"data_{k}": pd.DataFrame(data=v[1:], columns=v[0])
        for k, v in data.items()
        if v
    }

And so, with the text file you gave as input:

dfs = read_data("data_file.txt")

print(dfs["data_0"])
# Output
       name_A      name_B
0  0.16561919  0.03640960
1  0.39564838  0.66708115
2  0.60828075  0.95785214
3  0.68716186  0.92803331
4  0.80615505  0.96219926

print(dfs["data_1"])
# Output
       name_A      name_B
0  0.32474381  0.82506909
1  0.30934914  0.60406956
2  0.99519513  0.23425607
3  0.72210821  0.61141751
4  0.47362605  0.09892009

print(dfs["data_2"])
# Output
       name_A      name_B
0  0.46561919  0.13640960
1  0.29564838  0.66708115
2  0.40828075  0.35785214
3  0.08716186  0.52803331
4  0.70615505  0.96219926

Answered By - Laurent

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, April 4, 2022

[FIXED] Using pd.read_table() multiple times on same open file

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels