Tuesday, January 9, 2024

[FIXED] Reshape a stacked style data file into a dataframe using pandas

January 09, 2024 csv, dataframe, pandas, python, reshape No comments

Issue

I have a csv input file with the below format and i'm looking for a fairly easy way to convert to normal shape dataframe in pandas. the csv data file has all data stacked up into two columns with each data block separated by an empty row like below. note for the ease of explanation, i made the timestamp values the same for the three blockset, but in reality they can be different:

Trace Name,SignalName1
Signal,<signal info>
Timestamp,Value
2023-10-04 15:36:43.757193 EDT,13
2023-10-04 15:36:43.829083 EDT,14
2023-10-04 15:36:43.895651 EDT,17
2023-10-04 15:36:43.931145 EDT,11
,
Trace Name,SignalName2
Signal,<signal info>
Timestamp,Value
2023-10-04 15:36:43.757193 EDT,131
2023-10-04 15:36:43.829083 EDT,238
2023-10-04 15:36:43.895651 EDT,413
2023-10-04 15:36:43.931145 EDT,689
,
Trace Name,SignalName3
Signal,<signal info>
Timestamp,Value
2023-10-04 15:36:43.757193 EDT,9867
2023-10-04 15:36:43.829083 EDT,1257
2023-10-04 15:36:43.895651 EDT,5736
2023-10-04 15:36:43.931145 EDT,4935

the desired output after reshaping should look like the following:

    Timestamp           SignalName1 SignalName2 SignalName3
0   10/4/2023 15:36:43  13          131         9867
1   10/4/2023 15:36:43  14          238         1257
2   10/4/2023 15:36:43  17          413         5736
3   10/4/2023 15:36:43  11          689         4935

Solution

You could split the file using the multi newlines, then read_csv on each chunk and concat:

import re
import io

with open('csv_file.csv') as f:
    out = (pd.concat([pd.read_csv(io.StringIO(chunk),
                                  header=0, skiprows=[1,2])
                        .set_index('Trace Name')
                      for chunk in re.split('(?:\n,)+\n', f.read())
                      if chunk], axis=1)
             .rename_axis('Timestamp').reset_index()
          )

Assumptions (can be tweaked if needed):

there are 3 rows in each header
the first row of each header is "Trace Name" then the name to be used as column

Output:

             Timestamp  SignalName1  SignalName2  SignalName3
0  2023-10-04 15:36:43           13          131         9867
1  2023-10-04 15:36:43           14          238         1257
2  2023-10-04 15:36:43           17          413         5736
3  2023-10-04 15:36:43           11          689         4935

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, January 9, 2024

[FIXED] Reshape a stacked style data file into a dataframe using pandas

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels