Issue
I have a csv input file with the below format and i'm looking for a fairly easy way to convert to normal shape dataframe in pandas. the csv data file has all data stacked up into two columns with each data block separated by an empty row like below. note for the ease of explanation, i made the timestamp values the same for the three blockset, but in reality they can be different:
Trace Name,SignalName1
Signal,<signal info>
Timestamp,Value
2023-10-04 15:36:43.757193 EDT,13
2023-10-04 15:36:43.829083 EDT,14
2023-10-04 15:36:43.895651 EDT,17
2023-10-04 15:36:43.931145 EDT,11
,
Trace Name,SignalName2
Signal,<signal info>
Timestamp,Value
2023-10-04 15:36:43.757193 EDT,131
2023-10-04 15:36:43.829083 EDT,238
2023-10-04 15:36:43.895651 EDT,413
2023-10-04 15:36:43.931145 EDT,689
,
Trace Name,SignalName3
Signal,<signal info>
Timestamp,Value
2023-10-04 15:36:43.757193 EDT,9867
2023-10-04 15:36:43.829083 EDT,1257
2023-10-04 15:36:43.895651 EDT,5736
2023-10-04 15:36:43.931145 EDT,4935
the desired output after reshaping should look like the following:
Timestamp SignalName1 SignalName2 SignalName3
0 10/4/2023 15:36:43 13 131 9867
1 10/4/2023 15:36:43 14 238 1257
2 10/4/2023 15:36:43 17 413 5736
3 10/4/2023 15:36:43 11 689 4935
Solution
You could split the file using the multi newlines, then read_csv
on each chunk and concat
:
import re
import io
with open('csv_file.csv') as f:
out = (pd.concat([pd.read_csv(io.StringIO(chunk),
header=0, skiprows=[1,2])
.set_index('Trace Name')
for chunk in re.split('(?:\n,)+\n', f.read())
if chunk], axis=1)
.rename_axis('Timestamp').reset_index()
)
Assumptions (can be tweaked if needed):
- there are 3 rows in each header
- the first row of each header is "Trace Name" then the name to be used as column
Output:
Timestamp SignalName1 SignalName2 SignalName3
0 2023-10-04 15:36:43 13 131 9867
1 2023-10-04 15:36:43 14 238 1257
2 2023-10-04 15:36:43 17 413 5736
3 2023-10-04 15:36:43 11 689 4935
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.