Issue
I need to read a .dat file in Python. The file has 3 columns in total and hundreds of rows. The second and third column contain two characters followed by a float that I would like to extract--the second column always starts with "SA"
and the third column always starts with "SC"
. I am currently loading in the data and looping through each row to extract the values, but is there a better way to do it?
Once the data is cleaned, I want to perform some calculations on the result, namely computing the average.
Here is an example of two lines from the .dat file:
9:01:15 SA7.998 SC7.968
9:01:16 SA7.998 SC7.968
Here is the code I am currently using.
import numpy as np
import os.path
from statistics import mean
time=[]
s_1=[]
s_2=[]
s1=[]
s2=[]
r1=[]
r2=[]
avgg=[]
# Reading data from file
with open('serial_2.dat','r') as f:
dat=f.readlines()
for i in dat:
y=i.split() # cleaning and getting columns without spaces
time.append(y[0])
s1.append(y[1])
s2.append(y[2])
#getting only numbers without strings (SA and SC)
for counter in (range(0,len(s1))):
S_1=s1[counter]
r1.append(S_1[2:])
r1_f=np.array(r1, dtype='float32')
S_2=s2[counter]
r2.append(S_2[2:])
r2_f=np.array(r2, dtype='float32')
avgg=r1_f+r2_f/2
print(np.mean(avgg))
Solution
You can use pandas
to do that:
#! pip install pandas
import pandas as pd
import numpy as np
df = pd.read_csv('serial_2.dat', sep='\s+', header=None, names=['time', 's1', 's2'])
df['s1'] = df['s1'].str.extract('^[\D]+(.*)').astype(float)
df['s2'] = df['s2'].str.extract('^[\D]+(.*)').astype(float)
Output:
>>> df
time s1 s2
0 9:01:15 7.998 7.968
1 9:01:16 7.998 7.968
>>> df.dtypes
time object
s1 float64
s2 float64
dtype: object
If you have always 2 characters in s1 and s2 columns, you can avoid regex and strip the first two characters:
df['s1'] = df['s1'].str[2:].astype(float)
df['s2'] = df['s2'].str[2:].astype(float)
Then compute the global average:
# With pandas
>>> df[['s1', 's2']].mean().mean()
7.983
# With numpy
>>> np.mean(df[['s1', 's2']])
7.983
You can also compute the row average:
df['avg'] = df[['s1', 's2']].mean(axis=1)
print(df)
# Output
time s1 s2 avg
0 9:01:15 7.998 7.968 7.983
1 9:01:16 7.998 7.968 7.983
Answered By - Corralien
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.