Issue
I have data that has some outliers that need to be ignored, but I am struggling to find out how to do this. I need data that is over the value of 500 to be removed/ignored. Below is my code so far:
import pandas as pd
import matplotlib
#convert the files to make sure that only the data needed is selected
INPUT_FILE = 'data.csv'
OUTPUT_FILE = 'machine_data.csv'
PACKET_ID = 'machine'
with open(INPUT_FILE, 'r') as f:
data = f.readlines()
with open(OUTPUT_FILE, 'w') as f:
for datum in data:
if datum.startswith(PACKET_ID):
f.write(datum)
#read the data file
df = pd.read_csv(OUTPUT_FILE, header=None, usecols=[2,10,11,12,13,14])
#plotting the conc
fig,conc = plt.subplots(1,1)
lns1 = conc.plot(df[2],df[11],color="g", label='Concentration')
As you can see, I have selected certain columns that I need, but within [11] I only need the data that is less than 500.
Would anyone be able to advise? Sorry if this has been posted before.
Solution
You just have to filter your dataframe by that column like :
df = df[(df[11] <= 500)]
Your code will then look like this:
import pandas as pd
import matplotlib
#convert the files to make sure that only the data needed is selected
INPUT_FILE = 'data.csv'
OUTPUT_FILE = 'machine_data.csv'
PACKET_ID = 'machine'
with open(INPUT_FILE, 'r') as f:
data = f.readlines()
with open(OUTPUT_FILE, 'w') as f:
for datum in data:
if datum.startswith(PACKET_ID):
f.write(datum)
#read the data file
df = pd.read_csv(OUTPUT_FILE, header=None, usecols=[2,10,11,12,13,14])
# filter your data HERE:
df = df[(df[11] <= 500)]
#plotting the conc
fig,conc = plt.subplots(1,1)
lns1 = conc.plot(df[2],df[11],color="g", label='Concentration')
Answered By - mrCopiCat
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.