Issue
I have a large pandas Data Frame of around 210711 rows.
Currently I am using for loop for generating integral of particular column and plotly for plotting the data.
Following is the code used:
import pandas as pd
import numpy as np
import plotly as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots
DF = pd.read_csv(r"C:\Users\hsr4ban\Desktop\Temp\Temp.csv")
DF_01 = DF[["timestamps", "Data01", "Data02"]]
Integral_List = []
for ind in range(len(DF_01)):
Integral = np.trapz(y = DF_01.loc[:ind, "Data02"], x = DF_01.loc[:ind, "timestamps"])
Integral_List.append(Integral)
DF_01["Integral"] = Integral_List
Fig = make_subplots(rows = 2,
cols = 1,
shared_xaxes = True)
Fig.add_trace(go.Scatter(x = DF_01["timestamps"],
y = DF_01["Data02"],
name = "Data",
mode = "lines"),
row = 1, col = 1)
Fig.add_trace(go.Scatter(x = DF_01["timestamps"],
y = DF_01["Integral"],
name = "Integral",
mode = "lines"),
row = 2, col = 1)
Fig.show()
When above code is run, I get following graph as output:
However, when for loop is used for generating "Integral" column, it is consuming lots of time. Is there any efficient way to do this? Any suggestions would be helpful.
Solution
You can replace the loop with scipy.integrate.cumulative_trapezoid
.
from scipy.integrate import cumulative_trapezoid
DF_01["Integral"] = cumulative_trapezoid(y=DF_01["Data02"],
x=DF_01["timestamps"],
initial=0)
Answered By - Matt Haberland
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.