Issue
I have a python 3.7 (spyder) script, which collects data from a given .xlsx files and uses this data to create a cubic spine function.
After this step the script is going through all the files in a given directory and makes some calculations/adjustments to the source files (using the cubic spline function) and finally saves the new files as new files.
I tried to export the script so that it can be run on another computer (I used the "Auto Py to Exe"), which seemed to work great however:
- The .exe file is super big (300mb+)
- It is super super slow
What am I doing wrong here? Since it's literally a few lines of code that file shouldn't be that big, plus it really should run in like 1-2 seconds.
These are the imported modules:
import numpy as np
import pandas as pd
from scipy import interpolate
from scipy.interpolate import CubicSpline
import os
Here's is the full code
import numpy as np
import pandas as pd
from scipy import interpolate
from scipy.interpolate import CubicSpline
import os
OESbaseline = pd.read_excel('OES_CubicSplineBaseline.xlsx')
x_baseline = OESbaseline['pre-CS']
y_baseline = OESbaseline['sample_known_ppm_with_flux']
cs = CubicSpline(x_baseline, y_baseline)
tck = interpolate.splrep(x_baseline, y_baseline)
def f(x_baseline):
return interpolate.splev(x_baseline, tck)
basepath = "some_path"
for filename in os.listdir(basepath):
file = os.path.join(basepath, filename)
if os.path.isfile(file):
OESrun = pd.read_csv(file, skiprows=2)
CorrectedData = f(OESrun['Concentration'])
CorrectedData[CorrectedData < 0] = 0
CorrectedDF = pd.DataFrame({'SampleID': OESrun['Label'],
'Recvd Wt. (kgs)': np.nan,
'AQR (ppm)': np.around(CorrectedData,3),
'Grav (ppm)': np.nan,
'FA Notes': np.nan,
'AQR (R1) (ppm)': np.nan,
'Grav (R1) (ppm)': np.nan,
'Assay Wt.(R1) (gr)': np.nan,
'AQR (R2) (ppm)': np.nan,
'Grav (R2) (ppm)': np.nan,
'Assay Wt.(R2) (gr)': np.nan,
'CN (ppm)': np.nan,
'CN R (ppm)': np.nan,
'': np.nan,
'Run Assay Wt.': OESrun['Weight'],
'Grav. (OPT)': np.nan,
# 'OES Conc. (ppm)': np.around(OESrun['Concentration'],3),
# 'CS Conc. (OPT)': np.around(CorrectedData*(1/34.285),4),
'Recvd Wt. (lbs)': np.nan})
# (OPTIONAL) remove the second (267) wavelength readings
# CorrectedDF_1wave = CorrectedDF.iloc[::2, :]
oldname = os.path.splitext(filename)[0]
oldext = os.path.splitext(filename)[1]
new_filename = str(oldname + '_corrected' + oldext)
# export to Excel
CorrectedDF.to_csv(os.path.join(basepath, new_filename))
# (OPTIONAL) export to CSV (1 wave only)
#CorrectedDF_1wave.to_csv(OESrun_filename+'_Corrected'+'.csv')
Solution
Create your package in a virtual environment and pip install only the packages you need to run your script. This will shrink your .exe files from the +300MB exe to a ~30MB exe with pandas in it that will be much faster to bootup.
my personal goto is virtual env but you can use whatever virtual env packager you like:
step 1 - in your global environment pip install virtualenv
pip install virtualenv
step 2 - create your venv
python -m virtualenv venv
step 3 - activate your venv and pip install your packages
source venv/Scripts/activate
pip install pandas scipy numpy
step 4 - run your script within venv to ensure it works
step 5 - package (now I am not familiar with "py to exe" so I don't know if you can point it to a venv but in case you cannot here is the step with pyinstaller)
pip install pyinstaller
pyinstaller script.py --onefile
Answered By - PydPiper
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.