I have a python 3.7 (spyder) script, which collects data from a given .xlsx files and uses this data to create a cubic spine function.
After this step the script is going through all the files in a given directory and makes some calculations/adjustments to the source files (using the cubic spline function) and finally saves the new files as new files.
I tried to export the script so that it can be run on another computer (I used the "Auto Py to Exe"), which seemed to work great however:
- The .exe file is super big (300mb+)
- It is super super slow
What am I doing wrong here? Since it's literally a few lines of code that file shouldn't be that big, plus it really should run in like 1-2 seconds.
These are the imported modules:
import numpy as np
import pandas as pd
from scipy import interpolate
from scipy.interpolate import CubicSpline
import os
Here's is the full code
OESbaseline = pd.read_excel('OES_CubicSplineBaseline.xlsx')
x_baseline = OESbaseline['pre-CS']
y_baseline = OESbaseline['sample_known_ppm_with_flux']
cs = CubicSpline(x_baseline, y_baseline)
tck = interpolate.splrep(x_baseline, y_baseline)
def f(x_baseline):
return interpolate.splev(x_baseline, tck)
basepath = "some_path"
for filename in os.listdir(basepath):
file = os.path.join(basepath, filename)
if os.path.isfile(file):
OESrun = pd.read_csv(file, skiprows=2)
CorrectedData = f(OESrun['Concentration'])
CorrectedData[CorrectedData < 0] = 0
CorrectedDF = pd.DataFrame({'SampleID': OESrun['Label'],
'Recvd Wt. (kgs)': np.nan,
'AQR (ppm)': np.around(CorrectedData,3),
'Grav (ppm)': np.nan,
'FA Notes': np.nan,
'AQR (R1) (ppm)': np.nan,
'Grav (R1) (ppm)': np.nan,
'Assay Wt.(R1) (gr)': np.nan,
'AQR (R2) (ppm)': np.nan,
'Grav (R2) (ppm)': np.nan,
'Assay Wt.(R2) (gr)': np.nan,
'CN (ppm)': np.nan,
'CN R (ppm)': np.nan,
'': np.nan,
'Run Assay Wt.': OESrun['Weight'],
'Grav. (OPT)': np.nan,
# 'OES Conc. (ppm)': np.around(OESrun['Concentration'],3),
# 'CS Conc. (OPT)': np.around(CorrectedData*(1/34.285),4),
'Recvd Wt. (lbs)': np.nan})
# (OPTIONAL) remove the second (267) wavelength readings
# CorrectedDF_1wave = CorrectedDF.iloc[::2, :]
oldname = os.path.splitext(filename)[0]
oldext = os.path.splitext(filename)[1]
new_filename = str(oldname + '_corrected' + oldext)
# export to Excel
CorrectedDF.to_csv(os.path.join(basepath, new_filename))
# (OPTIONAL) export to CSV (1 wave only)
Create your package in a virtual environment and pip install only the packages you need to run your script. This will shrink your .exe files from the +300MB exe to a ~30MB exe with pandas in it that will be much faster to bootup.
my personal goto is virtual env but you can use whatever virtual env packager you like:
step 1 - in your global environment pip install virtualenv
pip install virtualenv
step 2 - create your venv
python -m virtualenv venv
step 3 - activate your venv and pip install your packages
source venv/Scripts/activate
pip install pandas scipy numpy
step 4 - run your script within venv to ensure it works
step 5 - package (now I am not familiar with "py to exe" so I don't know if you can point it to a venv but in case you cannot here is the step with pyinstaller)
pip install pyinstaller
pyinstaller --onefile
