Issue
I'm loading a batch of CSV files into a SQL Server table using Python one row at a time. The files each contain a number of free text fields and erroneous data which I trim and rename before attempting to insert.
In general (about 95% of the time), the code seems to work however exceptions appear with the error message described below.
I'm confused as a) I only have four columns in my table, and can't understand why it would be looking for Parameter 7, and b) the text columns are being loaded into nvarchar(max) formatted columns, so I wouldn't expect a data type error.
I've checked the source files to see which rows threw an error, there seems to be no discernible difference between the problem rows and others that are successfully loaded.
I've trimmed the process right back to only insert the JobID (as a bigint) and it works without issue, but as soon as I bring in the text fields, it causes an error.
I'm using Python 3.7.0 and loading into SQL Server 14.0
import numpy as np
import pyodbc
import os
import glob
import pandas as pd
import csv
import config
import urllib
import shutil
import codecs
path = "C:\\myFilePath"
allFiles = glob.glob(os.path.join(path, "*.csv"))
for file_ in allFiles:
df = pd.concat((pd.read_csv(f, encoding='utf8') for f in allFiles))
cnxn = pyodbc.connect("Driver={ODBC Driver 13 for SQL Server};"
"Server=myServer;"
"Database=myDatabase;"
"Trusted_Connection=yes;"
"SelectMethod=cursor;"
)
df2 = df[['JobID', 'NPS_score', 'Obtuse_Column_Name_1', 'Obtuse_Column_Name_2']].copy()
df2.columns = ['JobID', 'Score','Q1', 'Q2']
cursor = cnxn.cursor()
for index,row in df2.iterrows():
try:
counter = counter + 1
cursor.execute("""insert into [myDB].[dbo].[input_test]( [JobID], [Score], [Q1], [Q2]) VALUES (?, ?, ?, ?)""", row['JobID'],row['Score'],row['Q1'], row['Q2'])
cursor.commit()
print(counter)
except Exception as e:
print(e)
continue
cursor.close()
cnxn.close()
I expect the data to be loaded but on some lines get the following error code:
('42000', '[42000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 7 (""): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision. (8023) (SQLExecDirectW)')
Solution
I have found and solved the issue. Following @TomRon's advise, I printed the insert statement into the error block. This showed that despite me assuming Q2 was populated (as it was in the CSV) when I loaded it into a dataframe, for some reason it was NaN.
I simply included the function to replace NaN with 0 and it now loads all rows succesfully using the following line of code right after I create the dataframe I want to load (df2)
df2 = df2.fillna(value=0)
Answered By - Ron Sibayan
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.