Issue
We need to create several SQL reports, each of which depends upon the results of the previous report. All queries are in the following form: Select 1, Report 1, Select 2 from Select 1, Report 2, Select 3 from Select 2, Report 3, ...and so on to... Report N.
Right now "N" separate SQL queries are used to generate the complete set of reports. Every query contains the SQL code for all previous queries - which results in Report 1 being done "N" times, Report 2 being done "N - 1" times, etc. The resulting mindless repetition of the same select statements in multiple reports is causing performance issues.
How do we export the intermediate reports, ideally as .csv files, as they are generated by report "N", thereby eliminating the need for all other reports?
Some complicating factors that are specific to our case:
- A turnkey approach must be used, making any approach that uses any form of manual intervention unacceptable. Plain text output is required.
- OBDC (specifically Python's pyobdc package) is used to pass the SQL queries to the SQL Server (as a cursor). Python uses the results to create the report as a .csv file.
- Stored SQL procedures are prohibited but the existing SQL code does use temporary tables.
- Results must be written to my (client) machine. Our IM department might allow us to use a temporary folder on their server.
- It seems pyobdc can only accept one result set (hence the need for "N" queries). Ensuring the proper result set gets passed back requires that the SQL query begin with "SET NOCOUNT ON" (as advised by the pyobdc mailing list). I know of no other way to return/select the right result set from the multiple result sets returned by SQL. I have tried a pyobdc method (cursor.nextset) to skip over result sets but nothing was returned.
- I considered passing the results of report "n - 1" into report "N" - but the quantity of data involved probably makes this impractical.
- The Python (3.2.2) & SQL code is well proven, production code. Changing languages is not a practical option. Changing OBDC packages is possible but unlikely (a very strong case would have to be made and the other OBDC package must be easily portable across platforms and also be able to connect to Microsoft SQL Server 2008 Management Studio).
- Eclipse (Helios) with the pydev plugin is being used to launch the Python application that launches the SQL queries.
- The client O/S is XP Pro Sp 3, the server is believed to be the same. The company is planning to migrate our machines to Windows 7/8 "sometime soon" so portability to that O/S is a factor.
Solution
The following is a complete solution to my own question (which would have been better entitled as "Need to export MULTIPLE results from a SQL query script").
Also see this question that uses the same general approach except it uses a hardcoded SQL string - and fetchall() instead of fetchone() -- also that question
A closer look at the many lines of (inherited) SQL code showed that the intermediate reports were not being saved. Only the results of the last SQL select/report were being returned (to python).
The SQL side of the solution consists of creating of new tables to hold the result sets (reports), then returning all result sets at the very end of the SQL code.
The SQL code now looks like:
SET NOCOUNT ON -- required by pyobdc (and other OBDC packages?) at start of code
SET @year = ? -- get OBDC (python) parameter 1
SET @month = ? -- get parameter 2
SET @day = ? -- get parameter 3
DECLARE @ReportX TABLE -- a new table, one of these for each report
-- Repeated for each report (1 to N):
INSERT INTO @ReportX -- NEW LINE, it preserves the report
SELECT ..... -- the original report, uses the passed-in parameters
-- At the very bottom of the SQL code, add one of these lines for each report:
Select * from @ReportX -- where X is 1 to N
The Python 3.x side of the solution looks like:
import pyodbc # contains cursor.execute, the interface to SQL
import csv # creates csv.writer, used to create the CSV file
Textfile = open( "FileContainingSqlCode", 'r' )
SQL_COMMANDS = Textfile.read(); # Get SQL code for all reports
cursor.execute( SQL_COMMANDS, Year, Month, Day ) # do all reports using 3 parameters
# Create first report from the first result set
ReportID = 1
filename = "D:\\Report" + str( ReportID ) + ".csv"
OutFile = open( filename, 'w', newline= '' )
Results = csv.writer( OutFile, delimiter = ',', quotechar = '"',
quoting = csv.QUOTE_MINIMAL )
while True:
row = cursor.fetchone()
if not row:
break
Results.writerow( row )
OutFile.close()
# Create the 2nd through Nth reports
while ( cursor.nextset() ) :
ReportID += 1
filename = "D:\\Report" + str( ReportID ) + ".csv"
OutFile = open( filename, 'w', newline= '' )
Results = csv.writer( OutFile, delimiter = ',', quotechar = '"',
quoting = csv.QUOTE_MINIMAL )
while True:
row = cursor.fetchone()
if not row:
break
Results.writerow( row )
OutFile.close()
# end of Python 3.x code
Eliminating the first report section should be possible by using something along the lines of: while ( ?? in cursor.nextset() ) in the second while loop.
Answered By - user1459519
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.