SQL Server Import/Export to Excel using Python script

Introduction

In previous article, we saw how to import/export excel to/from SQL Server by executing R script within T-SQL
In this post, let us see another similar approach to import excel into SQL Server and export SQL server data to excel by executing Python script within T-SQL.

There are some existing methods to do this using BCP, Bulk Insert, Import & Export wizard from SSMS, SSIS, Azure data factory, Linked server & OPENROWSET query and SQLCMD.

BULK INSERT statement, the BCP tool, or Azure Data Factory can't read Excel files directly   
BCP - Workaround has to be done to include the header while exporting
SSIS - Though it supports exporting to excel, with dynamic source & destination, handling mapping between source to target increases the complexity of the package
SQLCMD - Cannot export output in Excel file format


R & Python language extension was introduced in SQL Server 2016 & 2017 as part of machine learning. With support of R in Azure SQL database   and Java language extension support in SQL Server 2019 , this new approach can be used extensively as it easy, fast and flexible.




Pre-requisites


Below scripts will work starting from SQL Server 2017 and above (as execution of Python language using T-SQL was introduced in SQL Server 2017).


Provide folder permission to access the excel files during import / export process, right-click on folder -> Properties -> Security -> (Full Control) to "ALL_APPLICATION_PACKAGES".



pandas.DataFrame.to_excel & pandas.read_excel are used to export and import excel which are installed by default


We can check that by running below statement in SSMS:

EXECUTE sp_execute_external_script
@language =N'Python',
@script=N'import pip
for i in pip.get_installed_distributions():
    print(i)';

GO


Exporting SQL Server data (list of tables, views) into Excel file



For this approach, we have created a Stored procedure named "usp_ExportExcel" in "WideWorldImporters" database. Executing the Stored procedure based on input parameters exports SQL Server data to excel files

Stored procedure  -  Has below three input parameters and writes the output to excel file


Parameter Description
@ExportPath  Path for exporting excel files 
@SchemaName List of objects under this schema to be exported. Can have multiple values separated by comma 
@ObjectlisttoExport List of tables, views to be exported. Can have multiple values separated by comma


CREATE OR ALTER PROC usp_ExportExcel (@ExportPath NVARCHAR(MAX),
                                      @SchemaName NVARCHAR(MAX),
                                      @ObjectlisttoExport NVARCHAR(MAX)
                                      )
AS
BEGIN
SET NOCOUNT ON;
BEGIN TRY
  
IF ISNULL(@ExportPath,'') <> ''
 BEGIN
 SELECT @ExportPath = CASE WHEN RIGHT(@ExportPath,1) = '\' THEN @ExportPath ELSE CONCAT(@ExportPath,'\') END
 DECLARE @ValidPath TABLE (ValidPathCheck BIT)
INSERT @ValidPath
EXEC sp_execute_external_script
@language =N'Python',
@script=N'
import pandas as pd
d = os.path.isdir(ExportFilePath)
OutputDataSet = pd.DataFrame(d,columns=["Filename"])'
,@params = N'@ExportFilePath NVARCHAR(MAX)'
,@ExportFilePath = @ExportPath
        IF (SELECT ValidPathCheck FROM @ValidPath) = 1
BEGIN
  
 IF ISNULL(@SchemaName,'') <> ''  OR ISNULL(@ObjectlisttoExport,'') <> ''
 BEGIN
    
DROP TABLE IF EXISTS #ExportTablesList, #FinalExportList
  
CREATE TABLE #ExportTablesList(TableCount INT IDENTITY(1,1),Cols NVARCHAR(MAX),TableName NVARCHAR(200))
--Get the list of objects to be exported
INSERT #ExportTablesList (Cols,TableName)
SELECT  CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp','decimal','bit','int','bigint')
               THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME))
          ELSE C.name END Cols  -- To cover poor data type conversions b/n Python & SQL Server
  ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
WHERE Schema_name(T.schema_id) IN (SELECT value FROM STRING_SPLIT(@SchemaName, ','))
-- Ignore the datatypes that are not required to be exported
AND TY.name NOT IN ('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant') 
  
INSERT #ExportTablesList (Cols,TableName)
SELECT  CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp','decimal','bit','int','bigint')
               THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME))
          ELSE C.name END Cols  -- To cover poor data type conversions b/n Python & SQL Server
  ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
WHERE CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) IN (SELECT value FROM STRING_SPLIT(@ObjectlisttoExport, ','))
-- Ignore the datatypes that are not required to be exported
AND TY.name NOT IN ('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant') 
  
--Dedup of object list
;WITH dedup
AS
(
SELECT *,ROW_NUMBER()OVER(PARTITION BY TableName,Cols ORDER BY Cols) Rn FROM #ExportTablesList
)
DELETE FROM dedup
WHERE Rn > 1
--Forming columns list as comma separated
SELECT TableName,IDENTITY(INT,1,1) AS TableCount
    , STUFF(
        (
        SELECT ', ' + C.Cols
        From #ExportTablesList As C
        WHERE C.TableName = T.TableName
        FOR XML PATH('')
        ), 1, 2, '') AS Cols
INTO #FinalExportList
From #ExportTablesList As T
GROUP BY TableName
  
  
DECLARE @I INT = 1
       ,@TableName NVARCHAR(200)
       ,@SQL NVARCHAR(MAX) = N''
       ,@PythonScript NVARCHAR(MAX) = N''
       ,@ExportFilePath NVARCHAR(MAX) = N''
--Loop through the object list to export as excel   
WHILE @I <= (SELECT COUNT(TableName) FROM #FinalExportList)
BEGIN
   
 -- Just for testing purpose top 10 records are selected
SELECT @SQL = CONCAT('SELECT TOP 10 ',Cols,' FROM ',TableName,';')
      ,@TableName = TableName
FROM #FinalExportList WHERE TableCount = @I
  
   
SET @PythonScript = N'
FullFilePath = ExcelFilePath+TableName+".xlsx"
InputDataSet.to_excel(FullFilePath,sheet_name=TableName.split(".")[-1],index=False)'
   
EXEC   sp_execute_external_script
      @language = N'Python'
     ,@script = @PythonScript
     ,@input_data_1 = @SQL
     ,@params = N'@ExcelFilePath NVARCHAR(MAX), @TableName NVARCHAR(200)'
     ,@ExcelFilePath = @ExportPath -- file path where Excel files are placed
     ,@TableName = @TableName
  
   
SET @I = @I + 1
END
  
 END ELSE PRINT 'Schema name of objects or list of objects (separated by comma) to be exported need to be mentioned'
 END ELSE PRINT 'Invalid folder path'
 END ELSE PRINT 'Export folder path need to be mentioned'
 END TRY
BEGIN CATCH
PRINT 'Issue while executing this SP, please check whether there is permission to execute the script / to access the folder and input params are valid'
END CATCH
  
END




Importing Excel file into SQL Server


For this approach, we have created a Stored procedure named "usp_ImportExcel" in "WideWorldImporters" database. Executing the Stored procedure based on input parameters imports excel files to SQL Server table


Stored procedure - Has below six input parameters 


Parameter Description
 @ImportPath Path where excel files are placed for importing into SQL Server 
 @DBConnectionString Target SQL Server database connection string where files are imported. Can be
Database=DB name; Trusted_Connection=True
or
Database=DB name;Uid= user name;Pwd=Password
 @ImportAll If set to 1 then all files in the mentioned path are imported. If set to 0 then only mentioned files are imported
 @CombineTarget Flag to decide single target table for each source file (files with same structure) or separate target table for each source file 
 @ExcelFileName If @ImportAll =  0 then excel file name needs to be passed to this parameter
 @ExcelSheetName If @ImportAll =  0 then corresponding sheet name of the excel file needs to be passed to this parameter 

CREATE OR ALTER PROC usp_ImportExcel (@ImportPath NVARCHAR(MAX),
                                      @DBConnectionString NVARCHAR(MAX),
                                      @ImportAll BIT,
                      @CombineTarget BIT,
                                      @ExcelFileName NVARCHAR(200),
                                      @ExcelSheetName NVARCHAR(50)
                                      )
AS
BEGIN
SET NOCOUNT ON;
BEGIN TRY
  
IF ISNULL(@ImportPath,'') <> '' AND ISNULL(@DBConnectionString,'') <> ''
 BEGIN
 SELECT @ImportPath = CASE WHEN RIGHT(@ImportPath,1) = '\' THEN @ImportPath ELSE CONCAT(@ImportPath,'\') END
 DECLARE @Serv NVARCHAR(200) = CONCAT(CHAR(39),CHAR(39),@@SERVERNAME,CHAR(39),CHAR(39))
  DECLARE @ValidPath TABLE (ValidPathCheck BIT)
INSERT @ValidPath
EXEC sp_execute_external_script
@language =N'Python',
@script=N'
import pandas as pd
d = os.path.isdir(ImportFilePath)
OutputDataSet = pd.DataFrame(d,columns=["Filename"])'
,@params = N'@ImportFilePath NVARCHAR(MAX)'
,@ImportFilePath = @ImportPath
        IF (SELECT ValidPathCheck FROM @ValidPath) = 1
BEGIN
   
  IF (@ImportAll = 0 AND (ISNULL(@ExcelFileName,'') <> '' AND ISNULL(@ExcelSheetName,'') <> ''))
  OR (@ImportAll = 1 AND (ISNULL(@ExcelFileName,'') = '' AND ISNULL(@ExcelSheetName,'') = ''))
  
  BEGIN
        
     
DECLARE @PythonScript NVARCHAR(MAX) =CONCAT('
import pandas as pd
import os
import glob
from revoscalepy import RxSqlServerData, rx_data_step
sqlConnString = "Driver=SQL Server;Server=Serv; ',@DBConnectionString,'"
Filefolderepath = ImportFilePath+"*.xlsx"
if ImportAll ==1 and CombineTarget==0:
    for FullFilePath in glob.glob(Filefolderepath):
       Filename = os.path.basename(FullFilePath).replace(".xlsx","")
       xl = pd.ExcelFile(FullFilePath)
       for sheetname in xl.sheet_names:
          Output = pd.read_excel(FullFilePath, sheetname=sheetname, na_filter=False).astype(str)
       if not Output.empty:
          sqlDS = RxSqlServerData(connection_string = sqlConnString,table = "".join(fl for fl in Filename if fl.isalnum())+"_"+"".join(sh for sh in sheetname if sh.isalnum()))
          rx_data_step(input_data = Output, output_file = sqlDS,overwrite = True)
if ImportAll ==1 and CombineTarget==1:
   df2=pd.DataFrame()
   for FullFilePath in glob.glob(Filefolderepath):
       Filename = os.path.basename(FullFilePath).replace(".xlsx","")
       xl = pd.ExcelFile(FullFilePath)
       for sheetname in xl.sheet_names:
           Output = pd.read_excel(FullFilePath, sheetname=sheetname).columns.astype(str)
           Output = ",".join(list(Output))
           df1 = pd.DataFrame([[Filename,sheetname,FullFilePath,Output]],columns=["Filename","sheetname","FullFilePath","Headers"])
           df2=df2.append(df1,ignore_index=True)
   sqlDS = RxSqlServerData(connection_string = sqlConnString,table = "Tbl_PyImpExp1")
   rx_data_step(input_data = df2, output_file = sqlDS,overwrite = True)
if ImportAll ==0:
   Filename =ImportFilePath+ExcelFileName+".xlsx"
   exists = os.path.isfile(Filename)
   if exists and ExcelSheetName in pd.ExcelFile(Filename).sheet_names:
         Output = pd.read_excel(Filename, sheetname=ExcelSheetName, na_filter=False).astype(str)
         if not Output.empty:
             sqlDS = RxSqlServerData(connection_string = sqlConnString,table = "".join(fl for fl in ExcelFileName if fl.isalnum())+"_"+"".join(sh for sh in ExcelSheetName if sh.isalnum()))
             rx_data_step(input_data = Output, output_file = sqlDS,overwrite = True)
   else:
      print("Invalid Excel file or sheet name")')
  
EXEC   sp_execute_external_script
      @language = N'Python'
     ,@script = @PythonScript
     ,@params = N'@ImportFilePath NVARCHAR(MAX),@ImportAll BIT,@CombineTarget BIT,@ExcelFileName NVARCHAR(200),@ExcelSheetName NVARCHAR(50),@Serv NVARCHAR(200)'
     ,@ImportFilePath = @ImportPath
     ,@ImportAll = @ImportAll
     ,@CombineTarget = @CombineTarget
     ,@ExcelFileName = @ExcelFileName
     ,@ExcelSheetName = @ExcelSheetName
     ,@Serv = @Serv
     IF @ImportAll =1 AND @CombineTarget =1
     BEGIN
        IF OBJECT_ID('Tbl_PyImpExp1') IS NOT NULL
            BEGIN
                    DROP TABLE IF EXISTS Tbl_PyImpExp2
                    ;WITH FileList
                    As(
                    SELECT [Filename]
                          ,[sheetname]
                          ,[Headers]
                          ,[FullFilePath]
                          ,ROW_NUMBER()OVER(ORDER BY (SELECT 1)) Rn
                          ,ROW_NUMBER()OVER(PARTITION BY [Headers] ORDER BY [Headers]) Grp
                          ,DENSE_RANK()OVER(ORDER BY [Headers]) Grp1
                      FROM [dbo].[Tbl_PyImpExp1]
                      )
                      SELECT  *,FIRST_VALUE([Filename]) OVER (PARTITION BY Grp1 ORDER BY Grp ASC) AS TableName
                      INTO Tbl_PyImpExp2
                      FROM FileList
            END
                IF EXISTS (SELECT 1 FROM Tbl_PyImpExp2)
                BEGIN
                     DECLARE @I INT = 1
                            ,@SQL NVARCHAR(MAX) =N''
SET @PythonScript = CONCAT('
import pandas as pd
from revoscalepy import RxSqlServerData, rx_data_step
sqlConnString = "Driver=SQL Server;Server=Serv; ',@DBConnectionString,'"
if ImportAll ==1 and CombineTarget==1:
   FinalImport=pd.DataFrame()
   for index, row in InputDataSet.iterrows():
       Tbl = "".join(T for T in row["TableName"] if T.isalnum())
       Import = pd.read_excel(row["FullFilePath"], sheetname=row["sheetname"], na_filter=False).astype(str)
       Import["ImportKey"] = row["TableName"]+"_"+row["sheetname"]
       FinalImport=FinalImport.append(Import,ignore_index=True)
       if not FinalImport.empty:
          sqlDS = RxSqlServerData(connection_string = sqlConnString,table = Tbl)
          rx_data_step(input_data = FinalImport, output_file = sqlDS,overwrite = True)')
                     WHILE @I <= (SELECT MAX(Grp1) FROM Tbl_PyImpExp2)
                     BEGIN
                     SET @SQL =  CONCAT('SELECT FullFilePath,sheetname,TableName FROM Tbl_PyImpExp2 WHERE Grp1 = ',@I)
                         EXEC   sp_execute_external_script
                          @language = N'Python'
                         ,@script = @PythonScript
                         ,@input_data_1 = @SQL
                         ,@params = N'@ImportAll BIT,@CombineTarget BIT,@Serv NVARCHAR(200)'
                         ,@Serv = @Serv
                         ,@ImportAll = @ImportAll
                         ,@CombineTarget = @CombineTarget
                     SET @I = @I + 1
                     END
                 END
     DROP TABLE IF EXISTS Tbl_PyImpExp1,Tbl_PyImpExp2
     END
      
     END ELSE PRINT 'Invalid parameters: If ImportAll = 0 then pass Excel file & Sheet Name as input. If ImportAll = 1 then pass Excel file & Sheet Name blank'
   
  END ELSE PRINT 'Invalid folder path'
END ELSE PRINT 'Import folder path or database connection string need to be mentioned'
  
 END TRY
BEGIN CATCH
PRINT 'Issue while executing this SP, please check whether there is permission to execute the script / to access the folder and input params are valid'
END CATCH
END




Please see my technet wiki article for detailed explanation on this topic with examples and screenshots and you can also download the scripts and SSRS report from technet gallery site.

We need to be careful with indentations in python script as formatting them without understanding the code can result in error.



No comments: