SQL Server Export/Import excel using R script - Part 4

In continuation to my previous posts on SQL Server Export/Import excel using R script - Part 1 ,Part 2 & Part3 where we saw about exporting to Excel and importing to SQL Server table using "xlsx" package which is a powerful package but it has external dependency on Java ("rJava" package).

In this post, let us see how to import excel with multiple sheets using "readxl" package into different SQL Server tables.


I have done below example using SQL Server 2019 on windows 10 (64 bit) and R (3.4.4).
Refer my previous post on how to install R services

I have also used Wideworldimporters sample database for data setup in this example. 
Before trying to import excel with multiple sheets using "readxl" package into SQL Server tables, I tried to re-create the Excel file that was created using script provided in part 2

I got below error :








Error in .jnew("java/io/FileOutputStream", jFile) : 

  java.io.FileNotFoundException:  (Access is denied)


This is because of Java language extension support in SQL Server 2019, So I had to provide folder access (read, write) to "ALL_APPLICATION_PACKAGES".




Step 1:

Now let us install the required packages for this example:

Open R.exe from below path:
C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\bin

From R.exe, execute install.packages("readxl") 

Step 2:

After successful installation of above package, Copy & paste all the packages downloaded from above step into default R library  to SQL Server R_services library.

From C:\Users\Sathy\Documents\R\win-library\3.4
to C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\R_SERVICES\library

Step 3:

Below script will import Excel file with six sheets into six different table on database called "Staging" that I have created before executing this script

Sheet 1 - CustomerTransactions
Sheet 2 - InvoiceLines
Sheet 3 - Invoices
Sheet 4 - OrderLines
Sheet 5 - Orders
Sheet 6 - SpecialDeals


Please note: In the below script, we need to configure the values for 
sqlConnString variable with database connection details

filepath_var variable with folder path details



DECLARE @RScript NVARCHAR(MAX) =N' 
    library(readxl);
library(foreach);
    filepath_var <- "H:/Temp/Blog/ImportExportExcel/Sales.xlsx"
    tab_names <- excel_sheets(path = filepath_var)
sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=Staging;Uid=sa;Pwd=***"

foreach(n = 1:length(tab_names)) %do%
{
sqlTable <- tab_names[n]
Output <- read_excel(path = filepath_var, sheet = n)
sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)
}'

EXEC   sp_execute_external_script
      @language = N'R'
     ,@script = @RScript



I have updated above script to make sure script doesn't error out if the file does not exist in the path or if any of the sheet is empty. Below is the updated script:

To test the script I have opened the "Sales.xlsx" file and added an empty sheet and saved it.



Please note: In the below script, we need to configure the values for 
sqlConnString variable with database connection details
filepath_var variable with folder path details


DECLARE @RScript NVARCHAR(MAX) =N' 
    library(readxl);
    library(foreach);
    
filepath_var <- "H:/Temp/Blog/ImportExportExcel/Sales.xlsx"
sqlConnString <- "Driver=SQL Server;Server=''LAPTOP-LCUB6HBB''; Database=Staging;Uid=sa;Pwd=***"

#Check file exists in path
if (file.exists(filepath_var)) {

    tab_names <- excel_sheets(path = filepath_var)
    
   foreach(n = 1:length(tab_names)) %do%
   {
  sqlTable <- tab_names[n]
  Output <- read_excel(path = filepath_var, sheet = n)
   
   #check  if the sheet has header 
   if (dim(Output)[2] != 0) {

  sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = sqlTable)
  rxDataStep(inData = Output, outFile = sqlDS,overwrite = TRUE)
  }
  }
  }'

EXEC   sp_execute_external_script
      @language = N'R'
     ,@script = @RScript




dim(Output)[1] -> gives us no.of rows
dim(Output)[2] -> gives us no.of columns

In this example, if there are no columns I am assuming the sheet is empty and not doing any further processing.


More information on this topic, to be continued in part 5 ...

See Also:

No comments: