How to load data from Azure Data Lake Store into Azure SQL Data Warehouse

In my previous post, I wrote about getting started with Azure data lake analytics and data lake store. There I have mentioned how to create Azure data lake analytics and data lake store.

Then I had shared a basic example for U-SQL, the U-SQL output in this post is saved into data lake store folder which I am going to use in this example for loading into Azure SQL data warehouse.

Also have a look at this article for establishing Azure data lake store authentication, which has to be done as mandatory step to access the files in data lake store and load into Azure datawarehouse.

Creating Azure SQL datawarehouse from Azure portal as shown below:





Connect to the created Azure datawarehouse from SSMS, need to enable the firewall to access the server:

Create Master key:

CREATE MASTER KEY


CREATE DATABASE SCOPED CREDENTIAL


CREATE DATABASE SCOPED CREDENTIAL ADLSCredential
WITH IDENTITY = '8a4740d2-80c4-4475-a158-a072025a37d5@https://login.microsoftonline.com/3de8b9d9-c82d-4bd4-94c7-79e4e795ce76/oauth2/token',
SECRET = 'eEF1nvcKzwy4ahNg7tJ80P94gN1Jgx5L545vrEt0wZM=';



CREATE EXTERNAL DATA SOURCE

Get the ADL URI from Azure data lake store -> Overview

CREATE EXTERNAL DATA SOURCE AzureDataLakeStore
WITH (
    TYPE = HADOOP,
    LOCATION = 'adl://msazdlaadls.azuredatalakestore.net',
    CREDENTIAL = ADLSCredential
);





CREATE EXTERNAL FILE FORMAT


CREATE EXTERNAL FILE FORMAT CSVFileFormat
WITH (  
       FORMAT_TYPE = DELIMITEDTEXT,   
       FORMAT_OPTIONS (
         FIELD_TERMINATOR =',',
         STRING_DELIMITER = '"',   
         USE_TYPE_DEFAULT = TRUE
       )
);


CREATE EXTERNAL TABLE



CREATE EXTERNAL TABLE [dbo].[shift_ext_ADLS] (  
        [Name] VARCHAR(20) ,   
        [Id] int 
)  
WITH (LOCATION='/myoutput/shift.csv',   
        DATA_SOURCE = AzureDataLakeStore,  
        FILE_FORMAT = CSVFileFormat 
);  


Query the above created external table (data from data lake store) from datawarehouse


To load the data into datawarehouse from above created external table, execute below CREATE TABLE AS SELECT (CTAS) query



CREATE TABLE [dbo].[Shift]
WITH (DISTRIBUTION = HASH([Id]  ) )
AS
SELECT * FROM [dbo].[shift_ext_ADLS]
OPTION (LABEL = 'CTAS : Load [dbo].[Shift]');




Reference: 


See Also: 


No comments: