In my previous posts, we saw about copying data from Azure blob storage to Azure cosmos DB using Azure data factory copy wizard. In this post, let us see how we can perform the same copy operation by creating JSON definitions for Linked service, Dataset, Pipeline & Activity from Azure portal.
First we need to create Azure data factory from Azure portal:
Click New -> Data + Analytics -> Data Factory
After creating Azure data factory, click on that -> Author and deploy to create JSON definitions for Linked service, Dataset, Pipeline & Activity from Azure portal.
I have created Azure blob storage and Azure Cosmos DB SQL API in my previous posts, which are source and destination for this Azure data factory copy activity example.

Azure data factory -> Author and deploy -> New data store -> Azure storage
Edit Account name & key from below JSON and deploy:
To get the Cosmos DB key:
Azure data factory -> Author and deploy -> New data store -> Azure DocumentDB
Edit accountendpoint, accountkey & database from below JSON and deploy:
Azure data factory -> Author and deploy -> ...More -> New dataset -> Azure blob storage
Edit the file name, folder path from below JSON and deploy:
Key properties to be noted in the above JSON:
Interval of 15 minutes is least we can set for data slicing.
Azure data factory -> Author and deploy -> ...More -> New dataset -> Azure DocumentDB
Edit the Cosmos DB collection name from below JSON and deploy:
Azure data factory -> Author and deploy -> ...More -> New pipeline
Edit the start & End time, copy activity name from below JSON and deploy:
A pipeline is active only between its start time and end time.
It is not executed before the start time or after the end time.
If the pipeline is paused, it does not get executed irrespective of its start and end time.
Once all the JSON definitions are deployed successfully, goto Azure data factory -> Monitor & Manage (we can change the start and end time and click Apply and right-click on pipeline -> resume the pipeline)
Azure data factory -> Diagram
If we double-click the input dataset, we can see the data slicing details.
First we need to create Azure data factory from Azure portal:
Click New -> Data + Analytics -> Data Factory
After creating Azure data factory, click on that -> Author and deploy to create JSON definitions for Linked service, Dataset, Pipeline & Activity from Azure portal.
I have created Azure blob storage and Azure Cosmos DB SQL API in my previous posts, which are source and destination for this Azure data factory copy activity example.
Step 1: Create & deploy Linked services
To get the key for Azure blob storage, we can get easily from Storage explorer (right-click on storage account -> Copy primary key)Azure data factory -> Author and deploy -> New data store -> Azure storage
Edit Account name & key from below JSON and deploy:
{ "name": "AzureStorageLinkedService", "properties": { "description": "", "hubName": "azdatafacv1_hub", "type": "AzureStorage", "typeProperties": { "connectionString": "DefaultEndpointsProtocol=https;AccountName=azblobstore;AccountKey=**********" } } }
Azure data factory -> Author and deploy -> New data store -> Azure DocumentDB
Edit accountendpoint, accountkey & database from below JSON and deploy:
{ "name": "DocumentDbLinkedService", "properties": { "hubName": "azdatafacv1_hub", "type": "DocumentDb", "typeProperties": { "connectionString": "accountendpoint=https://azcosmosdbsqlapi.documents.azure.com:443/;accountkey=**********;database=SQLDocDB" } } }
Step 2: Create & deploy Datasets
In the below dataset, we are not going to define the structure of data or mapping, as it is as-is copy of JSON document.Azure data factory -> Author and deploy -> ...More -> New dataset -> Azure blob storage
Edit the file name, folder path from below JSON and deploy:
{ "name": "AzureBlobDataset", "properties": { "published": false, "type": "AzureBlob", "linkedServiceName": "AzureStorageLinkedService", "typeProperties": { "fileName": "Doc3.Json", "folderPath": "azblobcontainer", "format": { "type": "JsonFormat" } }, "availability": { "frequency": "Minute", "interval": 15 }, "external": true } }
Key properties to be noted in the above JSON:
external | Boolean flag to specify whether a dataset is explicitly produced by a data factory pipeline or not. If the input dataset for an activity is not produced by the current pipeline, set this flag to true. |
availability | Defines the processing window (for example, hourly or daily) or the slicing model for the dataset production. Each unit of data consumed and produced by an activity run is called a data slice. |
Interval of 15 minutes is least we can set for data slicing.
Azure data factory -> Author and deploy -> ...More -> New dataset -> Azure DocumentDB
Edit the Cosmos DB collection name from below JSON and deploy:
{ "name": "DocumentDbTable", "properties": { "published": false, "type": "DocumentDbCollection", "linkedServiceName": "DocumentDbLinkedService", "typeProperties": { "collectionName": "JsonDocs" }, "availability": { "frequency": "Minute", "interval": 15 }, "external": false } }
Step 3: Create & deploy Pipeline & Activity
Azure data factory -> Author and deploy -> ...More -> New pipeline
Edit the start & End time, copy activity name from below JSON and deploy:
{ "name": "AzureBlobtoCosmos", "properties": { "description": "Copy JSON file from Azure blob to Azure Cosmos document DB", "activities": [ { "type": "Copy", "typeProperties": { "source": { "type": "BlobSource" }, "sink": { "type": "DocumentDbCollectionSink", "writeBatchSize": 0, "writeBatchTimeout": "00:00:00" } }, "inputs": [ { "name": "AzureBlobDataset" } ], "outputs": [ { "name": "DocumentDbTable" } ], "name": "Activity-Blob-Doc3_Json->JsonDocs" } ], "start": "2017-12-28T11:00:00.00000Z", "end": "2017-12-29T11:00:00.00000Z", "isPaused": false, "hubName": "azdatafacv1_hub" } }
A pipeline is active only between its start time and end time.
It is not executed before the start time or after the end time.
If the pipeline is paused, it does not get executed irrespective of its start and end time.
Once all the JSON definitions are deployed successfully, goto Azure data factory -> Monitor & Manage (we can change the start and end time and click Apply and right-click on pipeline -> resume the pipeline)
Azure data factory -> Diagram
If we double-click the input dataset, we can see the data slicing details.
No comments:
Post a Comment