Deploy Azure Data Factory Using PowerShell

I recently came across a customer scenario where customer had to create hundreds of azure data factory pipeline performing similar functions which would be used across different business domains.

Creating hundreds of pipelines manually can be a tiresome job. Imagine we had a script which could do this job for us, our life would be much easier! Isn’t it??

So I tried to create few PowerShell scripts which can be used to create or even delete azure data factory pipeline from the cloud. Here we go:

In order to perform any action , we would need to connect to Azure using a Service Principal Credential.

Connect to Azure Account (Entering through Portal)

Connect-AzAccount

Connect to Azure Account using Service Principal

#$password = "<Password>" | ConvertTo-SecureString -asPlainText -Force 
#$username = "<User  Name>" 
#$credential = New-Object ystem.Management.Automation.PSCredential($username,$password) 
#Connect-AzAccount -Credential $credential -ServicePrincipal -Tenant ""

Each user account might have multiple subscriptions, so it is essential to select appropriate subscription

Get Azure Subsciption

Get-AzSubscription

Select Azure Subscription

Select-AzSubscription -SubscriptionId "<Subscription ID>"

Create a Resource group

This is an optional step; you can use existing resource group if you want.

$resourceGroupName = "ADFDemoRG";
$ResGrp = New-AzResourceGroup $resourceGroupName -location 'East US'

Create Azure Data Factory

$dataFactoryName = "ADFactoryDemo";
$DataFactory = Set-AzDataFactoryV2 -ResourceGroupName $ResGrp.ResourceGroupName  -Location $ResGrp.Location -Name $dataFactoryName

To create Azure Storage Account Linked Service for the datasets use the template BlobLinkedService.json

{
"name": "AzureStorageLinkedService",
"properties": {
"annotations": [],
"type": "AzureBlobStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=<Account Name>;AccountKey=<Account Key>;EndpointSuffix=core.windows.net"
}}}

Create Azure Storage Account Linked Service

Set-AzDataFactoryV2LinkedService -DataFactoryName $DataFactory.DataFactoryName    -ResourceGroupName $ResGrp.ResourceGroupName -Name "AzureStorageLinkedService"  -DefinitionFile ".\BlobLinkedService.json"

Create Input dataset using a template InputDataset.json

{
"name": "InputDataset",
"properties": {
"linkedServiceName": {
"referenceName": "AzureStorageLinkedService",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "Binary",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": "emp.txt",
"folderPath": "input",
"container": "adftutorial"
}}}}

Create Input Dataset

Set-AzDataFactoryV2Dataset -DataFactoryName $DataFactory.DataFactoryName -ResourceGroupName $ResGrp.ResourceGroupName -Name "InputDataset"     -DefinitionFile ".\InputDataset.json"

Create Output data set using a template OutputDataset.json

{
"name": "OutputDataset",
"properties": {
"linkedServiceName": {
"referenceName": "AzureStorageLinkedService",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "Binary",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"folderPath": "output",
"container": "adftutorial"
}}}}

Create Output Dataset

Set-AzDataFactoryV2Dataset -DataFactoryName $DataFactory.DataFactoryName     -ResourceGroupName $ResGrp.ResourceGroupName -Name "OutputDataset"  -DefinitionFile ".\OutputDataset.json"

Create Azure Data Factory Pipeline using Adfv2QuickStartPipeline.json template

{
"name": "Adfv2QuickStartPipeline",
"properties": {
"activities": [
{
"name": "CopyFromBlobToBlob",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "BinarySource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true
}},
"sink": {
"type": "BinarySink",
"storeSettings": {
"type": "AzureBlobStorageWriteSettings"
}},
"enableStaging": false
},
"inputs": [
{
"referenceName": "InputDataset",
"type": "DatasetReference"
}],
"outputs": [
{
"referenceName": "OutputDataset",
"type": "DatasetReference"
}]}],
"annotations": []
}}

Create a Pipeline

$DFPipeLine = Set-AzDataFactoryV2Pipeline  -DataFactoryName $DataFactory.DataFactoryName -ResourceGroupName $ResGrp.ResourceGroupName     -Name "Adfv2QuickStartPipeline" -DefinitionFile ".\Adfv2QuickStartPipeline.json"

Create Azure Data Factory Pipeline Run

$RunId = Invoke-AzDataFactoryV2Pipeline -DataFactoryName $DataFactory.DataFactoryName  -ResourceGroupName $ResGrp.ResourceGroupName   -PipelineName $DFPipeLine.Name

Monitor the pipeline run for Success or Failure

while ($True) {    $Run = Get-AzDataFactoryV2PipelineRun -ResourceGroupName $ResGrp.ResourceGroupName -DataFactoryName $DataFactory.DataFactoryName         -PipelineRunId $RunId
    if ($Run) {   if ($run.Status -ne 'InProgress') {  Write-Output ("Pipeline run finished. The status is: " +  $Run.Status) ;  $Run ;   break        }   }     Start-Sleep -Seconds 10 }

Remove Data Factory Pipeline

Remove-AzDataFactoryV2Pipeline -DataFactoryName $DataFactory.DataFactoryName –ResourceGroupName $ResGrp.ResourceGroupName -Name "Adfv2QuickStartPipeline" -Force

Remove Output dataset

Remove-AzDataFactoryV2Dataset -DataFactoryName $DataFactory.DataFactoryName -ResourceGroupName $ResGrp.ResourceGroupName -Name "OutputDataset" -Force

Remove Input dataset

Remove-AzDataFactoryV2Dataset -DataFactoryName $DataFactory.DataFactoryName -ResourceGroupName $ResGrp.ResourceGroupName -Name "InputDataset" -Force

Remove Linked Service

Remove-AzDataFactoryV2LinkedService -DataFactoryName $DataFactory.DataFactoryName -ResourceGroupName $ResGrp.ResourceGroupName -Name "AzureStorageLinkedService" -Force

Try to use these scripts and get your hands dirty.

Happy Coding!!!



Categories: Automation, Azure, Azure Data Factory, Powershell

Tags: , , , , ,

Leave a Reply