I recently came across a customer scenario where customer had to create hundreds of azure data factory pipeline performing similar functions which would be used across different business domains.
Creating hundreds of pipelines manually can be a tiresome job. Imagine we had a script which could do this job for us, our life would be much easier! Isn’t it??
So I tried to create few PowerShell scripts which can be used to create or even delete azure data factory pipeline from the cloud. Here we go:
In order to perform any action , we would need to connect to Azure using a Service Principal Credential.
Connect to Azure Account (Entering through Portal)
Connect-AzAccount
Connect to Azure Account using Service Principal
#$password = "<Password>" | ConvertTo-SecureString -asPlainText -Force
#$username = "<User Name>"
#$credential = New-Object ystem.Management.Automation.PSCredential($username,$password)
#Connect-AzAccount -Credential $credential -ServicePrincipal -Tenant ""
Each user account might have multiple subscriptions, so it is essential to select appropriate subscription
Get Azure Subsciption
Get-AzSubscription
Select Azure Subscription
Select-AzSubscription -SubscriptionId "<Subscription ID>"
Create a Resource group
This is an optional step; you can use existing resource group if you want.
$resourceGroupName = "ADFDemoRG";
$ResGrp = New-AzResourceGroup $resourceGroupName -location 'East US'
Create Azure Data Factory
$dataFactoryName = "ADFactoryDemo";
$DataFactory = Set-AzDataFactoryV2 -ResourceGroupName $ResGrp.ResourceGroupName -Location $ResGrp.Location -Name $dataFactoryName
To create Azure Storage Account Linked Service for the datasets use the template BlobLinkedService.json
{
"name": "AzureStorageLinkedService",
"properties": {
"annotations": [],
"type": "AzureBlobStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=<Account Name>;AccountKey=<Account Key>;EndpointSuffix=core.windows.net"
}}}
Create Azure Storage Account Linked Service
Set-AzDataFactoryV2LinkedService -DataFactoryName $DataFactory.DataFactoryName -ResourceGroupName $ResGrp.ResourceGroupName -Name "AzureStorageLinkedService" -DefinitionFile ".\BlobLinkedService.json"
Create Input dataset using a template InputDataset.json
{
"name": "InputDataset",
"properties": {
"linkedServiceName": {
"referenceName": "AzureStorageLinkedService",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "Binary",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": "emp.txt",
"folderPath": "input",
"container": "adftutorial"
}}}}
Create Input Dataset
Set-AzDataFactoryV2Dataset -DataFactoryName $DataFactory.DataFactoryName -ResourceGroupName $ResGrp.ResourceGroupName -Name "InputDataset" -DefinitionFile ".\InputDataset.json"
Create Output data set using a template OutputDataset.json
{
"name": "OutputDataset",
"properties": {
"linkedServiceName": {
"referenceName": "AzureStorageLinkedService",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "Binary",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"folderPath": "output",
"container": "adftutorial"
}}}}
Create Output Dataset
Set-AzDataFactoryV2Dataset -DataFactoryName $DataFactory.DataFactoryName -ResourceGroupName $ResGrp.ResourceGroupName -Name "OutputDataset" -DefinitionFile ".\OutputDataset.json"
Create Azure Data Factory Pipeline using Adfv2QuickStartPipeline.json template
{
"name": "Adfv2QuickStartPipeline",
"properties": {
"activities": [
{
"name": "CopyFromBlobToBlob",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "BinarySource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true
}},
"sink": {
"type": "BinarySink",
"storeSettings": {
"type": "AzureBlobStorageWriteSettings"
}},
"enableStaging": false
},
"inputs": [
{
"referenceName": "InputDataset",
"type": "DatasetReference"
}],
"outputs": [
{
"referenceName": "OutputDataset",
"type": "DatasetReference"
}]}],
"annotations": []
}}
Create a Pipeline
$DFPipeLine = Set-AzDataFactoryV2Pipeline -DataFactoryName $DataFactory.DataFactoryName -ResourceGroupName $ResGrp.ResourceGroupName -Name "Adfv2QuickStartPipeline" -DefinitionFile ".\Adfv2QuickStartPipeline.json"
Create Azure Data Factory Pipeline Run
$RunId = Invoke-AzDataFactoryV2Pipeline -DataFactoryName $DataFactory.DataFactoryName -ResourceGroupName $ResGrp.ResourceGroupName -PipelineName $DFPipeLine.Name
Monitor the pipeline run for Success or Failure
while ($True) { $Run = Get-AzDataFactoryV2PipelineRun -ResourceGroupName $ResGrp.ResourceGroupName -DataFactoryName $DataFactory.DataFactoryName -PipelineRunId $RunId
if ($Run) { if ($run.Status -ne 'InProgress') { Write-Output ("Pipeline run finished. The status is: " + $Run.Status) ; $Run ; break } } Start-Sleep -Seconds 10 }
Remove Data Factory Pipeline
Remove-AzDataFactoryV2Pipeline -DataFactoryName $DataFactory.DataFactoryName –ResourceGroupName $ResGrp.ResourceGroupName -Name "Adfv2QuickStartPipeline" -Force
Remove Output dataset
Remove-AzDataFactoryV2Dataset -DataFactoryName $DataFactory.DataFactoryName -ResourceGroupName $ResGrp.ResourceGroupName -Name "OutputDataset" -Force
Remove Input dataset
Remove-AzDataFactoryV2Dataset -DataFactoryName $DataFactory.DataFactoryName -ResourceGroupName $ResGrp.ResourceGroupName -Name "InputDataset" -Force
Remove Linked Service
Remove-AzDataFactoryV2LinkedService -DataFactoryName $DataFactory.DataFactoryName -ResourceGroupName $ResGrp.ResourceGroupName -Name "AzureStorageLinkedService" -Force
Try to use these scripts and get your hands dirty.
Happy Coding!!!
Categories: Automation, Azure, Azure Data Factory, Powershell
Leave a Reply