Highlight
If you need to set up CI/CD for your Data Factory, here is how to do it in less than 30 minutes!
Summary
Many of us are using Data Factory almost daily for our Data Engieering jobs. But not many of us are doing it by leveraging proper DevOps continous delivery standards. In this small tutorial I’m sharing templates that you can add to your data factory in just 30 minutes and have nice automated deployments to your environments.
Setup guide
Step 1 - Azure Data Factory
- Create folder /devops/ and then, in that folder create following files
- Create file package.json
{ "scripts":{ "build":"node node_modules/@microsoft/azure-data-factory-utilities/lib/index" }, "dependencies":{ "@microsoft/azure-data-factory-utilities":"^0.1.5" } }
- Create file adf-build-job.yml
parameters: - name: subscriptionId type: string - name: resourceGroupName type: string - name: dataFactoryName type: string - name: repoRootFolder type: string default: / - name: packageJsonFolder type: string default: / - name: artifactName type: string default: data-factory jobs: - job: BUILD displayName: 'Build ARM Template' variables: workingDirectory: $(Build.Repository.LocalPath)${{ parameters.repoRootFolder }} packageJsonFolder: $(Build.Repository.LocalPath)${{ parameters.packageJsonFolder }} dataFactoryResourceId: /subscriptions/${{ parameters.subscriptionId }}/resourceGroups/${{ parameters.resourceGroupName }}/providers/Microsoft.DataFactory/factories/${{ parameters.dataFactoryName }} artifactTempDirectory: data-factory-arm steps: - task: NodeTool@0 inputs: versionSpec: '14.x' displayName: 'Install Node.js' - task: Npm@1 inputs: command: 'install' workingDir: $(packageJsonFolder) verbose: true displayName: 'Install npm package' - task: Npm@1 inputs: command: 'custom' workingDir: $(packageJsonFolder) customCommand: 'run build validate $(workingDirectory) $(dataFactoryResourceId)' displayName: 'Validate' - task: Npm@1 inputs: command: 'custom' workingDir: $(packageJsonFolder) customCommand: 'run build export $(workingDirectory) $(dataFactoryResourceId) "$(artifactTempDirectory)"' displayName: 'Validate and Generate ARM template' - script: 'mv -v $(packageJsonFolder)$(artifactTempDirectory)/${{ parameters.dataFactoryName }}_GlobalParameters.json $(packageJsonFolder)$(artifactTempDirectory)/GlobalParameters.json' displayName: 'Rename Global Parameter File post' - task: PublishPipelineArtifact@1 inputs: targetPath: '$(packageJsonFolder)$(artifactTempDirectory)' artifact: ${{ parameters.artifactName }}
- Create file package.json
- Create new pipeline under path /devops/adf-azure-pipelines.yml, paste in the sample code
trigger: - main pool: vmImage: ubuntu-latest stages: - stage: BUILD jobs: - template: <path_to_adf-build-job.yml_file> parameters: subscriptionId: <subscription_id> resourceGroupName: <resource_group_name> dataFactoryName: <data_factory_name> repoRootFolder: <absolute_path_to_datafactory_folder, ex. /data-factory/> packageJsonFolder: <absolute_path_to_package_json, ex. /devops/)
- Replace temp variables with your env values
- Run and test pipeline
- In /devops/ folder create file adf-deploy-job.yml
parameters: - name: environmentName type: string - name: serviceConnectionName type: string - name: subscriptionId type: string - name: resourceGroupName type: string - name: dataFactoryName type: string - name: location type: string default: westeurope - name: artifactName type: string default: data-factory - name: overrideParameters type: string default: jobs: - deployment: ${{ parameters.environmentName }} displayName: Deployment to ${{ parameters.environmentName }} variables: - name: artifactsDirectory value: $(System.ArtifactsDirectory)/data-factory environment: '${{ parameters.environmentName }}' strategy: runOnce: deploy: steps: - script: echo Deploying to ${{ parameters.environmentName }} displayName: 'Script - Display Environment Stage Name' - task: DownloadPipelineArtifact@2 displayName: 'Download ADF ARM Template' inputs: source: current artifact: ${{ parameters.artifactName }} downloadPath: $(artifactsDirectory) - script: 'ls $(artifactsDirectory)' displayName: 'List Artifact contents' - task: AzurePowerShell@5 displayName: 'Stop Triggers' inputs: azureSubscription: '${{ parameters.serviceConnectionName }}' ScriptPath: '$(artifactsDirectory)/PrePostDeploymentScript.ps1' ScriptArguments: "-armTemplate $(artifactsDirectory)/ARMTemplateForFactory.json \ -ResourceGroupName ${{ parameters.resourceGroupName }} \ -DataFactoryName ${{ parameters.dataFactoryName }} \ -predeployment $true \ -deleteDeployment $false" azurePowerShellVersion: LatestVersion - task: AzureResourceManagerTemplateDeployment@3 displayName: 'ARM Template deployment' inputs: azureResourceManagerConnection: '${{ parameters.serviceConnectionName }}' subscriptionId: '${{ parameters.subscriptionId }}' resourceGroupName: '${{ parameters.resourceGroupName }}' location: ${{ parameters.location }} csmFile: '$(artifactsDirectory)/ARMTemplateForFactory.json' csmParametersFile: '$(artifactsDirectory)/ARMTemplateParametersForFactory.json' overrideParameters: > -factoryName ${{ parameters.dataFactoryName }} ${{ parameters.overrideParameters }} - task: AzurePowerShell@5 displayName: 'Deploy Global Parameters' inputs: azureSubscription: '${{ parameters.serviceConnectionName }}' ScriptPath: '$(artifactsDirectory)/GlobalParametersUpdateScript.ps1' ScriptArguments: "-globalParametersFilePath $(artifactsDirectory)/GlobalParameters.json \ -resourceGroupName ${{ parameters.resourceGroupName }} \ -dataFactoryName ${{ parameters.dataFactoryName }}" azurePowerShellVersion: LatestVersion - task: AzurePowerShell@5 displayName: 'Start Triggers' inputs: azureSubscription: '${{ parameters.serviceConnectionName }}' ScriptPath: '$(artifactsDirectory)/PrePostDeploymentScript.ps1' ScriptArguments: "-armTemplate $(artifactsDirectory)/ARMTemplateForFactory.json \ -ResourceGroupName ${{ parameters.resourceGroupName }} \ -DataFactoryName ${{ parameters.dataFactoryName }} \ -predeployment $false \ -deleteDeployment $false" azurePowerShellVersion: LatestVersion
- Navigate to Project Settings » Service Connections and create new connection to Azure using Service Principal and grant at least Data Factory Contributor role to all data factories that you will be deploying to
- In Azure Portal navigate to Azure Active Directory and create new App Registration
- For ADF only piplines grant Data Factory Contibutor role on Azure Data Factory resource, or for full CI/CD in Azure grant Contributor role to an entire resource group
- Copy the details of this service principal and subscription to Azure DevOps
- Create Environment (Pipelines » Environment)
- Create Variable Group (Pipelines » Library » Variable Groups)
- Add variables that needs to be overriden in ADF template (factoryName is overriden by default)
- Edit your adf-azure-pipelines.yml pipeline/file and add this stage
- stage: <stage_name> variables: - group: <variable_group_name> jobs: - template: <path_to_adf-deploy-job.yml_file> parameters: environmentName: <environment_name> subscriptionId: <subscription_id> resourceGroupName: <resource_group_name> dataFactoryName: <data_factory_name> location: <data_factory_region> serviceConnectionName: <service_connection_name> overrideParameters: > -<param_1> <val_1>
- Replace temp variables with your env values
- Run and test
- Add any extra deployment stages you need by repeating steps 5, 6, 7, 8, 9
Hope this helps! Good luck!
Source Code