Highlight
Hitting a limits of 256 paramters in Data Factory, or maybe you are tired to parametrizing dozens of pipelines. If yes, this blog entry is for you.
Intro
In practice, large ADF estates hit a scaling problem: every pipeline needs settings, yet ARM templates only allow 256 parameters. This might sound like a lot, but in reality with just 20 unique pipelines this might already become a problem.
This blog explains why that happens and how an external metadata/config table solves it.
Solution
The clean solution is to move pipeline configuration into an external table. That table becomes the single source of truth for pipeline options, dataset mappings, file locations, and environment-specific values.
Using an external mapping/config table lets you:
- keep ARM templates lean and stable
- avoid 256 parameter limits
- update configuration without redeploying entire data factory
- reuse metadata across multiple pipelines
Implementation
The high-level flow is:
- create an external table for pipeline metadata
- populate it with pipeline/parameter mappings (add config)
- read the table from ADF using a lookup activity
- map returned values into pipeline parameters and activities
Details
Table Creation and Adding metadata
Create a simple metadata table.
CREATE TABLE dbo.PipelineConfig
(
PipelineName VARCHAR(100) NOT NULL,
Parameter VARCHAR(100) NOT NULL,
Value VARCHAR(255) NULL
);
GO
INSERT INTO dbo.PipelineConfig (PipelineName, Parameter, Value)
VALUES
('Pipeline A', 'Region', 'US'),
('Pipeline A', 'Timeout', '30'),
('Pipeline B', 'Enabled', 'true');
GO
Example usage
Alternatives
You can use different metadata storage options like Table Storage or Cosmos DB, it’s really yp to you. I like SQL as it provides a bit more control over the data and schema + more rich query language.
Adding ADF MI to read it
Ensure ADF can connect to SQL and read the data. You can technically use SQL auth, but it’s not recommended these days.
CREATE USER [adf_name] FROM EXTERNAL PROVIDER;
USE db_name;
ALTER ROLE DB_DATAREADER ADD MEMBER [adf_name];
Example usage
Lookup
Now add a new linkedservice to Azure SQL using managed identity as authentication. Ensure database name is set, otherwise it will fail trying to connect to master, which it has no permissions to do so.
Lookup query
SELECT *
FROM
(
SELECT PipelineName, Parameter, Value
FROM dbo.PipelineConfig
WHERE PipelineName = 'Pipeline A'
) src
PIVOT
(
MAX(Value)
FOR Parameter IN ([Region], [Timeout])
) p;
If you want you can use query which automatically prepares the pivot.
DECLARE @PipelineName sysname = 'Pipeline A';
DECLARE @cols nvarchar(max);
DECLARE @sql nvarchar(max);
SELECT @cols = STRING_AGG(QUOTENAME(Parameter), ',')
FROM
(
SELECT DISTINCT Parameter
FROM dbo.PipelineConfig
WHERE PipelineName = @PipelineName
) p;
SET @sql = N'
SELECT PipelineName, ' + @cols + N'
FROM
(
SELECT PipelineName, Parameter, Value
FROM dbo.PipelineConfig
WHERE PipelineName = @PipelineName
) src
PIVOT
(
MAX(Value)
FOR Parameter IN (' + @cols + N')
) p;';
EXEC sp_executesql
@sql,
N'@PipelineName sysname',
@PipelineName = @PipelineName;
Testing
Now just testing remains
Using the values in your pipelines
Now it’s really as simple as
@activity('Get Params').output.firstRow.<paramName>
Like so
@activity('Get Params').output.firstRow.Region
Conclusion
This is a very scalable and mature approach to metadata driven pipeline management. Instead of thousands of parameters all over, a single config table to manage them all. You can also define global parameters like this too!
Related Reading
- Azure Data Factory parameterize your pipeline
- Use Azure managed identity with Azure Data Factory
- Azure SQL Database authentication with managed identity
- Deploy resources with Azure Resource Manager templates
- Create metadata-driven data integration solutions in Azure Data Factory
Happy building! 🚀