Data Factory pipeline parametrization via external mapping/config tables

Highlight

Are your tired of parametrizing dozens of pipelines via templates, maybe you want a more metadata driven approach, or simply you are hitting ADF temmplate parameter limits. If yes, this blog entry is for you.

Intro

If you manage large data factories, and you’ve been hit by any of following challenges,

Azure Data Factory 256 template parameters limit
Hundreds of parameters to override when deploying your factories
Dozens of global or shared parameters
A lot of change management effort to change many parameters in many places
Changing parameter without redeploying entire data factory

then this is for you.

Solution

The clean solution is to move pipeline configuration into an external table. That table becomes the single source of truth for pipeline options, dataset mappings, file locations, and environment-specific values.

Using an external mapping/config table lets you:

keep ARM templates lean and stable
avoid 256 parameter limits
update configuration without redeploying entire data factory
reuse metadata across multiple pipelines

Alternative metadata storage options

Microsoft Fabric Lakehouse / DB
Azure Storage Tables
Azure Cosmos DB

Implementation

The high-level flow is:

create an external table for pipeline metadata
populate it with pipeline/parameter mappings (add config)
read the table from ADF using a lookup activity
map returned values into pipeline parameters and activities

Details

Table Creation and Adding metadata

Create a simple metadata table.

CREATE TABLE dbo.PipelineConfig
(
    PipelineName VARCHAR(100) NOT NULL,
    Parameter    VARCHAR(100) NOT NULL,
    Value        VARCHAR(255) NULL
);
GO

INSERT INTO dbo.PipelineConfig (PipelineName, Parameter, Value)
VALUES
    ('Pipeline A', 'Region',  'US'),
    ('Pipeline A', 'Timeout', '30'),
    ('Pipeline B', 'Enabled', 'true');
GO

Example usage (Azure SQL)

Example usage (Fabric SQL)

Alternatives

You can use different metadata storage options like Table Storage or Cosmos DB, it’s really yp to you. I like SQL as it provides a bit more control over the data and schema + more rich query language.

(ADF) Adding Managed Identity reader permissions

Note: This step is NOT required in Microsoft Fabric.

Ensure ADF can connect to SQL and read the data. You can technically use SQL auth, but it’s not recommended these days.

CREATE USER [adf_name] FROM EXTERNAL PROVIDER;
USE db_name;
ALTER ROLE DB_DATAREADER ADD MEMBER [adf_name];

Example usage

Now add a new linkedservice to Azure SQL using managed identity as authentication. Ensure database name is set, otherwise it will fail trying to connect to master, which it has no permissions to do so.

Azure Data Factory linked service

Lookup

Lookup query

SELECT *
FROM
(
    SELECT PipelineName, Parameter, Value
    FROM dbo.PipelineConfig
    WHERE PipelineName = 'Pipeline A'
) src
PIVOT
(
    MAX(Value)
    FOR Parameter IN ([Region], [Timeout])
) p;

Azure Data Factory

Microsoft Fabric

If you want you can use query which automatically prepares the pivot.

DECLARE @PipelineName sysname = 'Pipeline A';
DECLARE @cols nvarchar(max);
DECLARE @sql  nvarchar(max);

SELECT @cols = STRING_AGG(QUOTENAME(Parameter), ',')
FROM
(
    SELECT DISTINCT Parameter
    FROM dbo.PipelineConfig
    WHERE PipelineName = @PipelineName
) p;

SET @sql = N'
SELECT PipelineName, ' + @cols + N'
FROM
(
    SELECT PipelineName, Parameter, Value
    FROM dbo.PipelineConfig
    WHERE PipelineName = @PipelineName
) src
PIVOT
(
    MAX(Value)
    FOR Parameter IN (' + @cols + N')
) p;';

EXEC sp_executesql
    @sql,
    N'@PipelineName sysname',
    @PipelineName = @PipelineName;

Testing

Now just testing remains

Azure Data Factory

Microsoft Fabric

Using the values in your pipelines

Now it’s really as simple as

@activity('Get Params').output.firstRow.<paramName>

Like so

@activity('Get Params').output.firstRow.Region

CICD?

If you plan to deploy this via CICD, remember to also deploy metadata of this table alongside the Data Factory to avoid human mistakes.

Deploying Azure SQL DB is not part of this blog because it’s fairly common practice, but it also depends heavily on the choosen DevOps tool.

Conclusion

This is a very scalable and mature approach to metadata driven pipeline management. Instead of thousands of parameters all over, a single config table to manage them all. You can also define global parameters like this too!

Happy building! 🚀