Highlight

Are your tired of parametrizing dozens of pipelines via templates, maybe you want a more metadata driven approach, or simply you are hitting ADF temmplate parameter limits. If yes, this blog entry is for you.

Intro

If you manage large data factories, and you’ve been hit by any of following challenges,

  • Azure Data Factory 256 template parameters limit
  • Hundreds of parameters to override when deploying your factories
  • Dozens of global or shared parameters
  • A lot of change management effort to change many parameters in many places
  • Changing parameter without redeploying entire data factory

then this is for you.

Solution

The clean solution is to move pipeline configuration into an external table. That table becomes the single source of truth for pipeline options, dataset mappings, file locations, and environment-specific values.

Using an external mapping/config table lets you:

  • keep ARM templates lean and stable
  • avoid 256 parameter limits
  • update configuration without redeploying entire data factory
  • reuse metadata across multiple pipelines

Alternative metadata storage options

  • Microsoft Fabric Lakehouse / DB
  • Azure Storage Tables
  • Azure Cosmos DB

Implementation

The high-level flow is:

  1. create an external table for pipeline metadata
  2. populate it with pipeline/parameter mappings (add config)
  3. read the table from ADF using a lookup activity
  4. map returned values into pipeline parameters and activities

Details

Table Creation and Adding metadata

Create a simple metadata table.

CREATE TABLE dbo.PipelineConfig
(
    PipelineName VARCHAR(100) NOT NULL,
    Parameter    VARCHAR(100) NOT NULL,
    Value        VARCHAR(255) NULL
);
GO

INSERT INTO dbo.PipelineConfig (PipelineName, Parameter, Value)
VALUES
    ('Pipeline A', 'Region',  'US'),
    ('Pipeline A', 'Timeout', '30'),
    ('Pipeline B', 'Enabled', 'true');
GO

Example usage (Azure SQL)

Example usage (Fabric SQL)

Alternatives

You can use different metadata storage options like Table Storage or Cosmos DB, it’s really yp to you. I like SQL as it provides a bit more control over the data and schema + more rich query language.

(ADF) Adding Managed Identity reader permissions

Note: This step is NOT required in Microsoft Fabric.

Ensure ADF can connect to SQL and read the data. You can technically use SQL auth, but it’s not recommended these days.

CREATE USER [adf_name] FROM EXTERNAL PROVIDER;
USE db_name;
ALTER ROLE DB_DATAREADER ADD MEMBER [adf_name];

Example usage

Now add a new linkedservice to Azure SQL using managed identity as authentication. Ensure database name is set, otherwise it will fail trying to connect to master, which it has no permissions to do so.

Azure Data Factory linked service

Lookup

Lookup query

SELECT *
FROM
(
    SELECT PipelineName, Parameter, Value
    FROM dbo.PipelineConfig
    WHERE PipelineName = 'Pipeline A'
) src
PIVOT
(
    MAX(Value)
    FOR Parameter IN ([Region], [Timeout])
) p;

Azure Data Factory

Microsoft Fabric

If you want you can use query which automatically prepares the pivot.

DECLARE @PipelineName sysname = 'Pipeline A';
DECLARE @cols nvarchar(max);
DECLARE @sql  nvarchar(max);

SELECT @cols = STRING_AGG(QUOTENAME(Parameter), ',')
FROM
(
    SELECT DISTINCT Parameter
    FROM dbo.PipelineConfig
    WHERE PipelineName = @PipelineName
) p;

SET @sql = N'
SELECT PipelineName, ' + @cols + N'
FROM
(
    SELECT PipelineName, Parameter, Value
    FROM dbo.PipelineConfig
    WHERE PipelineName = @PipelineName
) src
PIVOT
(
    MAX(Value)
    FOR Parameter IN (' + @cols + N')
) p;';

EXEC sp_executesql
    @sql,
    N'@PipelineName sysname',
    @PipelineName = @PipelineName;

Testing

Now just testing remains

Azure Data Factory

Microsoft Fabric

Using the values in your pipelines

Now it’s really as simple as

@activity('Get Params').output.firstRow.<paramName>

Like so

@activity('Get Params').output.firstRow.Region

CICD?

If you plan to deploy this via CICD, remember to also deploy metadata of this table alongside the Data Factory to avoid human mistakes.

Deploying Azure SQL DB is not part of this blog because it’s fairly common practice, but it also depends heavily on the choosen DevOps tool.

Conclusion

This is a very scalable and mature approach to metadata driven pipeline management. Instead of thousands of parameters all over, a single config table to manage them all. You can also define global parameters like this too!

Happy building! 🚀

Adam Marczak

I've spent most of my career working with software and cloud technologies, but at heart I'm simply someone who loves learning new things and sharing what I discover. Through this blog and my Azure 4 Everyone YouTube channel, I try to make Azure and cloud computing more approachable for developers, architects, and anyone curious about technology.

Did you enjoy the article?

Support me

Join as member

Share it

More tagged posts