Source Code

Highlight

Check out how to leverage Azure Blob Storage and Logic Apps for simple scenario of data loading from CSV into Azure SQL in less than 30 minutes and with almost no coding.

Intro

About any developer out there at some point or another had to automate ETL process for data loading. This article will present a fast and convinient way to create data loading workflow for CSVs using Azure SQL and blob storage.

Data Factory is also an option

If you are interested in loading data, there is now alternative path available. On my channel I show introduction to Azure Data Factory in which I showcase the scenario of Blob Storage CSV to SQL loading. Everything just under 10 minutes. Feel free to check it out!

Prerequisites

In order to execute this scenario a developer must have following tools

  1. Azure Subscription

That’s it!

Design

Preparation of design is a crucial setup in delivery of complete architecture. In this example, following design choices were made

  1. Azure Blob Storage for purpose of uploading and storing input data files
  2. Storage will have container called input which will store all unprocessed input files
  3. Storage will have container called output where all processed files will be moved and their name will be changed to match date of processing
  4. Azure Logic App which will be triggered where new file appears in input container of blob storage
  5. Azure SQL that will store data model, loading procedure and the data itself

Process flow

It only takes three very small components for this scenario to be implemented. With that said let’s now discuss how will the process look like. Diagram below presents overall architecture of the solution.

Diagram covers following high level steps

  1. Uploading CSV file to blob storage (be it via portal or automated way in future)
  2. Triggering logic app whenever there is a new file in the storage
  3. Logic app connects to SQL database and triggers data loading procedure
  4. SQL database runs procedure which is using BULK INSERT command to connect to blob storage and load data
  5. Logic app moves processed file to outputs on successful load

Provisioning

Log into Azure Portal and start creating resources.

Resource Group

  1. Click on “+” sign and type in Resource Group in Search window
  2. Click on the result and click on Create button
  3. Give it a name and hit Create again

Blob Storage

  1. Click on “+” sign and type in Storage account in Search window
  2. Click on the result and click on Create button
  3. Give it any name and leave everything else default. For the resource group select Use existing and select resource group from previous step
  4. Hit Create (optionally select pin to dashboard before clicking)
  5. When the resource is created go to its options and click on Containers blade
  6. Create two new private containers called input and output

SQL Server & Database

  1. Click on “+” sign and type in SQL Database in Search window
  2. Click on the result and click on Create button
  3. Provide following values to the prompt
  4. Name for the database
  5. Select existing resource group from previous step
  6. Since there is no SQL Server yet to host our database it is also required to create on at this point. For this click on the server and then Create a new server button.
  7. Provide server name (this will be public DNS address of created server), admin login name and password
  8. Hit Select button
  9. Hit Create (optionally select pin to dashboard before clicking)

Logic App

  1. Click on “+” sign and type in Logic App in Search window
  2. Click on the result and click on Create button
  3. Give it a any name, for the resource group select Use existing and select resource group from previous step
  4. Hit Create (optionally select pin to dashboard before clicking)

Development

I did mention previously that there will be almost no coding.

Before settings up database and loading procedure, a special access token must be generated so that SQL database can connect to blob storage and read files from there. This token is called SAS (Shared Access Signature) token.

Blob Storage

  1. Navigate to previously created blob storage in Azure Portal
  2. Select Shared Access Signature blade
  3. Create SAS token with following settings
    Ensure that the End Date is far enough in future and replace this once it’s expired. In production system an operations team can handle this or simply create a very small logic app that will generate new token and update the database credential.
  4. Click Generate SAS and connection string and copy value of generated SAS token for further use. When this screen is closed a new token will have to be created if lost.
  5. Navigate to Containers blade
  6. Click on the input container
  7. Click on the properties blade
  8. Copy the URL value from the properties tab

SQL Server & Database

Now finally for the database

  1. Navigate to previously created SQL database in Azure Portal
  2. Click on Query Editor
    Optionally and recommended way to do this is via SQL Server Management Studio but for such simple scenario query editor is good enough.
  3. Click on the Login button and provide previously set password
  4. In the query window execute following SQL statement and replace unique string with secret key for the database. This creates master key which is used to encrypt all secrets in the database.
    CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'UNIQUE_STRING_HERE'
  5. Create new credential object that will be used by the database to connect to blob storage. Replace SAS token value with value from previous steps. Note that there is no question mark at the begining of the secret
    CREATE DATABASE SCOPED CREDENTIAL BlobCredential
    WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
    SECRET = 'sv=SAS_TOKEN_HERE';
  6. Create new connection to blob with previous credentials
    CREATE EXTERNAL DATA SOURCE AzureBlob
    WITH ( 
        TYPE       = BLOB_STORAGE,
        LOCATION   = 'https://marczakiocsvstorage.blob.core.windows.net/input',
        CREDENTIAL = BlobCredential
    );
  7. In this example a CSV file with two columns ID and VALUE will be loaded for simplicity. So next step is to create table with those columns in the database using following SQL.
    CREATE TABLE MyData (
        ID int,
        VALUE nvarchar(400)
    )
    File will look like this and will be named input.csv
    ID,VALUE
    1,Adam
    2,Tom
    3,Max
    4,LJ
    5,Jason
    6,Thomas
    7,Piotr
    8,Anna
    9,Jessabelle
  8. Now the last step is to create a procedure which will load data from the input.csv file to MyData table.
    CREATE PROCEDURE LoadData
    AS
    BEGIN
    
    DELETE FROM [dbo].[MyData];
    
    BULK INSERT [dbo].[MyData]
    FROM 'input.csv'
    WITH ( 
        DATA_SOURCE = 'AzureBlob',
        FORMAT      = 'CSV',
        FIRSTROW    = 2
    );
    
    END
    Few important things to note here
  9. In proper scenario delete and insert should be encapsulated in a transaction which will roll back in case of errors during the load.
  10. File name should be taken as parameter instead of static value and executed as dynamic SQL.

Logic App

The best for last. Since the setup is now complete, creation of logic app is very pleasant.

  1. Navigate to previously created blob storage in Azure Portal
  2. Select Edit button
  3. Select Blank Logic App
  4. Type in Blob in the search window and select Blob Storage trigger that appears
  5. Define new blob connection and give it a name. Select storage account created earlier so that azure automatically saves connection info.
  6. Select input container
  7. Add new action
  8. Find SQL server and select Execute Stored Procedure in Actions Tab
  9. Define new connection to SQL server, select previously created database and provide credentials for connectivity
  10. Once connected select previously created procedure
  11. Add new action
  12. Select Blob Storage Copy Blob Action - because there is no ‘move’ action it is required for blob to be copied to new location and then for the old one to be removed. In the source URL select Path object from the trigger step.
  13. Define destination path as conjunction of static path /output/ and dynamic name of file from the trigger step DisplayName and static underscore _ and finishing it with expression utcNow() so that uploading new input files will create different output files with the date suffix.
  14. Copy blob action should look like this.
  15. Add new action
  16. Select blob delete blob action and add Path object from the trigger.
    WARNING Make sure to select Path from the trigger. If Path from Copy Blob action will be selected a newly created file will be removed.
  17. Click save

Test it!

  1. Navigate to blob storage input container
  2. Click on the upload button and upload input.csv file
  3. Check if the logic app ran by going to logic app interface
  4. Check out the output container
  5. And finally check the data in the database from the query editor

What’s next?

There are of ‘course some limitations and drawbacks to this design no doubt. With that said this is a good template to start from and expand.

  1. Read more about BULK INSERT and its bcp utility. For instance, to match columns and types using schema files.
  2. Consider using OPENROWSET when working with more complex data.
  3. Change loading procedure to accept file name parameter so that input files on the blob do not have to follow naming convention, rather be in proper location. Logic apps allows for discovery of parameters so it is very easy to integrate.
Source Code

Adam Marczak

Programmer, architect, trainer, blogger, evangelist are just a few of my titles. What I really am, is a passionate technology enthusiast. I take great pleasure in learning new technologies and finding ways in which this can aid people every day. My latest passion is running an Azure 4 Everyone YouTube channel, where I show that Azure really is for everyone!

Did you enjoy the article?

Share it!

More tagged posts