Highlight

Excel files are one of the most commonly used file format on the market. Popularity of the tool itself among the business users, business analysts and data engineers is driven by its flexibility, ease of use, powerful integration features and low price.

Intro

This is why every data engineer out there should be to understand advantages and disadvantages of this format. The variety of different internal formats like XLS, XLSX, XLSB and XLSM and which tools to use in order to process those files effectively in the cloud.

Today I bring to you a quick introduction to the process of building ETL solutions with Excel files in Azure using Data Factory and Databricks services.

Code samples: https://github.com/MarczakIO/azure4everyone-samples/tree/master/azure-excel-file-processing-with-data-factory-and-databricks

Agenda

  • 00:00 Introduction
  • 00:25 Excel Business Justification
  • 01:22 Excel Challenges
  • 02:20 Supported Services
  • 04:30 Data Factory Introduction
  • 05:35 Demo Setup
  • 07:13 Demo using Data Factory
  • 13:36 Databricks Introduction
  • 14:44 Databricks Setup
  • 18:14 Databricks Demo - Reading Excels
  • 20:55 Databricks Demo - Reading Excels using References
  • 25:56 Databricks Demo - Workbook Metadata
  • 28:05 Databricks Demo - Defining Schema
  • 30:03 Databricks Demo - Defining Schema
  • 32:53 Additional Options

Video

Next steps for you after watching the video

  1. Excel format in Data Factory
  1. Spark Excel by Crealytics documentation

Adam Marczak

I've spent most of my career working with software and cloud technologies, but at heart I'm simply someone who loves learning new things and sharing what I discover. Through this blog and my Azure 4 Everyone YouTube channel, I try to make Azure and cloud computing more approachable for developers, architects, and anyone curious about technology.

Did you enjoy the article?

Support me

Join as member

Share it

More tagged posts