Skills Learned

What is Big Data?

Big Data is a field of technology that helps with the extraction, processing and analysis of information that is too large or complex to be dealt with by traditional software.

The three V’s rule

Big data typically has one of the following characteristics

  • Velocity - how fast the data is coming in or how fast we are processing it
    • Batch
    • Periodic
    • Near Real Time
    • Real Time
  • Volume - how much data we are processing
    • Megabytes
    • Gigabyte
    • Terabytes
    • Petabytes
  • Variety - how structured/complex the data is
    • Tables
    • Databases
    • Photo, Audio
    • Video, Social Media

Azure Synapse Analytics

  • Big data analytics platform (PaaS)
  • Multiple components
    • Spark
    • Synapse SQL
      • SQL pools (dedicated – pay for provisioned performance)
      • SQL on-demand (ad-hoc – pay for TB processed)
    • Synapse Pipelines (Data Factory – ETL)
    • Studio (unified experience)

Azure HDInsight

  • Flexible multi-purpose big data platform (PaaS)
  • Multiple technologies supported (Hadoop, Spark, Kafka, HBase, Hive, Storm, Machine Learning)

Azure Databricks

  • Big data collaboration platform (PaaS)
  • Unified workspace for notebook, cluster, data, access management and collaboration
  • Based on Apache Spark
  • Integrates very well with common Azure data services
◀ Previous Episode Next Episode ▶

Adam Marczak

Programmer, architect, trainer, blogger, evangelist are just a few of my titles. What I really am, is a passionate technology enthusiast. I take great pleasure in learning new technologies and finding ways in which this can aid people every day. My latest passion is running an Azure 4 Everyone YouTube channel, where I show that Azure really is for everyone!

Did you enjoy the article?

Share it!

More tagged posts