Big Data
Big Data describes the large volume of data, either structured or unstructured, that inundates a business on a daily basis. Big Data treats ways to analyse, extract information from, or deal with data sets that are too large or complex to be dealt with by normal data-processing software.
Big data has the following characteristics:
- Volume: The quantity of generated and stored data
- Variety: The type and nature of the data
- Velocity: The speed at which data is generated and processed
- Veracity: Data quality and the data value
The Influence of Azure on Big Data
Microsoft Azure transforms data into actionable insights by using machine learning tools. It allows you to combine any data at any scale, and to build and deploy machine learning models at scale.
With the following Azure products, advanced analytics can be performed on Big Data:
- SQL Data Warehouse
- Data Factory
- Azure BLOB Storage
- Azure Databricks
- Azure Cosmos DB
- Power BI
Let’s have a look at each of them individually.
SQL Data Warehouse
SQL Data Warehouse is a Cloud-based EDW (Enterprise Data Warehouse) that uses Massively Parallel Processing (a large number of processors that perform a set of computations in parallel) to run complex queries across petabytes of data.
You simply import big data into SQL Data Warehouse with PolyBase T-SQL queries (queries that read data from Hadoop), then, with the power of MPP, run high-performance analytics. The data warehouse then will become the single version of truth which you can count on for insights.
Data Factory
Data Factory is a Cloud data integration service that compose data storage, movement, and processing services into automated data pipelines. Azure Data Factory is a hybrid data integration service that allows you to create, schedule, and orchestrate ETL/ELT (Extract, Transform, Load) workflows.
Azure BLOB Storage
Azure BLOB storage is a Massively scalable object storage for unstructured documents, images, videos, and audio. Azure BLOB storage is optimized for storing massive amounts of unstructured data (data that does not adhere to a particular data model or definition), such as text or binary data, for example.
Azure BLOB storage has the following functions:
- Serving documents or images directly to a browser
- Storing files for distributed access
- Streaming audio and video
- Writing to log files
- Storing data for disaster recovery, backup and restore, and archiving
Azure Databricks
Azure Databricks is an easy, fast, and collaborative Apache Spark-based (open-source distributed general-purpose cluster-computing framework, which provides an interface for programming clusters with implicit data parallelism) analytics platform.
Azure Cosmos DB
Azure Cosmos DB is a globally distributed database service. It is designed to provide low latency, elastic scalability of throughput, well-defined semantics for data consistency, and high availability.
Power BI
Power BI is a suite of business analytics tools that deliver insights. Power BI enables you to connect to scores of data sources, simplify data preparations, drive ad hoc analysis, as well as produce reports to be consumed on the Web and across mobile devices.
Conclusion
Big Data has evolved, and keeps on evolving. With the help of Azure tools, Big Data becomes more and more manageable.