Architecture & DesignUnderstanding BI Components and Data

Understanding BI Components and Data

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Introduction

In first article of this series, we discussed the fundamentals of Business Intelligence (BI) with various data warehouse design approaches. In this article, we’ll know more about BI’s various components, architecture, data modeling, and its importance. Also, we’ll know in detail about Extract (E), Transform (T), and Load (L).

BI is a platform or system that builds by combining multiple components together. These various components can form different technologies and tools. Before starting to develop any BI system, it’s important to know the various components in detail and solid architecture design for successful implementation.

Components of Business Intelligence

An architect should be very careful before starting to design and develop any BI solution. It’s equally important for an organization not only for decision making but also, at the same time, it demands a huge investment that may go badly if the right components and approaches are not considered or good design is not in place.

A BI system is all about data; first, identify the right data and sources; second, store and analyze the data to provide desired output; third, visualize the data as information using various ways such as reporting, online query, alerts, and so forth.

To support all three stages, there are various options available in the industry. All major players who influence the industry trends have great exposure in BI technology and related tools. Some of them are IBM, Oracle, SAP, Microsoft, and many more. Also, there are a couple of open source suites available to design and develop BI solutions.

To design a BI system, the selection of technology and tools is very important. You may choose either a complete suite from one technology like IBM, Oracle, or Microsoft, or you can go for hybrid tools where the data gathering tool can be part of a different technology suite from data analysis and data visualization.

Sometime, the consideration of technology and tools is completely situational. For instance, if you are building an end-to-end BI system from scratch, you need to think about technical feasibility, the available skill set, and, most importantly, the licensing cost with support. If you are building a BI solution on top of a legacy system, you may think about hybrid design and use the best available licensed or open source tool compatible with the legacy system.

In next section, we’ll discuss how these different BI components participate and interact with each other when building the system.

Architecture Design for a BI Solution

Architecture design plays a key role for any system implementation, but it’s more critical if you are working on BI solutions. As we talked, there are multiple components that interact with each other; if there is any gap, we may not get the required information and meet the objective.

If you see the following logical architecture of a BI solution (referred from the first article), it has three major layers: a Data Integration Layer, a Data Analysis Layer, and a Information Delivery Layer. These are detailed below.

  • Data Integration Layer: It collects data from various data sources using an ETL tool and stores the data in centralized data storage (Database) called Data Warehouse (DW) or Data Mart (DM). In Architecture, we don’t include the designing of DW or DM databases in detail, but consider design principles and patterns that are specialized parts of the DW systems, such as Source systems as staging/landing area, Data warehouse as backroom using normalized ERD, Data Mart as frontroom using dimension modeling, and an Analytical Cube with any one of the M/R/H OLAP methodologies. I’ll discuss more about M/R/H OLAP in the next article.
  • Data Analysis Layer: Once data is integrated and becomes a single source of truth, other analytical tools (OLAP Cube) can consume this data and build aggregations to provide decision support information.
  • Information Delivery or Data Distribution Layer: This is the important layer in a BI system; it gives a real insight of information and what exactly stakeholders want to see. It helps business users in decision making and more activities, such as exploring the information, slicing and dicing the data, sending alerts, and in predictive analysis. Any data visualization tool (reporting tools, MS-Excel) can consume data from OLAP Cube or integrated data sources (DW or DM), and offers an opportunity to understand the data and show the power of BI.

Data Modeling

Data modeling is an integral part of BI system designing. There are three basic types of data models available: Conceptual Data Model, Logical Data Model, and Physical Data Model.

  • Conceptual Data Model: This model represents the highest level of relationships between different entities. The audience of this data model is business owners, BA, and Architects.
  • Logical Data Model: It represents entities, attributes, and relationships involved in a business function and helps as the basis for the creation of the physical data model.
  • Physical Data Model: This represents an application and database-specific implementation of a logical data model and defines database objects.

Here is the comparative analysis:

Objects

Conceptual

Logical

Physical

Entity Name

X

X

 

Entity Relationship

X

X

 

Attributes Identification

 

X

 

Primary Key Relationship

 

X

X

Foreign Key Relationship

 

X

X

Database Table Name

   

X

Table Column Name

   

X

Table Column Data Type

   

X

In all of the models above, data are designed for a different level of audiences. Availability of these models is important for any DW or DM design and development.

Apart from these, database design of the DW and DM should be well defined. It completely depends on methodology that is being used for development, Inmon or Kimball, or any other. But fundamentally, Inmon supports the 3rd normal form, which can be represented by an Entity-Relationship diagram and Kimball supports dimensional modeling, which is equally known as a star or snow flake design. Star or snow flake design is nothing but the representation of fact and dimension tables in a specific pattern. I’ll discuss all these terminologies in detail in the next article, along with OLAP methodologies.

Once a well-defined database design is in place, the next step would be extracting and loading the data from different or required data sources.

ETL: Extract, Transform, and Load

ETL is emerging terminology in the data world. It is used not only in BI implementation, but also at the same time is often used in all types of data migration or other data related activities, such as improving data quality, data profiling, and the like.

ETL is a combination of three activities: Data Extraction, Transformation (including cleansing), and Loading.

  • Data Extraction:In this, data can be consumed from various sources and loaded on a staging area for further processing. It is important to know that do we need to pull complete available data from the source or partial data. In the ETL world, this type of processing is known as Full pull and Delta pull.

    If it is Full pull, every time we need to pull all available data from the source. This is time consuming and may present other limitations, but if it is delta pull, we need to design ETL in such a way that whenever it pulls data from a source, it pulls only modified and newly entered data from the source. This is highly recommended, but depends on data availability at the source. There are various techniques to achieve all data pull scenarios, but these are out of scope for now.

    So, Data Extraction is the activity that consumes data from a source and makes it available to next ETL process; for instance, Data Transformation.

  • Data Transformation: It is the intermediate stage of ETL; it applies transformation on data available in the staging area. It is important to cleanse and transform your data in the required format before pushing it to its final destination.
  • Data Loading: It is the last step of ETL. Once data is cleansed and transformed into the required format, it loads to a destination known as DW or DM.

    This DW or DM will become the single source of truth. Now, other components of the BI system can consume data from central repository.

There are lots of ETL tools available to achieve the objective. Some of them are IBM’s DataStage, Microsoft’s SSIS, Informatica, and so forth. Also, we have many open source ETL tools, such as Pentaho and Talend, and more.

We can use any of these tools as per availability and need. But, I tell you one thing: The core concept of all these tools is the same; the only difference is what you may get in usability or look and feel.

Summary

In this article, we talked about the basic building blocks of BI systems and how they are tied to each other. The output of one component will become input for another. For any successful implementation of a BI system, all components should interact smoothly to each other.

Also, we discussed the design of a central repository of data, known as DW or DM, and how an ETL tool helps load data in the desired format to make use of it.

I’ll discuss more about BI and Big Data components in upcoming articles.

Reference

http://en.wikipedia.org/wiki/Business_intelligence

About the Author

Anoop Kumar worked for Microsoft for almost six and half years and has 12+ years of IT experience. Currently, he is working as a DWBI Architect in one of the top Fortune Companies. He has worked on end-to-end delivery of enterprise-scale DWBI projects. He carries a strong knowledge onf database, data warehouse, and business intelligence application design and development. Also, he worked extensively on SQL Server, designing of ETL using SSIS, SSAS, SSRS, and SQL Azure. Anoop is a Microsoft Certified IT Professional (MCITP) in Microsoft SQL Server – Database Development 2008, Business Intelligence 2005, and a Microsoft Certified Technology Specialist (MCTS) in Microsoft SQL Server 2008 – Implementation and Maintenance.

Disclaimer: I help people and businesses make better use of technology to realize their full potential. The opinions mentioned herein are solely mine and do not reflect those of my current employer or previous employers.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories