Data Modeling Is Like Building A Boat
Have you built ever built a boat? It is a singular objective for those who build a pleasure craft in their garage. It is a major business for those who build ships and boats for others.
Data modeling is like building a boat. A small application will have far different data modeling needs than those required for an enterprise warehouse. The small application is akin to building a boat in your garage. Enterprise development has project plans approaching those of shipbuilders. Here is why.
Think of data modeling as laying the keel for a boat. The keel provides the basic foundation for the hull. Three kinds of data models are successively developed in creating the foundation for a data mart (or warehouse or application). Conceptual, Logical, and Physical modeling are the progression that lead to a proper data layer.
The Conceptual Data Model
Conceptual models are used to group similar data concepts. The conceptual model is built and used in discussions with stakeholders and business analysts. Conceptual models are shown as subject areas without regard for a specific technology. The eventual use of a Teradata or Microsoft or Oracle engine is not a consideration when building a conceptual model. Entities such as primary and foreign keys are not yet considered. Attributes and table structures are not considered. For example, a conceptual model for a cruise ship organization might start with these key topics in the conceptual model:
Conceptual Data Model for Cruise Ship Lines
The model is incomplete. There is no reference to individual ships or their maintenance schedules. Equipment that is on the ships should be represented to account for beds and dining room outfitting and lifeboats. Documents needed for passengers and crew are missing. Operating costs and returns to the financial well-being of the cruise ship owner will be needed. Here is the point: The conceptual data model is a high-level representation of the data that is available or needed to address each of the subject areas. It is certain that multiple line-of-business systems are in place for these needs. The baggage handling system is a different application than the reservation system. Some of the systems may be owned and operated by third parties, and the eventual data from them might be provided as a file. Seldom will there be real-time access granted to the third party production system, so data integration will occur based on a process that makes data available outside of their transactional database. Those details become more evident in Logical and Physical modeling.
The Logical Data Model
After a Conceptual Data Model is determined to cover the business requirements of the sponsors and stakeholders, it is developed as a logical data model. At this stage, the model is shown as structures that can be implemented in databases. The logical model adds all of the entities for each of the subject areas, including primary and foreign keys. Where the Conceptual Data Model is meant to help analysts and developers and subject matter experts (SMEs) understand the subject areas of the organization and how they relate to one another, the logical data model is the first step in codifying those subject areas for implementation as tables and databases. Dimensional modeling, in the case of data warehouse development, begins to be a consumer of the logical data model.
The Physical Data Model
Actual code that defines data structures is created for the Physical Data Model. Here, all of the preceding work with conceptual and logical modeling results in table structures, including column names, column data types, column constraints, primary keys, foreign keys, and relationships between tables. This is where capabilities differ among the target engine for which the database is being deployed. Data types are not the same in some cases. Teradata’s Integer and Decimal data types are Numbers in Oracle. The differences range from slight to problematic, so knowing the target engine is required.
A data definition language is used to define CREATE table statements with all of the attributes for each of the table structures. Indexing is introduced for performance. ALTER statements maintain constraints, among other purposes. Security might be introduced on certain columns for systems which support such an option.
Physical Data Model is expressed in data definition language (DDL) from Logical Data Model.
Finally, the data model is produced in the target system by executing a file of DDL statements. Then, the effort of loading data from external databases and files, in the case of a warehouse, begins. Or, if the resulting database is for a transaction-oriented application, the front-end user interface can begin storing data through communication via a data access construct.
Data modeling and boat building are analogous. The effort in each case is relative to the size of the project. Both start with an idea (subject areas) and progress to material requirements (logical data model) and the blueprint (physical data model). Both ideas have flexibility, as data models are seldom finished when initially implemented. Similarly, cruise boats undergo refinishing and maintenance on a regular basis. Starting with a proper set of data models allows enhancements as the business needs grow and change.