In this article, I will analyze different ways to organize data in a database. In my examples, I will be utilizing Oracle, but any other relational database would suffice. There are both advantages and disadvantages to these methods of relational database data organization and I will distinguish between them. However, I will leave it up to the reader to make a decision on which data organization is right for them.
Depending on the business requirements and development needs, various ways of storing date can be implemented. The first data layout is horizontal. This is a traditional way of storing data in a table. Each new data record is inserted as a row and table columns run horizontally; hence, the name. The second method is vertical. This is a specific way of storing data vertically in a table that only has two real data columns and one other (but there could me more) identifier column. The data is stored as key/value pairs vertically; hence, the name.
Vertical vs. Horizontal
Here is an example of traditional Horizontal Table Structure:
Table HR
ID | First_Name | Last_Name | Dept | Sex |
---|---|---|---|---|
123 | Vlad | Kofman | IT | M |
234 | Jim | Cramer | Marketing | M |
456 | Olimpia | Dukakis | Accounting | F |
Here is an example of the same data in a Vertical Table Structure:
Table VR
ID | Key | Value |
---|---|---|
123 | First_Name | Vlad |
234 | First_Name | Jim |
456 | First_Name | Olimpia |
123 | Last_Name | Kofman |
234 | Last_Name | Cramer |
456 | Last_Name | Dukakis |
123 | Dept | IT |
234 | Dept | Marketing |
456 | Dept | Accounting |
123 | Sex | M |
234 | Sex | M |
456 | Sex | F |
As you can see, the data is transformed from a 3 rows x 4 data columns matrix (table HR) to a 12 rows x 2 data columns matrix (table VR). The number of rows in the vertical table can be assumed to be equal to the number of records in the horizontal table multiplied by the number of columns in the horizontal table; for example, 3×4=12. However, this assumption is not correct, as I will discuss later in this article.
Advantages and Disadvantages of Vertical Data Storage
There are numerous advantages of storing data in the traditional horizontal fashion, with predetermined structure and columns, but there are also shortcomings. Imagine for a second that the application you are designing has a form screen, and this screen is completely dynamic, and there is no concrete field number or names. Users can instantly create a new field, and give it a name and a value. How can such a form be persisted in the database?
Now, imagine a different scenario where a form is created based on business requirements and it has some specified business-required fields. The data per form is stored in a horizontal table where each field is inserted in a specific column and each form is stored as a row. After a month, because the application is in production, business realizes that a new field is needed. To add new field to the form, UI and persistence logic needs to be changed and the actual table needs to have an extra column added to it. The application also needs to be retested and redeployed.
In an environment where developers do not have direct access to the database, the DBA group needs to be involved to actually change the table.
Now, what if in a month another field is needed?
The solution to both of these scenarios is to use a vertical data store and a different, dynamic logic on application layer and the UI. Because the data can be stored as key/value pairs in only two columns and is tied together in a logical form by a unique id, there is no limitation on how many fields can be in a form. Subsequently, each logical row in vertical table can have various numbers of fields. Therefore, the biggest advantage of the vertical data organization is its flexibility, but there are still numerous disadvantages.
With flexibility, the control over data is lost, which means data normalization is very hard to maintain. Attributes of a logical row that does not belong in a single table can potentially be persisted there. For example, information on a Mutual Fund in a Vertical Table can be amended with analyst name and contact info, whereas this type of information normally resides in a separate table and should be linked by its Foreign Key id to the Fund.
Strong data typing is also lost with a vertical table. Because the value column is of only one type (for example, varchar) and all values need to be either of that type or need to be converted back and forth during save and retrieve operations. Storing special data types like Blobs or Clobs is impossible in the vertical table.
Another drawback of the Vertical Table is data consistency. The fact that all column names are entries in the key column makes it easy for users (or applications) to store potentially identical data with differently called keys. For example, one user can create a new field and save it as “Company” with a value of “Oracle”, whereas another user can do the same with “Organization” and “Oracle”.
Also, working with a vertical table is much harder. To find specific a logical row, multiple self joins need to be done. For this reason, a very little number of commercial reporting software can work with the vertical table to generate any kind of meaningful reports.
For example, to get all records from the horizontal table where people are male, one can write the following select:
Select * from HR where sex like 'M'
To get the same data from a vertical table, one would need to do a self-join, by first getting ids and then the data:
Select * from VR where id in (Select id from VR where key = 'sex' and value = 'M')
Many developers write special functions or stored procedures to convert from vertical to horizontal structure, only so that they can do reports and work with the data easier.
Vertical Table Grouping
The variations of vertical data storage can include custom algorithms for aggregation of logical data rows. What this means is that more then on id column can be used to group properties. For example:
ID | Grouping | KEY | VALUE |
---|---|---|---|
123 | 1 | name | Vlad |
123 | 1 | phone | 555-555-5555 |
123 | 2 | machine | Dell 560 |
123 | 2 | ISP | Verizon |
234 | 1 | name | Mariya |
234 | 1 | phone | 555-456-8392 |
234 | 1 | m@mail.com | |
243 | 2 | machine | Dell 1750 |
When a custom report is generated, it can be split easily into pages or sections based on the second “grouping” column. Therefore, a Vertical Table above can become this report:
Report 1 | ||||
Page 1 | ||||
Name | Phone | |||
Vlad | 555-555-5555 | |||
Page 2 | ||||
Machine: Dell 560 | ||||
ISP: Verizon |
Report 2 | ||||
Page 1 | ||||
Name | Phone | |||
Mariya | 555-456-8392 | m@mail.com | ||
Page 2 | ||||
Machine: Dell 1750 |
Note how different reports and pages have a diverse number of entries in them.
Conclusion
In this article, I have shown you different ways to persist data in a database table. The benefits and flexibility offered by a vertical data layout are often negated by its shortcomings. But, its flexibility is often the only way to design data organization. On the other hand, horizontal tables are better suited for data normalization and are a traditional way to persist data in RBMS. It will be up to software and database architects to decide what type of database table design is the right choice for a specific application need.
About the Author
Vlad Kofman works on enterprise-scale projects for the major Wall Street firms. He has also worked on defense contracts for the U.S. government. His main interests are object-oriented programming methodologies, UI, and design patterns.