Neural networks (with and without external optimization) and online analytical processing (OLAP) are two mainstream tools for the efficient discovery of valuable non-obvious information from a large collection of data. Naturally, to be efficient, the information obtained must be worth more than the cost of processing the raw data. This article will focus on some ways these tools can be used to make better business decision. You will apply neural networks and OLAP separately to several business problems using a single data set.
Figure 1 shows possible paths between this raw data (shown at the left) and the tools (shown at the right). Between these extrema lies a data warehouse.
Figure 1: Overview of the system to be discussed throughout this article.
Data warehouses are designed to overcome problems encountered when an organization attempts to perform strategic analysis using the same database that is used for online transaction processing (OLTP).
Data warehousing is the technique of extracting archived operational data and overcoming inconsistencies between different legacy data formats. As well as integrating data throughout an enterprise, regardless of location, format, or communication requirements, a data warehouse can incorporate additional or expert information.
It is the logical link between what the managers see in their decision support applications and the company’s operational activities.
After the setup of a data warehouse, attention is usually switched to data mining (a term given many different definitions that I will not attempt to reconcile), which aims to extract new and meaningful information. In other words, a pool of ‘useful information’ that has been stored in a company data warehouse becomes ‘intelligent information.’
Microsoft Excel with Add-Ins from Palisade
For the implementation of the examples discussed throughout this article, I have chosen NeuralTools and Evolver from Palisade, two plug-ins to Microsoft Office Excel. There are a number of commercial alternatives and even some well-regarded and free open-source alternatives to these choices. However, NeuralTools and Evolver are widely used throughout academia and industry, can be combined with other Excel add-ins, and are complemented by the very flexible functionality of Excel itself.
I also cite Microsoft Project Server supported by SQL Server and Analysis Services. The data created by Microsoft Project Professional can be saved in Microsoft Project Server and accessed as user-defined OLAP cubes, as discussed in Appendix 5.
It is not always necessary to create a data warehouse for OLAP analysis. Data stored by operational systems, such as point-of-sales, are in types of databases called OLTPs. OLTP, Online Transaction Process, databases do not have any difference from a structural perspective from any other databases. The main difference is the way in which data is stored.
OLTPs are designed for optimal transaction speed. When a consumer makes a purchase online, they expect the transactions to occur instantaneously. With a database design, called data modeling, optimized for transactions, the record ‘Consumer name, Address, Telephone, Order Number, Order Name, Price, Payment Method’ is created quickly on the database and the results can be recalled by managers equally quickly if needed.
Examples of OLTPs can include ERP, CRM, SCM, Point-of-Sale applications, and Call Center.
Data are not typically stored for an extended period on OLTPs for storage cost and transaction speed reasons.
OLAPs have a different mandate from OLTPs. OLAPs are designed to give an overview analysis of what happened. Hence, the data storage (in other words, data modeling) has to be set up differently. The most common method is called the star design, as shown in the right-hand side of Figure 2.
Figure 2: Typical organization of data seen in OLTP databases (left) and OLAP data warehouse (right).
The central table in an OLAP start data model is called the fact table. The surrounding tables are called the dimensions. By using the above data model, it is possible to build reports that answer questions such as:
- Which supervisor gave the most discounts?
- What was the quantity shipped on a particular date, month, year, or quarter?
- In which zip code did product A sell the most?
To obtain answers, such as the ones above from a data model, OLAP cubes are created. OLAP cubes are not strictly cuboids—it is the name given to the process of linking data from the different dimensions. The cubes can be developed along business units, such as sales or marketing. Or, a giant cube can be formed with all the dimensions.
A business analyst often wants to get a big picture of the business, to see broader trends based on aggregated data, and to see these trends broken down by any number of variables. Business intelligence (another term given many different definitions that I will not attempt to reconcile) is, in many instances, the process of extracting data from an OLAP database and then analyzing that data for information that you can use to make informed business decisions and take action.
Because the OLAP storage unit is multidimensional, it is called a cube as opposed to a table. With OLAP, the user directs the analysis and explores hypotheses or relationships. It is important to understand that OLAP requires the user to know what information he or she is searching for; that is, OLAP techniques do not process enterprise data for hidden or unknown intelligence.
Figure 3 illustrates the OLAP tool that’s built into Microsoft Excel being applied to a data set that will be used throughout this article.
Figure 3: Excel 2007 with native OLAP tool (pivot table). OLAP cubes are designed for ad hoc reporting.
Although OLAP is able to provide numerical and statistical analysis of data in an efficient and timely way, it lacks the ability to make predictions.
In contrast, neural networks involve the automated process of finding relationships and patterns in data. For example, a company might want to know what pattern of behaviors predicts that a customer might leave for a competitor. Using computationally complex algorithms, the software finds relationships that were previously unknown.
Neural networks are capable of learning complex relationships in data. They can discern patterns in data, and then extrapolate predictions when given new data. The problems neural networks are used for can be divided in two general groups:
- Classification Problems: Problems in which you are trying to determine what type of category an unknown item falls into. Examples include medical diagnoses and prediction of credit repayment ability.
- Numeric Problems: Situations where you need to predict a specific numeric outcome. Examples include stock price forecasting and predicting the level of sales during a future time period.
When using NeuralTools, neural networks are developed and used in four steps:
- Data Preparation: The data you use in NeuralTools is defined in data sets. A Data Set Manager is used to set up data sets so they can be used over and over again with your neural networks.
- Training: With training, a neural network is generated from a data set comprised of cases with known output values. This data often consists of historical cases for which you know the values of output/dependent variable.
- Testing: With testing, a trained neural network is tested to see how well it does at predicting known output values. The data used for testing is usually a subset of your historical data. This subset was not used in training the network. After testing, the performance of the network is measured by statistics such as the % of the known answers it correctly predicted.
- Prediction: A trained neural network is used to predict unknown output values. Once trained and tested, the network can be used as needed to predict outputs for new case data.
Training and testing are an iterative, sometimes time-intensive process. Typically, you may train several different times with different settings to generate a neural network that tests best. Once you have your “best net,” you can use it quickly for predicting.
Two Neural Network Examples with External Optimization: Auto Loans
NeuralTools can be used to predict unknown values of a category-dependent variable from known values of numeric and category independent variables. In this example, shown in Figure 4, a neural network learns to predict whether an auto loan applicant will be making timely payments, late payments, or default on the loan.
Figure 4: Excel 2007 with the NeuralTools and Evolver plugins applied to the same data set used in Figure 3.
The data set contains information on applicants who took car loans in the past, except for the first five rows, which contains data on new applicants. A neural net has been trained, with options to Automatically Test and Automatically Predict after training, as configured in the Application Settings dialog shown in Figure 5.
Figure 5: Configuration of Detailed Report, Type of Net, and so forth.
In the Detailed Report to the right of the data set, Auto-Prediction provides the predicted outcome for the new applicants, along with the probabilities of those outcomes. With Live Prediction (discussed in Appendix 3) enabled, the prediction and the probability change as independent values change. For example, with a lower loan amount, the predicted probability of this applicant falling within the “timely payments” classification typically will be higher. The Detailed Report to the right of the data set shows not only the probability of the predicted category, but also the probability of every possible classification for each applicant. Those can be obtained by making appropriate selections in the Application Setting dialog: by clicking “Columns in Detailed Reports” and in the table that shows selecting “Probabilities of All Categories (for PNN)” in the Prediction column. Refer to Figure 5.
By using another Excel add-in from Palisade, Evolver, with NeuralTools, you can solve optimization problems, as seen in the Evolver model definition shown in Figure 6. A bank employee wants to allocate $75,000 in loans to the five applicants so as to minimize the probability of a default occurring. That probability is calculated by the formula in cell P22. The Evolver model has been set up as follows: The optimization goal was defined by specifying cell P22 and selecting “Minimum” as Optimization Goal. Range F26:F30 was specified as the adjustable range, with $0 and $30,000 as minimum and maximum values. Also, the Budget solving method was selected (by clicking Group button, selecting Edit, and selecting Budget), to ensure the total of cells F26:F30 will be kept constant. To run the example, select Evolver Start Optimization icon shown at the top right in Figure 4.
Figure 6: A bank wants to allocate $75,000 in loans to five applicants so as to minimize the probability of a default occurring.
Figure 7: Decision making tools discussed in this and the follow-uparticle (NeuralTools with Evolver and NeuralTools with StatTools, respectively).
Here’s another optimization that can be solved by NeuralTools and Evolver applied to the very same data set used in the previous example:
A lending institution can determine how much to loan a new applicant. The bank wants a repayment probability of 90%. Less than 90% is too risky, and over 90% means the bank is being too cautious and not lending enough. The amount to loan the applicant is an adjustable cell (K28 in Figure 8). Evolver will try different loan amounts and NeuralTools will predict the probability of repayment live, during optimization, while Evolver will determine the best loan amount to achieve 90% repayment.
Figure 8: A bank wants a repayment probability of 90%.
The Appendices and References sections provide further information on the NeuralTools types of net and the Evolver solving method (Budget and Recipe) used above.
Note: The example given above requires Evolver and the Industrial version of NeuralTools (to support Live Prediction).
OLAP and neural network users frequently have different characteristics. Those working with OLAP may employ software to access predefined reports, manipulate the data using available dimensions and measures and (in the case of power users) create queries and reports for themselves and others.
Neural network analysts typically work with specialized software to find the relationships that are important to the business. These analysts may be either highly skilled professionals or businesspeople with good analytical and problem-solving skills who work with packaged software in applications such as fraud detection. Of course, these analysts and the work they do can differ considerably.
A recent article in the Boston Globe newspaper discussed a case of the improper deployment of the kind of software discussed above. The Commonwealth of Massachusetts had planned to distribute sophisticated management software to 20,000 state government users, many of whom, according to industry analysts, would lack either the skills or the need to use it.
According to the Globe article, only top managers and policy-makers with a background in technology in state agencies as well as the staff of the Information Technology Division should have access to the specialized software.
The software, a so-called “business intelligence” program, would allow state officials to view dozens of state expenditures, caseloads, expenses, and other indicators at once, providing a high-level snapshot of all operations.
Appendix 1: Online Analytical Processing
Online Analytical Processing (OLAP) databases facilitate business-intelligence queries. OLAP is a database technology that has been optimized for querying and reporting, instead of processing transactions. The source data for OLAP is OLTP databases that are commonly stored in data warehouses. OLAP data is derived from this historical data, and aggregated into structures that permit sophisticated analysis. OLAP data is also organized hierarchically and stored in cubes instead of tables. It is a technology that uses multidimensional structures to provide rapid access to data for analysis. This organization makes it easy for a PivotTable report or PivotChart report to display high-level summaries, such as sales totals across an entire country or region, and also display the details for sites where sales are particularly strong or weak.
OLAP databases are designed to speed up the retrieval of data. Because the OLAP server, rather than Microsoft Office Excel, computes the summarized values, less data needs to be sent to Excel when you create or change a report. This approach enables you to work with much larger amounts of source data than you could if the data were organized in a traditional database, where Excel retrieves all of the individual records and then calculates the summarized values.
OLAP databases contain two basic types of data: measures, which are numeric data, the quantities and averages that you use to make informed business decisions, and dimensions, which are the categories that you use to organize these measures. OLAP databases help organize data by many levels of detail, using the same categories that you are familiar with to analyze the data.
Appendix 2: Types of Neural Networks
There are various types of neural networks, differing in structure, kinds of computations performed inside neurons, and training algorithms. One type offered in NeuralTools is the Multi-Layer Feedforward Network. With MLF nets, a NeuralTools user can specify whether there should be one or two layers of hidden neurons, and how many neurons the hidden layers should contain (NeuralTools provides help with making appropriate selections). NeuralTools also offers Generalized Regression Neural Nets and Probabilistic Neural Nets; these are closely related, with the former used for numeric prediction, and the latter for category prediction/classification. With GRN/PN nets, there is no need for the user to make decisions about the structure of a net. These nets always have two hidden layers of neurons, with one neuron per training case in the first hidden layer, and the size of the second layer determined by some facts about training data.
Figure 9: From the introduction of The Next Generation of Neural Networks, one of the videos in the References.
Appendix 3: Live Prediction
Live Prediction is a powerful capability of NeuralTools (Industrial version only) that allows you to perform predictions automatically in Excel without going through a specific Predict operation. With Live Prediction, NeuralTools places formulas in the cells where the predicted dependent variable values are shown. These formulas use a custom NeuralTools function to calculate the predicted values, such as:
=NetOutputPrediction(_PALDS_DG25B8C82B!$A$140, "DG25B8C82B", "VG1DD83AF2", 'Prediction Data'!$A$6:$I$6, A7:I7)
The actual formula is added to your worksheet by NeuralTools; you do not need to enter it. The arguments let NeuralTools identify the trained network in use, along with the location of the independent values in the worksheet. When the input independent variable values for a case are added or changed, NeuralTools will automatically return a new predicted value. This makes it simple to add and generate predictions for new cases using an existing trained net.
Appendix 4: Optimization Methods and Concepts
Optimization: The “recipe” solving method
The “recipe” solving method is the simplest and most popular type of solving method. Use recipe whenever the set of variables that are to be adjusted can be varied independently of one another. Think of each variable as the amount of an ingredient in a cake; when you use the “recipe” solving method, you are telling Evolver to generate numbers for those variables in an effort to find the best mix. The only constraint you place on recipe variables is to set the range (the highest and lowest value) that those values must fall between. Set these values in the Min and Max fields in the Adjustable Cells dialog (for example, 1 to 100), and also indicate whether or not Evolver should be trying integers (1, 2, 7) or real numbers (1.4230024, 63.72442). Refer to Figure 8.
Optimization: The “budget” solving method
The “budget” solving method works like the “recipe” solving method, in that it is trying to find the right “mix” of the chosen variables. When you use the budget method, however, you add the constraint that all variables must sum up to the same number as they did before Evolver started optimizing.
A “budget” is similar to a “recipe” except that all of the variables’ values must total to a certain number. That number is the total of the variables’ values at the time an optimization is started. For example, you might want to find the best way to distribute an annual budget among a number of departments. The “budget” solving method will take the total of the current values for the departments, and use that sum as the total budget to be optimally distributed.
Optimization: Basic concepts
Hill climbing will always find the best answer if a) the function being explored is smooth, and b) the initial variable values place you on the side of the highest mountain. If either condition is not met, hill climbing can end up in a local solution, rather than the global solution.
Highly non-linear problems, the kind often seen in practice, have many possible solutions across a complicated landscape. If a problem has many variables, and/or if the formulas involved are very noisy or curvy, the best answer will probably not be found with hill climbing, even after trying hundreds of times with different starting points. Most likely, a sub-optimal, and extremely local solution will be found (see Figure 10).
Figure 10: Simple and complicated landscapes.
Evolver does not use hill climbing. Rather, it uses a stochastic, directed search technique, dubbed a genetic algorithm. This lets Evolver jump around in the solution space of a problem, examining many combinations of input values without getting stuck in local optima. In addition, Evolver lets good scenarios “communicate” with each other to gain valuable information as to what the overall solution landscape looks like, and then uses that information to better guess which scenarios are likely to be successful. If you have a complex or highly non-linear problem, you should, and often must, use Evolver.
Figure 11: Evolver generates many possible scenarios, then refines the search based on the feedback it receives.
Appendix 5: OLAP in Project Management
Microsoft Project Server 2007 provides a number of preconfigured OLAP cubes that you can use to explore project, resource, and task information within your data analysis views. Additionally, you can add custom fields to customize the cubes that enable you to extend the Project Server 2007 OLAP cube. For example, you might choose to add an employer or language dimension for the resources cube, or you might want to show more corporate data by adding measure fields for non-project costs to the OLAP cube.
Figure 12: The process of building the Microsoft Project Server 2007 OLAP cube database.
Figure 13: The Cube Configuration page (web-based access to Project Server) allows you to customize OLAP cubes for your Project Server Instance.
Figure 14: OLAP cubes generated by Project Server 2007 (using Web-based access) can be accessed by other tools, such as Excel 2007.
The Data Analysis feature of Project Server 2007 makes use of Microsoft Office Web Components, which is a collection of Microsoft ActiveX components. Project Server 2007 uses these components to access OLAP cube data that is in the Analysis Services database. Users can interact with this data in PWA (Project Web Access) and Project Professional 2007 by using fully interactive PivotTables and PivotChart reports.
- Berghage, T. et al Predicting the Unpredictable: How Neural Networks Enhance Predictability and Performance in Portfolio Management, Acacia (2007)
- Heaton, J. Introduction to Neural Networks with Java, Heaton (2005)
- McNelis, P. Neural Networks in Finance: Gaining Predictive Edge in the Market, Elsevier (2005)
- Samarasinghe, S. Neural Networks for Applied Sciences and Engineering: From Fundamentals to Complex Pattern, Auerbach (2007)
- Jelen, B. et al Customize a pivot table in Microsoft Office Excel 2007, Que (2006)
- Jelen, B. et al Pivot Table Data Crunching for Microsoft Office Excel 2007, Que (2006)
- Berson, A. et al Data Warehousing, Data Mining, and OLAP, McGraw-Hill (1997)
- QuantumPM Microsoft Office Project Server 2007 Unleashed, SAMS (2008)
- Basic Concepts for Neural Networks: http://www.cheshireeng.com/Neuralyst/nnbg.htm
- Basic Introduction to Neural Networks: http://www.cs.duke.edu/brd/Teaching/Previous/AI/Lectures/NN/neural.html
- Linear Neural networks: http://youtube.com/watch?v=cM1CF7gUm-U
- Introduction to Neural Networks: http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html
- Multi layered neural networks: http://youtube.com/watch?v=188qdP70124
- Neural Networks: http://youtube.com/watch?v=6ixqKw7uK6o
- The Next Generation of Neural Networks: http://youtube.com/watch?v=AyzOUbkUf3M
- Business Intelligence Demonstration: http://youtube.com/watch?v=-j5J7lXav7Y&feature=related
- Data Mining Techniques: http://www.cs.sunysb.edu/~cse634/students’_presentations/neural.ppt
About the Author
Marcia Gulesian is an IT strategist, hands-on practitioner, and advocate for business-driven architectures. She has served as software developer, project manager, CTO, and CIO. Marcia is the author of well more than 100 feature articles on IT, its economics, and its management.