Architecture & DesignMerging Data Files with Statistica, Part 2

Merging Data Files with Statistica, Part 2

Developer.com content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

In an earlier tutorial, “Merging Data Files with Statistica, Part 1,” we introduced using Statistica for merging spreadsheets. We discussed the concatenation merge mode. In this tutorial, we shall discuss two other modes: using case names and variable names. This tutorial has the following sections:

Using Case Names to Merge Data Files

Next, we shall merge data files (spreadsheets) by matching the rows (also called cases). If the rows have the same case names, the data in the rows from the two data files gets merged. The example data files we used in the preceding article do not include a case name. The case name is specified in the 1st column, the column before the data columns. Using the same data as for concatenating data files, add the case names (log1 to log6) to rows in the wlslog1.sta spreadsheet, as shown in Figure 1.

Spreadsheet wlslog1
Figure 1: Spreadsheet wlslog1

Similarly, add case names (log1 to log6) to each row in wlslog2.sta, as shown in Figure 2.

Spreadsheet wlslog2
Figure 2: Spreadsheet wlslog2

Select Data>Merge and, in Merge Options, select Mode as Match casenames, as shown in Figure 3. Click OK.

Merging wlslog1 and wlslog2
Figure 3: Merging wlslog1 and wlslog2

The data in the wlslog1.sta spreadsheet gets merged with the data in the wlslog2.sta spreadsheet, as shown in the resulting spreadsheet in Figure 4.

Merged File
Figure 4: Merged File

When merging by matching casenames, each of the data files to merge must include case names, or the error shown in Figure 5 gets displayed.

Case names are required when merging by matching casenames
Figure 5: Case names are required when merging by matching casenames

One spreadsheet may have more cases (or rows) than the other. As an example, add a 7th row to wlslog1.sta (see Figure 6). Click Merge to merge the spreadsheets.

Merge with a 7th Row in wlslog1.sta
Figure 6: Merge with a 7th Row in wlslog1.sta

Merge by matching casenames with wlslog2.sta, which is the same as before with 6 cases (rows), as shown in Figure 28. The spreadsheets to merge have unmatched cases (one spreadsheet has more cases than the other). Unmatched cases are merged by filling with missing data by default, which implies the data values are empty. The resulting spreadsheet has empty missing data for unmatched cases, as shown in Figure 7.

Resulting Spreadsheet has Empty Missing Data
Figure 7: Resulting Spreadsheet has Empty Missing Data

Merge Options provides some options for Unmatched Cases other than fill with missing data. To demonstrate, use a spreadsheet, wlslog1.sta, with an extra row and also a duplicate case name (log2), as shown in Figure 8.

Spreadsheet with Duplicate Case Name
Figure 8: Spreadsheet with Duplicate Case Name

The unmatched cases may be deleted by selecting Delete cases in File 1 Unmatched Cases, as shown in Figure 9. Multiple cases are fixed by selecting “Drop File 1 multiples”. With Merge Mode as Match Casenames, click OK.

File 1 Unmatched Cases>Delete cases
Figure 9: File 1 Unmatched Cases>Delete cases

The resulting spreadsheet has both issues fixed. The unmatched case is deleted and the duplicate case is dropped, as shown in Figure 10.

Resulting Spreadsheet with Unmatched case Deleted and the Duplicate case Dropped
Figure 10: Resulting Spreadsheet with Unmatched case Deleted and the Duplicate case Dropped

Using Variable Names to Merge Data Files

Next, we shall merge spreadsheets by matching variable names. Start with two spreadsheets, wlslog1.sta and wlslog2.sta, each with the column names shown in Figure 11.

Columns Names in wlslog1 and wlslog2
Figure 11: Columns Names in wlslog1 and wlslog2

Add the following data to wlslog1.sta.

4-8-2014-7:06:16,Notice,WebLogicServer,AdminServer,BEA-000365,
   STANDBY
4-8-2014-7:06:17,Notice,WebLogicServer,AdminServer,BEA-000365,
   RESUMING
4-8-2014-7:06:18,Notice,WebLogicServer,AdminServer,BEA-000365,
   ADMIN

The wlslog1.sta spreadsheet is shown in Figure 12.

Spreadsheet wlslog1.sta
Figure 12: Spreadsheet wlslog1.sta

Add the following data to wlslog2.sta.

4-8-2014-7:06:20,Notice,WebLogicServer,AdminServer,BEA-000331,
   STARTING
4-8-2014-7:06:21,Notice,WebLogicServer,AdminServer,BEA-000365,
   STARTED
4-8-2014-7:06:22,Notice,WebLogicServer,AdminServer,BEA-000360,
   RUNNING

The wlslog2.sta is shown in Figure 13. Select Data>Merge as before.

Spreadsheet wlslog2.sta
Figure 13: Spreadsheet wlslog2.sta

In Merge Options, select Mode as Match variables, as shown in Figure 14. Select File 1 as wlslog1.sta and File 2 as wlslog2.sta. The order is important because the spreadsheet to be added to the bottom of the other must be File 2. Keep the Match Criteria as By Auto, which automatically chooses the most appropriate merge criteria. The other options for Match Criteria are By Text, which compares data by comparing text; and By Numeric, which compares data by comparing the numeric values. Next, click Select to select the variables to match.

Merge Mode as Match Variables
Figure 14: Merge Mode as Match Variables

First, select matching variables for the current file (File 1). Click Select All and click OK, as shown in Figure 15.

Selecting Variables in the Current File
Figure 15: Selecting Variables in the Current File

Similarly, select all variables for merge file (File 2) and click OK (see Figure 16).

Selecting Variables in Merge File
Figure 16: Selecting Variables in Merge File

Click OK in Merge Options, as shown in Figure 17.

Merging with Mode as Match Variables
Figure 17: Merging with Mode as Match Variables

The two spreadsheets get merged by matching variable names, as shown in Figure 18.

Resulting Spreadsheet from Merging by Matching Variable Names
Figure 18: Resulting Spreadsheet from Merging by Matching Variable Names

When merging spreadsheets by matching variable names, the data values are sorted numerically and textually. As an example, merge two spreadsheets with the 1st spreadsheet, shown in Figure 19.

First Spreadsheet to Merge
Figure 19: First Spreadsheet to Merge

The 2nd spreadsheet is shown in Figure 20. A modification added is that the variable name has been modified slightly in File 1: “ServerType” instead of “servername”, “MessageCode” instead of “code”, and “Message” instead of “msg”.

Second Spreadsheet to Merge
Figure 20: Second Spreadsheet to Merge

Click Select to select the variables to be used for matching. In File 1, select all the variables (see Figure 21).

Selecting Matching Variables for Current File
Figure 21: Selecting Matching Variables for Current File

In File 2, also select all the variables, as shown in Figure 22.

Selecting Matching Variables for Merge file
Figure 22: Selecting Matching Variables for Merge file

Merge the two spreadsheets as before. The “servername” or “ServerType” is the same for all rows and does not contribute to the sorting of data in the resulting spreadsheet. The “code” or “MessageCode” column data values are sorted as Text case insensitive; BEA-000331 is sorted before BEA-000360, which is sorted before BEA-000365. For the same value for code BEA-000365, the “msg” or “Message” column data is sorted by Text also—ADMIN->RESUMING->STANDBY>STARTING—as shown in Figure 23.

Resulting Spreadsheet
Figure 23: Resulting Spreadsheet

Certain conditions must be applied when selecting variables. At least one variable must be selected for matching, or the error shown in Figure 24 gets generated.

A minimum of 1 Variable must be selected
Figure 24: A minimum of 1 Variable must be selected

The number of variables selected must be the same in File 1 and File 2, or the error shown in Figure 25 gets generated.

Same Number of Variables must be selected in Spreadsheets to Merge
Figure 25: Same Number of Variables must be selected in Spreadsheets to Merge

The data type of the variables selected must be the same for the variables selected. As an example, the “servername” and “ServerType” variables in File 1 and File 2 respectively must have the same data type, or the error shown in Figure 26 gets generated.

Variable Types must be the same when merging by matching Variables
Figure 26: Variable Types must be the same when merging by matching Variables

Conclusion

In this tutorial, we discussed merging data files (also called spreadsheets) in Statistica Platform using modes: Match casenames and Match variables.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories