In an earlier tutorial, “Merging Data Files with Statistica, Part 1,” we introduced using Statistica for merging spreadsheets. We discussed the concatenation merge mode. In this tutorial, we shall discuss two other modes: using case names and variable names. This tutorial has the following sections:
Using Case Names to Merge Data Files
Next, we shall merge data files (spreadsheets) by matching the rows (also called cases). If the rows have the same case names, the data in the rows from the two data files gets merged. The example data files we used in the preceding article do not include a case name. The case name is specified in the 1st column, the column before the data columns. Using the same data as for concatenating data files, add the case names (log1 to log6) to rows in the wlslog1.sta spreadsheet, as shown in Figure 1.
Figure 1: Spreadsheet wlslog1
Similarly, add case names (log1 to log6) to each row in wlslog2.sta, as shown in Figure 2.
Figure 2: Spreadsheet wlslog2
Select Data>Merge and, in Merge Options, select Mode as Match casenames, as shown in Figure 3. Click OK.
Figure 3: Merging wlslog1 and wlslog2
The data in the wlslog1.sta spreadsheet gets merged with the data in the wlslog2.sta spreadsheet, as shown in the resulting spreadsheet in Figure 4.
Figure 4: Merged File
When merging by matching casenames, each of the data files to merge must include case names, or the error shown in Figure 5 gets displayed.
Figure 5: Case names are required when merging by matching casenames
One spreadsheet may have more cases (or rows) than the other. As an example, add a 7th row to wlslog1.sta (see Figure 6). Click Merge to merge the spreadsheets.
Figure 6: Merge with a 7th Row in wlslog1.sta
Merge by matching casenames with wlslog2.sta, which is the same as before with 6 cases (rows), as shown in Figure 28. The spreadsheets to merge have unmatched cases (one spreadsheet has more cases than the other). Unmatched cases are merged by filling with missing data by default, which implies the data values are empty. The resulting spreadsheet has empty missing data for unmatched cases, as shown in Figure 7.
Figure 7: Resulting Spreadsheet has Empty Missing Data
Merge Options provides some options for Unmatched Cases other than fill with missing data. To demonstrate, use a spreadsheet, wlslog1.sta, with an extra row and also a duplicate case name (log2), as shown in Figure 8.
Figure 8: Spreadsheet with Duplicate Case Name
The unmatched cases may be deleted by selecting Delete cases in File 1 Unmatched Cases, as shown in Figure 9. Multiple cases are fixed by selecting “Drop File 1 multiples”. With Merge Mode as Match Casenames, click OK.
Figure 9: File 1 Unmatched Cases>Delete cases
The resulting spreadsheet has both issues fixed. The unmatched case is deleted and the duplicate case is dropped, as shown in Figure 10.
Figure 10: Resulting Spreadsheet with Unmatched case Deleted and the Duplicate case Dropped
Using Variable Names to Merge Data Files
Next, we shall merge spreadsheets by matching variable names. Start with two spreadsheets, wlslog1.sta and wlslog2.sta, each with the column names shown in Figure 11.
Figure 11: Columns Names in wlslog1 and wlslog2
Add the following data to wlslog1.sta.
4-8-2014-7:06:16,Notice,WebLogicServer,AdminServer,BEA-000365, STANDBY 4-8-2014-7:06:17,Notice,WebLogicServer,AdminServer,BEA-000365, RESUMING 4-8-2014-7:06:18,Notice,WebLogicServer,AdminServer,BEA-000365, ADMIN
The wlslog1.sta spreadsheet is shown in Figure 12.
Figure 12: Spreadsheet wlslog1.sta
Add the following data to wlslog2.sta.
4-8-2014-7:06:20,Notice,WebLogicServer,AdminServer,BEA-000331, STARTING 4-8-2014-7:06:21,Notice,WebLogicServer,AdminServer,BEA-000365, STARTED 4-8-2014-7:06:22,Notice,WebLogicServer,AdminServer,BEA-000360, RUNNING
The wlslog2.sta is shown in Figure 13. Select Data>Merge as before.
Figure 13: Spreadsheet wlslog2.sta
In Merge Options, select Mode as Match variables, as shown in Figure 14. Select File 1 as wlslog1.sta and File 2 as wlslog2.sta. The order is important because the spreadsheet to be added to the bottom of the other must be File 2. Keep the Match Criteria as By Auto, which automatically chooses the most appropriate merge criteria. The other options for Match Criteria are By Text, which compares data by comparing text; and By Numeric, which compares data by comparing the numeric values. Next, click Select to select the variables to match.
Figure 14: Merge Mode as Match Variables
First, select matching variables for the current file (File 1). Click Select All and click OK, as shown in Figure 15.
Figure 15: Selecting Variables in the Current File
Similarly, select all variables for merge file (File 2) and click OK (see Figure 16).
Figure 16: Selecting Variables in Merge File
Click OK in Merge Options, as shown in Figure 17.
Figure 17: Merging with Mode as Match Variables
The two spreadsheets get merged by matching variable names, as shown in Figure 18.
Figure 18: Resulting Spreadsheet from Merging by Matching Variable Names
When merging spreadsheets by matching variable names, the data values are sorted numerically and textually. As an example, merge two spreadsheets with the 1st spreadsheet, shown in Figure 19.
Figure 19: First Spreadsheet to Merge
The 2nd spreadsheet is shown in Figure 20. A modification added is that the variable name has been modified slightly in File 1: “ServerType” instead of “servername”, “MessageCode” instead of “code”, and “Message” instead of “msg”.
Figure 20: Second Spreadsheet to Merge
Click Select to select the variables to be used for matching. In File 1, select all the variables (see Figure 21).
Figure 21: Selecting Matching Variables for Current File
In File 2, also select all the variables, as shown in Figure 22.
Figure 22: Selecting Matching Variables for Merge file
Merge the two spreadsheets as before. The “servername” or “ServerType” is the same for all rows and does not contribute to the sorting of data in the resulting spreadsheet. The “code” or “MessageCode” column data values are sorted as Text case insensitive; BEA-000331 is sorted before BEA-000360, which is sorted before BEA-000365. For the same value for code BEA-000365, the “msg” or “Message” column data is sorted by Text also—ADMIN->RESUMING->STANDBY>STARTING—as shown in Figure 23.
Figure 23: Resulting Spreadsheet
Certain conditions must be applied when selecting variables. At least one variable must be selected for matching, or the error shown in Figure 24 gets generated.
Figure 24: A minimum of 1 Variable must be selected
The number of variables selected must be the same in File 1 and File 2, or the error shown in Figure 25 gets generated.
Figure 25: Same Number of Variables must be selected in Spreadsheets to Merge
The data type of the variables selected must be the same for the variables selected. As an example, the “servername” and “ServerType” variables in File 1 and File 2 respectively must have the same data type, or the error shown in Figure 26 gets generated.
Figure 26: Variable Types must be the same when merging by matching Variables
Conclusion
In this tutorial, we discussed merging data files (also called spreadsheets) in Statistica Platform using modes: Match casenames and Match variables.