Architecture & DesignUsing RapidMiner with MySQL Tables

Using RapidMiner with MySQL Tables

In an earlier article, “Merging Database Tables with RapidMiner,” we started with introducing the different process operators for merging database tables. We created the database tables to merge and created a RapidMiner process. Subsequently, we created a connection with MySQL database and used the UNION operator to merge two database tables. In this article, we shall discuss two other process operators: APPEND and UPDATE. This article has the following sections:

Appending Database Tables with the APPEND Operator

Next, we shall use the Append operator to append ExampleSet generated from one database table with ExampleSet generated from the 2nd ExampleSet and output the resultant ExampleSet to the wlslog table. For the Append example, we shall create the database table “wlslog” automatically when the Process is run. Delete the wlslog table created for the previous article, as shown in Figure 1.

Deleting the wlslog table
Figure 1: Deleting the wlslog table

Create another RapidMiner Process or use the same one, but to start with, only the Read Database and Read Database (2) operators should be included in the Process. Next, add the Append operator. Right-click the Process and select Insert Operator>Blending>Table>Joins>Append, as shown in Figure 2.

Insert Operator>Blending>Table>Joins>Append
Figure 2: Insert Operator>Blending>Table>Joins>Append

The Append operator gets added, as shown in Figure 3.

Append Operator added
Figure 3: Append Operator added

Arrange the operators with Process>Arrange Operators and add connections to connect the two Read Database operators with the Append operator, as shown in Figure 4.

Connecting Read Database and Append Operators
Figure 4: Connecting Read Database and Append Operators

Next, add a Write Database operator. Right-click the Process and select Insert Operator>Data Access>Database>Write Database, as shown in Figure 5.

Insert Operator>Data Access>Database>Write Database
Figure 5: Insert Operator>Data Access>Database>Write Database

Connect the output from the Append to the input of the Write Database, as shown in Figure 6.

Connecting output from the Append to Write Database
Figure 6: Connecting output from the Append to Write Database

Configure the Write Database as before except that you need to specify (instead of selecting a pre-existing table) the “table name” as wlslog and select “overwrite mode” as “none,” as shown in Figure 7.

Configuring the Write Database
Figure 7: Configuring the Write Database

Optionally, specify a batch size other than the default setting of 1, as shown in Figure 8.

Specifying batch size
Figure 8: Specifying batch size

Run the Process with Process>Run Process Locally, as shown in Figure 9.

Process>Run Process Locally
Figure 9: Process>Run Process Locally

The Process runs to completion, as shown in Figure 10.

Process runs to completion
Figure 10: Process runs to completion

The wlslog table, which we had deleted, gets created and a SQL query lists the merged table data, as shown in Figure 11.

Merged table
Figure 11: Merged table

Differences Between APPEND and UNION

Next, we shall demonstrate the difference between Append and Union. As discussed before, Append requires the ExampleSets that are to be combined have the same number of attributes and the same attribute names. The Union operator does not have any such requirement. Drop the wlslog table because we shall be creating the database table for the combined table data automatically when the Process is run, as shown in Figure 12.

Dropping the wlslog table
Figure 12: Dropping the wlslog table

Modify the table definition for the wlslog1 table to include another attribute. First, drop the wlslog1 table and subsequently create the wlslog1 table with an additional attribute.

CREATE TABLE WLSLOG1(LOGID VARCHAR(255) PRIMARY KEY,
   SERVERNAME VARCHAR(255),CODE VARCHAR(255),
   MSG VARCHAR(255));

The WLSLOG1 table gets created, as shown in Figure 13.

Creating wlslog1 table
Figure 13: Creating wlslog1 table

Add table data to WLSLOG1.

INSERT INTO WLSLOG1(LOGID,SERVERNAME,CODE, MSG)
   values("log1","BEA-000365","Server state changed to STANDBY" );
INSERT INTO WLSLOG1(LOGID,SERVERNAME,CODE, MSG)
   values("log2","BEA-000365","Server state changed to STARTING");
INSERT INTO WLSLOG1(LOGID,SERVERNAME,CODE, MSG)
   values("log3","BEA-000365","Server state changed to ADMIN");

Table data gets added to WLSLOG1, as shown in Figure 14.

Adding data to WLSLOG1
Figure 14: Adding data to WLSLOG1

We shall use the same Process as was used for the previous section, with some modifications. The Read Database operator keeps data in memory and, if the underlying table is changed, we need to replace the operator, too. To replace the Read Database operator, right-click Read Database and select Replace Operator>Data Access>Database>Read Database, as shown in Figure 15.

Read Database>Replace Operator>Data Access>Database>Read Database
Figure 15: Read Database>Replace Operator>Data Access>Database>Read Database

Similarly, replace the Append operator with a new Append operator, as shown in Figure 16.

Replacing the Append Operator
Figure 16: Replacing the Append Operator

Set the table name for Read Database as wlslog1, as shown in Figure 17.

Setting Table name for Read Database Operator
Figure 17: Setting Table name for Read Database Operator

The Metadata for Read Database should display an additional attribute, SERVERNAME, as shown in Figure 18.

Setting Table name for Read Database Operator
Figure 18: Setting Table name for Read Database Operator

Run the process with Process>Run Process Locally, as shown in Figure 19.

Process>Run Process Locally
Figure 19: Process>Run Process Locally

An error message gets displayed, “Example sets are not compatible,” as shown in Figure 20. Because we had only added the additional column SERVERNAME to wlslog1 and not to wlslog2, the number of columns in the two database tables are different and, as a result, the number of attributes in the two ExampleSets are different.

Example sets are not compatible
Figure 20: Example sets are not compatible

The Union operator would have combined the two ExampleSets with a different number of attributes. To demonstrate, replace the Append operator with the Union operator, as shown in Figure 21.

Replacing the Append operator with the Union operator
Figure 21: Replacing the Append operator with the Union operator

Set the table name for the Write Database as wlslog, as shown in Figure 22.

Setting Table Name of Write Database Operator
Figure 22: Setting Table Name of Write Database Operator

Run the process with Process>Run Process Locally, as shown in Figure 23.

Process>Run Process Locally
Figure 23: Process>Run Process Locally

The process runs to completion, as shown in Figure 24.

Process runs to completion
Figure 24: Process runs to completion

Run a SQL query in the wlslog table, as shown in Figure 25. Because we had only added the additional column SERVERNAME to wlslog1 and not to wlslog2, the ExampleSet generated from the wlslog table does not include the SERVERNAME attribute. As a result, the wlslog table generated from combining the ExampleSets does not include the SERVERNAME column in some of the table rows, the table rows generated from the wlslog2 table data. Although Append won’t combine the two ExampleSets with different number of attributes, Union does and adds a NULL value for the missing column value.

UNION adds NULL values for missing data
Figure 25: UNION adds NULL values for missing data

To add data to the columns with a NULL value, drop the WLSLOG2 table and create the table again with the SERVERNAME column included.

CREATE TABLE WLSLOG2(LOGID VARCHAR(255) PRIMARY KEY,
   SERVERNAME VARCHAR(255),CODE VARCHAR(255),
   MSG VARCHAR(255));

The WLSLOG2 table gets created, which you can see in Figure 26.

Creating WLSLOG2 table
Figure 26: Creating WLSLOG2 table

Add data to the WLSLOG2 table. The log4 id data has two of the column data switched. The column data has been switched to demonstrate the use of another operator, the Update Database operator, in the next section.

INSERT INTO WLSLOG2(LOGID,SERVERNAME,CODE, MSG)
   values("log4","BEA-000331","AdminServer",
   "Server state changed to RESUMING");
INSERT INTO WLSLOG2(LOGID,SERVERNAME,CODE, MSG)
   values("log5","AdminServer","BEA-000365",
   "Started WebLogic AdminServer");
INSERT INTO WLSLOG2(LOGID,SERVERNAME,CODE, MSG)
   values("log6","AdminServer","BEA-000365",
   "Server state changed to RUNNING");
INSERT INTO WLSLOG2(LOGID,SERVERNAME,CODE, MSG)
   values("log7","AdminServer","BEA-000360",
   "Server started in RUNNING mode");

The WLSLOG2 table data includes data for the additional column SERVERNAME, as shown in Figure 27.

Adding data to WLSLOG2 table
Figure 27: Adding data to WLSLOG2 table

Select the overwrite mode as “overwrite” to overwrite the WLSLOG2 data, as shown in Figure 28.

Setting overwrite mode
Figure 28: Setting overwrite mode

The Process gets configured for another run, as shown in Figure 29.

Process configured
Figure 29: Process configured

Run the Process with Process>Run Process Locally, as shown in Figure 30.

Process>Run Process Locally
Figure 30: Process>Run Process Locally

The process runs to completion, as shown in Figure 31.

Process runs to completion
Figure 31: Process runs to completion

List the WLSLOG data with a SQL query and the previously NULL column values are replaced with data values, as shown in Figure 32.

NULL values replaced with data
Figure 32: NULL values replaced with data

The Append operator should be used if the number of columns and column names in two database tables to be combined are the same. The Union operator should be used if the number of columns or/and the column names are different.

Updating Database Tables with the UPDATE Operator

In the previous section, the WLSLOG2 table had data switched in two of the columns; this resulted in a merged data that was also switched in two columns. To fix the data, we don’t need to drop all tables and run the process again with data fixed in the WLSLOG1 and WLSLOG2 tables. The Update Database operator is meant for just such a data fix issue in which only a subset of the data needs to be updated.

First, we need to drop the WLSLOG2 table and replace with a table in which the data is not switched in two columns for the log4 data.

INSERT INTO WLSLOG2(LOGID,SERVERNAME,CODE, MSG)
   values("log4","AdminServer","BEA-000331",
   "Server state changed to RESUMING");
INSERT INTO WLSLOG2(LOGID,SERVERNAME,CODE, MSG)
   values("log5","AdminServer","BEA-000365",
   "Started WebLogic AdminServer");
INSERT INTO WLSLOG2(LOGID,SERVERNAME,CODE, MSG)
   values("log6","AdminServer","BEA-000365",
   "Server state changed to RUNNING");
INSERT INTO WLSLOG2(LOGID,SERVERNAME,CODE, MSG)
   values("log7","AdminServer","BEA-000360",
   "Server started in RUNNING mode");

The WLSLOG2 table gets created with data not switched in two columns, as shown in Figure 33.

WLSLOG2 table
Figure 33: WLSLOG2 table

A SQL query on WLSLOG2 table lists the table data as fixed with no column data switched, as shown in Figure 34.

SQL Query on WLSLOG2 table
Figure 34: SQL Query on WLSLOG2 table

Next, modify the same Process to replace some of the operators. Replace the Write Database operator with the Update Database operator and also replace the Read Database (2) operator because the underlying database table WLSLOG2 has been replaced. An alternative to replacing an operator is to delete an operator and add another.

Right-click the Process and select Insert Operator>Data Access>Database>Update Database, as shown in Figure 35.

Process>Insert Operator>Data Access>Database>Update Database
Figure 35: Process>Insert Operator>Data Access>Database>Update Database

The Update Database operator gets added, as shown in Figure 36.

Update Database Operator
Figure 36: Update Database Operator

Delete the Write Database operator and add a connection from Union to Update Database, as shown in Figure 37.

Adding a Connection from Union to Update Database
Figure 37: Adding a Connection from Union to Update Database

The Read Database (2) operator needs to be configured slightly differently. We don’t need all the data from the WLSLOG2 table. We need only log4 row data for which we shall define a query instead of selecting a table name, which would select all the data in the WLSLOG2 table. Click Read Database (2) and select a query in the define query parameter, as shown in Figure 38.

Defining a query
Figure 38: Defining a query

Click Build SQL Query, as shown in Figure 39.

Build SQL Query
Figure 39: Build SQL Query

A Build SQL Query GUI gets displayed, as shown in Figure 40.

Build SQL Query GUI
Figure 40: Build SQL Query GUI

Specify the following SQL query.

SELECT LOGID,SERVERNAME,CODE,MSG from WLSLOG2
   WHERE LOGID="log4"

Click OK, as shown in Figure 41.

Specifying a SQL query
Figure 41: Specifying a SQL query

A query over a subset gets defined for Read Database (2); parameters are shown in Figure 42.

Read Database (2) parameters
Figure 42: Read Database (2) parameters

Before running the process, get an ExampleSet for Read Database (2) with right-click>Show ExampleSet Result, and it should include only a single row of data, as shown in Figure 43.

ExampleSet for Read Database (2)
Figure 43: ExampleSet for Read Database (2)

An ExampleSet for the Read Database operator lists three rows of data, as shown in Figure 44.

ExampleSet for Read Database
Figure 44: ExampleSet for Read Database

Next, configure the Update Database operator. Select the define connection as predefined and select the connection as MySQLDB. Select the scheme name as “mysql” and the table name as “wlslog”. Select the attribute filter type as “single” and select the attribute as “LOGID,” as shown in Figure 45.

ExampleSet for Read Database
Figure 45: ExampleSet for Read Database

Before running a Process with Update Database, display an ExampleSet Result, as shown in Figure 46.

Show ExampleSet Result
Figure 46: Show ExampleSet Result

The ExampleSet consists of four rows of data which do not include any error in the data, as shown in Figure 47.

ExampleSet result
Figure 47: ExampleSet result

Select Process>Run Process Locally, as shown in Figure 48.

Process>Run Process Locally
Figure 48: Process>Run Process Locally

The process completes successfully, as shown in Figure 49.

Process completes successfully
Figure 49: Process completes successfully

Run a SQL query on the wlslog table created and all the data should be error-free. For comparison, the SQL query result for the wlslog table before the update is also listed, as shown in Figure 50.

The wlslog table before and after Update
Figure 50: The wlslog table before and after Update

The “attr filter type” must not be “all” for the Update Database operator, as shown in Figure 51.

attr filter type>all
Figure 51: attr filter type>all

The attribute id values for all the attributes in the attribute filter must be the same in a row of data in the database table to update for the row to get updated. If all the id attribute values do not match, a new row gets added instead. To demonstrate, select the attribute filter type as “subset,” as shown in Figure 52, and click Select Attributes.

Selecting attribute filter type as "subset"
Figure 52: Selecting attribute filter type as “subset”

Select the CODE, LOGID, and SERVERNAME attributes. Click Apply, as shown in Figure 53.

Selecting attributes
Figure 53: Selecting attributes

Select Process>Run Process Locally, as shown in Figure 54.

Process>Run Process Locally
Figure 54: Process>Run Process Locally

The process runs to completion successfully, as shown in Figure 55.

Process runs to completion
Figure 55: Process runs to completion

But, because the log4 id row in the ExampleSet generated from Read Database (2) has the accurate CODE and SERVERNAME attributes, whereas the CODE and SERVERNAME values in the wlslog table, which is to be updated, are not accurate, not all the selected attributes LOGID, CODE, and SERVERNAME match and a row gets added instead of updating an existing row with LOGID log4, as shown in Figure 56.

Table Row added instead of updating
Figure 56: Table Row added instead of updating

Conclusion

In this article, we discussed appending and updating database tables with RapidMiner. We also discussed the difference between the UNION and APPEND operators.

Get the Free Newsletter!
Subscribe to Developer Insider for top news, trends & analysis
This email address is invalid.
Get the Free Newsletter!
Subscribe to Developer Insider for top news, trends & analysis
This email address is invalid.

Latest Posts

Related Stories