Azure Machine Learning (Azure ML) How-to
Introduction to Azure ML
Azure Machine Learning (Azure ML) is a SAAS cloud offering by Microsoft. The advantage of Azure ML is that it provides a UI-based interface and pre-defined algorithms that can be used to create a training model. It also supports R and Python script integration. This article explains how to create a training model and then deploy it as a Web service. We will create a "book recommender" model as an example.
Before we start with the actual experiment, let's look at the prerequisites.
Figure 1: Creating the workspace
Figure 2: Search results
Navigate through the steps and finish the workspace creation. Once the workspace is created, navigate to the workspace and click "Launch Machine Learning Studio."
Figure 3: Launching Machine Learning Studio
ML Studio is a UI-based editor that provides a set of predefined algorithms to create a training model. The most popular algorithms have already been created, ready to be used in an experiment. ML Studio gives an easy and a quick way to create ML experiments and validate them.
Let's understand it with the help of an example. In this example, we will create a book recommender experiment. We will upload two new datasets, "book-ratings" and "book". The "book" is a master set of books with the following columns: ISBN, Book-Title, Book-Author, Year-of-Publication, and Publisher. The "book-ratings" has the following columns: User-ID, ISBN, and Rating.
Navigate to ML Studio https://studio.azureml.net.
On the left panel, select "Dataset;" then, click the "New" button as shown in Figure 4:
Figure 4: Starting a new dataset
Clicking "New" opens up a screen to upload a new csv file. Click "From Local File" and upload the file. Because only one file can be uploaded at a time, we need to do this action twice to upload "Book Ratings" and "Book" data.
Figure 5: Uploading a local file
Clicking "From Local File" opens up a dialog window, as shown in Figure 6:
Figure 6: The Upload a new dataset window
Choose a file to upload. Leave the "Select a Type for the new dataset" as "Generic CSV File with a header (.csv)" as is and save the changes. Repeat the same process for Books.csv as well.
Once the datasets are created, we are ready to create an experiment. Select Experiment from the left panel (as seen in Figure 5) and click "New." Select "Blank Experiment," as shown in Figure 7.
Figure 7: The Blank Experiment window
This opens a canvas with a panel on the left with a number of modules listed. These modules can be dragged-dropped on the canvas, as in Figure 8:
Figure 8: The canvas with a panel and modules
Give the experiment a name—for example, "Books Experiment." Expand "My Datasets" on the left panel (refer to Figure 8), select "Book rating," and drag-drop it on the canvas. Right-click the circle and click Visualize to see the data and the column heading names (see Figure 9).
Figure 9: Clicking Visualize
Once the dataset is added, the next step is to cleanse the data to ensure the experiment gives the desired results. As a data cleansing process in this example, the records are filtered based on the rating; in other words, if they don't have any ratings or the rating is 0. On the left panel, search for "Split," select "Split Data," and then drag-drop it on the canvas.
Figure 10: Selecting Split Data
Connect the two modules as shown in Figure 11:
Figure 11: Connecting the two modules
Select the "Split Data" and, in the properties panel, select the "splitting mode" as "Relative Expression" and the "Relational Expression" as ‘\"Book-Rating" != 0', where "Book-Rating" is a column name in the "Book-Ratings" dataset.
|Note: Column names are case sensitive.|
We further split the data so that a few records can be used to train the model and the rest to score the model. The original dataset is divided 50-50—50% of the data will be used to train the model and the other 50% will be used to score it. This ratio can be adjusted to 80-20 or 70-30.
Figure 12: Adjusting the ratio
The next step is to train the model. Because we are building a "book recommender," in the left panel search for "recommender." It brings up "Train Matchbox Recommender," "Score Matchbox Recommender," and "Evaluate recommender." We will use all the three in this experiment.
Figure 13: Finding Train Matchbox Recommender
First, add the "Train Matchbox Recommender" to the canvas and connect, as shown in Figure 14:
Figure 14: Adding Train Matchbox Recommender to the canvas
|Note: Hover over the nodes and it displays the kind of data supported by the node. An example can be seen in Figure 15.
Now, we add the "Score Matchbox Recommender" to the canvas. This has a few input nodes and they expect different types of data. The first node expects the output from the "Trained Matchbox Recommender" and the second node expects the dataset to score against—the second half of the split data. The connected model is as shown in Figure 16:
Figure 16: The connected model
Click "Score Matchbox Recommender" and set its properties as demonstrated in Figure 17:
Figure 17: Setting the Score Matchbox Recommender properties
The last step is to add the "Evaluate Recommender." Evaluates' first node takes a "Test dataset"—the second part of the split—and the second node takes input from the "Score Matchbox Recommender." The updated model is shown in Figure 18:
Figure 18: Adding Evaluate Reminder
Run the experiment to visualize the data. Once the experiment is done, it will display a green icon against all the modules. However, when we visualize the output of the "Score Matchbox Recommender," the result is in some IDs with no book titles (see Figure 19).
Figure 19: Showing IDs, but no titles
The scored dataset has "Item" and "Related Item 1" columns, both contain the ISBN values and it's difficult to interpret with just IDs. To make it more readable, we would need two joins with the "book" dataset to get the book titles. Add the "Book" dataset to the canvas and add "Select Columns in Dataset" to the canvas. Connect the modules (see Figure 20):
Figure 20: Connecting the modules
Select the "Select Columns in Dataset" and, in the properties panel, click "Launch the column selector." Select the columns, as shown in Figure 21:
Figure 21: Selecting the columns
Now, add two "Join Data" modules to the canvas and connect the modules (see Figure 22):
Figure 22: Connecting the modules
Select the first "Join Data" and, in the properties panel, select the keys to perform the join. In the first "Join Data," inner join the "Item" column from the "Score Matchbox Recommender" output dataset to the "ISBN" column of the "Books" dataset.
Figure 23: Performing a Join
In the second "Join Data," inner join "Related Item 1" column from the "Join Data" output to the "ISBN" column of the "Books" dataset.
Figure 24: Inner joining two columns
Run the experiment again. Now, visualize the data at the second "Join Data" and it displays the title names. The Final experiment looks as shown in Figure 25:
Figure 25: The final experiment
After executing the experiment successfully, click "Predictive Web service" (see Figure 26).
Figure 26: Clicking Predictive Web service
The output of the "Predictive Web service" looks as shown in Figure 27:
Figure 27: The output of Predictive Web service
Run the Predictive model; the "Score Matchbox Recommender" displays the data with ISBN IDs. The join information needs to be added again because it was added in the training model. The only difference is that the "Join Data" output is added as an input to the "Web Service Output." After adding the joins to the Predictive model, the model looks as indicated in Figure 28:
Figure 28: After adding the joins
Once the Predictive Model runs successfully select "Deploy Web service [Classic]" as shown in Figure 29:
Figure 29: Selecting Deploy Web service (Classic)
The published Web service provides an API key to access the Web service. Verify the Web service by clicking the "Test" links/buttons under the "Default Endpoint" --> "Test" column.
Figure 30: Verifying the Web service
In the preceding example, we could create a training and predictive model without writing any code. ML studio makes it easier because popular algorithms are already defined for use. It also provides modules to run custom R and Python scripts. Azure ML provides an easy and faster way to create training models. For beginners, it's a great place to start.