The Use of Java in Machine Learning
Artificial Neural Networks (ANN)
Neural networks learning methods provide a robust approach to approximating real-valued, discrete-valued, and vector-valued functions. The well-known algorithm Back-Propagation uses gradient descent to tune network parameters to best fit a training set with an input-output pair. This method is inspired by neurobiology. It imitates the function of the brain, where many neurons are interconnected. The instances are represented by many input-output pairs. ANN learning is robust to errors in training data and has been successfully applied to problems such as speech recognition, digital signal processing, face recognition, object recognition (extract a specific object from an image in the presence of other objects), character recognition such as OCR (Optical Character Recognition), and so on.
Bayesian reasoning provides a probabilistic approach to inference. Bayesian reasoning provides the basis for learning algorithms that directly manipulate with probabilities, as well as a framework for analysing the operation of other algorithms. A Bayesian learning algorithm that calculates explicit probabilities for hypothesis, such as the Naive-Bayes algorithm, are among the most practical approaches to certain type of learning problems. The Bayes classifier is competitive with other ML algorithms in many cases. For example, for learning to classify text documents, the Naive-Bayes classifier is one of the most effective classifiers.
In the area of Speech Processing, Bayesian methods are used extensively with ANN, Fuzzy Logic, and Digital Signal Processing techniques to process language. The smallest speech patterns that have linguistic representation in a language are called phonemes. Phonemes have three major conventional groups: vowels, semi-vowels, and consonants. Prior probabilities are assigned to phonemes depending on wether they are vowels, semi-vowels, or consonants. Speech is a sequence of waves that are transmitted over time through a medium and are characterised by physical features such as intensity and frequency. In humans, the perceived speech activates small oscillations in the inner ear that are transmitted to the brain for further processing. Java has a speech API (JSAPI), which is merely a reference implementation to be used with speech engines. Speech Engines are where the difficulty lies, in terms of technology and software development. Software developers involved in developing Speech Engines must be well versed in techniques of Machine Learning and Digital Signal Processing (DSP). The JSAPI, has no algorithm in Bayesian, DSP, ANN, or Machine Learning in general. These algorithm are all being developed at the Speech Engines level.
One area of difficulty in speech processing is filtering. Humans are quite good at filtering conversation in a noisy room. If you talk to your friend in a noisy environment like a bar, your brain filters (blocks) out other noises or speeches from other people who are talking at the same time as your friend, but receiving only the speech of your friend with whom you are currently having a conversation. Machines are still a long way away from this differential speech (signal) filtering capability of humans. Electrical engineers, computer scientists, and physicists are researching the hardware and software requirements of machine speech filtering.
Speech technology is important now and for the future. Soon we will speak to machines and they will be able to understand (Natural Language Representation). A manager will just speak such a command as: "What is the sales forecast for next month?" and the machine will responded by either printing a report on screen or spoken texts (text-to-speech). Computing with words is made possible with Fuzzy Logic.
Reinforcement learning solves the task and addresses how the agent, that senses and acts in an environment, can learn to choose optimal actions to reach its goal. Each time the agent performs an action in its environment, a trainer may provide a reward or penalty to indicate the convenience of the resulting state. For example, when an agent is trained to play a game, the trainer might provide a positive reward when the game is won, a negative reward when it is lost, and zero reward in other states. The task of the agent is to learn from this delayed reward, to choose sequences of actions that produce the greatest cumulative reward. An algorithm that can acquire optimal control strategies from delayed reward is called Q-learning.
This method can solve problems such as learning to control a mobile robot, learning to optimise operations in factories, and learning to plan therapeutic procedures. Consider building a learning robot or agent that has a set of sensors to observe the state of its environment, and a set of actions it can perform to alter this state. The sensors might be a digital camera and sonar where there are such actions as moving forward and turning. The robot's task is to learn a control strategy for choosing actions that achieve its goals. Perhaps one goal is docking onto its battery charger whenever its battery level is low. The goal of docking to the battery charger can be captured by assigning a positive reward, say a number of +10 to state-action transitions that immediately result in a connection to the charger, and a reward of zero for every other state-action transition. The reward function may be built into the robot or known only to an external teacher who provides the reward value for each action performed by the robot. The task of the robot is to perform sequences of actions , observe their consequences, and learn a control policy. The control policy that is desired is the one that, from any initial state, chooses actions that maximize the reward accumulated over time by the agent.
The robot example is a generalized type of reinforcement learning. Maximizing cumulative reward covers many problems beyond robot learning tasks. In general, the problem is one of learning to control sequential processes. This includes, for example, manufacturing optimization problems in which a sequence of manufacturing actions must be chosen, and the reward to be maximized is the value of the goods produced minus the costs involved. It also includes software applications for call centres such as sequential scheduling problems in choosing which taxis to send for passengers where the reward to be maximized is a function of wait time of the passengers and the total fuel costs of the taxi fleet.
Inductive Logic Programing (ILP)
Inductive logic programming has its roots in concept learning from examples, a relatively straightforward form of induction. The aim of concept learning is to discover, from a given set of pre-classified examples, a set of classification rules with high predictive power. The theory of ILP is based on proof theory and model theory for the first order predicate calculus. Inductive hypothesis formation is characterized by techniques including inverse resolution, relative least general generalisations, inverse implication, and inverse entailment. This method can be used to create logical programs from training data set. The final program should be able to generate that data back. The creation of logical programs is very dependent on task complexity. In many cases, this method is not usable without many restrictions posed on the final program. ILP is used successfully in Data Mining for finding rules in huge databases.
Case-Based Reasoning (CBR)
Case-Based Reasoning is a lazy learning algorithm that classifies a new query instance by analysing similar instances while ignoring instances that are very different from the query. This method holds all previous instances in case memory. The instances/cases can be represented by values, symbols, trees, various hierarchical structures, or other structures. It is a non-generalization approach. The CBR works in the cycle: case retrieval—reuse—solution testing—learning. This method is inspired by human reasoning using knowledge from old, similar situations. This learning method is also known as Learning by Analogy. There is more on this subject from my previous article here at Gamelan: http://www.developer.com/java/article.php/10922_1491641_1.
Genetic Algorithms (GA)
Genetic algorithms provide a learning method motivated by an analogy to biological evolution. The search for an appropriate hypothesis begins with a population of initial hypothesis. Members of the current population give rise to the next generation population by operations such as selection, crossover, and mutation. At each step, a collection of hypotheses, called the current population, is updated by replacing some fraction of the population by offspring of the most fit current hypothesis. Genetic algorithms have been applied successfully to a variety of learning tasks and optimisation problems.
The following are some examples of the use of Genetic algorithms (GA):
- Used for tuning the optimal parameter's setting when it is combined with other ML methods, such as Neural Network or Instance-Based Learning.
- Used to learn collections of rules for robot control.
- Used to recognize objects (object-recognition) from a visual scene (or image)—this field is known as Machine Vision or Computer Vision. For human visual scene identifications, we basically can tell and identify different objects from a scene or an image. Say you have an image of your friend's study room. Suppose, in this image, there is a stack of books at one corner, a table in the middle of the room, a computer desk, a collection of CDs on the computer desk, and a heater. It is quite easy for a human to see that image and identify all the objects in the room, but can a machine do the same thing? The answer is YES, but currently it is limited; this is a growing field and perhaps machines in the future will completely achieve human-like vision (scary, huh?).
The simplest form of Computer Vision is OCR (Optical Character Recognition), where handwritten characters on a sheet of paper are scanned into a computer and the output is ASCII characters. More sophisticated applications of Computer Vision are in industrial automation visual inspection systems, for quality control and medical imaging diagnosis systems. More complex still are the applications used by the military for real-time intelligence satellite image gathering, or missile guiding systems for identifying ground targets. In fact, it was the military who were first to develop such technologies even before commercial applications started appearing on the market.
- Identification of credit card fraud, where the card is a stolen one or card numbers is being used for a fake one. If you have experienced using your credit card for shopping and when you think the transaction is accepted, all of a sudden you being asked for some more identification, such as what is your mother's or father's name, or what school you went to? This is GA and ANN (Artificial Neural Network). In the database of your credit company, the GA program learns the characteristics of fraud from its millions of transactions and is able to evaluate whether a transaction is fraudulent or not.
Page 2 of 4