The Use of Java in Machine Learning
Machine learning (ML) is the study of computational methods and construction of computer programs that automatically improve performance based on experience. ML is the most multi-disciplinary field you can ever imagine. It uses ideas from Artificial Intelligence (AI), Soft-Computing, Probability, Statistics, Information Theory and Signal Processing, Computational Complexity, Computational Mathematics, Philosophy, Control Systems Theory, Cognitive Psychology, Biology, Economics, Linguistics, Operations Research (OR), Physics, and other areas that have been explored in the last decade.
The idea of creating intelligent machines has attracted humans since ancient times. Today, with the huge development of computers and 50 years of research into AI programming techniques, the dream of intelligent machines is closer to becoming a reality. The successful application of machine learning (ML) recently involves data-mining programs that learn to monitor and detect international money laundering or fraudulent credit card transactions, information filtering systems that learn the user's reading preferences, and autonomous vehicles that learn to drive on their own on public highways in the presence of other vehicles (note: this is not a remote control vehicle).
Machine learning (ML) has just been in widespread use in different industries within the last seven or eight years. This is the approximate length of time that the Internet has been widely adopted. The use of machine learning is expected to grow exponentially over the coming years, and any company that is going to keep a tap on this revolutionary technology will produce software products that are very competitive in the market.
Humans learn by being taught, that is, by a teacher; or they learn by self-discovery, as in the case of a researcher. A physics teacher at a high school teaches Newton's Laws of Motion to his/her students. The students learn all these concepts of physics from the teacher. A group of Ph.D. computer scientists or a group of Ph.D. physicists at Cal Tech, for example, do their learning by self-discovery. They read ACM computer journals or the Physics Review Letters about peer publications in their respective fields and figure out how to discover things that have not been done before (nor been published before).
Computer scientists and software developers, in general, apply the notion of learning to machines, as well. Machines can learn by being taught (a computer programmer, or an expert in a specific domain such as a lawyer, doctor, and so forth) or learn from self-discovery by observing data (such as data from a marketing database, patients' management database, and so on) and being able to discover and extract predictive hidden information. They also are able to acquire new knowledge by predicting the outcome of new cases that are encountered during the process of learning.
Examples of learning in machines is Artificial Neural Network (ANN). It is a type of mathematical algorithm implemented in different types of software applications. The software systems, or specifically ANN, are fed in data for classifications, such as lung cancer patient records in the last five years from a specific hospital. There is no prior knowledge at all about the patient records' data. The task of ANN is to discover patterns and classify new instances of the data (such as newly admitted patients whose medical conditions have not been accurately determined from an initial diagnosis) according to patterns it has discovered from the training data, during the learning stage. So, the Artificial Neural Network (ANN) is a paradigm of machine learning that learns by self-discovery. In contrast to ANN, another method of machine learning called K-Nearest Neighbour or KNN for short (Case-Based Reasoning—CBR, a more specialised form of KNN), is learn-by-memorising. This is a form of learning by being taught, so prior knowledge is needed. For machines to use CBR, a database of knowledge about known instances is required; that is, the memory of the machine has to be taught or know about external world facts before it can classify new instances of knowledge that recently arrived. (There is more on ANN, KNN, and CBR later.)
For machines, the term "learning" usually corresponds to a designed model, such as ANN mentioned above. Through the process of learning, we are improving model prediction accuracy as fast and well as possible. After learning, we expect the best fitting model of input data. Some methods that are not very robust give very good results on training data, but on testing or real data perform poorly. When this occurs, we are talking about over-fitting. The rate of over-fitting is also very dependent on input data. This mostly happens when we do not have much data compared with the number of attributes, or when the data is noisy. The noise gets into data by subjective data providing, uncertainty, acquisition errors, and so on. If we have too many attributes and not much data, the state space for finding the optimal model is too wide and we can easily lose the right way and finish in local optimum. The problem of over-fitting can be partially eliminated by suitable pre-processing and by using adequate learning methods as well as providing good input data.
Main Methods in ML
Understanding intelligence and creating intelligent artefacts, the twin goals of AI, represent two of the final frontiers of modern science. Several of the early pioneers of computer science, such as Turing, Von Neumann, and Shannon, were captivated by the idea of creating a form of machine intelligence. The questions and issues considered back then are still relevant today, perhaps even more so.
The following lists are the main methods employed in ML:
- Decision Trees
- Artificial Neural Networks (ANN)
- Bayesian Methods
- Reinforcement Learning
- Inductive Logic Programming (ILP)
- Case-Based Reasoning (CBR)
- Genetic Algorithms (GA)
- Support Vector Machines (SVM)
In recent years, attention has been paid to generating and combining different but still homogeneous classifiers with techniques called bagging, boosting, or bootstrapping. They are based on repeated generation of the same type of model over evolving training data. These methods enable reduction of model variance.
Decision tree learning is a method for approximating discrete functions by a decision tree. In the nodes of trees are attributes and in the leaves are values of discrete function. The decision tree can be rewritten in a set of if-then rules. Trees' learning methods are popular inductive inference algorithms, mostly used for variety of classification tasks (for example, for diagnosing medical cases). Table 1 shown below shows a set of fictitious conditions that are suitable for playing tennis.Table 1: (Fictitious Weather Data)
A set of rules can be learned from the data in Table 1, which might look like the following:
|If outlook = sunny||and humidity = high||then play_tennis = no|
|If outlook = rain||and wind = strong||then play_tennis = no|
|If outlook = overcast||then play_tennis = yes|
|If humidity = normal||then play_tennis = yes|
|If none of the above||then play_tennis = no|
The above sets of rules are shown as a decision tree in Figure 1. The rules are meant to be interpreted in order; that is, the first ones first, then if it does not apply, the second, and so on. This is called a decision list, the interpretation of a set of rules in sequence. As a decision list, the rules correctly classify all of the examples in the table, whereas taken individually, out of context, some of the rules are incorrect. For example,
|If humidity = normal||then play_tennis = yes|
This is inconsistent with row 7 of Table 1. The rules we had mentioned are classification rules where they predict the classification of the example in terms of whether to play tennis or not. It is equally possible to look for any rules that strongly associate different attribute values. These are called association rules and many can be derived from data in Table 1:
Figure 1 shows a decision tree for the concept of PlayTennis.