Image recognition is one of the quintessential tasks of artificial intelligence. The ability to process an image and decide if it is a day scene or a night scene or determine if you are looking at a picture of a cat or a dog is one that comes naturally to most organic intelligence, but for Artificial Intelligence (AI), the task must be performed one pixel at a time.
Today, we are going to build a simple image recognition system using the Python programming language. You may be wondering why Python when there are many languages that can be used to create AI systems. Python has a number of versatile and useful libraries that developers can use to achieve that goal and make the process easier than some of its competitors. So today, we are going to go through the creation of a simple image recognition system so that you can get familiar with the various AI libraries and tools Python has to offer.
We are going to begin with the imports for the libraries. These will do the majority of the actual work of image recognition and analysis for the task at hand. The libraries that we are going to import include: Matplotlib, Keras, Tensor Flow, OpenCV-Python, and Numpy. Matplotlib is a library for creating visualizations of data in Python. Keras is for making deep learning models. Tensor Flow is for cloud-based numerical computations, which we will use in order to manage the data analysis of our image recognition application.
OpenCV-Python, which you will see as the cv2 import statement, is a library designed to work with computer vision problems; it loads an image from the specified file. NumPy is meant for working with arrays and math transformations such as linear algebra, Fourier transform, and matrices.
Here is how to import the various AI libraries in Python:
import matplotlib.pyplot as plt import seaborn as sns import keras from keras.models import Sequential from keras.layers import Dense, Conv2D , MaxPool2D , Flatten , Dropout from keras.preprocessing.image import ImageDataGenerator from keras.optimizers import Adam from sklearn.metrics import classification_report,confusion_matrix import tensorflow as tf import cv2 import os import numpy as np
Once we have all of those libraries imported, we can begin to work with them and bring in our data. First up, we will define our data for the system. This is done with the get_data() method. get_data() will help us define the two possible categories for our data. This will allow the system to make our training and validation data sets down the line. We will be using two labels for the data, “Cat” and “Dog”. This will be the basis of the classifications system. This means that the images we give the system should be either of a cat or a dog.
labels = ['Cat', 'Dog'] img_size = 224 def get_data(data_dir): data = [] for label in labels: path = os.path.join(data_dir, label) class_num = labels.index(label) for img in os.listdir(path): try: img_arr = cv2.imread(os.path.join(path, img))[...,::-1] resized_arr = cv2.resize(img_arr, (img_size, img_size)) data.append([resized_arr, class_num]) except Exception as e: print(e) return np.array(data)
Once the path and categories have been set up, we can import our training and test data sets. Of course, you should be sure to make sure that your file paths are correct for your system and file names when you do this.
train = get_data('../input/catdog/Main/train') val = get_data('../input/catdog/Main/test')
Once that is done, we can pre-process the data. This includes putting the data into a highly workable format and making sure that the data is cleaned up enough to give the system the ability to work with images that are less than perfectly similar to the test images. For example, images with motion, a greater zoom, altered colors, or unusual angles in the original image.
It is also important to note that in the context of a professional project, we might, prior to pre-processing, want to pull random selections of images to ensure that the imports were done correctly or pull information (such as how many images of each type were imported) in order to make sure that things imported cleanly.
x_train = [] y_train = [] x_val = [] y_val = [] for feature, label in train: x_train.append(feature) y_train.append(label) for feature, label in val: x_val.append(feature) y_val.append(label) x_train = np.array(x_train) / 255 x_val = np.array(x_val) / 255 x_train.reshape(-1, img_size, img_size, 1) y_train = np.array(y_train) x_val.reshape(-1, img_size, img_size, 1) y_val = np.array(y_val) datagen = ImageDataGenerator( featurewise_center=False, samplewise_center=False, featurewise_std_normalization=False, samplewise_std_normalization=False, zca_whitening=False, rotation_range = 30, zoom_range = 0.2, width_shift_range=0.1, height_shift_range=0.1, horizontal_flip = True, vertical_flip=False) datagen.fit(x_train)
Once we have the data prepared for analysis, we are going to create a simple Convolutional Neural Network. It has three convolutional layers and a dropout layer. For those of you not familiar with the idea, a Convolutional Neural Network (sometimes also written as CNN) is a type of neural network that excels at image analysis. It excels because it thinks of the image not as one thing, but as rows and columns of data, with each pixel containing the value of the color of that pixel.
That grid system of pixels uses the values to note how bright each pixel should be and what color is in that cell.
model = Sequential() model.add(Conv2D(32,3,padding="same", activation="relu", input_shape=(224,224,3))) model.add(MaxPool2D()) model.add(Conv2D(32, 3, padding="same", activation="relu")) model.add(MaxPool2D()) model.add(Conv2D(64, 3, padding="same", activation="relu")) model.add(MaxPool2D()) model.add(Dropout(0.4)) model.add(Flatten()) model.add(Dense(128,activation="relu")) model.add(Dense(2, activation="softmax")) model.summary()
Now we will compile with Adam as our optimizer. Because the sample set of data you should be using is not very large we are going to train it with a small number of Epocs. We will be using 500, but you should be able to adjust this number based on the size of your data and how much processing it needs.
opt = Adam(lr=0.000001) model.compile(optimizer = opt , loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) , metrics = ['accuracy']) history = model.fit(x_train,y_train,epochs = 500 , validation_data = (x_val, y_val))
Then once we have done that, you can take a look at the results. You can use Matplotlib to do an analysis of your results if you want to. It is a best practice to do so just to ensure that the system is working the way that you would like. If it is not, then there is debugging to be done or numbers of epochs to adjust. Remember that it is good to play around with the analysis and see how adjusting it changes the results, as this will help you begin to make estimates on your needs for future projects.