Training an AI model to classify images
Training an AI model to classify images into categories like cat, dog, or lion involves several mathematical steps, from preprocessing the images to training the neural network. Here’s a detailed breakdown of the process:
1. Data Preprocessing
Input: 100 images Output: Preprocessed images ready for training
Steps:
- Resizing: Resize all images to a fixed size, say 64×64 pixels.
- Normalization: Scale pixel values to be between 0 and 1.
Let’s assume each image is resized to 64×64 pixels and has 3 color channels (RGB).
2. Converting Images to Arrays
Each image is a 64×64×3 array. For 100 images, the data tensor XXX will have a shape of (100,64,64,3).
3. Label Encoding
Assume labels are encoded as follows:
- Cat: 0
- Dog: 1
- Lion: 2
So, the label vector YYY will be a 1D array of length 100 with values 0, 1, or 2.
4. Neural Network Architecture
We’ll use a simple Convolutional Neural Network (CNN) with one convolutional layer, one pooling layer, one fully connected layer, and an output layer.
Example Architecture:
- Conv Layer: 32 filters, kernel size 3×3
- Pooling Layer: Max Pooling 2×2
- Fully Connected Layer: 128 neurons
- Output Layer: 3 neurons (one for each class)
5. Forward Propagation
Convolutional Layer:
A Convolutional Layer applies a set of filters to the input image, producing a set of feature maps. Each filter is convolved with the input image to produce a feature map.
Input: 64×64×3 (image with height = 64, width = 64, and 3 color channels)
Filters: 32 filters of size 3×3
Convolution Calculation
Each filter is a 3×3×3 tensor (height, width, and depth corresponding to the input channels). The output feature map’s dimensions are calculated based on the input size, filter size, stride, and padding.
Pooling Layer:
Input: 62×62×32 Pooling Size: 2×2
Output Calculation:
- The output will be 31×31×32
Fully Connected Layer:
Input: Flattened 31×31×32 to 30752 features Neurons: 128
Output Calculation:
z=Wx+bz = Wx + bz=Wx+b
Where:
- xxx is the flattened input vector.
- WWW is the weight matrix of shape 128×30752
- bbb is the bias vector of shape 128
Output Layer:
Input: 128 features Neurons: 3 (one for each class)
Output Calculation:
z=Wx+bz = Wx + bz=Wx+b
Where:
- W is the weight matrix of shape 3×128
- b is the bias vector of shape 3.
6. Activation Functions
- Convolutional and Fully Connected Layers: ReLU (Rectified Linear Unit) ReLU(x)=max(0,x)\text{ReLU}(x) = \max(0, x)ReLU(x)=max(0,x)
- Output Layer: Softmax Softmax(zi)=ezi∑jezj\text{Softmax}(z_i) = \frac{e^{z_i}}{\sum_{j} e^{z_j}}Softmax(zi)=∑jezjezi
7. Loss Function
We’ll use Cross-Entropy Loss for multi-class classification.
8. Backpropagation
Compute gradients of the loss with respect to all weights and biases using the chain rule and update them using gradient descent or an optimizer like Adam.
9. Optimization
Using the Adam optimizer
10. Training Loop
- Initialize weights and biases.
- Forward pass: Compute the output and loss.
- Backward pass: Compute gradients.
- Update weights and biases using the optimizer.
- Repeat for several epochs.