In this project, I used 3 different models (deep neural networks and convolutional neural networks to classify traffic signs). I trained a model to classify traffic signs from the German Traffic Sign Dataset.
The goals / steps of this project are the following:
- Load the data set (see below for links to the project data set)
- Explore, summarize and visualize the data set
- Design, train and test a model architecture
- Analysze train data set and augment data
- Use the model to make predictions on new images
- Analyze the softmax probabilities of the new images
- Summarize the results with a written report
Install the car nanodegree starter kit if you have not already done so: carnd starter kit
If you have access to a GPU, you should follow the TensorFlow instructions for installing TensorFlow wiht GPU support.
Download the dataset and load the data set. Here are a few pictures of my training data set randomly.The number above each image is the label of the picture, and the corresponding name of each tag will be displayed later.
I used the pandas and numpy library to calculate summary statistics of the traffic signs data set:
- The size of training set is 34799
- The size of the validation set is 4410
- The size of test set is 12630
- The shape of a traffic sign image is (32, 32, 3)
- The number of unique classes/labels in the data set is 43
Here is the training set table:
index | ClassId | SignName | counts |
---|---|---|---|
0 | 0 | Speed limit (20km/h) | 180 |
1 | 1 | Speed limit (30km/h) | 1980 |
2 | 2 | Speed limit (50km/h) | 2010 |
3 | 3 | Speed limit (60km/h) | 1260 |
4 | 4 | Speed limit (70km/h) | 1770 |
5 | 5 | Speed limit (80km/h) | 1650 |
6 | 6 | End of speed limit (80km/h) | 360 |
7 | 7 | Speed limit (100km/h) | 1290 |
8 | 8 | Speed limit (120km/h) | 1260 |
9 | 9 | No passing | 1320 |
10 | 10 | No passing for vehicles over 3.5 metric tons | 1800 |
11 | 11 | Right-of-way at the next intersection | 1170 |
12 | 12 | Priority road | 1890 |
13 | 13 | Yield | 1920 |
14 | 14 | Stop | 690 |
15 | 15 | No vehicles | 540 |
16 | 16 | Vehicles over 3.5 metric tons prohibited | 360 |
17 | 17 | No entry | 990 |
18 | 18 | General caution | 1080 |
19 | 19 | Dangerous curve to the left | 180 |
20 | 20 | Dangerous curve to the right | 300 |
21 | 21 | Double curve | 270 |
22 | 22 | Bumpy road | 330 |
23 | 23 | Slippery road | 450 |
24 | 24 | Road narrows on the right | 240 |
25 | 25 | Road work | 1350 |
26 | 26 | Traffic signals | 540 |
27 | 27 | Pedestrians | 210 |
28 | 28 | Children crossing | 480 |
29 | 29 | Bicycles crossing | 240 |
30 | 30 | Beware of ice/snow | 390 |
31 | 31 | Wild animals crossing | 690 |
32 | 32 | End of all speed and passing limits | 210 |
33 | 33 | Turn right ahead | 599 |
34 | 34 | Turn left ahead | 360 |
35 | 35 | Ahead only | 1080 |
36 | 36 | Go straight or right | 330 |
37 | 37 | Go straight or left | 180 |
38 | 38 | Keep right | 1860 |
39 | 39 | Keep left | 270 |
40 | 40 | Roundabout mandatory | 300 |
41 | 41 | End of no passing | 210 |
42 | 42 | End of no passing by vehicles over 3.5 metric ... | 210 |
Here is an exploratory visualization of the data set.
def resize_img(img):
re_img = cv2.resize(img, (32, 32), interpolation=cv2.INTER_CUBIC)
re_img = cv2.cvtColor(re_img, cv2.COLOR_BGR2RGB)
return re_img
Before modeling, I did some precessing with the data set, like grayscale and normalization. First, I coverted image to grayscale image, in the paper Traffic Sign Recognition with Multi-Scale Convolutional Networks, grayscale data set is more accurate.
Second, I normalized the grayscale data set to ensure that the optimization. The functions are the following.
# Convert imgs to graycale
def grayscale(imgs):
imgs_temp = np.zeros((imgs.shape[0], imgs.shape[1], imgs.shape[2], 1))
for i in range(len(imgs)):
imgs_temp[i] = cv2.cvtColor(imgs[i], cv2.COLOR_RGB2GRAY).reshape(imgs_temp.shape[1:])
return imgs_temp
# Normalize images
def normalize(x):
return (x - 128.0) / 128
I used three models, they are based on LeNet by Yann LeCun. It is a convolutional neural network designed to recognize visual patterns directly from pixel images with minimal preprocessing. It can handle hand-written characters very well.
- The inputs are 32x32 (RGB - 3 channels) images
- The activation function is ReLU except for the output layer which uses Softmax
- The output has 43 classes
The first structure is LeNet
Layer | Shape |
---|---|
Input | 32x32x1 |
Convolution (valid, 5x5x6) | 28x28x6 |
Activation (ReLU) | 28x28x6 |
Max Pooling (valid, 2x2) | 14x14x6 |
Convolution (valid, 5x5x16) | 10x10x16 |
Activation (ReLU) | 10x10x16 |
Max Pooling (valid, 2x2) | 5x5x16 |
Flatten | 400 |
Fully Connect | 120 |
Activation (ReLU) | 120 |
Output | 43 |
Layer | Shape |
---|---|
Input | 32x32x1 |
Convolution (valid, 5x5x6) | 28x28x16 |
Convolution (same, 5x5x6) | 28x28x16 |
Activation (ReLU) | 28x28x16 |
Max Pooling (valid, 2x2) | 14x14x16 |
Convolution (valid, 5x5x16) | 10x10x32 |
Convolution (same, 5x5x16) | 10x10x32 |
Activation (ReLU) | 10x10x32 |
Max Pooling (valid, 2x2) | 5x5x32 |
Flatten | 800 |
Fully Connect | 256 |
Activation (ReLU) | 256 |
Fully Connect | 128 |
Activation (ReLU) | 128 |
Output. | 43 |
Layer | Shape | Layer | Shape |
---|---|---|---|
Input | 32x32x1 | ||
L1 : Convolution (valid, 5x5x6) | 28x28x6 | ||
L1 : ReLU | 28x28x6 | ||
L1 : Max Pooling (valid, 2x2) | 14x14x6 | ||
L2-1: Convolution (valid, 5x5x16) | 10x10x16 | ||
L2-1: ReLU | 10x10x16 | ||
L2-1: Max Pooling (valid, 2x2) | 5x5x16 | ||
L2-1: Convolution (valid, 5x5x400 | 1x1x400 | ||
L2-1: ReLu | 1x1x400 | ||
L2-1: Flatten | 400 | L2-2: Flatten L1 | 1176 |
L2 : Concate L2-1 & L2-2 | 1576 | ||
L3 : Fully Connect | 256 | ||
L4 : Fully Connect | 128 | ||
Output. | 43 |
LeNet parameters and structure:
learning_rate = 0.005
BATCH_SIZE=128
logits = LeNet(X, keep_prob, n_classes)
logits = tf.identity(logits, name='lenet-logits')
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=one_hot_y, logits=logits)
cost = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
accuracy_operation = tf.identity(accuracy_operation, name='lenet-accuracy')
LeNet_E parameters and structure:
LNE_learning_rate = 0.0005
BATCH_SIZE=128
LNE_logits = LeNet_E(X, keep_prob, n_classes)
LNE_logits = tf.identity(LNE_logits, name='LNE_logits')
LNE_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=one_hot_y, logits=LNE_logits)
LNE_cost = tf.reduce_mean(LNE_cross_entropy)
LNE_optimizer = tf.train.AdamOptimizer(learning_rate=LNE_learning_rate).minimize(LNE_cost)
LNE_correct_prediction = tf.equal(tf.argmax(LNE_logits, 1), tf.argmax(one_hot_y, 1))
LNE_accuracy_operation = tf.reduce_mean(tf.cast(LNE_correct_prediction, tf.float32))
LNE_accuracy_operation = tf.identity(LNE_accuracy_operation, name='LNE_accuracy')
LeNet2 parametersd and structure:
LN2_learning_rate = 0.0005
BATCH_SIZE=128
LN2_logits = LeNet2(X, keep_prob, n_classes)
LN2_logits = tf.identity(LN2_logits, name='LN2_logits')
LN2_cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=one_hot_y, logits=LN2_logits)
LN2_cost = tf.reduce_mean(LN2_cross_entropy)
LN2_optimizer = tf.train.AdamOptimizer(learning_rate=LN2_learning_rate).minimize(LN2_cost)
LN2_correct_prediction = tf.equal(tf.argmax(LN2_logits, 1), tf.argmax(one_hot_y, 1))
LN2_accuracy_operation = tf.reduce_mean(tf.cast(LN2_correct_prediction, tf.float32))
LN2_accuracy_operation = tf.identity(LN2_accuracy_operation, name='LN2-accuracy')
Training...
EPOCH 5...
Validation Accuracy = 0.925
EPOCH 10...
Validation Accuracy = 0.933
EPOCH 15...
Validation Accuracy = 0.932
EPOCH 20...
Validation Accuracy = 0.934
EPOCH 25...
Validation Accuracy = 0.952
EPOCH 30...
Validation Accuracy = 0.931
EPOCH 35...
Validation Accuracy = 0.945
EPOCH 40...
Validation Accuracy = 0.934
EPOCH 45...
Validation Accuracy = 0.943
EPOCH 50...
Validation Accuracy = 0.939
Model saved!
Test Accuracy = 0.916
Training...
EPOCH 5...
Validation Accuracy = 0.909
EPOCH 10...
Validation Accuracy = 0.960
EPOCH 15...
Validation Accuracy = 0.967
EPOCH 20...
Validation Accuracy = 0.972
EPOCH 25...
Validation Accuracy = 0.974
EPOCH 30...
Validation Accuracy = 0.983
EPOCH 35...
Validation Accuracy = 0.978
EPOCH 40...
Validation Accuracy = 0.977
EPOCH 45...
Validation Accuracy = 0.987
EPOCH 50...
Validation Accuracy = 0.988
Model saved!
Test Accuracy = 0.976
Training...
EPOCH 5...
Validation Accuracy = 0.905
EPOCH 10...
Validation Accuracy = 0.929
EPOCH 15...
Validation Accuracy = 0.938
EPOCH 20...
Validation Accuracy = 0.939
EPOCH 25...
Validation Accuracy = 0.941
EPOCH 30...
Validation Accuracy = 0.949
EPOCH 35...
Validation Accuracy = 0.944
EPOCH 40...
Validation Accuracy = 0.951
EPOCH 45...
Validation Accuracy = 0.955
EPOCH 50...
Validation Accuracy = 0.951
Model saved!
Test Accuracy = 0.946
Through the comparison of the above three sets of data, I found that the structure of LeNet_E is more accurate. I would have thought that the LeNet2 model would be better, but but that's not the case.
Another way to increase the accuracy of the model is data augmentation, I calculated the number of each label in the data, and some labels lacked sufficient data, We actually saw the result on the training set table and the histogram below is more intuitive.
The training data set have 34,799 samples and 43 labels so the average label has 809 samples, there are 26 labels' counts are lower than average which are needed to augment(increase the number of sample).
I used keras.preprocessing.image
ImageDataGenerator gives a good way to generate new images, the basic setting rotation_range= 5, width_shift_range=0.1, height_shift_range=0.1, zoom_range=0.2)
. Here is the generator fuction.
datagen = ImageDataGenerator(rotation_range= 5,\
width_shift_range=0.1,\
height_shift_range=0.1,\
zoom_range=0.2,\
fill_mode = 'reflect')
def generator(classid):
data_x, data_y = X_gray_data[classid_indices[classid][0]:classid_indices[classid][1]+1],\
y_origin_train[classid_indices[classid][0]:classid_indices[classid][1]+1]
batch_size = len(data_x)
if batch_size < 809:
epo = int(809 / batch_size)
tiny_batch = (809 % batch_size)
gen_img_tiny_batch = datagen.flow(data_x, data_y, batch_size=tiny_batch, shuffle=False)
gen_img_full_batch = datagen.flow(data_x, data_y, batch_size=batch_size, shuffle=False)
if (epo == 1):
#tiny_batch = (809 % batch_size)
gen_data_x, gen_data_y = next(gen_img_tiny_batch)
New_data_x = np.concatenate((data_x, gen_data_x), 0)
New_data_y = np.concatenate((data_y, gen_data_y), 0)
else:
New_data_x, New_data_y = data_x, data_y
for i in range(epo - 1):
gen_data_x,gen_data_y = next(gen_img_full_batch)
New_data_x = np.concatenate((New_data_x, gen_data_x), 0)
New_data_y = np.concatenate((New_data_y, gen_data_y), 0)
gen_data_x, gen_data_y = next(gen_img_tiny_batch)
New_data_x = np.concatenate((New_data_x, gen_data_x), 0)
New_data_y = np.concatenate((New_data_y, gen_data_y), 0)
return New_data_x, New_data_y
else:
return data_x, data_y
Here is the original image:
Here are the augmented images:
Training...
EPOCH 5...
Validation Accuracy = 0.921
EPOCH 10...
Validation Accuracy = 0.972
EPOCH 15...
Validation Accuracy = 0.987
EPOCH 20...
Validation Accuracy = 0.987
EPOCH 25...
Validation Accuracy = 0.984
EPOCH 30...
Validation Accuracy = 0.985
EPOCH 35...
Validation Accuracy = 0.989
EPOCH 40...
Validation Accuracy = 0.990
EPOCH 45...
Validation Accuracy = 0.988
EPOCH 50...
Validation Accuracy = 0.988
Model saved!
Test Accuracy = 0.969
After enhancing the data, the accuracy rate has not been improved. Too disappointed, maybe something wrong with my method😂.
I need to mention here that I used greyscale images to enhance the data. Because it's faster than RGB images. New data set shape is (46714, 32, 32, 1).
Here are 10 pictures I downloaded from the Internet:
The predicted data obtained through model LeNet_E:
The visualization output of any tensorflow weight layer.
Test image:
The first conv_layer's visualization about LeNet_E:
The third conv_layer's visualization about LeNet_E: