In this project, you'll label the pixels of a road in images using a Fully Convolutional Network (FCN).
The function load_vgg
loads vgg
model.
The project has layers
function implemented.
This image visualizes the Original Skip Layer Architecture of the network:
The project has optimize
function implemented.
The train_nn
function is implemented and prints time and loss per epoch/epochs of training.
The project trains model correctly, about 48s per epoch, 48sx20 epochs in total.
I have trained model many times to figure out reasonable set of params. For KITI data set I used:
L2_REG = 1e-5
STDEV = 1e-2
KEEP_PROB = 0.8
LEARNING_RATE = 1e-4
EPOCHS = 20
BATCH_SIZE = 8
IMAGE_SHAPE = (160, 576)
NUM_CLASSES = 2
For CityScapes dataset I used:
L2_REG = 1e-5
STDEV = 1e-2
KEEP_PROB = 0.8
LEARNING_RATE = 5e-4
EPOCHS = 20
BATCH_SIZE = 8
IMAGE_SHAPE = (256, 512)
for CityScapes data set number of classes defined based on a list of classes provided by the set provider.
Cityscapes test image vs predicted result after 20 epochs & 16 batch_size
training:
More results can be see in runs/
directory.
Yes, I've tested the project on images from the dataset and here is the result:
KITI data set
CityScapes data set
I was tracking the loss during the training and here is a graph that describes my results:
Cross-entropy loss for KITI data set
Cross-entropy loss for CityScapes data set
To improve road recognition I've added image pre processing. In helper.py
in def gen_batch_function
I added image crop, image flip and changes to brightness and contrast of the image.
To increase an image data set, for one given image I've produced three additional once, cropped, flipped and with changed brightness and contrast, for each image I've kept the ground truth to be consistent. If the image was cropped, I've cropped ground truth image as well, if image was flipped, I flip the ground truth images as well.
This allowed me to add more variety to data set of images and improve road recognition in difficult places like shadows, bikes, sidewalks, road separators.
Skip connections are found to improve the segmentation accuracy, as discussed by the authors in the original paper.
I used these resources to learn more about weight initialization and regularization. To study more about Semantic Segmentation, I am going to use this resource - A 2017 Guide to Semantic Segmentation with Deep Learning.
I have process ~20 labels from cityscapes data, code is located in main-city.py
and helper-cityscapes.py
.
Code can be run python3 main_city.py --epoch 20 --batch-size 16
. Code build using Python3.
Make sure you have the following is installed:
Download the Kitti Road dataset from here. Extract the data set in the data
folder. This will create the folder data_road
with all the training a test images.
Download the CityScapes dataset from here
Implement the code in the main.py
module indicated by the "TODO" comments.
The comments indicated with "OPTIONAL" tag are not required to complete.
Run the following command to run the project:
python main.py
Note If running this in Jupyter Notebook system messages, such as those regarding test status, may appear in the terminal rather than the notebook.
- Ensure you've passed all the unit tests.
- Ensure you pass all points on the rubric.
- Submit the following in a zip file.
helper.py
main.py
project_tests.py
- Newest inference images from
runs
folder (all images from the most recent run)
- The link for the frozen
VGG16
model is hardcoded intohelper.py
. The model can be found here - The model is not vanilla
VGG16
, but a fully convolutional version, which already contains the 1x1 convolutions to replace the fully connected layers. Please see this forum post for more information. A summary of additional points, follow. - The original FCN-8s was trained in stages. The authors later uploaded a version that was trained all at once to their GitHub repo. The version in the GitHub repo has one important difference: The outputs of pooling layers 3 and 4 are scaled before they are fed into the 1x1 convolutions. As a result, some students have found that the model learns much better with the scaling layers included. The model may not converge substantially faster, but may reach a higher IoU and accuracy.
- When adding l2-regularization, setting a regularizer in the arguments of the
tf.layers
is not enough. Regularization loss terms must be manually added to your loss function. otherwise regularization is not implemented.
If you are unfamiliar with GitHub , Udacity has a brief GitHub tutorial to get you started. Udacity also provides a more detailed free course on git and GitHub.
To learn about REAMDE files and Markdown, Udacity provides a free course on READMEs, as well.
GitHub also provides a tutorial about creating Markdown files.