Training AlekseyAB YOLOv3 on own dataset in Google Colab

Privalov Vladimir
5 min readSep 14, 2020

--

In my recent post I have presented a guide on training YOLOv3 darknet model on own dataset. In this post I will explain how to train YOLOv3 darknet model from AlekseyAB on own dataset in Goolge Colab.

I will omit preparing training data as it is covered in my previous post.

I my previous post I told about labelMe tool for labeling training samples. You can use also labeImg img. Install labelImg following official guide. If you face problem similar to mine just run this command:

sudo python3 labelImg.py

To mount Google Drive in Google Colab use drive API

from google.colab import drivedrive.mount('/content/gdrive', force_remount=True)

And add your working directory to sys.path in Python

import syssys.path.append('/content/gdrive/My Drive/Documents/Job documents/Xenlp/AR-nav/Detection/YOLOv3')

You can access file system of your Google Drive from Colab. Click on Folder icon on left panel and select folder gdrive

To open file double click on it. If it opening popup for saving file navigate the file in Google Drive in browser and edit it using Text Editor app.

Now we need to clone repository with darknet YOLOv3 model to some folder in your Google Drive:

!cd /content/gdrive/My\ Drive/<some path>&& git clone https://github.com/AlexeyAB/darknet.git

Edit Makefile to enable OPENCV and GPU:

GPU=1
...
OPENCV=1

Then compile model

!cd /content/gdrive/My\ Drive/<some path>/darknet && make

And finally start training

!cd /content/gdrive/My\ Drive/<some path>/darknet && sudo chmod +x darknet && ./darknet detector train ../data/obj.data cfg/yolo-obj.cfg ../data/darknet53.conv.7 2>&1 > log.txt

here I redirect output to file log.txt. We can use output in log.txt file later to build loss plot.

Training on Google Colab with GPU can take hours.

If you set parameter jitter larger than 0.5 you can get error “Calloc error — possibly out of CPU RAM”. Optimal for me was jitter=.4.

When run training for the first time we can get error messages “Can’t open label file … images/…/…jpg”. We can see that training script looking for labels in directory with images. The problem is in source code of model on C. To fix that uncomment line 270 in file src/utils.c (source):

find_replace(output_path, “/images/”, “/labels/”, output_path);

Show plot of training loss. Clone darknet_scripts repo and run plot_yolo_log.py

git clone https://github.com/vovaekb/darknet_scripts.git
python plot_yolo_log.py <path to log.txt>

This script will create file loss_plot.jpg. Show loss_plot.jpg in Colab

%matplotlib inlinefrom IPython.display import ImageImage('<path to loss_plot.jpg>')

Result

We can also calculate Mean Average Precision (mAP), precision, recall and another metrics during training using flag -map when start training

./darknet detector train ../data/obj.data cfg/yolo-obj.cfg ../data/darknet53.conv.7 -map 2>&1 > log.txt

This will add following lines to log.txt

calculation mAP (mean average precision)...
...

detections_count = 304, unique_truth_count = 136
class_id = 0, name = bus, ap = 95.28% (TP = 36, FP = 2)
...

for conf_thresh = 0.25, precision = 0.97, recall = 0.85, F1-score = 0.91
for conf_thresh = 0.25, TP = 116, FP = 3, FN = 20, average IoU = 73.74 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision (mAP@0.50) = 0.929896, or 92.99 %

We can also create plots of key metrics during training using flag -dont_show

./darknet detector train ../data/obj.data cfg/yolo-obj.cfg ../data/darknet53.conv.7 -map -dont_show 2>&1 > log.txt

Plot will be saved in file chart.png. Here is my example cropped (original size)

mAP is calculated after every 100 iteration.

Optimizing parameters

From previous experiments it was seen that optimal input size for image is 416x416.

First we try run training with best config from last experiment on darknet YOLOv3 model (batch=64, subdivisions=16, learning_rate=0.001, momentum=0.9). This time we use jitter=.3 as jitter large than 0.5 leads to memory shortage (out of CPU memory error).

Result loss plots for jitter=.3 and .4.

Loss plot for jitter=.3
Loss plot for jitter=.4

Then we tune parameters momentum and decay. Experiments have shown that most optimal values for these parameters are default ones. Loss plot for

decay=0.0002 (0.0005 default)

Then we tune burn_in. Results obtained

Loss plot for burn_in=500
Loss plot for burn_in=100

The best loss curve I get for burn_in=100. I used image size (416x416), burn_in=100, decay=0.0005 (default) and momentum=0.9 (default). Loss rapidly decreasing from the very start has very smooth curve and starting approaching 0 after 60 batches.

Evaluating model on validation data

When training model the best indicator of training performance is validation data. There is a ‘map’ flag in darknet script which allows to calculate metrics like mAP, recall etc on validation data.

To evaluate model on validation data choose a weights file to evaluate performance on and use command:

./darknet detector map ../data/obj.data cfg/yolo-obj.cfg backup/yolo-obj_300.weights

If you get error “couldn’t open file: data/obj.names” copy files obj.names, train.txt and test.txt to folder darknet.

Output will be similar to that

calculation mAP (mean average precision)...
Detection layer: 82 - type = 28
...
120
detections_count = 509, unique_truth_count = 136
class_id = 0, name = bus, ap = 92.71% (TP = 35, FP = 3)
...

for conf_thresh = 0.25, precision = 0.97, recall = 0.82, F1-score = 0.88
for conf_thresh = 0.25, TP = 111, FP = 4, FN = 25, average IoU = 71.01 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision (mAP@0.50) = 0.922583, or 92.26 %
Total Detection Time: 5 Seconds

Here we can see that model has Average precision (AP) 92.71% and mAP 92.26 %.

That’s all. Good luck in training your YOLOv3 detector.

--

--