Owl, PicPac and XNN are three tools I wrote to make image-related model training easy.
- Owl: a web UI for efficient image annotation.
- PicPac: PicPac is an image database and streaming library that preprocess the images and feed them into a deep learning framework. PicPac supports Caffe (fork), MxNet, Nervana, Theano and Tensorflow.
- XNN: a C++ wrapper that provides a unified prediction interface to all common deep learning frameworks, including Caffe, MxNet, Tensorflow, Theano and other Python-based frameworks.
- (Caffe fork with PicPac support)
The goal is to create a model that will detect and localize a target object category within images. We will use a toy dataset for car plate recognition for illustration.
================
$ git clone https://github.com/aaalgo/owl
$ cd owl
$ # Download the dataset
$ wget http://www.robots.ox.ac.uk/~vgg/data/cars_markus/cars_markus.tar
$ mkdir images
$ cd images
$ tar xf ../cars_markus.tar
$ cd ..
$ # create database
$ ./manage.py migrate
$ # import images into the database
$ find images/ -name '*.jpg' | ./manage.py import --run
$ # start the annotation server
Before starting the annotation server, we need to adjust a couple of
parameters in the file owl/annotate/params.py
ROWS = 2 # <-- images rows / page
COLS = 3 # <-- images / row
BATCH = ROWS * COLS
POLYGON = False # set to True for polygons
VIEWED_AS_DONE = False # see below
$ ./run.sh
The URL of the annotation UI is http://HOSTNAME:18000/annotate/
.
The UI is designed to minimize hand movements and therefore maximize efficiency. The following design decisions were made:
- A bounding box is automatically saved by AJAX when created.
- Refreshing page loads the next batch of examples.
The annotation process finishes when all images are annotated/viewed.
The VIEWED_AS_DONE
parameter controls the behavior whether
an image viewed should be considered annotated even when no annotation
is added. Set the value to True
if it is know that images without
positive regions exist. If the value is set to False
and no annotation
is made to an image, it will be shown again when all other images are done.
After annotation is done, or sufficient number of annotations are collected, the images and annotations can be exported to a PicPac database by
$./manage.py export db
The file db
then contains all the information needed for training.
PicPac Database
A PicPac database contains images and labels/annotations. The annotation produced by Owl is the same format used by Annotorious. Actually Owl uses an extended version of Annotorious. Below is a sample annotation:
{'shapes': [{u'geometry': {u'y': 0.5912162162162162, u'x': 0.6049107142857143, u'width': 0.10491071428571429, u'height': 0.08277027027027027}, u'style': {}, u'type': u'rect'}]}
PicPac provides a web server for viewing the content of a database.
$ picpac-server db
$ picpac-server db
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0901 22:52:20.280788 29210 picpac-server.cpp:146] listening at 0.0.0.0:18888
I0901 22:52:20.281389 29210 picpac-server.cpp:148] running server with 1 threads.
And samples with annotations can be viewed with http://HOSTNAME:18888/l?annotate=json
. The red bounding box is rendered on-the-fly by the server; images and annotations are stored separately in the database.
The server accepts almost all of the perturbation/augmentation parameters,
so the effects on the training set can be visualized. For examples,
the following can be appended to the URL &perturb=1&pert_angle=20
.
Sometimes when the positive regions are too small compared to the background, it is desirable to use only local areas surrounding the postive regions as training example, so that positive pixels and negative pixels are roughly balanced. The command below can be used to do the cropping.
$ picpac-split-region --width 100 --height 50 --bg 200 --no-scale 1 db db.crop
min: 0.668153
mean: 0.743567
max: 0.819342
Using picpac-server
to serve db.crop
shows this.
The program picpac-split-region
accepts the following parameters:
- (
--size
, always 50) Scale, or sqrt(width*height), of positive region. --width
output image wdith.--height
output image height.--no-scale 1
. If not set, the cropped region is scaled so
positive region and negative region are of the specified size. If set, the cropped region is not scaled. Rather the size parameters are used to determine the ratio between positive and negative regions, and the output image size is determined accordingly.
Training
XNN provides a couple of templates based on public models. For example, we can train with the above database using the following command.
xnn/train-caffe-fcn.py fcn db ws
where
fcn
is the template name.db
is the input database.ws
is the working directory.
Training will start automatically after the command, and can be
canceled with CTRL+C. The ws
directory will contain the
following:
$ ls wc
log params.pickle solver.prototxt train.log train.prototxt.tmpl
model snapshots solver.prototxt.tmpl train.prototxt train.sh
Training can be restarted with train.sh
, or continued at a snapshot by supplying a snapshot name under the snapshots
directory as the argument of train.sh
.
While some parameter can be adjusted via arguments to train-caffe-fcn.py
,
it is easier to cancel the training process, edit the file train.prototxt
and then restarted. The most import parameters of train.prototxt
are
annotated below.
layer {
name: "data1"
type: "PicPac"
top: "data"
top: "label"
picpac_param {
path: "path/to/db"
batch: 1 # batch size, has to be 1 if image sizes are different
channels: 3 # color channels, use 1 for grayscale images
split: 5 # randomly split db into 5 parts
split_fold: 0 # use part 0 for validation and the rest for training
annotate: "json"
anno_color1: 1
threads: 4
perturb: true # enable image augmentation
pert_color1: 10 # random perturbation range of
pert_color2: 10 # the three color channels
pert_color3: 10
pert_angle: 20 # maximal angle of random rotation, in degrees
pert_min_scale: 0.8 # min &
pert_max_scale: 1.2 # max ramdom scaling factor
}
}
PicPac supports a full range of flexible configurations. See
(documentation)[http://picpac.readthedocs.io/en/latest/] for details.
PicPac with TensorFlow
PicPac has a simple python interface with the same parameters.
config = dict(loop=True,
shuffle=True,
reshuffle=True,
batch=1,
split=1,
split_fold=0,
annotate='json',
channels=FLAGS.channels,
stratify=False,
mixin="db0",
mixin_group_delta=0,
#pert_color1=10,
#pert_angle=5,
#pert_min_scale=0.8,
#pert_max_scale=1.2,
#pad=False,
#pert_hflip=True,
channel_first=False
)
stream = picpac.ImageStream('db', negate=False, perturb=True, **config)
...
with tf.Session() as sess:
sess.run(init)
for step in xrange(FLAGS.max_steps):
images, labels, pad = stream.next()
feed_dict = {X: images,
Y_: labels}
_, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)