Instruction vs. Deduction: Deep Learning and Advances in VCA
Deep Learning is changing the industry
- By Brian Carle
- Apr 01, 2018
As camera counts and the data they provide grow
ever-larger, it becomes increasingly difficult for
organizations to monitor, perform investigations,
and draw useful conclusions from the valuable
information gathered by their video surveillance
Video analytics have long been seen as a technology solution to
help identify activity and information from all the video data. Video
analytics have largely fallen short of delivering on that market expectation.
However, Deep Learning may change that. But what is Deep
Learning, and how can it improve on conventional techniques?
Machine Learning Techniques and VCA
Most Video Content Analytics (VCA) developed to-date have been
based on traditional, algorithmic, Machine Learning techniques.
Deep Learning is a more advanced evolution of machine learning,
using sophisticated, artificial neural networks.
In the context of VCA, both Machine Learning and Deep Learning
instruct software to develop a model of objects based on a variety
of attributes the software “learns” about those objects. The model
helps the software to later identify and categorize an object in the
video feed which matches those attributes the software has learned.
For instance, an object moving through the camera’s field of view
may be taller than it is wide, as opposed to another object, which is
wider than it is tall. The VCA software may classify the first object as
a person and the second as a vehicle, based on those attributes.
In reality, multitudes of data points are used to classify objects,
but some attributes are more important than others. The VCA software
will weigh the various criteria it uses to classify objects in order
to determine the probability that an object is more likely to be
a person, vehicle or something else. Once an object enters the scene,
the object is analyzed, and its properties are measured. To determine
what an object is, the VCA may begin by looking at the object’s dimensions,
color variation, and movement patterns.
For example, the software determines the object is wider than it
is tall, is primarily red, and is moving at a relatively rapid pace in
a single direction. Based on these observations, those attributes are
compared to the existing model of what properties represent a car, a
person, or other objects. Based on the comparison against existing
models, the VCA software finds the object is 88 percent likely to be
a vehicle, seven percent a person, and 22 percent “other.” The object
is identified as a car, and data is collected along with the video which
may allow the user to later perform a search on all red cars travelling
from left to right in the scene.
Machine Learning Analytics
Machine Learning creates a model of an object based on data fed to
the program by its developers. This data is compiled by people, and
will therefore be inherently limited to the set of attributes a developer
chooses to collect and feed to the program.
To continue with the “person versus vehicle” example, an object
may be classified as a person by Machine Learning VCA if the dimensions
of the object show a greater height than width, as opposed
to a vehicle, which may be wider than it is tall. Given those criteria,
VCA classification may fail in the case of a person crawling through
a scene, or a person carrying a long box. In both examples, the algorithm
assumes the person will be standing upright and the dimensions
will not be skewed by any other objects the person is holding,
such as the box.
Such challenges with accuracy have been one of many issues
plaguing the reputation of analytics for years in the video security
Advances using Deep Learning
Using Deep Learning, the program is fed many example images, and
told those images represent a person, a car, an elderly woman, or
any variety of very specific categories of objects the program may
be tasked to classify unknown objects to. The major advancement
of Deep Learning is that it is the software that determines what attributes
are used for classification, and not the human developers.
The example images could number in the tens of thousands or
greater, and the images may demonstrate the object from different
angles, different light conditions, different regions of the world, and
so forth. Because Deep Learning allows the software to determine
object attributes based on real image examples, there are no preconceived
notions as to what defines an object. Provided the image library
fed to the program is sufficiently diverse there should be no
inherent biases as to what attributes may define an object and no
significant limit to the number of attributes which can be used for
What the Future Holds
Deep Learning is still a relatively new technology; however, some say
this technique may lead to computers being able to recognize objects
better than people can, and with less data, in the future.
Presently, object classification is limited to how much training the
VCA program receives, the diversity of the examples used to train
the program, and the processing power available to perform accurate
object detection and classification on video in real time.
Near term advances in algorithm training will come from developers
using video instead of static images in the training process.
Software trained using video clips could lead to VCA making classifications
based on multi-faceted attributes. VCA could observe and
note that cars travel on roads, whereas people walk on sidewalks. Attributes
such as speed, movement patterns, where an object is located
in the scene, walking gait, and other factors could be considered by
analytics for better detection.
Training, detection improvements, and greater processing power
combined with Deep Learning techniques could make near perfect
accuracy a future reality for VCA.
This article originally appeared in the April 2018 issue of Security Today.