Everything in Between
Edge computing: the evolution of video content analytics
- By Tom Edlund
- Jun 01, 2019
Today, artificial intelligence and machine learning are
backing a wide range of technologies and applications,
powering diverse solutions to a broad set of challenges.
For video analytics, deep learning has accelerated the
technology’s evolution, particularly when it comes to
accurate detection. AI-backed video analytics enable object extraction,
recognition, classification, and indexing—activities that can
advance various business and security applications by making video
searchable, quantifiable and actionable.
This article will review the factors driving the adoption of deep
learning-based video content analytics, the creation of more sophisticated
cameras and higher resolution video and, ultimately, the resulting
need to identify efficient data processing and computing solutions
to support these changes.
Video-Based Alerting:
From Computer Vision to the AI Age
When video analytics first emerged, products were primary designed
as alerting solutions. Through triggering calls to action, these early
solutions were attempting to eliminate the need for active human video
monitoring. However, these computer vision-based solutions did
not fully achieve the aim of removing human involvement in video
surveillance and oversight: For one, these video alerting technologies
tended to produce false positives and inaccurate matches for video
search criteria.
An alternative approach adopted by other solutions was to maintain
and maximize human involvement in the video surveillance
process: These interactive video solutions didn’t focus on entirely removing
human operators from surveillance monitoring but strived
instead to accelerate video review for users and make it easier to
understand whole scenes captured by video. While alert-based video
monitoring yielded imprecise results, solutions that streamlined users’
comprehension of entire video scenes enabled operators to overcome
video-based alerting limitations and quickly identify critical
information in captured video.
The Renaissance of Alerting:
New Innovations and New Challenges
The introduction of deep learning-backed video analytics revolutionized
the video content analytics industry, driving more accurate detection
capabilities and precise alerting. The demand for higher quality
video analytics, among other considerations, catalyzed the development
of more sophisticated cameras, as well as end user adoption
of more cameras to optimize real-time alerting. These developments
have enabled capabilities such as people counting and face recognition-
based alerting: Higher resolution video makes it possible to
more accurately distinguish between people in crowds and capture
individual faces, which could then be analyzed by state-of-the-art
analytics to trigger real-time alerts when certain conditions are met.
Furthermore, beyond alerting, deep learning-driven solutions make
it possible to leverage the valuable and powerful video metadata to
drive deeper insight in other ways, such as business intelligence and
trend visualizations.
While driving the deployment of real-time, deep learning-based
alerting solutions, the proliferation of cameras with higher resolution
video also drove up the total cost of ownership for video surveillance
integrations. Specifically, these conditions entail higher processing
demands and hardware requirements. For real-time video analytic
solutions such as face recognition alerting, better accuracy drives up
operating costs—a new challenge that must be overcome.
Lowering the Increasing
Cost of Computing
Current video analytics research and development is focused on lowering
the cost of processing. Whereas today’s deep learning driven
video content analytics are mostly based on GPU computing, looking
forward, solution innovators must consider continual improvements
to camera technology, increasing availability and volumes of
high-resolution video, and powerful, deep-learning-driven video analytics,
and determine which processing model could best keep costs
down: edge or centralized computing?
Because there are advantages to both options, it’s important to
understand what makes edge processing and centralized computing
respectively effective. Today’s leading solutions rely on centralized
computing for several compelling reasons.
Flexible resource allocation. Today, organizations rely on large
video surveillance installations with multiple cameras. At different
times, each camera will have varying levels of activity, and by
distributing processing with centralized computing, lagging can be
prevented. Centralized computing is flexible, enabling the sharing of
processing resources between cameras so that unusually high activity
can be processed without slowing down computation across cameras.
Statistically, relying on more video streams increases the likelihood
of maintaining a steady state.
By contrast, edge computing is rigid, requiring pre-defining computation
resources and scenario specifications, such as activity and
resolution, which are dynamic conditions. When working with edge devices,
users or developers must decide up front whether to provision for
normal situations—in which case there is a risk of lagging and missed
alerts during high activity scenarios—or for extreme situations—driving
up costs, because resources are often idle and do not require the
allocation of high processing resources.
When one camera is driving up activity and time is of the essence,
overloading processing requirements could cause alerts to be delayed
or missed when they matter most. By distributing the computing, lags
in alerting can be prevented, timely processing is ensured, and lower
processing costs are maintained. The ability to flexibly distribute processing
with centralized computing is more beneficial to deployments
with more cameras.
Broader coverage of analytic capabilities. On-camera analytics
require pre-configuring the specific analysis activities for edge processing.
Edge devices are typically designated for dedicated purposes,
and the range of analytic activities that can be completed per device
is limited. Because of device memory constraints, at the onset of deployment,
the user will need to manually configure the relevant analytics
based on the camera location. If the camera points to an area
where faces can be viewed in high resolution, the device will likely be
dedicated for face recognition, but not for license plate recognition
(LPR) for fear of overbearing the processing load.
With centralized processing, there is no need for manual calibration.
There is sufficient memory to share different Deep Neural Networks
between video streams and cameras, so that when a person
of interest on a pre-defined watchlist passes a dedicated LPR camera—
capturing a high-resolution image – a call to action can still be triggered based on face recognition or other
analytics, even if that wasn’t the dedicated
purpose of that specific camera.
Shorter development cycle. By developing
software that can be deployed on general
purpose hardware—instead of developing
the edge hardware itself—the end product is
more broadly applicable, essentially shortening
the development cycle.
The Main Drivers of
On-Camera Analytics
Today, due to memory and computation limitations,
on-camera analytics tend to be used
for point solutions. Initially, this was limited
to motion detection, but on camera analytics
have evolved and can now identify and classify
objects such as people and vehicles enabling
advanced activity such as intruder detection,
license plate recognition and people
counting. The big question in the VCA industry
today is whether edge devices will become
sophisticated enough to enable general
purpose identification, extraction, tracking
and classification of all objects in the video.
There are three main considerations driving
development towards the edge.
Higher Demands for Real-Time Processing.
Today, there is a higher volume of realtime
data processing from a higher number
of cameras. Because of these increasing demands,
technology providers can justify the
large initial investment in creating, marketing
and distributing smarter cameras to meet
the demand.
Deep learning driving AI chip development.
Now that deep learning is considered
standard for video analytics enablement
leading hardware providers are developing
dedicated AI chips. Since these chips only
support specific instructions required for
deep learning inference, they feature high efficiency,
low energy consumption and small
form factor.
Due to their flexibility, deep learning
hardware solutions are enabling broad applications.
Autonomous cars, for instance, rely
on this type of hardware, transplanting the
deep learning enabling hardware in the car itself,
instead of in a centralized server center.
Lowering costs for decoding high-resolution
video. To run video analytics, captured
video must be transmitted to recording archives,
live monitors or centralized processing
servers, requiring significant bandwidth.
By encoding video, solutions reduce transmission
costs, but then face another obstacle:
the work intensive demands of decoding
higher resolution video, such as 4K.
A byproduct of processing video on the
edge, circumventing video decoding ultimately
reduces the computational requirements
for processing the overwhelming
amounts of high-quality video data.
By the time the video captured by the
edge device is transferred to the centralized
location, it is already processed and can be
decoded as needed. For post-event investigation,
for instance, only the video for the
relevant time and camera ranges need to be
decoded. Thus, the extraction of evidence
isn’t inhibited even though the demands of
decoding have been reduced.
Balancing the Benefits
of Edge and Centralized
Computing
At the onset of 2019, VCA industry predictions
focused on pain points driving the shift
towards edge processing and cloud computing—
changes that will play a critical role in
accelerating the adoption of advanced video
content analytics. On camera analytics technologies
are focused on transforming from
point solutions to offering a complete analytics
suite, including object tracking, classification
and recognition. However, for the
foreseeable future, centralized computing
will remain critical for deriving comprehensive
intelligence from edge devices. To enable
cross-camera analytics, there must be a
centralized computing service, aggregating
insights from across cameras and feeds.
By overcoming decoding challenges, edge
providers can drive enhanced operational
reliability and processing speeds; reduced
privacy risks by transmitting only encoded
metadata; and, ultimately, accelerated migration
to cloud-based solutions. When computing
activities are limited to data and applications
and not video processing, centralized
cloud platforms become a more affordable
option for running intensive
video analytics, such
as alerting, business intelligence,
and video indexing
and search.
This article originally appeared in the May/June 2019 issue of Security Today.