A Needle in a Haystack
Analytics helps sort through video surveillance's information overload
- By Stephen Russell
- Oct 06, 2009
London's city-wide transit surveillance system,
the Ring of Steel, includes more than 10,000
cameras. Some 3,000 cameras also have been
deployed in Chicago, with 3,000 more soon coming to
New York City.
Meanwhile, New Orleans, Baltimore, Philadelphia
and other metropolitan areas have all moved forward
with new city-wide and transit authority surveillance
projects. Internationally, Taipei recently announced
a 13,000-camera city-wide initiative, and largest of
them all, Beijing deployed a staggering 300,000 cameras
prior to the 2008 Olympics.
Collectively, more cities have deployed more cameras
for more purposes in the past few years than in
previous decades combined. And with fast networks
and high-resolution cameras, surveillance video spigots
are on and running at full blast.
Too Much of a Good Thing
Cities and transportation authorities are fast discovering
that in addressing security and safety problems
through video surveillance, they have created another
challenge: information overload. It's the same problem
folks at the National Security Agency, at the CIA
and in signal intelligence realized years ago after proliferating
their countless satellites and listening posts.
What were they going to do with all that information?
How could they sift through countless hours of nothing
while looking for that rare something? Today this
problem has led many to question the ultimate effi-
cacy of video as a security tool in human and cargo
transit systems.
In London, opponents of the Ring of Steel argue
that despite the 10,000-plus cameras, more than 80
percent of crimes remain unsolved. After the London
bombings in 2005, it took thousands of investigators
more than six weeks to comb through the city's vast
surveillance archives looking for clues. Clearly this
isn't the kind of effort that can be deployed in more
routine circumstances. It's no wonder that despite
massive camera proliferation in London so many
crimes go unsolved.
In some enterprising small cities, such as Lancaster,
Pa., citizens are given the ability to access and manipulate
cameras in an attempt to address these kinds
of problems, giving new meaning to "neighborhood
watch." Still, many studies question whether enough
has been done to really make video surveillance effective.
For instance, recent studies in San Francisco and
Los Angeles claim zero impact on crime.
This will remain a problem so long as these deployments
are hampered by an unmanaged and unmanageable
deluge of video. You simply can't find what
you can't see.
The critical question that no one is asking is how
to make the captured video relevant to safety and security.
While many articles and studies focus on how
to network cameras effectively, how to cover hardto-
reach spots with specialized cameras, whether to
locate intelligence at the edge or on a server and how
video analytics will magically find a terrorist in the act
of contemplating a catastrophic attack, no one has
put forward a workable model for making video useful
in a transportation safety scenario.
How will video technology stop the assault of a
citizen? How can we foil the plots of those bent on
mass destruction, and how can we do it in such a way
that salvages the millions of dollars spent to implement
these extensive camera systems?
The Role of Search
The problem with big video really comes down to its
inherent lack of structure. Analog or IP, standard resolution
or megapixel, surveillance video is all essentially
unstructured stuff. It contains no notes, no tags
or descriptions and no keywords to help you separate
what's important from what is not. That is a problem.
It means that for the most part, video is useless without
a person to make sense of it, and we just don't
have enough people to keep up with all the video we
are generating. We've deployed about 30 million cameras
in the world; that works out to be more than 250
billion new hours of recorded video every year.
But if we could teach computers to make sense out
of video, even a little, we could make it searchable.
And a good search engine, we have learned, can change
the world. Imagine the Internet without a search engine.
There are more than 10 billion Web pages. Users
would sift aimlessly through content, rarely striking
on something useful and relevant to their interests.
If they did, they would not be able to indicate their
interest in topics like this and find similar content. In
short, search makes the Internet usable.
Yet, somehow this information revolution that improved
productivity in other knowledge management
industries has yet to transform our approaches to surveillance.
Law enforcement and security staff are often
left to sift through countless hours of video footage
to pinpoint those vital few moments of a crime.
Once found, the process of finding more video related
to the same person or event is often just as arduous as
the first inquiry.
Video search technology exists to make video relevant.
Searching for an event by time and place, license
plate, serial number, face, color, toll transaction
or other relevant data point can rapidly narrow the
video data down to a volume that can quickly be sifted
through and analyzed by the human eye.
Take, for example, a purse that is stolen on the subway. The victim reports the time and location of the assault, but the actual
assault did not happen in a camera's field of view. In most instances, the victim
would have little recourse other than to complete a crime report.
With search-based video surveillance, however, the investigating party could
rapidly pull up video from motion events in the surrounding area and times, broadening
the search to achieve more results presented in an easy-to-scan form. In a
few seconds, they spot the perpetrator exiting the subway with the purse. Without
search, the operator would only have been able to find this video if he had the time
and willingness to manually review video from each possible camera feed.
Faces, License Plates, Colors and More
Search engines like Google rely on things like keywords and page ranks and the
fact that Web page text can be analyzed and cross indexed to make meaningful
searches possible. Video is completely different, but it too can be analyzed for
things like faces, license plates, colors and object tracks. When this data is processed
and cross indexed, an incredible understanding of activity and identity can
emerge from video.
Once relevant video analytics is implemented and tuned, the game changes entirely
in favor of law enforcement. License plate recognition, commonly used to
stop toll evaders, is often dismissed as a crime-stopping tool because the technology
requires highly tuned and expensive cameras to be implemented. Technology
now exists that allows common cameras to track license plates anywhere from a
car rental agency to a city intersection.
While facial recognition is a dirty word in some video surveillance circles, the
technology's promise to deliver more criminals to justice is being realized in transit
scenarios. Much of this can be attributed to the dramatic increase in accuracy of
face-finding technologies in recent years. Where tests of facial recognition in German
subways in 2006 to 2007 yielded accuracy rates of around 60 percent, recent
studies conducted in South Korea show that new technologies can achieve accuracy
of closer to 85 percent with very low instances of false positives.
From Reactive to Proactive
We will all know that efforts at developing transportation-based video surveillance
systems have been successful when we read the headline "Terrorist Plot Foiled,
Suspects Captured using Video Surveillance, No Citizen Harmed."
For this to happen, search is simply not sufficient because it is by its nature a
forensic tool. But, if you turn search on its head, you have alerts. Just as Google
can send you information pertaining to topics of interests, advanced video surveillance
systems can provide alerts related to events, people, cars and other
items of interest.
For transit security professionals, this answers the question "What should I be
looking at?" Rather than asking security staff to stare into monitors hoping to
see something that may or may not be happening, why not provide that staff with
a steady stream of events (e.g., faces, motion and license plates) that indicate an
activity of interest?
Visiting the example of our ill-intentioned thief again, let's assume he actually
got away with stealing the woman's purse. We now have isolated three instances of
him on video that provide evidence of three different thefts. Flagging this video, we
ask the system to alert us the next time he enters the transportation system.
True to his pattern, the thief returns to the subway and attempts to position
himself in the same off-camera location where he committed his other crimes. This
time, monitoring staff are alerted to his presence before he has a chance to act, and
they are able to apprehend him.
While video analytics can surface suspicious activities, objects and people, the
trick to making alerts work is managing false positives. If every alert turns out to
be a false positive and actual criminal behavior is missed, then the system is no
more useful than a sleeping security guard. In the Korea subway case mentioned
above, each time a watchlist suspect entered the subway, the system accurately
alerted staff to the presence of the individual. While alerts did come through
that turned out to be false, the volume was at a tolerable level that did not reduce
the efficiency of the operation.
Key Considerations
"The Minority Report" is just a movie,
and what you see on "CSI," "NCIS"
and any other crime television drama is
fake. Though these images often form
the basis of what users expect from a
system, they are not realistic use cases
for video technology. However, never
before have so many valuable use cases
been within our grasp, allowing safety
and security professionals to gain the
upper hand and narrow the chances of
a criminal's escape.
To successfully leverage intelligent
video in a transportation setting, it is
important to simplify and integrate.
Find a system that provides a platform
for the integration of a wide variety of
cameras, data systems and analytics. By
starting with a platform that is easy to
integrate to, the system will be able to
fl ex to meet several use cases.
Use cases are the crux of an effective
implementation. Transportation surveillance
focuses on the rapid motion
of a large number of people and objects
over a wide area. Too often, video
systems are installed in such a way that
they see too much of the big picture and
not nearly enough of the detail. While
tracking a wide area is a catch-all use
case, it does nothing to solve the problem
of too much video. To create an effective
use case, institutions should not
implement a wide-reaching dragnet, but
rather should address individual cases,
develop, implement and tune them until
they work.
As New York found in the 1990s,
the key to overall crime reduction is to
crack down on specific small crimes—
for example, graffiti, littering and public
drunkeness. Define the use case, how
you would solve it and then determine
the proper equipment, search tools and
analytics that will allow your organization
to address the behavior effectively
from a reactive and proactive stance.
Toll jumpers, freight from a suspicious
location or unattended bags are good
examples of use cases that can be addressed
and managed with video analytics.
Once you conquer one, move on
to another. Soon enough, you will see
that you have solved several individual
problems, but what you have really
done is created fewer options for criminals
to destabilize transit systems and
endanger the people and cargo that
move through them.
Surfing the Tidal Wave
We stand at a turning point in the
development of video as a tool to stop
crime in our nation's transportation
systems. A proliferation of cameras
has created a tidal wave of near indecipherable
visual information. To surf
it, we are going to have to combine the
knowledge from our intelligence communities,
the most innovative system
design, and the best and brightest from
the IT industry.
It will require us to embrace and
hone new tools and technologies,
teaching the right video to surface itself.
But in the end, making sense out
of video and making it searchable is
the only way to begin to fulfill on the
promise of video as a tool to keep
travelers, trade and
our nation safe.
About the Author
Stephen Russell is
chairman and founder
of 3VR Security Inc.