Three Critical Questions
Knowledge you need when evaluating an AI-driven video analytics solution
- By Brent Boekestein
- Nov 14, 2022
It was never really a question “if” video analytics technology would live up to its promise as “the next big thing” in physical security; it was simply a matter as to when the industry would start adapting the solution en masse.
Apparently, that time is very near. According to Deloitte’s State of AI in the Enterprise 5th Edition Report published in October, 94% of business leaders surveyed feel that AI is critical to the future success of their organizations. IBM’s Global AI Adoption Index 2022 goes on further to report that 35% of companies surveyed reported that they are already using AI today, and that an additional 42% reported they are exploring AI for future deployment.
With AI changing the way large-scale corporations implement and streamline their processes, measures, and protocols, it is evident that AI technology will continue to play a large role across the enterprise.
None of this comes as a surprise for security professionals who are actively exploring the far-reaching capabilities, accuracy and performance of new Wave 2 video analytics to deliver higher levels of analysis and understanding of event detection, classification, tracking, and forensics. And as one would expect, such intense interest in video analytics comes along with a heightened level of competition and performance claims, many of which are misleading at best, and have a potential to reintroduce the same level of skepticism that dogged video analytics for years.
This makes the process of vetting the best possible video analytics solution a critical task, one that starts with asking the right questions. To get the process started, here are three fundamental questions you should ask every AI video analytics provider to help gain a better understanding of their specific solution.
Where does your analytics training data come from? AI-based analytics rely upon models that use training data that learns patterns used to perform a number of different tasks including image detection, recognition, classification and more. To ensure that systems are accurate and effective, these patterns must have a strong correlation to data analyzed in the real world. An analytics solution that lacks a homogenous distribution in terms of the quantity and quality of these patterns, ultimately results in suboptimal performance.
One common issue related to training video analytics to detect specific events stems from the use of biased data sources. Reducing the effects of biases can help mitigate any unnecessary negative effects on people affected by the AI technology itself. For example, training models that use publicly available images to establish their face recognition models result in thousands of shots of people who are often in the public eye such as sports stars, politicians and actors, who may not represent what "average" human being like you and me.
Eliminating these potential analytics biases requires the proper training of AI algorithms to minimize human and systemic influences. This requires the development of algorithms that consider several different factors beyond the scope of the technology itself. This form of synthetic data training enables algorithms to create any desired detection scenario, free from nuances.
In addition, with many open-source computer vision algorithms designed for generic applications, they are incapable of automatically identifying when a very specific event takes place. A good example is the ability to detect when an expensive piece of equipment like a neo-natal ultrasound unit removed from a designated area. If the video analytics solution has been adequately “trained,” it will autonomously detect such instances and alert operations and/or security staff that the unit has been removed from a sanctioned area. The same analytics can be used to help locate the equipment’s whereabouts so it can be retrieved and placed where it belongs.
A more generic form of analytics such as object detection cannot efficiently be implemented for such a highly specific application. Some form of synthetic analytics training is required for such levels of specialization. Gartner projects that by 2024, 60% of the data used for the development and training of AI and analytics projects will be synthetically generated. Hence, knowing the source of the training data driving a video analytics solution is a critical evaluation criterion in determining its ability to detect specific anomalies.
From where did your model architecture come – was it open-source code or written by the provider? Using a purpose-built video analytics solution based on computational efficiency and accuracy provides the metrics and data needed to scale security applications quickly and easily in a hyper-efficient manner. Computing data efficiency and accuracy is measured by using a number of standardized validation metrics, including Common Objects in Context (COCO) or Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) datasets. Think of these testing standards as the equivalent to requesting fuel efficiency in a recently purchased vehicle, or using Bayes’ theorem to test the accuracy in diagnosing medical procedures.
COCO is a large-scale object detection, segmentation and captioning testing standard that uses high-quality datasets for computer vision created with the sole goal of advancing image recognition using state-of-the-art neural networks. Testing standards are used as training datasets for image segmentation into deep learning models, and as a benchmark for comparing the performance of real-time object detection. Datasets are used to train and measure algorithms designed to perform tasks such as object detection, object instance segmentation and stuff segmentation.
KITTI is another data-validation technique tool that is popular within datasets for use in mobile robotics and autonomous driving. Using nearly 130,000 images, the KITTI benchmark contains many tasks such as stereo, optical flow, and visual odometry to help validate large datasets.
As the gold standards of video analytics measurement, COCO and KITTI can be used to ensure datasets are efficient and accurate prior to implementing hyper-efficient scalability. Using purpose-built solutions created with COCO and KITTI datasets ensures that a video analytics solution can be easily scaled for various applications. Such testing standards are being applied to validate new Wave 2 video analytics that employ synthetic, high-quality training data.
Such powerful new Wave 2 video analytics can be used in new ways to facilitate the deployment of accurate, efficient and scalable AI algorithms for specific analytics applications. Consequently, new Wave 2 video analytics consistently outperform open-source models such as YOLO and SSD, providing faster, more-accurate and more-scalable video analytics solutions for specific security and business intelligence applications.
How do you measure video analytics performance? The ability to determine performance is based on accuracy: how many individuals and/or events were properly detected and identified over a specified time. This applies to both new events and recurring events, such that a blue golf cart is always a blue golf cart with two golfers and so on.
Once a specific object has been detected, the video analytics machine learning architecture should then be able to provide more details about what is going on in the scene. This includes extracting fine information such as an individual’s gender, the type of vehicle and its specific color, as well as the ability to track specific individuals and/or objects in a given scene and across multiple scenes over time. This allows the creation of advanced knowledge graphs that correlate people with objects in space-time domains, providing a new level of insight and event analysis.
By associating a unique digital signature to each object detected, new Wave 2 video analytics employ a deep learning model trained to detect changes in illumination, angles, fields of view, resolution, body positions and poses, weather conditions, etc. This means that two detections of the same object/person/face that are captured by two different cameras can correlate the two otherwise different signatures. This allows the video analytics solution to use samples that analyze new training without adding extra computational time. This provides new Wave 2 analytics with a smarter approach to training data that is faster and more accurate for professional security and business intelligence applications.
Although the science of AI-driven video analytics has been around for many years, it continues to rapidly develop and mature for real-world applications, creating high demand and interest, and lots of confusion. Although the three relatively simple questions raised here demand somewhat complex answers, they set the stage for when evaluating and comparing different solutions. The video analytics provider that takes the time to delve into these issues with documentation, and proof of performance examples, is the one that you should trust to deliver the best return on your security investment.