Security and Surveillance Cameras are Uniquely Positioned to be Enhanced with More Intelligence

Capturing depth in addition to the usual two dimensions is already a must-have feature for systems where depth-sensing information combined with computing power enables functionality such as face-based unlocking, with additional AI that can distinguish between people, animals, and vehicles as well as recognizing familiar faces.

Depth Carries Critical Information
3D imaging is inspired by the most complex imaging device: the human eye. We have natural depth perception capabilities that help us navigate our world.

Many of today's devices translate the 3D world into a 2D image using 2D image recognition-based computer vision. The limitation in 2D technology is that a flat image does not recognize how far apart objects are inside a given scene.

Many cameras use a passive infrared (PIR) motion detection scenario to wake the camera or alert on activity. The PIR sensor only senses thermal motion changes in the scene. Changing the sensitivity setting only changes the level of thermal change to alarm, not how near the object is to the camera. In this case, a truck on the street, or a pet near the camera can cause false alarms. Too often, to remove these types of false alarms, we end up turning the sensitivity down to the point which prevents the system from detecting human presence.

Computer vision motion detection allows for greater analysis of the scene and for identification of the subject through advanced features such as person detection or facial recognition. This requires the full camera to run and to process the scene to determine if the objects of interest are within the scene. Each camera is then using a non-trivial amount of processing and can be fooled by pictures of human subjects. An additional issue with these features is their reliability which depends highly on the specifics of the algorithms used.

Using 3D imaging systems, one can add distance or depth information to each pixel on the RGB image and accurately measure the distances between objects in a complete scene in real time. When a device reads images in 3D, it will not only know the color or shape information from the flat image, but it will also know the positions and size information.

A 3D camera has the ability to detect how far the subject is located from the camera, as well as detect what the subject is, a person, pet, or a truck, with fewer false positives. Only once a verified human subject gets close enough, the system may proceed with biometric identification.

A 3D imaging solution can also boost anti-spoofing capabilities, particularly attempts to thwart facial recognition systems using a photo, whether a hard copy or an image on the display of a smartphone.

Depth perception enables 3D camera applications in a variety of scenarios such as human presence, motion tracking, activity monitoring, smart alerts, physical access, device access, and gesture recognition.

How is 3D Imaging Implemented?
There are various active and passive 3D imaging solutions available ranging from structured light (IR dot-projectors and IR cameras) and stereoscopic camera systems, to time-of-flight dedicated sensors.

Active systems use various methods for spatially or temporally modulating light, such as time-of-flight and structured light. Passive methods include stereo, depth from focus, and light field, where ambient or broad fixed illumination is trained on the object.

In both active and passive 3D imaging systems, reflected light from the object being illuminated is captured by a CMOS-based camera to generate a depth map of the structure, and then a 3D model of the object.

Standard Passive Stereo Vision
This approach relies on ambient light and uses two cameras located at fixed distance apart to capture a scene from two different positions. Using triangulation, depth information is extracted from, say, the left-right image disparity by matching features in both images. The range of depth a stereo system can accurately detect depth is based on the sensor, lens, and distance between the two cameras. The closer the cameras are to each other, the nearer the depth range.

Changes in the distance from the two cameras after deployment, whether due to thermal expansion, being dropped or struck, will impact depth accuracy. Passive stereo only finds depth points where the two cameras can ‘see’ the same point in the scene. When there are regions in the scene that are homogenous in response (e.g. flat single-color walls), there are no single points for the system to converge on. This means there are regions within the scene where no depth information can be determined.

Active stereo. To provide depth in the regions where passive stereo provides no depth information, a patterned light can be used. In addition to the stereo cameras, a light source that puts out a pattern of light onto the scene is included. This light source reflects points off the parts of the scene that are homogenous in response, essentially adding artificial texture, which can then be detected by the stereo systems.

Structured light. In addition to the camera itself, a structured light system adds a patterned light source/ projector to illuminate the scene projecting a specific pattern. The pattern of this illumination is distorted by the surface of the object, and from this distortion a depth map of the object can be calculated through triangulation. Like stereo, the solution relies on knowing the exact distance between the camera and in this case the light source, rather than a second camera.

Typically, this is not the same camera used for the 2D color image because the pattern of light would be visible. Instead, it uses a separate monochrome sensor coupled to a patterned lighting working in the near IR range, beyond the wavelengths a person could see.

These systems rely on capturing reflected light off objects and they tend to have problems when working in sunlight when parts of the scene have too much light on them to distinguish between where the pattern falls and where there is no patterned light.

Since structured light is dependent upon the distance between the camera and the light source, its accuracy will be impacted if this distance changes. The ranging limit is also dependent upon how bright the source illumination is and the reflectivity of the object in the scene it impinges upon.

Time of Flight (ToF) is a technique of calculating the distance between the camera and the object by measuring the time it takes the projected infrared light to travel from the camera, bounce off the object surface, and return to the sensor. As light speed is constant, by analyzing the phase shift of the emitted and returned light, the processor can calculate the distance of the object and reconstruct it.

Typically, the light is a single pulse that illuminates the entire scene at the same time. As this system is also generating light, it can be subject to the same scene lighting issues described for structured light.

Additionally, the ToF sensor pixel typically is a very different architecture from a conventional color sensor pixel, and so it requires a special sensor just to detect depth. ToF systems have the same ranging limitations based on the intensity of light generated as described in structured light.

Alternatively, a direct ToF system can use a time-to-digital converter to analyze a histogram of the returned light signal from objects in the scene, as is the case for many of the proximity sensors used in smartphones and gesture recognition devices.

LIDAR. A problem with the ToF approach is the sensor will receive more than just the intended reflections from the pulse of light, it will also receive off-angle reflections and multiple reflections. LIDAR addresses this issue by not illuminating the whole scene at the same time, but by sending a pulsing beam to a specific point in the scene. This way only a single return pulse is captured at any one time. Because LIDAR uses a single beam, its depth range can be much greater than ToF or structured light.

It's important to note that active solutions that use structured light and time of flight measure depth but do not necessarily produce conventional 2D images. On the other hand, since multi-camera solutions such as stereo and depth-by defocus depend on light intensity data, they can only natively record 2D images and cannot directly measure depth.

Additionally, the solutions that generate light produce much lower depth resolutions, due to either the finite number of dots that a structured light system can generate or the size of the specialty pixels the ToF and LIDAR systems use. For most solutions, the depth resolution of these active illumination approaches is 1 megapixel or less.

All these implementations use complex hardware that can influence the industrial design choices of a security camera and significantly increase its bill of materials cost. They include multiple RGB or IR cameras, specialty sensors, and require a relatively high level of computation.

A new and simpler approach was recently introduced by DepthIQ™, a passive single-sensor solution developed by AIRY3D, which uses diffraction to measure depth directly through an optical encoding mask applied on a conventional 2D CMOS sensor. Together with image depth processing software, DepthIQ™ converts the 2D color or monochrome sensor into a 3D sensor that generates both 2D images and depth maps which are inherently correlated, resulting in depth per pixel. DepthIQ™ promises to enable high-quality depth sensing at a much lower cost, with very light-weight processing, and without limitation on the resolution of the sensor.

When choosing between imaging systems for security camera applications, several factors must be considered.

Depth performance and anti-spoofing capabilities. Even though biometric technology increases the security of systems that use it, they are prone to spoof attacks where attempts of fraudulent biometrics are used. Using 3D cameras is one major anti-spoofing approach, where the system uses 3D cameras to calculate the depth of the scanned face and determine its authenticity.

Tests to assess depth information in face authentication techniques and evaluate their capability in distinguishing fake biometrics have been previously performed in the industry. The best results were obtained by 3D solutions capable of detecting a subject’s real face in contrast to a flat picture of a subject’s face, a subject’s face displayed on a screen, or a video of the subject, regardless of the subject’s head orientation, facial movement, and distance from the camera.

The depth range of the 3D camera is also an important factor, particularly where occlusion zones (“blind spots”) that limit the ability to collect 3D range information are concerned. With stereo solutions, one can set the range of depth detection (near & far) but all stereoscopic imaging systems and active illumination systems have inherent dead zones with little or no 3D information. In particular, some objects in the foreground may be visible to only one of the two cameras.

Structured light and ToF implementations are also affected by limitations due to geometrical occlusion zones in the scene, if one object blocks the light from getting to a second object. However, it should be noted that the baseline between stereo cameras is effectively replaced in a ToF system by the separation between the ToF light source and the ToF receiver. Thus, an inherent geometrical limitation remains for objects in the foreground. The difference between a ToF system and a stereo system is that the foreground depth accuracy is limited by the response time for the return signal phase detection (or time-to-digital conversion in direct ToF systems).

In contrast to the challenges with accurate detection of nearby objects in both of the above two-device geometries, stereoscopic structured light and ToF, DepthIQ uses a single CMOS image sensor and has no built-in occlusion zones. The geometric design of the monolithically integrated encoding/diffraction mask is optimized to provide the greatest possible depth sensitivity and this accounts for both the numerical aperture of the objective lens and the chief ray angle of the microlens array patterned on the pixel array of the image sensor.

Sunlight interference. As mentioned, light sources such as sunlight or reflections off shiny objects can saturate cameras. This limitation becomes especially problematic for systems that rely on structured light and time-of-flight for 3D image capture. The main mitigation strategy for bright sunlight for structure light and ToF systems relies on narrow bandpass optical filters to limit the spectral range of the image to a wavelength region near 940 nm or 850 nm, chosen to be compatible with the artificial light source, typically a vertical cavity surface emitting laser (VCSEL). Thus adding cost and while improving performance under sunlight, it still has issues under bright sunlight. Conversely, low light levels can produce noisy images that confound the depth computation algorithms.

The same limitations do not apply to DepthIQ and stereoscopic techniques, since these techniques fundamentally rely on some form of A-to-B image disparity (that is, left-right or up-down angular separation). Generally speaking, stereoscopic and DepthIQ solutions perform well in sunlit scenes, unless large areas of the image are badly overexposed.

Computation, power consumption, footprint and cost. Very low power consumption in a compact, low profile form factor is essential in embedded computing applications.

Passive sensing mechanisms use less power than active depth sensing camera systems. Active solutions require a high level of computation, which is taxing on the image signal processor. In contrast, DepthIQ allows for perfectly aligned 2D and 3D data and most of the computational burden is eliminated by having the physics of light diffraction contribute the disparity information on a pixel-to-pixel basis. This renders the extraction of depth information from the raw 2D image somewhat analogous to the demosaicing of a Bayer pattern color filter array.

The number of components used weighs heavily on the footprint of the solution and the associated cost. Traditional 3D solutions require multiple components: either two image sensors in a stereo camera, or a single sensor with a structured light projector, or a time-of-flight detector accompanied by one or two light sources. In comparison, DepthIQ uses a single existing 2D CMOS image sensor to generate both the 2D image and the 3D depth map.

With an increasing usage of smart cameras for security and surveillance purposes in residential, commercial facilities, and public spaces, no one solution will perfectly fit all applications. There may be cases when a specific application may require the combination of two or more 3D solutions for best results.

For example, a stereo solution could be used in a long-range application such as airport security in conjunction with DepthIQ. In this hybrid scenario, the DepthIQ-enabled sensors will serve to enhance the RGB sensors to improve close depth and extend far depth, fill in occlusion zones, simplify computation, and monitor drift. This opens up cost-effective, low-compute extended 3D capabilities in physical security applications wherever conventional CMOS image sensors are in use today.

Featured

  • Choosing the Right Solution

    Today, there is a strong shift from on-prem installations to cloud or hybrid-cloud deployments. As reported in the 2024 Genetec State of Physical Security report, 66% of end users said they will move to managing or storing more physical security in the cloud over the next two years. Read Now

  • New Report Reveals Top Security Risks for U.S. Retail Chains

    Interface Systems, a provider of security, actionable insights, and purpose-built networks for multi-location businesses, has released its 2024 State of Remote Video Monitoring in Retail Chains report. The detailed study analyzed over 2 million monitoring requests across 4,156 retail locations in the United States from September 2023 to August 2024. Read Now

  • Gaining a Competitive Edge

    Ask most companies about their future technology plans and the answers will most likely include AI. Then ask how they plan to deploy it, and that is where the responses may start to vary. Every company has unique surveillance requirements that are based on market focus, scale, scope, risk tolerance, geographic area and, of course, budget. Those factors all play a role in deciding how to configure a surveillance system, and how to effectively implement technologies like AI. Read Now

  • 6 Ways Security Awareness Training Empowers Human Risk Management

    Organizations are realizing that their greatest vulnerability often comes from within – their own people. Human error remains a significant factor in cybersecurity breaches, making it imperative for organizations to address human risk effectively. As a result, security awareness training (SAT) has emerged as a cornerstone in this endeavor because it offers a multifaceted approach to managing human risk. Read Now

Featured Cybersecurity

Webinars

New Products

  • Mobile Safe Shield

    Mobile Safe Shield

    SafeWood Designs, Inc., a manufacturer of patented bullet resistant products, is excited to announce the launch of the Mobile Safe Shield. The Mobile Safe Shield is a moveable bullet resistant shield that provides protection in the event of an assailant and supplies cover in the event of an active shooter. With a heavy-duty steel frame, quality castor wheels, and bullet resistant core, the Mobile Safe Shield is a perfect addition to any guard station, security desks, courthouses, police stations, schools, office spaces and more. The Mobile Safe Shield is incredibly customizable. Bullet resistant materials are available in UL 752 Levels 1 through 8 and include glass, white board, tack board, veneer, and plastic laminate. Flexibility in bullet resistant materials allows for the Mobile Safe Shield to blend more with current interior décor for a seamless design aesthetic. Optional custom paint colors are also available for the steel frame. 3

  • QCS7230 System-on-Chip (SoC)

    QCS7230 System-on-Chip (SoC)

    The latest Qualcomm® Vision Intelligence Platform offers next-generation smart camera IoT solutions to improve safety and security across enterprises, cities and spaces. The Vision Intelligence Platform was expanded in March 2022 with the introduction of the QCS7230 System-on-Chip (SoC), which delivers superior artificial intelligence (AI) inferencing at the edge. 3

  • PE80 Series

    PE80 Series by SARGENT / ED4000/PED5000 Series by Corbin Russwin

    ASSA ABLOY, a global leader in access solutions, has announced the launch of two next generation exit devices from long-standing leaders in the premium exit device market: the PE80 Series by SARGENT and the PED4000/PED5000 Series by Corbin Russwin. These new exit devices boast industry-first features that are specifically designed to provide enhanced safety, security and convenience, setting new standards for exit solutions. The SARGENT PE80 and Corbin Russwin PED4000/PED5000 Series exit devices are engineered to meet the ever-evolving needs of modern buildings. Featuring the high strength, security and durability that ASSA ABLOY is known for, the new exit devices deliver several innovative, industry-first features in addition to elegant design finishes for every opening. 3