bigstockphoto.com
Though video analytics garnered somewhat of a poor reputation in the physical security market initially due to their inability to deliver on the promises touted by some vendors, advancements in machine learning stand poised to revolutionize their use in the industry moving forward.

AI begins to infiltrate advanced video analytics

April 7, 2021
Artificial intelligence opens up new applications, but its evolution is not complete.

It has been 20 years since ObjectVideo, one of the first widely known video analytic companies, was founded. The company was formed in 1998 by a group of scientists who came out of DARPA and had a goal to bring video analytic technology into the commercial physical security market.

During this time, another market transformation was going on, the move from analog video to digital video for Closed Circuit Television Systems (CCTV). The adoption of digital video was met with mixed signals in the beginning. Many saw digital video as progress moving the industry forward where others were not so comfortable with the new technology and were slow to adopt. A main factor was the transition from an analog signal from the camera transmitted over coaxial cable and recorded on a VCR to connecting the camera via coaxial to a DVR (Digital Video Recorder) which received the analog video signal and converted it into digital video and saved the video as files on the hard drive.

The Evolution of Digital CCTV Technology

In the beginning of this transformation, many in the industry did not have a deep understanding of digital video and how to design and manage solutions properly while also ensuring the integrity of the video data. This introduced a new skill set that the physical security industry had to adapt to: understanding computers, networking and digital video. This demand opened the doors to several new companies that had a background in this arena. For example, Axis Communications, which started out as an IT company, transformed itself to deliver the first IP camera to this market in 1996. It would, however, be several years later before IP cameras would enter into the market with the ability to stream video and start replacing analog cameras.

The transition from analog to digital video was a key component to the rise of video analytics and how it would mature over the next several years. Now that video was being converted to digital video, the images were now stored as pixels on a hard drive. These images could be analyzed frame-by-frame by a computer; unlike analog video which stored the video as a wave on tape and required the use of a VCR to do the recording and playback of the video.

Before video analytics was introduced into the physical security market, the majority of digital video companies were utilizing techniques of various forms of motion-detection based on pixel movement. Intellex DVMS (Digital Video Management System), introduced in 1997 by American Dynamics, used motion-detection and light changes to trigger when to record video, touting the feature to save hard drive space which was a significant expense at the time. Soon after video motion detection was introduced, several companies started to emerge with more advanced techniques to analyze digital video.

The revolution of video analytics was born with features such a perimeter detection, loitering, direction flow, and object left behind or removed. These features were targeted at the security market and were seen as necessary tools to help monitor the vast number of video cameras that were being installed due to the state of the conflicts going on around the world. The hope was that the technology would alert security personnel in real time to events that would help mitigate or minimize threats to the public and critical infrastructure.

Video Analytics Arrives

Post 9/11, video analytic companies were now being asked to do more with the technology than it was capable. The demand for the video analytics was there but the underlying technology was not mature enough. There were several factors that hindered video analytics during this period, the resolution of the digital video, the compute power required to do classification of objects, and tuning the analytics were some of the main drawbacks. Unlike analog video, which measures video by the number of horizontal lines in the video, digital video is measured in resolution -- which is the number of pixels in the image or frame of video.

Less than 20 years ago, the resolution of an IP camera or a video encoder, (a device used to convert analog video cameras to digital video stream) was CIF (352 x 240 pixels) or 4CIF (704 x 480 pixels). This is approximately .1 and .3 megapixel (MP) respectively, which is how the industry typically categorizes cameras today. The quality of the digital video at this resolution was said by many in the industry to be inferior to analog. Today, we look at IP cameras with the resolution measuring in the number of megapixels with the low end starting at 1MP and ranging to in excess of 20MP.

As video analytics (VA) evolved, there were several methods used by VA companies to do classification of objects. The classification of objects in the video was mainly limited to people, vehicles and other at this time. The more sophisticated solution relied on CNNs (Convolutional Neural Networks), which was in essence a method used to translate video into a way that a computer could see and understand objects in video, also known as computer vision.

Simply stated, CNNs would translate the pixels from video into an array of numbers that a computer could understand and apply filters to the values to determine if it could recognize and place an object into a known class that the neural network was previously trained on. This technology relies on continuously processing a significant number of computations, so the server hardware for video analytics needed to be designed to handle this load.

The final drawback was the tuning of the video analytics, which could take several hours for a single video feed (even by an experienced engineer) when deployed in a complicated environment. The more sophisticated the video analytic solution was, the more capable they were at delivering success.  But this came with a cost. The time that it took to tune the required knowledge of video analytics, along with the server cost made the solutions too expensive for most security budgets. Much of the focus for VA companies at this point was municipalities, critical infrastructure, and transportation agencies. This was largely due to their ability to tap into federal grant funding that was put in place to help secure the public.

Video Analytics Redefines Itself

I entered the video analytic space in 2006 after being employed in the video management and IT industry for several years. I saw the necessity and the potential of the technology and wanted a part of being on the cutting edge of new possibilities in computer vision. I did my research before moving, watching some of the different companies as they evolved. When I was introduced to Vidient, they were able to demonstrate their technology that made me confident enough to take the leap.

It was a tough time for video analytics and, as I quickly found out, the technology in general was failing to satisfy the consumers. VA companies were able to demo the technology successfully, but delivering in a live environment became a challenge. The system integrators and end users would become frustrated with the amount of time and tuning it took to deliver the solution. Even once the solutions were commissioned, the number of false alerts and constant re-tuning lead to many projects being abandoned.

Being a sales engineer was no easy task, constantly having to prove the differentiator of the technology you represented and make a potential customer and the integrator trust that your solution could deliver. Vidient was able to win several medium to large projects and deliver successfully, including a tunnel intrusion solution at Montreal Metro. However, the company eventually lost funding in 2010 finding no easy path forward due to the cost to implement, and what the market was willing to bare to implement video analytics.

During this time the video analytic industry was trying to redefine itself and began to fracture, splitting in several different directions. One philosophy was to integrate video analytics on the edge by embedding in IP cameras or encoders to bring down the cost and make it easy to implement. This was a simple way to add video analytics because the camera was being deployed and the cost was significantly lower than a server-based solution. This had its place in the market but only had the processing power to do simple video analytics and if it did not work well it could just be disabled. Other companies were looking toward reaching other markets such as retail, tailoring solutions to fit.

For example, using heat maps to do market research on product displays and understanding traffic flow. Other offerings were people counting, slip and fall or trying to detect theft in retail or warehouse environments. This shift also demonstrated that the technology could provide a return on investment (ROI) where previously it was just thought of as cost to an organization that lived in the security budget.

New Solutions Emerge

A new company emerged in 2008 stating that It was the first video analytic company to use Artificial Intelligence (AI) as the basis for its video analytic platform. The claim was that the technology was autonomous using AI to do self-learning of the camera’s field of view and alert on anomalies. Many in the industry were skeptical of these claims and would challenge the validity of the analytics. There was also the other side that embraced the possibilities, hoping that this solution was the future of not only video analytics but AI in general.

The technology was marketed as Behavioral Recognition hence the name of the company BRS Labs. The main selling point of the technology was that it could be deployed in environments that had a large number of cameras in a matter of days and after a self-learning period (no human intervention), the analytics would alert on anomalies. Although the description of the technology did what it said; there was no method in the AI to distinguish what events would be relevant to a human monitoring the system.

Behavioral analytics struggle today to gain a significant market share after several large deployments of the technology did not perform to the end user’s expectations. There have been a few more companies that make similar claims that are still on the market today, and some that claim to have a hybrid approach utilizing AI. Ultimately, I still see a need to pursue the advancement of artificial intelligence in video analytics but do not foresee that it will be ready for prime time in the near future.

GPU vs. CPU

Something that has plagued the video analytic industry until recently was the power to process the video streams efficiently and being able to apply advanced computer vision techniques that were being developed. When I started my career in video analytics 12 years ago, a typical server with dual Xeon CPUs was capable of processing about 5- 10 video streams with a resolution of CIF (352 x 240) at 10 frames per second. All the processing of the video streams was done on the CPUs and at a very low resolution.

This was a major bottleneck when trying to develop a robust cost-effective platform. In late 2014, I was introduced to a team at NVIDIA that was looking to find future markets for their technology. Until this point, NVIDIA was known for its video cards and high-end graphic/gaming cards which use Graphical Processing Units (GPUs) to process video. The vision was to have the GPUs compete against CPUs for certain types of processing that it could perform much more efficiently, sometimes by a factor of up to 20X or more.

Today, GPUs are being utilized for several different uses, including many machine learning and artificial intelligence applications along with crypto currency miners and to build super computers. GPUs played a key role to the advancement of video analytics, giving developers the ability to process higher resolution video streams with resolutions that today approach 4K for standard use. A 4K video stream has a resolution of 4096 X 2160 pixels, which is 100-times more than CIF, was the typical resolution being processed by video analytic software in the early years.

Deploying multiple GPUs and utilizing NVIDIA’s CUDA technology (parallel computing platform and application programming interface model) in a server gave developers the ability to harness enough compute power to move the technology into a new generation of video analytics and accelerated a deeper level of machine learning. Now developers have the power necessary to track hundreds of objects in a scene and build training models that can classify hundreds of different types of objects - as opposed to just a handful a few years back.

New Applications Develop From Advanced Technology

This is now opening the door for new uses and applications of video analytics. Video analytics are now being deployed as part of SmartCities solutions to help cities understand traffic flow and aid in parking, along with safety and security. Amazon is using computer vision and deep learning as part of an experiment in a test store in Seattle; as you shop and pick items from the shelf they are added to your basket automatically, and when you leave the store you are automatically checked out and your account is charged.  

Irvine Sensors Corporation (ISC) has been working with BART (Bay Area Rapid Transit) to help them quantify their fare evasion issue by utilizing its People Counting analytic on the fare arrays and service gates. The technology helped BART realize the scope of this issue, which is estimated to be 26 million dollars annually. ISC, in partnership with BART, will be using the information collected by the video analytic as one of the tools for business intelligence purposes as it makes modifications to decrease the fare evasion issue.

Being in the industry for the past 15 plus years, I am excited to see the recent progress that has been made with video analytic and machine learning, and hopeful that AI will make similar strides in the coming years to enhance video analytic capabilities.

About the Author:

Corey Young serves as the Product Manager – Cognitive Systems at Irvine Sensors; managing product development, third-party integrations, field deployments and sales engineering. His leadership roles within elite organizations have made him an essential part of Irvine Sensors, where he and his team drive innovation through engineering and technology, from concept to delivery.

Voice your opinion!

To join the conversation, and become an exclusive member of Buildings, create an account today!

Sponsored Recommendations