Why Computer Vision Needs Data Annotation

By Meher Dinesh Naroju Director, AI Services
People shopping in a store

Computer vision is becoming a more strategic technology in the corporate world. The global computer vision market is expected to hit $207.09 billion by 2030, registering a compound annual growth rate (CAGR) of 39.6 percent from 2021 to 2030.

Computer vision is a form of artificial intelligence (AI) that allows computers equipped with cameras to analyze visual data such as pictures and video and make intelligent decisions real-time.

It’s a powerful technology that can help businesses in many ways – such as retailers for monitoring stores to keep them safe, airports to monitor crowds in order to manage congestion, and manufacturers that need to watch production to spot defects.

Amazon, for instance, uses computer vision to keep track of constantly changing inventory levels in its Amazon GO stores. This makes it possible for Amazon to manage all the moving parts required to support its friction-free shopping process with zero cashiers required.

But computer vision requires proper training data to be effective. For computer vision to be valuable, it has to know what to watch for and analyze. Without proper data preparation, computer vision is potentially harmful because a camera using bad data will report inaccurate information.

We recently blogged about the importance of training computer vision by collecting data properly. Now, let’s take a closer look at the importance of data annotation in training computer vision models.

The Role of Data Annotation

Data collection is about recording all the information a computer vision solution needs in order to solve a problem. For instance, a retailer that wants to use computer vision to fight theft needs to feed its computer vision application a lot of footage in order to train it to know what dishonest behavior looks like (and what it does not look like).

But data collection solves only part of the challenge. Computer vision also needs data annotation. This is the process of labeling the data available in various formats like text, video or images.

For instance, let’s say an airport wants to train a computer vision model to spot a knife hidden in carry-on baggage. The model needs to know what a knife looks like. And this means all the types of knives in the world, ranging from a carpenter’s life to a Balisong. It also needs to know how to distinguish between a knife and other objects – for example, the difference between a carabiner and a fold-out knife.

Or, in a retail scenario, the model needs to understand the difference between someone taking an item off a shelf and slipping it into their purse; and a someone retrieving their mobile phone from their jacket and putting it into their purse while they are shopping.

In both examples above, someone needs to feed a computer vision app images and footage – the raw data. But it is also necessary to label, or annotate, each piece of data (e.g., a picture of a Balisong needs to be labeled “Balisong”; footage of a human being removing something from the shelf and putting it in their purse needs to be labeled as such) so that computer vision can know what to look for and teach itself to get better.

To do all this, businesses need a team of humans to take raw data, label it, and feed it into a platform for processing. This can be a labor-intensive process, and only a few companies can afford to hire such large teams of annotators. This is why increasingly businesses are looking for help, ranging from the use of software that makes the process easier to outsourcing the entire process to partners. The global data annotation tools market size alone was valued at $629.5 million in 2021 and is anticipated to expand at a 26.6 percent CAGR from 2022 to 2030. The growth is driven mostly by the increasing adoption of data annotation tools in the automotive, retail, and healthcare sectors. The data annotation tools enable users to enhance the value of data by adding attribute tags to it or labeling it. 

How to Do Data Annotation Right

Data annotation is fraught with challenges such as bias hampering the labeling process. To do it right, businesses need to:

Be Mindful

One of the biggest issues with data annotation is that businesses allow bias to creep into the labeling of content. For instance, a retailer might want to train a computer vision model to identify or frustrated customer in a store. This information could help store associates spot a customer service problem that needs intervention or, worse, someone experiencing a behavioral issue that could escalate if left unaddressed.

This model needs to know what an angry or frustrated customer looks like. But facial features can be deceiving. They need to be read in context of local cultures, too, because how people express emotions can be influenced by social norms.

The exercise of identifying a person experiencing anger or any emotion may also differ dramatically in a completely different context. Let’s look outside of retail. What if the manager of a public venue like a stadium wanted to spot potentially violent behavior in crowds at a sporting event or a music concert, or spot people having health issues in such crowded conditions, which can escape the notice of even well-trained security teams? Well, in that kind of setting, people might be naturally venting strong emotions (as anyone who has attended a heavy metal concert or a European football match can attest) that have nothing to do with real anger.

How does a business guard against creating false signals (through inaccurate annotation) or labels that unfairly single out people based on the personal biases of the annotator?

The answer is to work with a large, diverse pool of annotators who can act as a check and balance on each other. Hiring a team of annotators who lack any diversity and knowledge of local cultures will result in a solution that lacks inclusivity and therefore effectiveness.

At Pactera EDGE, we rely on globally crowdsourced resources who possess in-market subject matter expertise, mastery of 200+ languages, and insight into local forms of expression. Our team expands as needed to complete specific assignments (such as this one).

To learn more about how we practice mindfulness, read this blog post.

Use Good Data

As we all know, garbage in is garbage out. It is critical that models are fed with clean and correct data to be effective. Data annotation has the most significant role in this entire process. Every label and annotation done contributes and affects what a model eventually learns. But where does a business source its data samples – the images and video content that it needs to train computer vision model? We blogged about this important topic in a recent post. For instance, we operate our own in-house lab in Seattle, which consists of a simulated retail environment where we can re-create specific behaviors and film them depending on the problem our clients are trying to solve. Few businesses have the resources to manage this kind of lab. We can help.

Use Technology to Scale

Another major challenge with data annotation is that it needs to be done on vast data volumes. Computer vision models require large volumes of training data to predict results within acceptable accuracy ranges. Imagine the myriad categories of store merchandise and behaviors that a retailer might need to label and record to manage against theft – the library of potential data points is mind boggling! In addition, it is necessary to store, organize, and share these vast volumes of data across processes to get the data right for model consumption. This is where technology plays an important role. Our proprietary platform, OneForma, is an advanced annotation tool that has the capacity to handle such large volumes and also has built-in models to accelerate the annotation effort. By pairing OneForma with our crowdsourced team, we can handle the requirements of any large-scale effort confidently.   

An Example of Data Annotation

As discussed in this blog post, we helped a global brand improve its computer vision model to 97 percent accuracy. Our client sought to enhance its computer vision model so that it could better recognize live images of objects and the text on the objects. The client’s goal was to improve the user experience of its cloud-based image and video collection solution to help people easily navigate through thousands of stored pictures from the convenience of their mobile devices. The company approached Pactera EDGE asking to collect and curate a high volume of high-quality images. Pactera EDGE tapped into its global pool of hundreds of thousands of resources to collect live images and text in specified categories. Since quality was of high importance, Pactera EDGE:

  • Developed a customized collection tool to upload, store and classify these live images and a second customized tool to label the text and objects in the images. 
  • Trained a team of quality assurance and labeling experts to provide the highest quality deliverables. 

Pactera EDGE collected and curated images in 19 different categories, each with its own target volume and target specifications. We delivered over thousands of high-quality live images at a 97 percent accuracy rate in 12 weeks covering five continents.  

This example is illustrative of how complicated just one aspect of data readiness – data curation – can be. 

Contact Pactera EDGE 

To learn more about how we can help you meet your business goals with computer vision, contact Pactera EDGE.