Lorenzo Gravina

We found working with the Ango team professional, progressive, and scalable. We appreciate how the Ango team proactively seeks feedback and holds themselves accountable. Their QA team is careful, quick, and responsive, and their high-quality work has led to impactful and timely results.

Saman Nouranian, Director of AI Research at WEIR Motion Metrics

WEIR Motion Metrics is a leading Canadian AI company, dedicated to help mining companies get the most out of their mining operations through the use of computer vision, machine learning models, and AI.

WEIR Motion Metrics receives, from its customers, terabytes of data each week. Before meeting Ango AI, they were annotating all of this data in-house. They were able to get high quality labels, but slowly, since managing this team became a massive overhead on their operations. The team could not be scaled quickly and it became a large expense for them.

WEIR Motion Metrics contacted Ango AI in the summer of 2021, expressing the need to outsource their labeling tasks. We quickly learned how the data needed to be labeled and scaled our team to accommodate their needs. Since then, we have been annotating data for WEIR Motion Metrics, combining a higher speed, lower cost, and a higher quality than previous labeling contractors WEIR Motion Metrics attempted to work with in the past. They also said our review and delivery processes are considerably faster than others. WEIR Motion Metrics has expressed the wish to continue working with Ango AI for the foreseeable future.

The Customer

WEIR Motion Metrics is a leading AI company based in Vancouver, Canada. They are a B2B business focused on helping mining companies get the most out of their mining operations with the use of machine learning, computer vision, and AI.

They install cameras on their customers’ mining equipment, then process the images and videos they get to train their machine learning models. They then use these models to provide their customers with advanced AI-powered insights on their mining operations, such as statistics, real-time alerts, and more.

Before Ango AI

Machine learning models need to be trained before being used. In other words, if we want a model to recognize cars, we first need to feed it thousands of pictures in which the car is highlighted (e.g.: labeled.)

In WEIR Motion Metrics’ case, since ML models are their main product, data labeling is at the core of what they do. Each week they receive terabytes of data from the cameras installed on their customers’ quarries, and a good chunk of this data needs to be labeled and fed to the model for training.

Before Ango AI, they annotated all of this data internally, using a data labeling team of their creation. This led to a number of downsides which will be elaborated below.

Lack of flexibility

Occasionally, WEIR Motion Metrics would need to label more data than usual. This variation in quantity means that they would need to constantly grow their team when there is more data, then scale it back when they have less data. This process is not easily feasible for a company that does not focus solely on data labeling.

Hiring new data labeling talent is expensive, both in time and resources. And even when hired and in the team, there will be times when there is less data to label. During those times, the new team members will be paid yet with no tasks to complete as there is no data to process.

In short, WEIR Motion Metrics was stuck between two negative situations: either labeling all data as it comes, but leading to wasted human labor, or optimally utilizing the team, but without being able to label all data in time.

High overhead

WEIR Motion Metrics is, at its core, an AI company. AI is what they do best, and it is the core of their business. Having to create, then manage an internal data annotation team is an extra overhead, bringing unnecessary complexity and deviating the team away from their main occupation.

Having an in-house labeling team means having to manage each individual, involving HR and other departments in the process. Often, for higher-security scenarios, this means finding office space and thus providing various benefits to the annotators. Extra personnel needs to be hired simply to manage the team, let alone the team itself.

High complexity

Having a team in-house means having to reinvent the wheel each time. While labeling data in-house appears as if it is the easiest solution upfront, when the team grows, so does the complexity of managing it.

For example, they had to develop their own data labeling software, since open-source, off-the-shelf solutions did not work for them. This is an entire new product they had to now maintain, update, and manage.

Other complexities of doing data labeling in-house for WEIR Motion Metrics were to make sure that every labeler was on the same page and received clear instructions, that their performance was accurately measured, and more.

High cost

It comes as no surprise that hiring a dedicated data labeling team, creating a new data labeling platform, hiring people to manage such a team, as well as renting office space and purchasing equipment comes at a high cost.

After Ango AI

WEIR Motion Metrics contacted Ango AI in the summer of 2021. Realizing the costs, complexity, and inflexibility of doing labeling in-house, they expressed the wish to outsource their labeling tasks to a third party.

We quickly understood their needs and stepped in to help. We created channels of communications and met up often at the beginning of the project to establish exactly how the project was going to move forward, what they expected from the annotations, and what we could provide.

As soon as we received the data from WEIR Motion Metrics we started annotating it, and this brought several advantages to our customer.

Flexibility and Speed

Now, whenever our customer needs to label more data than usual, they simply send it to us and we take care of everything. Thanks to our flexible labeling team and hiring pipeline, we are able to scale up and down our workforce extremely quickly to deal with spikes in customer data.

This also means that when WEIR Motion Metrics has less data to label, they only need to pay for the data that is actually being labeled instead of maintaining a team of annotators in standby. If they send us less data than usual, we simply reroute our workforce to another one of our projects, resulting in a win-win for both of us.

And when the customer needs more speed, they can simply tell us and we will dedicate more annotators to their task, with no upper limit to how fast we can go. WEIR Motion Metrics has explicitly expressed that our review and delivery processes are faster than with any other company they tried before.


The head of WEIR Motion Metrics’ labeling team has said that Ango AI managed to match and exceed the quality of the labels that were produced by labeling firms they tried to work with before. This is likely because of a variety of reasons, and some are the following:

Quality-Centric Labeling Platform

Our software team is highly experienced in crafting solutions for high quality data labeling, and our software is packed with features to increase the quality of the labels: features we have added over time as we learned from all of our previous experiences. WEIR Motion Metrics are, at their core, not a labeling software company, and do not have previous experience with labeling. Thus, we were able to provide them with a high quality platform which helped in keeping label quality high.

Highly Trained Annotators

Our annotators are highly selected and trained. They can quickly adapt to new labeling tasks, and they immediately understood how WEIR Motion Metrics needed their data. This led to labeling quality to quickly get to the level required.


Since choosing Ango AI, the customer does not need to maintain an internal data labeling team anymore, saving significantly in both time and cost. While outsourcing does have a cost, we and other AI research companies estimate that the cost of outsourcing is about 1/5th the cost of doing data labeling in-house.


The overhead of performing data labeling in-house was completely eliminated. Instead of having to create an internal pipeline for labeling, all they do now is send us their data and get their labels back completely done. This way, they can focus on their main business proposition of creating world-leading machine learning models instead of dedicating time and resources to data labeling.


Image segmentation is one of the most widespread data labeling tasks, finding uses in hundreds of different ML applications. Panoptic segmentation is one type of image segmentation, and while one of the most time-intensive, arguably one of the most powerful. In this article, we’ll dive deep into what panoptic segmentation is and how you can use it.

If you need a broader overview of image annotation, check out our complete guide to image annotation. For the medical field, you may check our guide to medical image annotation.

If you need to segment your data and are looking for a data annotation platform to do it, Ango Hub has all you need to start. If instead, you are looking to outsource segmenting your images, Ango Service is what you are looking for. But let’s get back to panoptic segmentation.

What is Image Segmentation?

Image segmentation is the process of labeling an image such that various parts of the image are classified up to the pixel level, making segmentation one of the most information-intensive ways of labeling an image.

Segmented images, along with their segmentation data, can be used to train extremely powerful ML/ Deep Learning algorithms that can provide detailed information regarding what is contained in the image and where.

Image segmentation effectively classifies and localizes objects of interest within an image, making it the labeling task of choice when we need to train highly detailed detectors, and data and resources are available.

Before we delve into the details of various forms of image segmentation. We need to understand the two key concepts to further image segmentation. Any image when segmented can contain two kinds of elements:

Things (Instance): Any countable object is referred to as a thing. As long as one can identify and separate the class into objects comprising it, it is a thing.

To exemplify – person, cat, car, key, and ball are called things.

Stuff (Semantic): Uncountable amorphous region of identical texture is known as stuff. Stuff in general forms an indivisible uncountable region within an image.

For instance, roads, water, sky, etc. would belong to the “stuff” category.

Types of Image Segmentation

Image Labeling Tasks from Detection to Panoptic Segmentation – from the COCO dataset

Knowing the two concepts mentioned above we can delve into image segmentation. There are three main categories:

Semantic segmentation refers to the task of exhaustively identifying different classes of objects in an image. All pixels of an image belong to a specific class (we automatically consider some unlabeled pixels as belonging to the background class).

Fundamentally, this means identifying stuff within an image.

Instance segmentation refers to that task where we identify and localize different instances of each semantic category. Fundamentally, in instance segmentation each object even though it may belong to the same category gets a different identifier and thus appears as an extension of semantic segmentation. 

Instance segmentation thus identifies things in an image

Panoptic Segmentation combines the merits of both approaches and semantically distinguishes different objects as well as identifies separate instances of each kind of object in the input image. It enables having a global view of image segmentation 

Essentially, the panoptic segmentation of an image contains data related to both the overarching classes and the instances of these classes for each pixel, thus identifying both stuff and things within an image.

Image Classification, Instance Segmentation, Semantic Segmentation, and Panoptic Segmentation on Ango Hub

The Panoptic Segmentation Format

So how exactly do we achieve maintaining both the semantic and instance categories of the same image? Kirillov et.al at Facebook AI Research and Heidelberg University solved this problem in a very intuitive manner. The following properties exist for panoptic segmentation.

Two Labels per Pixel: Panoptic segmentation assigns two labels to each of the pixels of an image – semantic label and instance ID. The pixels having the same label belong to the same semantic class and instance IDs differentiate its instances. 

Annotation File Per Image: As every pixel is labeled and assigned its pixel values, it is often saved as a separate (by convention, png) file with the pixel values, rather than a set of polygons or RLE encoding.

Non-Overlapping: Unlike instance segmentation, each pixel in panoptic segmentation has a unique label corresponding to the instance which means there are no overlapping instances.

Consider the image above and its resultant panoptic segmentation PNG file. The panoptic segmentation is stored as a PNG, with the same dimensions as the input image. This means that masks are not stored as polygons or in RLE format but rather as pixel values in a file. 

The image above was a 600 x 400 image, and similarly, the panoptic segmentation is also a 600×400 image. However, while the input image has pixel values in the range 0-255 (grayscale range) the output panoptic segmentation image has a very different range of values. Each pixel value in the resultant panoptic segmentation file represents the class for that pixel.

How Annotations are Stored in the Panoptic Segmentation Format

Let’s dive into some Python to understand how exactly the labels are represented. The key question we want to address is: 

For any pixel value in the panoptic segmentation output, what is its corresponding class?

First, let’s check what classes we have:

We find out we have 133 classes in total, representing various categories of objects.

Now let’s go to the panoptic segmentation output. If we get the unique values of the pixels in the panoptic segmentation, we get the following result:

To get the instance and class ids for each of these pixel values here’s how we interpret them:

The instance IDs separate different instances of the same class by a unique identifier. Note that instance IDs are global, i.e. they are not unique for each semantic class, rather the instance ID is a counter for the total instances in the image. In the case above since the highest instance ID is 5, we have 5 thing-instances in total, the rest is stuff. 

Mathematically We need to decode these pixel values to get the indices of the classes that they represent. Usually, panoptic segmentation encoding is such that: pixel value % (modulus operator) offset gives us the id of the class.

Because of our mathematical operation above, 2000 % 1000 = 5000 % 1000 = 0 . Thus, we see that pixel value 2000 is actually the same class as pixel value 5000. I.e. They both belong to class 0. Similarly, values 1038 and 3038, belong to the same class of 38.  

Correlating our class IDs to the model classes we get the following output. We see that 38 is for tennis_racket, and 0 is for person class, and similarly for other classes. thus answering our initial question of what pixel values correspond to what class in the panoptic segmentation label. 

Image from the first paper on the Panoptic Segmentation 

Frameworks for Panoptic Segmentation

Panoptic FPN

Architecture of Panoptic FPN Combining Instance and Semantic Segmentation.

Introduced by the pioneers of Panoptic segmentation, this deep learning framework aims to unify the tasks of instance and semantic segmentation at the architectural level, designing a single network for both tasks.

They use Mask-RCNN initially meant for instance segmentation and add a semantic segmentation branch to it. Each branch uses a Feature Pyramid Network backbone for feature extraction. The FPN extracts and scales up the features such that when encountered in different proportions the network may still detect them correctly.

Surprisingly, this simple baseline not only remains effective for instance segmentation but also yields a lightweight, well-performing method for semantic segmentation. Combining these two tasks the framework sets the foundation for Panoptic Segmentation architectures.


Mask2Former Architecture

Presented in 2022 the authors aim to tackle the problems of instance and semantic segmentation using a single framework thus effectively tackling panoptic segmentation, and advancing the state of the art for panoptic segmentation on various datasets.

The framework is called “Masked-attention Mask Transformer (Mask2Former),” and can address any image segmentation task (panoptic, instance, or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions.  

This framework also uses two main branches: a Pixel Decoder branch and A Transformer Decoder branch. The pixel decoder performs a task fairly similar to the FPN discussed above, i.e. to scale up extracted features to various proportions. The transformer decoder uses the various scales of features and the output of the transformer, and combines pixel decoders to predict the mask and class of various objects.

Panoptic Segmentation Datasets

COCO Panoptic

Annotations from the COCO panoptic dataset

The panoptic task uses all the annotated COCO images and includes the 80 thing categories from the detection task and a subset of the 91 stuff categories from the stuff task. This dataset is great for general object detection and you’ll often see it in the panoptic literature to fine-tune networks.


Some Annotations from ADE20k Dataset

The ADE20K semantic segmentation dataset contains more than 20K scene-centric images exhaustively annotated with pixel-level objects and object parts labels. There are a total of 150 semantic categories, including “stuff” like sky, road, grass, and discrete objects like person, car, and bed.


Some Annotations from the Mapillary Dataset

The Mapillary Dataset is a set of 25000 high-resolution images. The images belong to 124 semantic object categories and 100 instance categories. The dataset contains images from all over the globe covering 6 continents. The data is ideal for panoptic segmentation tasks in the autonomous vehicle industry.


Annotations from the Cityscapes dataset

A dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high-quality pixel-level annotations of 5000 frames, in addition to a larger set of 20 000 weakly annotated frames.

It contains polygonal annotations, combining semantic and instance segmentation with 30 unique classes with data collected from 50 cities.

Panoptic Segmentation in Short

Panoptic Segmentation is a highly effective method of segmenting images effectively, including both semantic and instance segmentation within the task. Although panoptic segmentation is a recent development, the research is fast-paced, and it is pushing the boundaries of object detection further.

Panoptic segmentation is extremely detail-rich due to the pixel-level class labels and can train powerful deep learning frameworks which we have discussed. However, the process of labeling data up to the very pixel level is a grueling one. 

At Ango AI we deliver high-quality densely annotated images. Whether you’re looking to deploy a panoptic detector for an autonomous vehicle, a medical imagery task, or other problem, we ensure that our experts label each image carefully up to pixel perfection, using Ango Hub, our state of the art labeling platform natively supporting panoptic labeling.

Book a demo with us to learn how we can help you solve your data labeling needs.

Author: Balaj Saleem
Technical Editor: Onur Aydın

“Since starting to work with Ango AI, being able to take advantage of their labeling service within a secure network allowed us to get better results, faster.”

Eren Bekin, Head of Innovation Unit at Anadolu Sigorta

Before meeting Ango AI, the leading Turkish insurance company Anadolu Sigorta was performing in-house labeling of its data using three different tools.

Anadolu Sigorta uses different data types to train their machine learning models: text, image, and audio, and for each, they needed to use separate tools, which did not integrate with one another or with the company’s existing infrastructure at large.

Since meeting Ango AI, Anadolu Sigorta has been using Ango Hub On-Prem to annotate all of their data, using a single, unified, cohesive platform which integrates seamlessly with the rest of their infrastructure, saving them in resources and dramatically increasing the quality of their labels.

The Customer

Anadolu Sigorta is one of the leading insurance companies in Turkey. It was the country’s first ever insurance institution. It numbers thousands of employees and millions of customers, both locally and abroad.

Anadolu Sigorta is one of the technology leaders in the Turkish insurance market. They have an eye for innovation, and they adopted AI and machine learning models as soon as it was feasible.

Thanks to their adoption of AI, Anadolu Sigorta can process a large number of documents and images coming from their agents on the field, with only minimal human supervision. This optimizes resource utilization and dramatically speeds up the processes of appraisal and data entry.

Before Ango AI

Machine learning models need to be trained before being used. In other words, if we want a model to recognize cars, we first need to feed it thousands of pictures in which the car is highlighted (e.g.: labeled.)

In Anadolu Sigorta’s case, they mostly use models to appraise damage and similar entities from pictures, to automatically file documents, and to process audio. For this reason, their in-house AI team spends a considerable amount of time labeling images, documents, and audio to train their models.

Lack of versatility

Before Ango AI, to fulfill their data annotation needs, Anadolu Sigorta had to use three different labeling platforms: one for text, one for audio, and one for images. This stems from the fact that most platforms in this category are specialized on a single data type, making it difficult for multimedia teams like Anadolu Sigorta to perform their labeling tasks cohesively.

The platforms did not communicate or integrate with one another, so it was impossible to get comprehensive information on the labeling tasks as a whole.

The platforms looked and worked differently from one another, making it so that an annotator would have to retrain themselves each time they switched. This led to reduced label quality, since the annotator’s muscle memory cannot be activated if they have to use radically different interfaces for every different media type. It also led to an inefficient use of time since the time it takes for an annotator to retrain is time which could have instead been used for labeling.

High Pricing

Since they were using three different platforms, from different vendors, they had to pay fees to each separately, incurring a high cost. The current “single-media” model of labeling platforms makes it so that multimedia teams are at a disadvantage, as they need to pay three times as much as a team using only one type of media.

After Ango AI

Anadolu Sigorta contacted Ango AI regarding their labeling needs in the first half of 2021. Once we understood their needs, we proposed they try our Ango Hub On-Premise solution, as it fit their needs perfectly.

Anadolu Sigorta has since installed Ango Hub On-Premise on their own servers and has been annotating data with it.

Since we do not collect any type of analytic data from our On-Premise installations for security reasons, we do not know for certain how much and for what they used our platform. They have, however, communicated to us that they labeled millions of data points, and that they intend to continue working with Ango Hub On-Premise for the foreseeable future.


Now, with Ango Hub On-Premise, the AI team at Anadolu Sigorta only has to use one tool to annotate all of their data. Some of the advantages of this are:


Since now Anadolu Sigorta only uses Ango Hub instead of three different platforms, they save considerably in resources, both time and money.


Before Ango AI, setting up and maintaining their three different products took considerable time and effort.

Installing the entire Ango Hub On-Premise product to Anadolu Sigorta’s servers took, in total, less than 30 minutes thanks to Ango Hub’s adoption of the latest state of the art in deployment infrastructure, explained in detail in our Ango Hub On-Premise attachment.

Plus, any updates released after the initial deployment took less than 15 minutes each to deploy and activate, resulting in virtually no downtime for Anadolu Sigorta.


Based on the feedback from our customer, we have been able to push updates which helped not only this customer but every other customer as well. We have created a direct line of communication from the customer to us, which allowed us to instantly troubleshoot any issue the customer might have had.

Before, this was not easily possible, as a change in their data labeling pipeline would have meant changing three different products.


Our data science and ML teams are working hard on delivering our customers an all-new AI assistance feature: Magic Prediction. We aim to release this feature in the upcoming weeks. In the meantime, we’d like to share with you the progress done so far, as well as explaining exactly what Magic Prediction does.

The rise of data-centric ML has shown that the development of ML models still has some issues that can only be solved effectively with supervised learning. As such, data annotation is extremely relevant, especially for most industrial AI projects. Despite this, data annotation is also a labor-intensive activity. This makes it so that in data annotation processes, even small improvements are extremely valuable. At Ango, constantly improving end-to-end data labeling processes is our north star.

To speed up the process of data labeling even further, we are pleased to introduce our new feature, which we call “Magic Prediction”. It is a simple yet effective technique, which we believe will speed up the data annotation pipeline drastically.

in a typical bounding-box annotation scenario, an annotator performs two tasks:

  1. Drawing a bounding box containing the object, and
  2. Selecting the class name among possible candidates.

With Magic Prediction, we are eliminating this second step, by classifying the object inside the drawn bounding box automatically. To make it happen, we are training image classifiers as labeled data accumulates. As the number of annotated objects increases, we are able to train better-performed image classifiers.

Here’s a sample illustration of class predictor:

Let us introduce our lovely Smoky. She is cute, and crazy at the same time;
and of course, she is a cat!


To prove and test the effectiveness of Magic Predictor, we conducted a number of experiments with a large variety of different images, classes, and labeling conditions. We are publishing the results here.


For the experiments, the COCO object detection and segmentation dataset is used. In the COCO dataset, there are a total of 80 classes, but for the sake of simplicity, only animal classes (sheep, bird, cow, horse, elephant, dog, giraffe, zebra cat, and bear) are selected. The distribution of the classes is shown in the figure below:

Class Distribution of COCO Dataset (Only Animal Classes)


To measure the performance of our classifier, we brought correctly and wrongly classified samples from the test set. From the figures below, if the animal is obvious and if there is no occlusion, it is correctly classified. However, if there is occlusion, or if the animal is far away from the camera, or if the lighting conditions are bad, the probability of the classifier making a mistake increases.

Correctly Classified Samples

Wrongly Classified Samples

The Effect of Training Size

It is good to know the minimum of how many data instances are needed for training and how frequently we should train our model. For these reasons, the model is trained with various sample sizes and its performance is measured on the same test data.

In the figure below, in order to see the effect of training size, accuracy vs. the number of training sample size figure is plotted. With the maximum sample size, an 88.17% accuracy score is obtained. As expected, as the sample size decreases, the accuracy also decreases. However, with only 250 annotated objects, the classifier reached 74.26% accuracy which is low, but still satisfactory. 83.02% accuracy is obtained when the sample size reaches 2172. Also, it is good to note that our classifiers are raw yet open to any improvement. Therefore, it is not wrong to say that this is the absolute minimum performance of the classifiers in this setup.

Sensitivity to Bounding Box Size

Until now, we used COCO bounding box annotations directly as an annotator input. In this section, we will discuss the effect of bounding box tightness on accuracy. In the figure below, the classifier was tested with various bounding box tightness levels. Obviously, with extremely tight and broad bounding boxes, the classifier begins to make mistakes. On the other hand, the classifier is tolerable to bounding box broadness at a certain level.

Sensitivity to Bounding Box Size

Object Detection vs. Magic Prediction

After seeing the magic prediction tool, you may think: rather than a magic prediction tool, can an object detection model be used directly?

In general, object detection models are more complex than classifiers, and it makes them more data-hungry, which means that you need more annotated data to reach a certain level. In addition to that, the runtime complexity of object detection models is higher.

What’s Next?

We are still working on getting highly accurate image classifiers to reach the best accuracy values. As a next property, we are planning to bring the ability to detect out-of-distribution cases. In addition to that, we are planning to combine the class predictor with our other AI assistance tools.

Author: Onur Aydın
Editor: Lorenzo Gravina
Technical Editor: Balaj Saleem

The Ango team prides itself in its incredible diversity. Our team has people coming from different countries, walks of life, and disciplines, each with unique interests and many stories to tell. We couldn’t just keep all these stories to ourselves. So welcome to People of Ango AI.

In this series of posts we will talk to the people who make up the Ango AI team, in conversations where they can share their stories, work, experiences, thoughts, and more.

Today we will be having a chat with our Machine Learning Engineer Balaj Saleem. Balaj was born and lives in Islamabad. He spent his childhood in many different cities in Pakistan, to then graduate from Bilkent University’s CS department in mid-2021. He has been working remotely with us ever since.

First of all, thank you for taking the time to have a chat with us. Could you please briefly introduce yourself to the readers, telling them a bit about your background and your work at Ango AI?

For quite some time I was interested in solving problems that impact society in general and specifically the tech industry. After finishing high school I felt like the best way to do it was to polish my self in the domains that I excelled and to then utilize those skills in the best way possible, for that I moved to Turkey from Pakistan, and in my last two years at Bilkent I developed a keen interest in AI and more specifically, machine learning. The way that changed the whole paradigm to be data centric and the idea of systems tuning themselves to make sense of data was truly fascinating. Around the same time, interestingly Ango AI was taking some of its first steps in the domain of AI and I thought it would be amazing to contribute here with the skills and enthusiasm that I had for the domain. Since then I have done a lot of work on systems that leverage machine learning to help make the tedious problem of data labeling faster, easier and more approachable. I have gotten to develop, learn about and contribute to some of the most cutting edge technology there in the field of Machine Learning and have had the pleasure to help teams globally meet their data needs in the best way possible.

At Ango, you’ve been spending time solving problems related to machine learning in general and data labeling specifically. What do you think is the state of data labeling right now in early 2022?

I think we’re at somewhat of a tipping point, the world of machine learning is starting to recognise the impact and importance of data, more importantly quality data. With numerous advocates and movements towards “Data-Centric AI” we have a spotlight on data labeling. ML teams all over the world are realizing that quality data is one of the most cumbersome yet important parts of any ML solution. 

Yet, traditionally,  we have a research orientation towards a “model-centric AI” thus the field of data labeling, academically and industrially is still in its infancy. But the work in the field is evolving at an exponential rate, and to be at the forefront of witnessing such an evolution is truly remarkable for me personally.

What are some of the biggest challenges you have worked on at Ango AI? And what do you think are the biggest challenges facing data annotation in general right now?

There are two fundamentals I have tried to focus on in my work at Ango AI: Quality and Efficiency. Many of the problems I have worked on have been closely related to solving these challenges of data annotation. 

I have worked on solutions using various technical frameworks to ensure that data is labeled fast and to a pixel perfect accuracy. Apart from this I have tackled quite a bit of research oriented challenges, focused on ideas ranging from predicting data labeling difficulty and duration to implementing deep learning algorithms straight into the browser as opposed to the server.

I think one of the key problems faced by the data annotation industry is that usually the problem of data annotation is treated as a sub-problem of any ML / AI product. The focus on the process of annotation is often minimal, even though according to some estimates it takes 80% of project time / effort. 

Machine Learning is a field that’s been growing considerably in the last few years, both in breadth and depth. What are your thoughts on the increasing role ML models are playing in our lives? What are their positive and potentially negative effects? What do we need to be careful about?

The overwhelming inflow of data from a myriad of sources in the past few years and the support of tech giants such as Google and Facebook to pursue Machine Learning is one of the key factors in this growth. 

This makes Machine Learning a lot more accessible and supported. This means that it doesn’t have to remain for the benefit of a select few in the Industry. As more and more organizations adopt ML and AI we have solutions that affect major parts of our lives, with the way our search engines work to the way agriculture is being approached. The possibilities are practically endless as long as the data is available.

However, one caveat certainly is the concentration of data with a handful of organizations. Fundamentally, this means that compared to the level of data collection the open source access to it is very limited, this certainly hinders not only the transparency of the way data is being used but also retards the development in the field of AI and ML. I think as individuals, a keen eye on how our data may be processed and more importantly what insights may be generated from them are key issues we need to think about.

Moving now to a more personal side of the conversation: you’ve been working remotely at Ango for a while now, from a city that’s two time zones away from Ango’s main hub. How has the experience of working remotely been for you? What are its pros and cons for you?

I think working remotely has been truly delightful. I spent about 4 years in Turkey and as amazing of a host country as it was, I missed Pakistan. Being here, surrounded by family and friends, whom I had been away from for quite some time certainly has helped me get in touch with a crucial part of what makes me me. 

A great thing is that the kind of work we do at Ango AI is pretty flexible. I get to work on my tasks at my own time discretion which certainly boosts productivity and promotes a work-family balance. It also helps me be more goal and task oriented and that allows me to make sure that I bring the best ideas and value to the company. 

There are rarely times when the time in Pakistan can be a bit late for an important scheduled meeting, but that’s really once in a blue moon. The only true con I would say is that I miss socializing and interacting with the amazing and dynamic team. In the time I was in Ankara, due to Covid I only got to meet the team once, and since then have worked mostly remotely. Everyone is extremely supportive and encouraging from aspects of life related to and beyond work, and so I would truly be glad to meet them again and spend some time in the office. 

Did you always envision a career in machine learning, or is it something you picked up along the way?

Machine learning was actually something I discovered quite later in my university and it all happened out of a simple project that we worked on together that worked to infer the emotions of a face given a picture of it, we managed to match the right emotion with the picture with nearly 70% accuracy and that truly opened my eyes to how impactful this work can be. After that I took more courses along the same lines, finally incorporating it into my final year project as well.

What do you enjoy tinkering with in your free time?

Not many people know this but I am actually really into making electronic music, it’s a great creative outlet for me. Although much of what I make goes either to friends or family or on my computer, there are some tracks that I put on soundcloud that go from chill lo-fi beats to more hip-hop / trap kind of stuff. Go ahead and find me on soundcloud if you want! 

Thank you again for your time today. Do you have any final words of advice for those who are looking to start a career in ML?

The pleasure is all mine, I would just like to say that keep being inquisitive and share your passion for the discipline with like-minded people. The domain of ML is still in the process of rapid evolution and make sure you keep yourself surrounded with people who can give you the best kind of guidance.

Artificial Intelligence and Machine Learning have seen a massive growth in the past few years, and have reached a market size of over 22 billion US dollars. While there are numerous domains AI interacts very closely with, Computer Vision is and has been one of the most significant ones, acting as a catalyst for this immense growth. And image annotation is at the core of computer vision tasks.

At its core, computer vision is a field that deals with teaching and enabling systems to extract meaningful information, patterns, and insights from digital visual inputs such as images and videos. Using these visual inputs, complex algorithms and techniques are employed to allow machines to “see” by deriving information from a mathematical representation of the world. Often, these systems then take important actions based on these inputs and insights derived from them.

As humans, we spend many years of our lives understanding the visual stimuli that the world around us provides us, and picking out the information that is important to us. Along the same lines, machines need this visual stimuli, and an extensive amount of it, in order to determine patterns and pick out the most appropriate information. Without the biological apparatus that humans have, their best source of visual stimuli is data, more specifically: annotated data. 

Often, computer vision systems need a large amount of image-like data that has been carefully labeled and processed. These annotated images, can have a variety of elements, such as:

A detailed discussion upon the topic of image annotation can be found in one of our prior blog posts

These elements allow the system to separate the pixels, classes and regions of importance within an image, effectively picking out the “signal from the noise”. Learning from these patterns the system overtime adapts to become better at recognizing these patterns from unknown instances and providing deeper and more accurate insights.

Important Use Cases

While image annotation can be employed for countless applications, there are certain areas which can derive a lot of value from such annotated data and computer vision systems. As a result, most of the work in present times is being devoted to these fields.

While each of the areas mentioned below can have an extensive article upon their sub-application areas, we will give a high level overview of the healthcare, transportation and agriculture industries, and their use cases for image annotation.


Image source

The healthcare industry relies heavily on visual detection of various conditions, diseases and ailments. With the help of computer vision / machine learning systems and quality annotated data they can derive a lot of value. Specifically, the following areas within the healthcare industry have seen rapid and effective input:

Image source


Image source

Transportation is a massive sector that has often been one of the first ones to accommodate technological progress and innovation. Many sub-sectors have already seen incredible research, ranging from autonomous driving to parking occupancy detection. Here are a few such areas that have seen concrete results by using AI systems relying on annotated images and videos:


Image source

Agriculture is another key area that has benefited immensely from the progress in computer vision systems and the availability of large scale agricultural datasets. Camera surveillance and massive fields produces a large amount of data, which can be annotated and processed by AI systems to provide critical insights such as disease and pest detection, crop and yield monitoring, and livestock health monitoring:


With countless opportunities that arise with the interaction of Artificial Intelligence and Computer vision it is evident that quality annotated data is key in any such application. As machines learn to “see” the world they need the right direction and mentorship and that comes through annotated data, once one has access to this data the possibilities are endless. 

The domains discussed above are simply a subset of a massive industry that relies heavily on annotated images. Machine Learning systems and the use cases for image annotation include manufacturing, mining, irrigation, construction, defect detection, workforce monitoring, product assembly, predictive maintenance, document classification and recognition, and much more.
Image annotation is one of Ango AI’s core offerings. Whether you need a platform to annotate images with your team, on the cloud or on-premises, or if you’re looking for a high-quality yet simple, fully-managed end-to-end data labeling solution, Ango AI provides it. Try our platform at hub.ango.ai, or contact us to learn more about how we can help you solve your data labeling needs.

When it comes to the world of AI, the word “learning” has a very specific meaning: it is the ability of a system to understand data. Active Learning is one such way for an AI to understand.

In the constantly evolving domain of Machine Learning, there are many learning approaches to cater to different use cases. There are two approaches, however, which are most commonly employed:

However there are many other types of learning that are less explored, such as reinforcement learning or semi-supervised learning. One such type of learning is Active Learning, an approach which is often not in the forefront of learning strategies but one that can be of immense use to many machine learning projects and tasks.

Fundamentally, Active Learning is an approach that aims to use the least amount of data to achieve the highest possible model performance. When following an Active Learning approach, the model chooses the data that it will learn the most from, and then trains on it. 

While traditional (passive) supervised machine learning only works by training the model in a single iteration on all training data. The process of Active Learning evolves in several iterations as follows:

  1. Choose initial training data (a small subset of all data)
  2. Train your model on the provided data.
  3. Check where in all the unlabeled data the model is most uncertain.
  4. Label this data using an Oracle (A human or machine that can provide accurate labels)
  5. Repeat steps 2-4 until all data is exhausted, acceptable model performance is achieved, or time / budget constraints are reached.
The Active Learning loop.

Types of Active Learning

Pool Based Active Learning

This is the most popular approach, commonly used when working on Active Learning projects.

The idea is that given a large pool of unlabeled data, the model is initially trained on a labeled subset of it. These training samples are then removed from the pool, and the remaining pool is queried for the most informative data repetitively. Each time data is fetched and labeled, it is removed from the pool and the model trains upon it. Slowly, the pool is exhausted as the model queries data, understanding the data distribution and structure better. This approach, however, is highly memory-consuming.

Stream Based Active Learning

The approach relies on moving through the dataset sample by sample. Each time a new sample is presented to the model, it is determined whether this sample needs to be queried for its label. However since not all of the data is available, the performance over time is often not at par with the pool based approach, as the samples that may be queried may not be optimal, providing the most information for our active learner.


The key to having a successful Active Learning model lies in selecting the most informative / useful samples of data for the model to train on. This process of “choosing” the data which would help a system learn the most is known as querying. The performance of an Active Learning model depends on the querying strategy.

There are many approaches to finding the most informative samples in the data, practically these can vary from case to case, however there are a few which can be adapted to many use cases:

Uncertainty Sampling

Used for many classification tasks, and also known as the 1 vs 2 uncertainty comparison, this approach compares the probabilities of the two most likely outcomes / classes for a given data point. The data points where this value is low are usually the most confusing ones for the model and hence would prove useful to be queried.

This Active Learning strategy is effective for selecting unlabeled items near the decision boundary. These items are the most likely to be wrongly predicted, and therefore, the most likely to get a label that moves the decision boundary.

Another measure that can be used for uncertainty sampling is entropy, which is a measure of “surprise” in a data instance. Points with high entropy are likely to be the most surprising / confusing to the model, therefore knowing the labels for these points would be beneficial for the model.

A theoretical comparison of Active Learning vs supervised learning model performance. (source)

Query by Committee

Query by Committee is a querying approach to selectively sample in which disagreement amongst an ensemble of models is used to select data for labeling.

In other words, an array (committee) of models which may differ in implementations is set up for the same task. As they train, they start to comprehend the structure of data. There are, however, points where the models in this committee are in high disagreement, (i.e. the classes / values assigned to the data point by different models is starkly different) hence these data points are chosen to be labeled by an oracle (usually a human) as they would provide the most information for the models.

Diversity Sampling

As the name suggests, this querying strategy is effective for selecting unlabeled items in different parts of the problem space. If the diversity is away from the decision boundary, however, these items are unlikely to be wrongly predicted, so they will not have a large effect on the model when a human gives them a label that is the same as the model predicted. This is often used in combination with Uncertainty Sampling to allow for a fair mix of queries which the model is both uncertain about and belong to different regions within the problem space.

Top right: One possible result from uncertainty sampling
 If all the uncertainty is in one part of the problem space, however, giving these items labels will not have a broad effect on the model.
Bottom left: One possible result of diversity sampling.
Bottom right: One possible result from combining uncertainty sampling and diversity sampling. Adapted from Human-in-the-loop Machine Learning by Robert Monarch.

Active Learning and Data Annotation

As can be observed from the fundamentals of the Active Learning approach, this method reduces the total amount of data needed for a model to perform well. This means that the time and cost that the data labeling process incurs is highly reduced as only a fraction of the dataset is labeled.

However, the tasks of data annotation and model training are often handled separately, and by different organizations. Hence the interaction of both the processes is a challenge that often becomes hard to tackle, owing to the confidentiality and privacy of the data and processes. 

Often, Active Learning is used in association with online or iterative learning during the process of data annotation, using Human in the Loop approaches. Active Learning then is responsible for fetching the most useful data and iterative learning, enhancing model performance as the process of annotation continues, and allowing a machine agent to assist humans.

A practical example of this would be using Active Learning for video annotation. In this task, consecutive frames are highly correlated and each second contains a high number (24-30 on average) of frames. Because of this, labeling each frame would be very time- and cost-intensive. It is thus more appropriate to select frames where the model is the most uncertain and label these frames, allowing for better performance with a much lower number of annotated frames. 

An intersection of Active Learning and Iterative Learning (Source)


Whether you are a data scientist working on projects that involve labeling vast amounts of data, or an organization that deals with a constant inflow of data that needs to be integrated into their AI system, labeling the right subset of this data for it to be fed to the model would inevitably cater to many of your needs, drastically reducing the time and cost needed to attain a well performing model.

More than 9 researchers out of 10 who have attempted some work involving Active Learning claim that their expectations were met either fully or partially (source). 

At Ango AI we work with Active Learning and many more such techniques to ensure that the speed and the quality of our labels is kept as high as possible, employing the latest research in AI assistance. Our focus on improving labeling efficiency via AI assistance has led us to pursue the intersection of Iterative learning and Active Learning and their applications for quality data annotation.

Data-related tasks consume nearly 80% of the time of AI projects. This makes them a key factor in the machine learning pipeline. Within these data-related tasks, data labeling in particular takes, on average, up to one fourth of the project’s time. Just like the stages that follow related to model development and hyper parameter tuning, the process of data labelling comes with challenges of its own, making it one of the most difficult, time consuming and expensive tasks if not handled in the right manner. 

It is often observed in the industry that the task of data labelling is tackled haphazardly by many organizations working to build an AI/ML pipeline, and is also underestimated in its complexity. This is a pitfall that causes inadequate results and may be a contributing factor to the reality that only 8% of firms engage in core practices that support widespread adoption. Most firms have run only ad hoc pilots or apply AI/ML in just a single business process as reported by Harvard Business Review.

So what exactly is it about data labeling that makes it such a challenging task? There are many facets to why exactly this is so, and this article will break them down in detail. 


Subject Matter Expertise

Subject Matter Expertise means the amount of domain knowledge or information a labeler has on the data that is being labeled. Fundamentally, data labeling is a task that employs human knowledge at its core, in order to prepare data for a model to train upon in the future.

Often this data is of a nature that can not be accurately labeled without expertise regarding the characteristics and the complexity of the data. This is the primary reason subject matter experts are required for many labeling tasks.

For instance, a task asking annotators to label images of tumors found in MRI scans would be very difficult to comprehend and label by someone who has no medical or radiological knowledge. This type of data would be best understood by an expert radiologist or a doctor. 

Consider another case where an organization may want to distinguish faulty architectural blueprints from robust ones. For this task a qualified architect would do the best job in the identification of such blueprints, an unqualified labeler would certainly make many mistakes in this complex decision making process.

The availability and inclusion of subject matter experts becomes the primary challenge of a data labeling task, as not only can these experts be expensive, but are often very hard to access for many organizations due to mutually exclusive domains of operation of the experts and the organization.


Subjectivity and Human Bias

Many machine learning tasks require data that is often subjective. There are sometimes no right or wrong answers; this makes the task inherently fuzzy and up to the labeler’s judgement. This induces human bias into the labels, as the labelers have to follow what seems like the best (or the most logical) answer to them.

More technically this concept is known as the induction of cognitive bias which can manifest itself in various ways, some these being:


An example of subjectivity may be given by one of the recent projects handled by Ango AI, which aimed at discerning which frames in a video were most interesting. The use case of labeling the videos was to then summarize them only including the most interesting frames. As one may observe, the importance or significance of a frame depends completely upon the labeler’s discretion. One closely related problem that is caused by this is low consistency, which is another challenge of data labeling.

Another use case can be identified to be scene analysis from a still image. Two labels might give starkly different labels to the same scene. For instance, even observing the image below, one may interpret the man holding the briefcase giving the cogwheel to the robot and the scientist interpreting the results, while others may see it as the scientist programming the robot to give the cogwheel to the man holding the briefcase. 

In the simplest terms this can be the manifestation of the phenomenon captured by the widely used proverbial phrase, “whether the glass is half full or half empty?” and that completely depends on who is observing.


Consistency, with regards to data labeling, is the level of agreement that exists for a label among different individuals (or machines) that labeled that specific item (or row) of data. This is specific to the case when multiple labelers are labeling a single piece of data. In general, high consistency is required for quality labeled data. However, maintaining consistency can be fairly challenging, partly due to the reasons of subjectivity and bias discussed above.

Beyond the aforementioned reasons, it is inherently human to make mistakes in tasks requiring judgement/discretion or logic and thus different labels for the same data item arise. This lowers consistency and demands consideration before the data can be delivered.

There are multiple ways to enhance consistency, but some of the most effective ones are the following:


With the growing adaptation of outsourcing or crowdsourcing data for labeling it is of utmost importance to ensure the safety, privacy and confidentiality of the data that is being labeled. Unauthorized access, deletion, and storage of data at an unauthorized location are often concerns that need to be addressed by the labeling entity.

Often, organizations choose to have the labeling services on-premise to tackle this problem and ensure that no third party can access the data. This is the most effective way to ensure privacy, however it comes with its own managerial and administrative overhead as managing labels on premises and putting quality assurance measures in place is an extensive process.

The ideal way to tackle this challenge is to ensure that the firm that labels the data complies closely with privacy regulations and processes the data lawfully, fairly and in a transparent manner, keeping all stakeholders informed. This removes the complex layer of workforce and project management, and allows the experts to label the finalized data. Some of the things to look out for within the process of ensuring data privacy are:



Data labeling, especially at large scale as is required today for many use cases, can be extremely challenging, with a variety of facets that need to be addressed. Without addressing these challenges the data may either be low quality (the pitfalls of which were discussed in our “Quality Assurance Techniques in Data Annotation” article) or will incur extra layers of complexity and financial overhead.

Often it is best to outsource this task to firms you can trust and those that deliver quality and speed and tackle all these challenges professionally. At Ango AI we provide such a service, ensuring that you get the highest quality, consistent and unbiased data labeled by a handpicked and highly talented team of experts subject to multiple cycles of review. Throughout the process we ensure transparent and effective communication providing initial samples of well labeled data, instructions and the ability for any labeler to report issues within data or the labeling process.

One of the most common idioms in the domains of Data Science, AI and Machine Learning is “Garbage in, garbage out”.

While a simple sentence on the surface, it elegantly captures one of the most pressing issues of the domains mentioned above. It implies that low quality data fed to any system will most certainly (unless by chance) give low quality predictions and results.

In all application areas of Artificial Intelligence, data is crucial. With it, models and frameworks can be trained to assist humans in a myriad of ways. These models, however, need high quality data, especially annotations which are reliable and are representative of the ground truth. With high quality data, the system learns by optimally tuning its own parameters, using the data to provide valuable insights.

In absence of such quality data, however, the tuning of these parameters is far from optimal, and in no way are the insights provided by such a system reliable. No matter how good the model, if the data provided is low quality, the time and resources poured into making a quality AI system will surely be wasted.

Since the quality of the data is so important, the first steps in the building of any machine learning system are crucial. The quality of the data is not only determined by their source, but equally as importantly by how the data is labeled, and by the quality of this labeling process. The quality of annotations for data is a key aspect of the data pipeline in machine learning. All the steps that follow depend highly upon this one.

Machine Learning Workflow

Data Quality

Data being prepared for any task is prone to error. The main reason for this is the human factor. As with any task involving humans, there are inherent biases and errors that need to be taken into account. The task of labeling any form of data, text, video or image may elicit a varied response from different respondents / labelers.  Due to the nature of many data annotation tasks, there is often no absolute answer, hence, an annotation process is required. The annotation process itself however is itself occasionally prone to error. There are two kinds of errors commonly encountered:

Data drift: Data drift occurs when the distribution of annotation labels, or features of the data, change slowly over time. Data drift can lead to increasing error rates for machine learning models or rule-based systems. There is no static data: an ongoing annotation review is necessary to adapt downstream models/solutions as data drift occurs. It is a slow and steady process that occurs over the course of annotation that may skew the data.

Anomalies: While data drift refers to slow changes in data, anomalies are step functions – sudden (and typically temporary) changes in data due to exogenous events. For example, in 2019-20, the COVID-19 pandemic led to anomalies in many naturally occurring data sets. It is important to have procedures in place to detect anomalies. When anomalies occur, it may be necessary to shift from automated solutions to human-based workflows. Compared to data drift these are considerably easy to detect and fix.

An anomaly. Credit

Quality Assurance Techniques

Various techniques can be employed to detect, fix and reduce the errors that occur in data annotation. These techniques ensure that the final deliverable data is of the highest possible quality, consistency and integrity. The following are some of those techniques:

Subsampling: A common statistical technique used to determine the distribution of data, this refers to randomly selecting and keenly observing a subset of the annotated data to check for possible errors. If the sample is random and representative of the data, this can give a good indication of where errors are prone to occur. 

Setting a “Gold Standard/Set”: A selection of well-labeled images that accurately represent what the perfect ground truth looks like is called a gold set. These image sets are used as mini testing sets for human annotators, either as part of an initial tutorial, or to be scattered across labeling tasks to make sure that an annotator’s performance is not deteriorating, either due to poor performance on their part, or to changing instructions. This sets a general benchmark for annotator effectiveness.

Annotator Consensus: This is a means of assigning a ground truth value to the data after taking inputs from all the annotators and using the most likely annotation. This relies on the well-known fact that collective decision-making outperforms individual decision making.

Using scientific methods to determine label consistency: Again inspired from statistical approaches, these methods involve using unique formulas to determine how different annotators perform. The formulas determine human label consistency using scientific methods such as Cronbach Alpha, Pairwise F1, Fleiss’ Kappa, and Krippendorff’s Alpha. Each of these allow for a holistic and generalizable measure to the quality, consistency and reliability of the labeled data.

 Fleiss’ Kappa 
(Where, pois the relative observed agreement among raters and
pe is the hypothetical probability of chance agreement.)

Annotator Levels: This approach relies on ranking annotators and assigning them to levels based on their labeling accuracy (can be tested via the gold standard discussed above) and give higher weight to the annotation of quality annotators. This is especially useful for tasks that have high variance in their annotations, or tasks that require a certain level of expertise. This is because the annotators that lack this expertise will have a lower weight given to their annotations, whereas annotators with expertise will have more influence on the final label given to the data.

Edge case management and review: Mark edge cases for review by experts. Determining an “edge case” can be done either by thresholding the inter-rater metrics listed above, or flagging by individual annotators or reviewers. This allows for the data that is most problematic to be reviewed and corrected as most of the anomalies occur in edge cases.

Automated (Deep learning based) Quality Assurance

The task of data annotation is very human in nature, as researchers and organizations are often looking specifically for human input. This makes it so that the quality of the labels is dependent on human judgement. There are certain approaches, however, that exploit the principles of deep learning to make this process easier, primarily by identifying data that may be prone to errors, thus picking out data that should be reviewed by humans, ultimately ensuring higher quality.

Without delving too deep, fundamentally, this approach relies on actively training a deep learning framework on the data as it is being annotated, and then using this neural network to predict the labels / annotations on the upcoming unlabeled data.

If an adequate framework is selected and then trained on data with high quality labels, (e.g the gold set mentioned above) the predictions will have little to no difficulty in classifying or labeling the common cases. In cases where the labeling is difficult, i.e. an edge case, the framework will have high uncertainty (or low confidence) in the prediction.

Interestingly, it just so happens that often when a robust model has low confidence on a label, a human will also display the same trait.


Whether you are in the tech industry or working on cutting edge research, having high quality data is of utmost importance. Regardless of whether your task is statistical or related to AI, having an early focus on the quality of data will pay in the long run.

At Ango AI, using a combination of techniques mentioned above, we ensure that we only ship the highest quality labels to our customers. Whether it’s by employing complex statistical methods to keep quality high, or cutting-edge deep learning frameworks to keep speed high and assist human annotators in review, we keep quality at its highest standards subjecting it to numerous checks before it’s finally delivered to you.