Balaj Saleem
"What is Medical Data Labeling?" Header

Medicine and Healthcare have long been on the forefront of scientific and technical innovation. According to a recent survey, global healthcare spending stands at nearly $10 Trillion. A considerable fraction of this investment goes into integrating technology into the healthcare industry. One of the most promising avenues at the confluence of healthcare and technology is AI in healthcare. And in training AI models, medical data labeling is essential.

Data flows into the healthcare industry from a myriad of sources, including medical imaging devices, diagnosis documents, visual observations, and health data collection applications. This can exist in visual (image-like) or textual form and can serve clinical, research or administrative purposes. One characteristic of this raw data however is that it lacks structure and labels.

Why Label Medical Data

Raw medical data means little in the world of AI, at least in its present state. 9 out 10 commercial initiatives follow supervised learning approaches, which need well labeled and structured data. With the advent of deep learning this reliance on data, and more importantly on large quantities of quality data is critical.

To deal with this lack of structure and labels, qualified medical professionals need to label the data on a data labeling platform. Models, then, will be able to use this data for training. 

Having these ground truth labels for the medical data is thus absolutely essential for their AI applications. Hospitals, universities, and private research institutes are investing time and effort to ingest this labeled medical data and have state of the art models assist the healthcare industry at scale.

How Labeled Medical Data is Used


At the forefront of this AI adoption are researchers who use data in order to further the boundaries of AI innovation and also to set the foundation for commercial and practical usage of the technology.

Since AI is still in its early stages in terms of its maturity for practical / clinical applications, medical researchers are absolutely critical in ensuring steady and well-directed progress of the discipline. We’ll briefly discuss a few examples of such incredible research work:

Following are some examples from Stanford AIMI lab of what AI is able to achieve with labeled medical data.

Detection of child age from an X-Ray of the wrist, overlapped with a heatmap of what the algorithm deems to be of most importance. Potential to save valuable time for pediatricians.
Detection of 10 different diseases based on chest X-Rays, with performance comparable if not better than a panel of radiologists detecting the same diseases.
Digitally (without actual injection) adding contrastive agents to radiology images for better delineation of areas of interests. Making the contrasting process completely non-invasive.

Clinical Medical Data Labeling

Beyond research. medical professionals use AI in a limited but effective way for prevention, diagnosis and treatment of conditions. Deep Learning models trained on hundreds of thousands of medical images tend to perform comparably in certain specific scenarios to medical professionals. Although they may not completely replace these professionals at the current stage they can certainly assist the process. 

Most commercial / clinical applications are in the field of radiology. After a radiologist completes a scan, we send the image to a machine learning / deep learning model, which in turn presents its predictions to the radiologist, utilizing this extra layer of information until they make a more comprehensive diagnosis.

Anomaly detection, which is another key area in machine learning, is being actively integrated into various applications and monitoring systems that collect patient data. The goal being to detect any abnormal behavior or identify individuals who may be at risk.

Source: Harvard Medical School

Types of Medical Data and Labels

Due to the vastness of the medical domain, there are numerous ways we can collect and store data. The following three types, however, occur most frequently:


Fundamentally this data type stores multiple (slices) of medical image information in a single comprehensive volume. Modalities such as CT, MRI, PET scans often store their data in the form of volumes of multiple slices. These volumes can then be projected and labeled in a 3D view or from various directions (sagittal, axial and coronal views). These volumes are correlated spatially and thus store more information both for the diagnosis and model training. 

One can label volumes in the following ways based on the use case:

  1. Classification (Indicating presence or absence of a certain attribute in a volume or slice)
  2. Bounding Boxes (Localizing Class and Region of Interest)
  3. Segmentation (Pixel level localization of a class)
Image of four different views, showing medical data labeling on an abdomen CT scan.
Abdomen CT Segmentation (Source)


Chest X-Ray
Retina Fundus Image

As the name implies these are basically 2D images that come in RGB or black and white formats. These include X Rays, Retina Fundus Imaging and Microscopy. These modalities often come independent of relative spatial information.

Medical data labeling done on our platform Ango Hub, over an image of an abdomen CT scan slice.
Abdomen Slice Annotation using Ango Hub 

One can label images in the following ways based on the use case:

  1. Classification
  2. Bounding Boxes 
  3. Segmentation

Documents / Text

While the the previous two types of medical data mainly help in solving computer vision problems in the medical domain, textual data is required for the domain of Natural Language Processing (NLP). There are numerous documents and textual artifacts produced by medical institutes that range from structured signals to non structured form entries. In order to make sense of this textual data, we can label it using the following methods:

  1. Classification (of a data sample)
  2. Named Entity Recognition (Identification and localization of elements of interest within a text)
  3. Bounding box (Localization and classification of certain passages or sections within a document)
Medical data labeling done on our platform Ango Hub, with bounding boxes over a birth report PDF.

Document Labeling using Ango Hub

The challenges of Medical Data Labeling


In the medical domain the data collected is extremely personal and thus subject to strong privacy regulations. Thus one of the key factors when it comes to using a streamlined cloud platform for data labeling or outsourcing the whole labeling process is ensuring that data is handled with strong privacy and security regulations. 

The way we address this at Ango AI is baking the medical anonymizer service directly into the platform. This way whenever data is uploaded it goes through a layer of anonymity ensuring that all patient / institute specific details are removed before a labeler sees the data.


One of the key challenges of medical data labeling is the requirement of domain expertise to label data. Since medical data is fairly convoluted an untrained labeler often struggles with annotating it in the right manner. This is where the experience and qualifications of radiologists and radiographers come into play. However such annotators are not only much harder to acquire but due to the level of expertise cost considerably higher per hour than normal annotators.

At Ango we ensure a rigorous recruitment process to select capable and experienced medical professionals from the fields of radiology and pathology to deliver the most accurate possible set of labels.


Unlike traditional image formats, medical imaging comes in formats that are much more robust and suited to the needs of medical systems and professionals. The most popular among them are:

This , however, makes these formats comparatively more convoluted and compatibility of data over different platforms is often an issue. 

At Ango we ensure that the most popular formats are well supported and often they would be directly imported into Ango Hub. For the formats we do still support, converting is easy.


Medical data labeling is a key factor in producing quality models for AI initiatives in the medical industry. The process of employing AI in healthcare is highly impactful both for research and clinical use. 

At Ango AI we provide the necessary tools and a fully managed service to meet all your medical data labeling needs. Through the process we abstract out all the unnecessary details and deal with the challenges that medical labeling entails, ensuring a streamlined experience on our platform and the highest quality labels for your project. 

There are billions of images shared every day on the internet, with each capturing a potentially useful part of the world around us. Images are a rich source of information, and we can use them to train machines to understand our world. However, unlabeled or unstructured data is of very little use in this learning process. Thus, these images need to be labeled or annotated. But what’s the best image annotation tool to do so?

Although image labeling can be done by very simple tools that allow the user to draw over or manipulate the image, as workflows get more and more complex and scale to gigantic levels, using an adequate tool that caters to this complexity becomes increasingly important.

The Comparison Criteria

There are many premium tools on the market that cater to the need of image labeling, providing platforms with features such as ontology / schema management, labeling interface, project management, image import and annotation export support etc. 

However, there are various important features that set one platform apart from the other. In this article we will be focusing on the following general features (each having sub-features) that distinguish the different image labeling platforms available on the market. We use these to determine which is the best image annotation tool for you:

Data Support

How well does the platform cater to non-conventional data types?  Nearly all platforms support simple JPGs and PNGs, however support for multi page images, documents (pdf) and medical data that comes in convoluted formats definitely sets one competitor apart from the other.

AI Assistance

How well does the platform incorporate various AI assistance features, to make the process of data annotation faster and more efficient? AI Assistance features can include things such as segmenting all objects in an image automatically (Autodetect), segmenting any possible object based on bounding box input (Frame Cut), extracting all readable text from an image (OCR) and following the contour of any possible complex object closely (Smart Scissors). You can see these tools in action here.

Labeling Tools

How well does the tool encapsulate all possible methods of labeling an object within an image? Nearly all platforms contain tools such as bounding boxes and polygons, however certain use-cases require more specialized tools such as segmentation tool (for instance and panoptic segmentation) and rotated bounding box, for example. This metric compares the competitors on the availability of such tools.

Developer Tools

How well does the platform support programmatic functions such as import, export, task assignment through robust APIs and Python SDKs? Since many machine learning teams incorporate the tool in their current ML/AI pipelines, this feature is crucial.


How well does the platform tackle the various aspects of the data labeling process itself? Things such as measuring annotation quality by benchmarks and consensus, complying to secure data protocols, and providing a fully managed service where-in the platform providers (the data labeling company) takes ownership of data to be labeled and delivers the final dataset with quality guarantees.

Feature Comparison Table

We’ve prepared a table to allow for an easy comparison between these tools so that you can factually compare various tools on their comprehensiveness in tackling image labeling. Hopefully by the end you’ll have an idea on what is the best image annotation tool for your own needs.

FeaturesAngo AILabelboxV7 LabsSuperviselySuperAnnotateScale AIHasty AIRedbrick AI
Data Support
Medical (DICOM) Support
Document (PDF) Support
Multi-page Image Support
AI Assistance Tools
Auto Detect
Frame Cut
Smart Scissors
Labeling Tools
Segmentation Tool
Polygon Tool
Rotated Bounding Box
Nested Classifications
Developer Tools
API Support
Python SDK Support
Fully Managed (In-house Annotators Available)
Quality Metric Guarantees
HIPAA Compliance
GDPR Compliance

Ango Hub

While Ango Hub does much more than image labeling, (comprising text, video and audio labeling), it is easy to observe that even in image labeling it outperforms the others by a wide margin, owing to strengths in all the general features mentioned above. We might be biased, but we think Ango Hub has the potential to be the best image annotation tool for you. Here’s why the platform stands out:

Pricing: Free for projects up to 10k annotations, along with Cloud and On-Prem pricing. Feel free to contact sales for more information on pricing.


The platform was launched in 2018 and is a strong player in the image annotation industry, it extends its support for datatypes beyond images, however within images lacks support for certain types. It allows a good suite of image annotation tools with polygons, bounding boxes, brush and nested classifications for annotations. 

There is however a complete lack of in-built AI assistance tools allowing for various other competitors to perform better than Labelbox in this domain.

There is ample support for developer tools via an adequate API and python SDK, along with operational support allowing the user to check quality via benchmark and consensus and have the data fully labeled by the labelbox team.  Some other interesting features include:

Price: 5000 images can be labeled for free, along with availability of Pro and Enterprise plans.

V7 Labs

A popular platform for image and video labeling, just like Labelbox, the company was founded in 2018 and has since focused heavily specifically on image labeling. While the platform is only meant to tackle image labeling, it allows for certain ML training and deployment capabilities also built in. It has a strong AI assistance tool and Data management / exploring capabilities. The platform does not provide a fully managed labeling service or quality guarantees on the dataset, however with support for various data formats and AI assistance capabilities makes up for them. Key features include:

Price: 14-day trial along with credit-based plans. Credits can be used for model training and use of AI assistance features.


Supervisely is a platform for computer vision tasks, for image and videos. Its focus is data annotation however there are various other features of the platform that a data science team may be interested in. A key feature of the platform is that it allows for a high level of extendability through the support of plugins and applications. The fundamental features that make Supervisely stand out are:

Price: 100 images can be labeled for free, along with availability of Business and Enterprise plans.


A very simplified annotating platform allowing for a great variety of functionality specifically for images, handling complete model generation and training along with an adequate model zoo give this platform an edge. However the annotation experience is itself very basic since no extensive assistance is provided.

Divides the annotation process into various tasks i.e. vector annotations – including boxes, polygons, lines, ellipses, key points, and cuboids – and pixel annotations which allow for segmentation using a brush tool. Important features of this platform include:

Pricing: Limited number of free images, along with custom Pro and Enterprise plans

Scale AI

One of the leading platforms for data annotation and dataset preparation. A list of impactful clients use this platform. It adopts a generalized approach (tackling nearly all data types (Encompassing sensor, image, video, text and document data). Has very high operational support as the platform is fully managed. It incorporates a powerful data explorer and supports various formats. However apart from auto detect it provides nearly no AI assistance features. The strengths of this platform are:

Pricing: 2 cents per image, and 6 cents per annotation

Hasty AI

A Germany based annotation platform similar in nature to V7 labs, tackling only image annotation, with a strong focus on AI assistance. The platform uses the idea of “using AI to train AI” and incorporates various tools to accomplish this. Key features of this platform include:

Pricing: Credit Based, unlimited images. Credits can be used for ML model training and error detection. 30 free credits are provided which can be purchased based on user requirements. 1 Credit = 1 Euro

Redbrick AI

A well rounded image / video labeling platform with a focus towards medical data annotation. Supports common image and video formats along with Dicom and Nifti for medical data. The platform does not have inbuilt AI assistance tools, however it does incorporate the element of AI assistance through active learning, which is similar in nature to Hasty AI. Key features of this platform are:

Pricing: 219$ / month for the first 10k images. After that it costs 1 cent per image

So what’s the best image annotation tool in 2022?

Well, with the plethora of great tools on the market for image labeling it is definitely a tough choice to choose the best image annotation tool for your needs.

If you’re looking to label your image dataset in the best way possible, with a tool that outperforms the competitors in catering to all your labeling needs in terms of efficiency, quality, and the sheer level of cross domain support, don’t hesitate to reach out to us at Ango AI and we’d be delighted to show you how we can assist and partner with you in the journey to produce an incredible dataset.

Data scientists agree: the quality of the training data is a key determinant of the quality of the final model. However, high quality annotated data is usually slow to come by, as humans need to be involved in the process of creating it. This process can be sped up, however, with the help of AI. This is what we call “AI-in-the-Loop” in data annotation.

There are countless domains where AI systems have made a significant and beneficial impact, however there’s one common denominator for all these systems, data. From autonomous vehicles to disease detection systems, models need high quality labeled data to train on. 

While there is a massive amount of unstructured, unlabeled data available and procurable, annotated / label data is relatively much harder to obtain. The key reason behind this is the cumbersome process of data annotation, which takes an investment of time, effort and resources. 

Whether one is dealing with images, videos, text, audio or any other form of data, the task of filtering out useful information whether it be by drawing bounding boxes or selecting parts of text is very human in nature. While this human element is absolutely essential, unfortunately it is – relative to machines – also slow. 

How slow one may ask? That varies, analyzing some of our recent projects at Ango AI we have per label times ranging from 3 secs (simple classification) to 80 seconds (complex polygons for images). But one question that can certainly be pursued is: How can we make this faster? Or more comprehensively: 

How can we make the process of data annotation faster while retaining human level accuracy?

This is where AI-in-the-loop comes in.

AI-Assisted annotation

At first glance it may seem counterintuitive and perhaps counter productive to have AI-in-the-loop when annotating data that is then going to be used to train AI systems. However, the core intuition idea is that, such systems have the capability perform phenomenally well at certain tasks such as object detection, named entity recognition e.t.c and most often the same systems that will utilize this data for training can be plugged in earlier in the annotation loop to make use of their capabilities.

During the annotation stage the AI-in-the-Loop system’s primary job is to help the human labeler by assisting in labeling aiming to make the process more efficient and accurate. Thus at this stage unlike the production / deployment stage the model does not need to perform at the best possible level. Instead of metrics such as accuracy or precision, efficacy (how useful the model is to the labeler) is a much more important metric to judge the model’s performance during this stage. A lot of modern research and experiments have shown the benefits of using such systems during the data annotation stage. 

The benefits reflect in the quality and the efficiency of annotation:

Quality is impacted positively because AI-in-the-loop predictions are often close to if not exactly the ground truth (the actual label). This may be analogous to two labelers working on the same data, a human and an AI system. Although AI may not match human performance or accuracy it certainly does provide a layer of assistance. The cooperation between the machine and human allows for a certain level of delegation to the machine, which allows the human counterpart to focus more on reviewing the annotations, and correcting if necessary.

Efficiency takes a positive turn as generally the AI prediction reduces the number of interactions the labeler has to make with the datum (image, text e.t.c). For instance if it is an image and the task is of segmentation, then the number of clicks the user has to do to draw a polygon would be highly reduced once the prediction of AI is taken into account. Due to this reduction in human interaction, the overall time needed per sample, and consequently for the dataset as a whole, is reduced.

There is however one important point to mention: like any solution the design, implementation of AI-in-the-loop and the user experience are very important for the usefulness of AI assistance solutions. There certainly is something along the lines of “ideal level of AI assistance” as pointed out by this paper

Such results have been observed while designing, testing and refining our AI-in-the-loop tools at Ango, where-in AI assistance tools may not be beneficial if the work of using them, interacting with them and correcting their predictions acts as an overhead rather than an aid in the process of data annotation. For instance there certainly have been cases, where in longer time was spent in correcting an image mask generated by the AI, than to label it individually. This phenomenon can be observed in this experiment

AI Assistance techniques

The domain of AI-in-the-loop for in data annotation is certainly a novel one yet it is exponentially evolving, thus apart from a few sources there are no set list of methods that can be used to achieve AI assistance. However a few that we often engage with and actively research at Ango will be discussed here.

Pre-trained Models

Pre-trained models are basically machine learning / deep learning models that have learnt their parameters through training on a specific dataset. These pre-trained models can be then used to provide predictions on the data that is being labeled.

To understand this concretely, take the case object detection and segmentation for image using the COCO dataset and the MaskRCNN model. When trained on the COCO dataset the model performs very well at detecting and segmenting 80 classes (person, car, traffic sign, e.t.c). Thus once given an image that contains any of the classes the model has been trained on, it will ideally provide a bounding box and mask capturing the object with fairly high accuracy. 

For many use cases teams are looking to label these common categories (car, pedestrian) with certain differences only, such as different class names, additional class attributes e.t.c. For such use cases predictions provided by Pretrained AI model (mask rcnn model trained on the coco dataset) can prove especially useful, as this takes burden of capturing various objects in bounding boxes and polygons away from the human labeler, and leaves the task of reviewing and adding additional attributes only.

Transfer Learning

This technique further evolves the idea of using pretrained models a bit further, by using additional  – custom and domain specific, yet similar – data to train a pretrained model further. Through the idea of transfer learning the model adjusts to accommodate the new information and fit itself to it, thus providing useful predictions on domain specific data.

For instance a team may want to train a model to detect different types of tropical fruits and thus needs 2K images labeled. Carrying on the previous example of using COCO and MaskRCNN, while the data containing these fruits items may not nearly be as much as the 80K COCO images, MaskRCNN can still be tuned to provide valuable predictions on the fruit images and help in the process of labeling them. 

The process of transfer Learning (Source)

This is done through the concept of transfer learning; fundamentally, the model had learned to recognize/extract a lot of features from images using the initial training on the COCO dataset, using that information and some more specific training using a few labeled fruit images, the model fits to recognize this new information with fair accuracy. Thus transfer learning provides an added layer of generalizability to pre-trained models allowing an expansion of the domain they can provide valuable predictions in.

AI-in-the-Loop Case Study

One of the experiments related to transfer learning we successfully conducted at Ango was related to a vehicle detection project, the project entailed labeling various parts of a vehicle. We trained a model using a small subset of labeled domain specific (vehicle parts) data. After the performance was deemed to be satisfactory, the model was given 500 images containing 4528 bounding boxes to give it’s prediction for, these predictions were then passed to a labeler and the following results were observed:

Action by Human on AI LabelsTotal NumberRatio
Not changed140330.98%
Only location changes228850.53%
Only class changes2455.41%
Class and location changes2635.80%
New bounding box is created3297.26%
Total bounding boxes 4528
Deleted bounding boxes from AI labels995

Conclusively based on labeler reviews the AI predicted labels did add a layer of assistance and the task shifted from that of creation to a hybrid one, where the labeler reviewed and simply corrected or deleted AI labels. More than 90% of AI suggestions were  somehow utilized by the labelers in order to assist the labeling, whereas 30% of the labels were unchanged, effectively suggesting that at least 30% of workload was  directly reduced based on these predictions.

This improvement can be reflected in the comparison of label duration (time it took to label) of assets where AI labels were present versus where they were not.  As observed below the distribution of labeling times considerably shifted after AI assistance was applied. The pre-assistance (before AI labels helped) mean labeling duration per asset was about 16 minutes, whereas after this assistance was applied this was reduced to about 10 minutes. 

Iterative and Active Learning

Adding a further layer of generalization to the previous techniques we have iterative learning. The idea is very simple: repeatedly train the AI-in-the-loop model on incoming data as more and more data becomes available. This means that initially the model is trained on a small subset of the labeled data, the model starts providing inference on the remaining data as it is labeled. As more and more data is labeled the model is trained repeatedly at regular intervals in order to fit better to the data, thus the quality of predictions improves over time.

Moving on with our prior example of fruit dataset and MaskRCNN, the process of iteration after every 200 images here would look something like this:

  1. Train the initial model (transfer learning via COCO pretraining) on 200 images
  2. Use the model predictions on the rest of the data as it is labeled.
  3. Retrain the model once additional 200 images are labeled.
  4. Repeat steps 2-4 until the dataset is fully labeled.
An intersection of Active Learning and Iterative Learning (Source)

Using this approach the model adapts better and better to the underlying specific data distribution overtime causing predictions to be more accurate and thus helping the human labeler in the process of annotation more accurately over time.

Active learning in the context of data annotation simply answers the following question. 

Which data samples should be labeled first to increase the model performance the most?

The way this problem is addressed is by choosing the most uncertain samples i.e. the samples the model is most unsure of in it’s predictions, the key point here is that labeling these uncertain samples through the human labeler and training the model on these samples first would cause the fastest increase in it’s performance, thus making the model more helpful in the process of annotation. If you’re interested in learning more about active learning, please check out this article.

Reinforcement learning

Reinforcement learning is one of the most captivating domains for adoption of AI-in-the-loop in labeling. The domain is still in the process of evolution and thus academically the interest is in its fledgling stage at the moment. For the process of data annotation this paradigm is the closest to the student-teacher relationship that we want to adopt for the AI assistant (student) as it labels data alongside a human (teacher) annotator (this is not to be confused by the teacher-student training methodology for CNNs). 

Simply put, reinforcement learning allows an agent (our AI Assistant) to perform actions within an environment (data samples) and based on the outcome of these actions (annotation by AI Assistant) compared to the expected action (annotation by Human) is rewarded or penalized. Over time the agent aims to maximize the reward it earns for its actions, and thus improves performance.

This behavior once applied to the problem of data annotation fits very well, as unlike other techniques the agent does not directly need data or pre-training but rather an environment, which is presented in the form of unlabeled data. 

Applied to the example of the Fruit Dataset we have an interaction that would ideally look like this:

  1. A human annotator draws polygons for the first set of images (200 for instance). The agent take’s a set of actions however outputs are not reflected on the platform for this set.
  2. Based on the initial set and rewards to the agent, the agent is ideally expected to perform better on the remaining images, and thus the outputs of the agent are reflected onto the platform.
  3. The action’s of the agent are refined as more and more data is labeled and the label moves to reviewing agent’s actions rather than ignoring them.
  4. The process continues until the dataset is exhausted.

Although with the approach mentioned above there needs to be abundant testing, however if such an approach can be mimicked, the avenue of deep learning can certainly be of immense benefit to the process of data annotation.

Further Reading

Transfer Learning:

Reinforcement Learning:

Active Learning:

Written by Balaj Saleem, reviewed by Onur Aydın

When people talk about annotating data, there’s one topic that seems to always be the elephant in the room: image annotation.

The domain of image annotation is as vast and old as data science itself. Indeed, one of the first works ever done in the field of AI was to interpret and annotate line drawings. In recent times, however, the focus has evolved substantially. This evolution has been part and parcel of the advent of Big Data and various real-world application areas for computer vision, such as self-driving cars, facial recognition, augmented reality, and surveillance.

Teaching a computer how to see is no easy task. The machine learning model needs to train on images already annotated correctly, so that it may then recognize them on its own and provide meaningful and accurate results and predictions. Image annotation, then, provides an extra burden: the AI team needs to find or produce thousands, if not tens of thousands of correctly annotated images in order to train the model. This is before the model can even be useful.

Models can provide various kinds of outputs. For instance, predicting whether or not an object is present inside an image, creating a rectangular box around an object (commonly called bounding box), or even creating a mask to cover the object itself with pixel-perfect accuracy. Each of these different outputs requires a similar kind of prepared, annotated data to be provided to the model, such that it can learn to do it on its own with accuracy.

We will explore different ways one can prepare this training data. We will do so by going over the most common types of image annotation.

Types of Labeling Tasks

We will delve deep into image annotation. Before that though, let’s go through the various tasks that trained image processing systems can perform.


This type of task usually checks whether a given property exists in an image, or whether a condition is satisfied. In other words, this means classifying an image within a set of predetermined categories based on the contents of the image. Usually, classification is posed as an answer to a question. Such a question may be, for example: “Does the image contain a bird?”

Object Detection

This takes classification one step further by including not only the presence but also the position of the object. Primarily, this finds instance(s) of the object within an image. Detection is primarily a way of getting indicators towards the coordinates of the object within the image. Building up from the previous question of classification, this asks, for example: “Where is the bird in the image?”

Fig. 1. Left, Semantic Segmentation. Right, Instance Segmentation. Source.

Image Segmentation

Put simply, when doing segmentation, the machine learning model breaks the image down into smaller components.

There are two main ways a model can segment an image. In the first, the model assigns a label to a specific “entity” such as a person, a car, or a boat, which has delineated boundaries and is countable. In the second, it labels “areas,” which are not countable and may not have rigid boundaries, such as sky, water, land, or groups of people.

What is commonly called Instance Segmentation is the task of identifying the “entities,” with every pixel that belongs to them, such that the segment captures their shape. Here, one may choose to separate each instance.

On the other hand, Semantic Segmentation requires each pixel of the image to be labeled, such that it not only includes the “entities” but also the “areas”. Most importantly, it does not differentiate between different occurrences of the same object.

Fig. 1. Left, Semantic Segmentation. Right, Instance Segmentation. Source.

Types of Image Annotation

Bounding Boxes

As of right now, this is by far the most common approach to image labeling, as it is the one that most often fulfills the requirements of models processing images. A bounding box is a rectangular area containing an object of interest. The boxes define the location of this object of interest, and a constraint to its size as well.

Each bounding box is a set of coordinates that delineates the starting positions and the ending positions of the object, in all directions. Under the hood, there are two main ways to format such annotations: one uses two pairs of points (x, y) to represent the top right and the bottom-left position of the rectangle. These first two points allow us to extrapolate the other two. The other format only uses one point (x, y) to represent the top right corner of the object, while another tuple (w, h) represents the width and the height of the bounding box.

When do you want to use bounding boxes?
When the primary purpose of your model/system is to detect or localize an object of interest, the range of uses of object detection can range from tasks such as activity recognition, face detection, face recognition, video object co-segmentation, or any similar task.

Polygonal Segmentation

The drawback of bounding boxes is that they cannot fully delineate the shape of the object, only its general position. Polygonal segmentation addresses this problem. The approach relies on drawing a series of points around the object and connecting them to form a polygon around the object. This, although not pixel-perfect in annotation performed by humans, provides adequate data regarding the shape, size, and location of the object.

The polygons are stored in various formats, for example as a list containing a set of points corresponding to the vertices of the polygon. Commonly, this is presented as a list of lists, or using a consecutive ordering of (x, y) points.

When do you want to use polygonal segmentation?
When the system being built is not only to detect or localize the position of an object of interest but also its shape and size. This implies that polygonal segmentation is the way to go for most segmentation tasks.

Fig. 1. Left, Semantic Segmentation. Right, Instance Segmentation. Source.

How can Ango AI help?

Ango AI provides an end-to-end, fully managed data labeling service for AI teams, including image annotation. With our ever-growing team of labelers and our in-house labeling platform, we provide efficient and convenient labeling for your raw data.

Our labeling software allows our annotators to label images with both bounding boxes and polygons in a fast and efficient way. After labeling, our platform also allows for reviewers to verify that our labelers’ work is satisfactory and meets and exceeds our high quality requirements.

Once done, we export the annotations in various formats such as COCO or YOLO, among others, depending on the project.

To bring labeling speed to the next level, these tools will soon be supplemented by smart annotation techniques using AI assistance, drastically reducing the time of such tasks, from minutes to a matter of seconds.