A short while ago, our team of data annotators labeled nearly 25 thousand images of faces, classifying them by age, gender, hair color, beard and mustache color (if present), and glasses. We then released the annotated dataset free to the public. You can download the annotated face classification dataset here.
The dataset consists of 23032 face images. Each image was labeled by two independent annotators. The assets for our Face Classification Dataset were taken from the open Flickr-Faces-HQ (FFHQ) dataset. We analyzed the annotations created by our team, and we show our findings in this article.
The first graph illustrates the gender distribution over the dataset. Our annotators found 10472 “Male”, and 12591 “Female” assets in the dataset. In addition to that, 254 people were labeled as “Not sure”. When the total data and the results were compared, conflict was minimal. As such, it can be be ignored.
The conflicting results are added in the figures as a value of 0.5 for each key. To illustrate, if one annotator labels “Male”, but a second annotator labels “Female” for the same image, both the “Male” and “Female” columns’ “conflict” sections are increased by 0.5.
876 images were labeled as “Baby (0-2)” by both annotators. Similarly, 320 images were labeled as “Baby (0-2)” by one annotator, and “Child (3-9)” by another. The conflict between the annotators reached the maximum level in the young and adult ages. Since the difference between the face types of young and adult ages are smaller, the conflict is expected. The conflicts in each age group can be seen in Figures 3 and 4.
The following graph analyzes the hair color distribution of the dataset. According to the graph in figure 5, the most common hair color is “Brown” with 31% of the total set. It is followed by “Black” and “Blonde”. Since “Black” and “Brown” colors are similar to each other, the conflict reaches the maximum level there.
The next graph analyzes the beard color distribution of the dataset. According to the dataset, 82.9% of the assets have no beard and they are labeled as “No hair”. The conflict reaches the maximum between “Black” and “Brown” colors, as seen in the figure below.
The next graph analyzes the mustache color distribution of the dataset. According to the dataset, 82.2% of the images have no mustache and they are labeled as “No hair”. The conflict reaches the maximum between “Black” and “Brown”, like in the hair color graph. It can be noted that the graphs of the beard and the mustache are close to one another.
The next graphs analyzes the eye color distribution of the dataset. The most common eye color is “Brown” after the labeling process. Since it is hard to distinguish between eye colors, the conflict reaches the maximum level at the “Not visible” label as can be seen in figure 11.
The following graph shows the wearing glasses distribution of the dataset. It is clear in figure 13 that most people have no glasses.
Let’s go deeper analyzing the dataset. For the following graphs, only the results coming from the first annotator were used.
Most men and women in the set are classified as adults. Although the number of “Young” women in the set is really close to the number of “Adults”, the number of adults in the men category is dominating. Either women in the dataset were, on average, younger, or they appeared younger to our annotators.
According to the hair color distribution graph, the most popular color in each age category is brown, except in the baby category. Most baby images in the dataset have blonde hair.
The following figures are really close to each other. Figure 17 represents the beard color distribution by age group, and figure 18 represents the mustache color distribution by age group. Since babies have no beard and mustache, the baby color columns are empty as it is expected. Further, it appears the vast majority of people don’t sport either beard or mustache.
The following figure shows the eye color by age group in the dataset. Brown was by far the most popular color in all age categories.
According to the following figure, wearing prescription glasses occurs mostly in the adult category. Plus, people with no glasses are the vast majority in each category.
According to the hair color distribution graph, black hair color is the most popular among males. Most females appear to have brown hair.
The following figures are really close to each other. Figure 22 represents the beard color distribution by gender, and figure 23 represents mustache color distribution by gender. Most people do not have any beard or mustache. The most popular color both in beard and mustache is black.
The following figure shows the eye color distribution by gender. It appears that most people have brown eyes, with blue in second place for all genders.
The following graph illustrates the images in the dataset by type of glasses. It can be seen in figure 25 that most prescription glasses users are male. However, most faces annotated in the set have no glasses.
The following graph illustrates the conflict rates between annotators for each feature. Since eye color and age group are difficult to understand by looking at an image, that’s where most conflicts occurred.
Ango AI provides data labeling solutions for AI teams of all sizes and industries. Our data labeling platform, Ango Hub, is used by dozens of industry-leading companies to label millions of data points monthly. Hub is the most versatile platform in the market, supporting 15+ file types and 20+ annotation tools. It’s also free to try here.
Ango AI also offers an end-to-end, fully managed data labeling service, Ango Service, used by customers all over the world to label data ranging from banking, to insurance, government, medical, and more. We know all of our annotators personally and do not outsource. Book a call with us to learn more.
Authors: Onur Aydın, Kıvanç Değirmenci