How to Build Great Computer Vision Datasets in 2023

In recent years, Computer Vision has become a rapidly growing subfield of Artificial Intelligence. Since we know this for a fact at Theos AI, in today’s article we will explain you how to create amazing computer vision datasets for your specific task. We will share with you all the best practices that you should follow, so that you can train a great AI model.

*How many images do I need to train a computer vision model? - Best Practices 2023*

We will mainly talk about Object Detection, but these principles will also apply to image classification, segmentation and most of the other subfields of AI.

Introduction

In object detection, the objective is to not only classify the objects within the image, but also to locate them. Basically, we have to predict the (x, y) coordinates of the objects and their classes, like in the following picture.

*Example of an object detection algorithm locating Messi and a football.*

How Many Images Do I Need to Train a Computer Vision Model?

This is the most important question obviously, but in order to answer this, there are several things to consider to ensure that the model trained on our data will be able to detect patterns effectively and accurately.

The number of images needed will depend on what we answer to the following questions:

How much bounding box precision and class accuracy do I really need?
How much variability will the model see in production?
How many objects are present in each image?
How many classes are present in each image?

It's important to have a good understanding of these factors to estimate the amount of data required to build a robust computer vision model.

Bounding Box Precision

Imagine you're developing an app where you take a picture and it accurately measures an object, such as a table or a vehicle. To achieve this, the boundary box precision needs to be incredibly high. Any inaccuracies in position or size will result in incorrect measurements, so it's essential that the model has plenty of examples to learn from to predict precise bounding boxes.

Also, you must label all the instances of objects in all of the images in your dataset. If look at the first example, you will see that we forgot to label one person. If we do this, the AI model will be confused because it will try to predict a bounding box there and by us not labeling it, we are punishing it for that prediction, which was in fact correct.

Class Accuracy

Let's say you're developing a security system that detects guns in people's hands, but most of the time they will be holding something else, like a mobile phone. In this scenario, having a high class accuracy is crucial. If the app wrongly detects a mobile phone as a gun, it could incorrectly trigger an immediate police response. So, in this case, you want to ensure that the system is accurately identifying the object in question, not just the bounding box around it. You don't need to be concerned with the precision of the bounding box as long as the class accuracy is high, as that is the most important factor.

Variability

Moving on, another factor to consider is the variability that the AI model will see in real-world usage. In a controlled environment, such as a manufacturing line, where you have a camera that inspects products for defects, the conditions are largely consistent. There will be the same lighting, backgrounds, and products are similar to each other. In this scenario, you won't need a large number of examples to train a highly effective model.

However, when it comes to more variable conditions, like a self-driving car that needs to work in different countries and weather conditions, you will require many examples to train the model to work robustly.

Self-driving Cars

*Self-driving car object detection labeling in a summer day.*

*Self-driving car object detection labeling in a winter day.*

Amount Of Objects

Another important aspect to consider is the number of objects present in each image. For instance, if you're creating a mobile app that counts the fruits on a tree, each image may contain 500 fruits. In this case, you may not need many examples to get an accurate model as there is a lot of information in each image. On the other hand, if you're making an AI model that detects the ball in a football match, you may need many more examples as there is only one ball per image and not much information for the model to work well with few images.

In summary, the amount of variability in the environment and the number of objects present in each image are key factors to consider when building your dataset.

Amount of Classes

Suppose you want to train a model to detect different flavors of coffee capsules. This would require a large number of images, especially if you have 100 flavors of coffee. You would need more than 100 images for each flavor, such as 800 or a thousand, to provide enough information for the model to detect patterns among the different flavors.

On the other hand, if you want to detect a single type of gun, such as a pistol, you would need fewer images, as the guns are very similar. The number of images needed would also depend on the number of classes you have.

More Real World Examples

Now, let’s take a look at some more real-world examples to develop insight into how many images we expect to need for each one.

Blood Cell Counting

Here's an example of a dataset for blood cell detection and counting. The dataset has three classes: platelets, red cells and white cells. The images were taken under a microscope, and the background is expected to be the same. This reduces the number of images needed, as the images are very similar and there are only three classes of objects. The white cells are bigger and purple, the red cells are red and circular, and the platelets are small and purple. These patterns are easy to differentiate between one another, so we probably won’t need many images to train a robust model.

Orange Fruit Counting

*Counting Oranges with Object Detection.*

Suppose you want to make a model to count the number of oranges on a tree. This has one class, oranges, and there are many examples in each image. Only 100 images like this would give you a good model. However, if you want the same model to detect bananas, apples and other fruits, as well as vegetables, you would need more images and examples as the model would have to learn the differences between each class.

Supermarket Products Counting

*Detecting Products in Supermarkets with Object Detection.*

Suppose you want to train an AI model to detect all products in the shelves of a supermarket. This has just one class, but the products can look very different from each other. In this example, you would need a lot of images for the model to detect all the different kinds of products that can be present in a supermarket.

Summary

So in summary, when creating a computer vision data set for your specific task, there are several things to consider to ensure that the model will be able to detect patterns effectively and accurately. The number of images needed depends on the number of classes, the similarity between objects within the same class, and the amount of variability in the images. It's important to have a good understanding of these factors to estimate the amount of data required to build a robust model.

Another important factor to consider is the quality of the images. It's crucial to ensure that the images in the dataset are clear and high-resolution, with well-defined objects and minimal noise. This will ensure that the model is able to learn accurate representations of the objects and perform well in real-world scenarios.

Additionally, it's also important to have a balanced dataset, where there is a similar number of labels for each class, so that the model does not have a bias towards certain objects. This can be achieved by carefully selecting the images and ensuring that they are evenly distributed between the classes.

Finally, it's important to also annotate the all of images in the dataset, by adding bounding boxes around the objects of interest and not leaving even a single object unlabelled, so that the model can learn to correctly detect and differentiate between objects.

In conclusion, creating a great computer vision data set is a crucial step in building a successful and robust AI model. The number of images needed to build a great computer vision dataset depends on the number of classes and the similarity between the examples. It is important to consider these factors before collecting your data set to ensure that your model works well in production.

By following these best practices, you can ensure that your model will have the information it needs to accurately detect patterns and perform well in real-world scenarios.

The End

I hope you enjoyed reading this as much as we enjoyed making it!

If you found this cool or helpful please consider subscribing to our YouTube Channel and sharing this blog post with your friends and workmates, as it will help us to know if you want us to make more of these!

Also, consider joining our Discord Server where we can personally help you make your computer vision project successful!

If you have further comments or requests, don’t hesitate to reach out at contact@theos.ai

See you on the next one!

How to Build Great Computer Vision Datasets in 2023 | Best Practices

Introduction

How Many Images Do I Need to Train a Computer Vision Model?

Bounding Box Precision

Class Accuracy