Computer Vision history and trends


When there were tools for inputting images into a computer and computers became powerful enough to process them, it became possible to use computers for image analysis. Of course, from an applied point of view, this made sense in cases where a computer could perform processing faster or better than a person. In the early 1990s, fairly well-functioning programs for automatic recognition of printed text (OCR) were available.

The emergence of the ability to receive real-time images on a computer using digital sensors or digitizing a signal from analog sensors made it possible to use software to solve a wider range of tasks.

The term “computer vision” was spread much later than the appearance of tasks that began to be assigned to this direction (by the way, a similar phenomenon occurred with the term “machine learning”). Here are examples of some typical computer vision tasks:

  • detection of objects (determining which part of the image the object is in)
  • tracking objects
  • counting objects
  • pattern recognition
  • determining the location or size of an object (a discipline such as photogrammetry also deals with such tasks)
  • building 3D models

Algorithms in which the input image and the output image are not computer vision, although they can be used as part of computer vision software. Algorithms where the image and the output enter the “image processing” area. (see after 1 paragraph)

The increase in the popularity of computer vision was promoted by an improvement in the characteristics of image sensors, a decrease in the prices for them, and an increase in the power of computers.

In parallel with this, new algorithms for image analysis were developed, in particular, it is worth noting algorithms for fast detection of objects (for example, the Viola-Jones algorithm) and deep neural networks. This approach allows in many cases to achieve a sufficiently high recognition rate, but in some cases, the program may not recognize an image that a person can easily recognize, and introducing a change that is insignificant from the point of view of a person can lead to a change in the result.

So in critical applications where predictability and the absence of false positives are needed, other methods can be used to compare objects. For example, face recognition in full-face security systems is most often done using biometrics – measuring the distances between key points of a face. At the same time, neural networks can be used to solve an auxiliary problem – the search for key points.

The emergence of OpenCV and a number of other libraries greatly facilitated the development of software for computer vision; before the appearance of specialized libraries, the development of even not very complex programs for computer vision could be quite laborious.

The result is significantly affected by the equipment used to obtain the images. Shooting of each frame is not instantaneous. When using the frame shutter, this period is common for all pixels, when using the line shutter, it is different for each line. It’s clear that when processing images of fast-moving objects, the frame shutter has advantages, but image sensors with a frame shutter and high resolution are quite expensive, in addition, mass-produced mobile devices are sometimes required, and all of them use image sensors with a line shutter.

When using computer vision to solve technical problems, in addition to the camera and computer, additional illumination tools can be used for data processing, for example, as in the TI solution using micromirrors.

To solve some of the tasks that a person copes with, it is advisable for the computer, in addition to cameras, to use additional tools to determine distances to objects. For example, in robotic cars, lidars (for example, or radars can be used for this purpose.