What exactly is camera calibration?

Mondo Cars Updated on 2024-01-19

Machine Vision Salon.

Camera calibration can be said to be the foundation of computer vision and machine vision, but it is not easy for beginners to get started, this article will sort out the logic of camera calibration for readers, and answer the questions raised in the comment area at the end of the article. Divided into the following:

The Purpose and Significance of Camera CalibrationSimplified and Modeling Pinhole Camera Model Mathematical DescriptionThe parameters of calibration pinhole camera model The world we live in is three-dimensional, and ** is two-dimensional, so that we can think of the camera as a function, the input is a scene, and the output is a grayscale map. This function of the process from 3D to 2D is irreversible.

The goal of camera calibration is to find a suitable mathematical model and find the parameters of this model, so that we can approximate the 3D to 2D process and make the function of this 3D to 2D process find the inverse function.

This approximation process is called camera calibration, and we use a simple mathematical model to express the complex imaging process and find the inverse process of imaging. After calibration, the camera can reconstruct the three-dimensional scene, that is, the perception of depth, which is a major branch of computer vision.

When it comes to camera imaging, it's basically talking about the lens of the camera. The fixed structure of the camera lens determines a pair of fixed object image conjugation relationship, the so-called conjugation, which means that the object in a certain position in front of the lens, its image must be in a certain position behind the lens, this relationship is fixed. To take the simplest example, an object at infinity is bound to be imaged at the focal point of the lens. The fixed structure here refers to the fixed focal length and aperture of the lens.

Pictured above is the Canon EF 85mm F12l ii usm, we can find a convex lens with the same object-image conjugation relationship as this lens to equivalentize this lens, we call this convex lens an equivalent lens, represented by a double arrow facing outward, as shown in the figure below.

The equivalence mentioned here is only for the equivalence of the conjugation relationship between objects and images, that is, the equivalence of the optical path, and the reason why lenses of different shapes are used in the lens is mainly to eliminate various aberrations and improve clarity. In other words, the purpose of equivalent lenses is never to replace lenses in practical applications (after all, a lens is so expensive), but only to help us understand. This allows us to draw a clear sketch of the candle burning scene captured by the camera, as shown below.

Among them, is the object point of the flame tip, is the image point of the flame tip, is the object point of the candle root, is the image point of the candle root, is the center of the equivalent lens (also known as the light center), and the red dotted line expresses the object point.

to the image point. Two of the imaging optical paths, the green dotted lines represent the object points.

to the image point. Two of the imaging optical paths, red is the CCD surface.

Note that we just said that we were drawing a clear sketch of the candle burning scene with the camera, which indicates the image point.

and like a point. If the image point does not fall on the CCD surface, that is, the image taken by the CCD is not clear, how can we determine the position of the image point?

According to the geometrical optics charting method, the focus is passed by the equivalent lens.

of light and over the center of light.

The rays of light that we can make like dots.

And. The position of the camera, now we have a clear sketch of the candle burning scene captured by the camera is also used as a diagram, only considered.

And. The object-image relationship of the point.

In this way, we can get four imaging optical paths: the optical path that passes through the upper edge of the lens, the optical path that passes through the lower edge of the lens, the optical path that passes through the focal point of the equivalent lens, and the optical path that passes through the optical center. They all express the point of matter.

with image points. Obviously, the optical path through the optical center is the easiest mathematical model to establish the object-image conjugation relationship, so we use it to represent the imaging optical path and simplify the imaging process of the camera.

At this point, we found that the imaging principle of the simplified camera model and the pinhole camera is very similar, so we call the simplified camera model the pinhole camera model. pictured above.

is the focal length of the pinhole camera model, but note that the focal length of this pinhole camera is not the equivalent lens focal length , but only borrows the focal length of the focal length of the converging light, which expresses the distance from the CCD surface to the optical center.

But what we are saying is that the imaging principle of the simplified camera model and the pinhole camera is only similar, and must not be equated, because the principle of the pinhole camera is that light travels along a straight line, so the real pinhole camera does not have the concept of focal length, and there is no aberration, and the object-image relationship does not have a one-to-one correspondence, as shown in the following figure.

Therefore, to be precise, the imaging process of the camera is simplified into the pinhole camera model, and the simple mathematical relationship in the pinhole camera is borrowed to express some mathematical relationships that are difficult to express, which greatly reduces the complexity in mathematics, but the cost of this simplification is also very large, it does not consider the aberration (although the pinhole camera model supplements the dedistortion model), does not consider the depth of field (the object-image relationship of the pinhole camera model does not have one-to-one correspondence, and it is believed that everything can always be clearly imaged), and the equivalent lens is assumed to be a thin lens. Therefore, the pinhole camera model is only an approximation of the imaging process of a real camera, and we can even say that this is a very rough approximation, which makes the model more similar to the real camera model of the pinhole camera model, such as webcams, mobile phone lenses, surveillance probes, and so on.

We simplified and modeled the camera imaging process to obtain a pinhole camera model, as shown in the figure below.

First of all, we set up the camera coordinate system, and we take the center of light.

is the origin of the coordinate system, and .

The direction is the horizontal and vertical direction of the CCD pixel arrangement, and the direction is perpendicular to the CCD plane, establishing a right-hand coordinate system, which is a three-dimensional coordinate system. Secondly, we also need to establish the CCD label coordinate system: take the pixel label in the upper left corner of the CCD as the origin, and the horizontal and vertical directions of the CCD pixel arrangement.

And. Orientation, which is a two-dimensional coordinate system. For the sake of description, we will then flip the pinhole camera model symmetrically, as shown in the image below, and they are mathematically equivalent.

Starting from the optical center on the optical axis, the image plane is on , which is the physical focal length (unit:) of the camera. Point In space, the position under the camera coordinate system is the point on the image plane, and there are two equivalent position descriptions:1The position in the camera coordinate system is ;2.The position in the CCD designator coordinate system is . In the absence of lens distortion, the optical center, the dot and the dot are in a straight line. AND is the size of the CCD individual pixels in both horizontal and vertical directions (unit: pixels), so the focal length is defined as (unit: pixels). The offset from the origin of the CCD designator coordinate system to the optical axis is (in pixels). Based on the similar triangle relationship, it can be obtained:(1) Two-dimensional CCD marking coordinates Two-dimensional CCD pixel coordinatesEstablish the correlation between the point number coordinates and the physical coordinates on the CCD image plane, which can be omitted because the image plane is on . (2) The two-dimensional pixel coordinates of the image point The coordinates in the three-dimensional space of the object pointThe correlation between the physical coordinates of the image point on the CCD image plane and the coordinates of the object point in the corresponding three-dimensional space is established. (3) The two-dimensional CCD marker coordinates of the image point The coordinates in the three-dimensional space of the object pointThe two associations that connect (1) and (2) are also the parameters obtained by the actual calibration, which can be obtained from the above three associations: the image point number, the image point coordinates, and the object point coordinates. In OpenCV and MATLAB calibration toolbox, (3) correlation is directly used, and there is no need to know the size of a single pixel of the CCD, so the physical focal length can not be obtained in the calibration process, and only the pixel focal length can be obtained. It is easy to find that (3) the correlation is unconstrained, that is, the pinhole camera model itself is underdetermined, and through the illuminated ccd pixels, we can only know that the object point is on the ray, but we cannot determine the specific point, so we say that the pinhole camera model is a ray equation model. Ray equation of the point: The above relationship is established under the condition of no lens distortion, but in fact there is lens distortion, it can be understood that the light between the imaging point and the object point is curved, and the distortion must be eliminated to obtain the ray model. (4) The supplementary dedistortion model is centered on the pixel coordinates of the center of the image plane, and the distance from the point to the center on the image plane is the resultant distortion Wherein: radial distortion tangential distortion The dedistortion model is supplemented with the pinhole camera model, so the image point label number image point coordinates object point coordinates are modified to: image point label image point coordinates object point coordinates The ray equation after distortion is: in the pinhole camera model, as long as these 9 parameters are determined, the pinhole camera model can be uniquely determined, and this process is called camera calibration The first 4 are called internal parameters, and the last 5 are called distortion parameters, and the distortion parameters are used to supplement the internal parameters. So once the camera structure is fixed, including the lens structure and the focusing distance, we can use these 9 parameters to approximate the camera. The lens structure fixation mentioned here, according to my personal understanding, in addition to the focal length fixation, should also include aperture fixation, because changing the size of the aperture, in addition to the depth of field, it is possible to change the position of the optical center in the pinhole camera model, but the impact is not very large. This means that if the calibrated camera changes the aperture size, the calibration error will be larger, but it should not be unacceptably large.

For the pinhole camera itself, the equation that needs to be fitted is as follows:

Thereinto. Represents the transformation between a distorted image and a dedistorted image.

So our task now is to find a large number of image points that have a corresponding relationship.

and the point of matter. The 9 parameters of the model were trained as samples.

Then two questions arise here: (1) Who is the same pair with so many dots and objects?(2) Even if I know that the object point is in **, how can I use the camera coordinate system to express the location of the object point?

In order to solve the above two problems, the calibration plate came into being. The first function of the calibration plate is to determine the correspondence between the object point and the image point. The principle used here is mainly perspective invariance, for example, if you look at a person up close and look at a person from afar, although the size of his nose has changed, and the perspective of your nose has also changed, the topology must not change, and you can't see the nose as a mouth.

Therefore, in the calibration board, the topology is printed, and the checkerboard and dot grid are widely used, which have become the mainstream not only because their topology is clear and uniform, but more importantly, the algorithm for detecting its topology is simple and effective. Checkerboard detects corner points, which can be obtained by calculating the gradient in both the horizontal and vertical directions of the captured checkerboard imageThe detection of the dot grid only needs to calculate the centroid of the dot lattice pattern. If you have developed a very perfect algorithm for detecting all the features of a face, you can use your ** as a calibration board. In my experience, the dot grid should work better than the checkerboard because the perspective invariance of the dot centroid is much more stable than the corner of the checkerboard. The figure below shows the error comparison of the checkerboard and the dot of the same size and proportion at the maximum reprojection error, the red cross is the extracted corner centroid, and the green circle is the corner centroid position calculated by the pinhole camera model.

The following figure is a diagram of the reprojection error between the checkerboard and the dot grid, and it is obvious that the error space of the reprojection error of the dot grid is smaller.

However, the detection of the dot grid seems to be a patent of HALCON (doubtful), so OpenCV and MATLAB calibration toolboxes use checkerboards, and you have to write your own algorithm to use the dot grid. The calibration boards mentioned below are all checkerboards.

The second function of the calibration plate is to transform the corners of the calibration plate into the coordinates in the camera coordinate system.

For beginners in calibration, it is easy to overlook the fact that the calibration plate has a calibration plate coordinate system. In other words, the position of each corner point in the calibration plate in the calibration plate coordinate system is determined and known.

The transformation matrix of the calibration plate coordinate system to the camera coordinate system, we call its elements the external parameters, so in my eyes, the external parameters of the camera calibration are simply the by-products of the calibration internal parameters, which will be different with the placement of the calibration plate, and the transformation matrix from the calibration plate coordinate system to the camera coordinate system can be expressed by the following formula:

Among them, it is called the rotation matrix, which is called the translation matrix, as shown in the table below.

Stands for board to camera. Note that the transformation of this coordinate system is a linear transformation, which reflects physically, which means that the calibration plate must be as flat as possible, and if the calibration plate is not flat, the transformation is not linear. Let's bring this transformation to the equation we were meant to fit:

In this way, we can label the pixels that can be photographed on the CCD to the corners.

and the coordinates of each corner point known in the calibration plate coordinate system.

Correspondingly, all parameters are trained by using various postures as samples.

As for the methods of parameter training, least squares, maximum likelihood estimation, etc., it is easy to find relevant information, so I will not repeat them here. If you use OpenCV or MATLAB calibration toolbox for calibration, you need to give the physical size of the checkerboard, which is actually to establish the calibration plate coordinate system, from the measurement point of view, the accuracy of the calibration plate is the benchmark of the camera calibration accuracy, and it is the first link in the error transmission chain. Therefore, in order to make the pinhole camera model more similar to the real camera, the quality of the calibration plate has the following requirements (in order of importance):

The flatness of the calibration plate is high, and the checkerboard is at right angles;High consistency of each lattice size of the calibration plate;The difference between the real size and the nominal size is small. 4.Pay tribute to the ancestor Zhang Zhengyou.

This article is only for academic sharing, if there is any infringement, please contact to delete the article.

Related Pages