2D image sensors:

  • Monocular vision: black/white, e.g. 512x512
  • Color: 3 sets of 2D matrix data for RGB, e.g. 512x512x3
  • Bit depth: data size of each pixel (e.g. 8 bit: 0 ~ 255)

Image Coordinates (MATLAB)

Pixel indices:

  • Row and column indices are ordered from top to bottom, and from left to right.
  • In general, there are three indices with .

Spatial coordinates:

  • Intrinsic coordinate: Representing locations in image on a continuous plane
  • World coordinate (mapping the intrinsic coordinate to the spatial frame of reference)

Camera Optics

A thin lens with focal length forms a sharp image of an object at distance from the lens on an image plane located at distance behind the lens:

When the object is far away (), the image forms at the focal plane ().

If we let the aperture shrink to a point (or equivalently consider very distant scenes with ), we obtain the pinhole camera: all rays from a scene point pass through a single point (the optical center) and intersect a plane at distance . This “ray-through-a-point” geometry is the basis for the projection equations below.

Pinhole Camera Model

Consider a point in camera coordinates with the origin at the pinhole and axis pointing forward. Its image coordinates (in metric measurements, not pixel measurements) on a image plane at distance are . Similar triangles give us:

The division by is a hallmark of perspective projection; points further away (large ) appear closer to the principal point (center of the image from the camera’s geometric perspective).

We can represent the above with a homogeneous representation:

where

where is the unknown projective scale.

In practice, 3D points are given in a world frame . The rigid motion from world to camera coordinates is:

with rotation matrix and translation .

In homogeneous form, we can then transform between the two as

where . We call the extrinsic matrix; it positions the camera within the world.

Real sensors measure pixels, not metric lengths. We let:

  • and be the pixel densities (pixels/meter) along the sensor’s and axes.
  • be the principal point (the pixel where the optical axis hits the sensor, typically near the image center)

Converting the metric projection to pixels gives:

  • and are called the focal lengths in pixels.

Then, the intrinsic calibration matrix that maps metric image-plane coordinates into pixel coordinates is:

  • The image plane is is the surface where the 3D world is projected to form a 2D image

Putting the pieces together, a world point projects to image pixels via