227-0391-00: Medical Image Analysis
Section 3
Image Transformation & Registration
Swiss Federal Institute of Technology Zurich
Eidgenössische Technische Hochschule Zürich
Last Edit Date: 06/30/2025
Disclaimer and Term of Use:
We do not guarantee the accuracy and completeness of the summary content. Some of the course material may not be included, and some of the content in the summary may not be correct. You should use this file properly and legally. We are not responsible for any results from using this file
This personal note is adapted from Professor Ender Konukoglu, Professor Mauricio Reyes, and Doctor Ertunc Erdil. Please contact us to delete this file if you think your rights have been violated.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Sampling model¶
Moving image is the one that will be transformed
Target image is to target that we want to attain with the transform
Both images have their own pixel coordinates and coorepsonding Cartesian grids
Assume there exists a spatial transformation, i.e., a coordinate transformation, that matches the moving to the target image
In reality, images are discretized - we only have values at pixel locations
Transformation¶
Let us assume $x' = T(x)$ is the transformation model with $x, x' \in \mathbb{R}^2$ in this case, where $x$ and $x'$ represents points in the moving and target image coordinate systems, respectively. Discussions for 3D transformations is the same.
Let us also define $x = T^{-1}(x')$ as the inverse of the transformation model.
Number of possible models: $T(x) = Ax$ (linear) or $T(x) = x + u(x)$ (non-linear)
Main sampling model question
Given the images, respective pixel-grids and transformation, how do we generate the "output" transformed moving image?
There are two possibilities
Forward transformation
Backward transformation
Approch 1: Forward transformation¶
$$T(x) = \{x_1 / [0.15 \sin(2.5x_2) + 0.85], x_2\} = \{x_1', x_2'\} = x'$$
We transform the input grid. Blue contour is the object's countour in the target image and the gray area is the transformation of the moving image. So, everything works nicely.
However, here everything is continuous and analytically computed! Actually this is not how it would work in discrete case.
In the discrete case each pixel from the moving image will be transformed $x' = T(x)$ and the pixel intensity will be copied to the output image.
Problem:
Transfromed pixel cooordinates $x'$ does no necessarily fall on pixel exactly
Two pixels can be mapped to the same output pixel
Same output pixel can be mapped to two output pixels
Some output pixels may not even get a value assigned
Solution (quite complicated):
Intensities of the output pixels can be computed using partial areas
Ovelep between each transformed pixel and output pixels are computed
This overlap is used in a complex intensity assignment step
Approach 2: Backward transformation¶
$$T^{-1}(x') = \{ x_1' [0.15 \sin (2.5 x_2') + 0.85], x_2' \} = \{ x_1, x_2 \} = x$$
We transform the ouput grid. Blue countour is the object's countour in the target image and the gray area is the transformation of the moving image. Everything is continuous and analytical again.
In the backward mapping, for each ouput pixel the corresponding location in the input / moving image is determined. This is much easier to deal with.
Problem:
- Transformed locations of the output pixels od not generally fall exactly on input pixels.
Solution:
Interpolation
Pixel intensity (all channels if necessary) are interpolated using the nearest input image pixels
Remark
Backward transformation is always easier to use for transforming images through interpolation
Interpolation models¶
Problem source¶
Whenever an image is resampled interpolation is used to compute the intensities at the new pixels. For example, spatial transformations and resizing an image for pre-processing.
Intensities at each pixel are defined and we assume those intensities are defined at the center of the pixels, black nodes. How can we computer the intensity at the red node?
Approach 1: Neasest neighbor interpolation¶
Assign the intensity value of the closest pixel with known intensity. Assign the value of the "nearest neighbor."
Nearest neighbor interpolation is very fast
It does not introduce new intensity values in the image
May introduce heavy artifacts in gray level images
Nearest neighbor interpolation yields a piece-wise constant funciton, which is not continuous nor differentiable
Remark
While it is not used for gray level or multispectural images, it is used for transforming labels since it have the nice property of not introducing any new values
Approach 2: Bilinear interpolation¶
Uses closest 4 pixels' intensity values
Extension of linear interpolation in 2D
Trilinear interpolation in 3D
Let us denote the coordiantes of the four points as $p_{00} = \{ x_1^0, x_2^0 \}$, $p_{01} = \{ x_1^0, x_2^1 \}$, $p_{10} = \{ x_1^1, x_2^0 \}$, $p_{11} = \{ x_1^1, x_2^1 \}$, and $q = \{ x_1, x_2 \}$. Interpolated values at the blue intermediate points are:
$$ \begin{align} I(x_1, x_2^1) &= \frac{x_1^1 - x_1}{x_1^1 - x_1^0} I(x_1^0, x_2^1) + \frac{x_1 - x_1^0}{x_1^1 - x_1^0} I(x_1^1, x_2^1) \\ I(x_1, x_2^0) &= \frac{x_1^1 - x_1}{x_1^1 - x_1^0} I(x_1^0, x_2^0) + \frac{x_1 - x_1^0}{x_1^1 - x_1^0} I(x_1^1, x_2^0) \\ I(x_1, x_2) &= \frac{x_2^1 - x_2}{x_2^1 - x_2^0} I(x_1, x_2^0) + \frac{x_2 - x_2^0}{x_2^1 - x_2^0} I(x_1, x_2^1) \\ \end{align} $$
Bilinear interpolation is fast
It will introduce new intensity values in the image due to interpolation
Does not intriduce as big artifacts as nearest neighbor interpolation
Bilinear interpolation leads to piece-wise linear functions, which is continuous but not differentiable
It may smooth and blur the image, leading to decrease in spatial resolution
Trilinear interpolation in 3D repeats the same procedure for an additional dimension and uses 8 points - corners of a cube
On label maps bilinear interpolation produces extra values (look at the boundaries)
These additional labels arise due to interpolation
They doe not exist and should be avoided whenever resampling label maps
Alternative formulation of bilinear interpolation¶
The bilinear interpolation has the following form with four unknown parameters
$$I(x_1, x_2) = a + bx_1 + cx_2 + dx_1x_2$$
Corresponding to the four unknowns we have the intensity values at the four corners leading to a linear system of equation
$$ \begin{bmatrix} 1 & x_1^0 & x_2^0 & x_1^0 x_2^0 \\ 1 & x_1^0 & x_2^1 & x_1^0 x_2^1 \\ 1 & x_1^1 & x_2^0 & x_1^1 x_2^0 \\ 1 & x_1^1 & x_2^1 & x_1^1 x_2^1 \\ \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ \end{bmatrix} = \begin{bmatrix} I(x_1^0, x_2^0) \\ I(x_1^0, x_2^1) \\ I(x_1^1, x_2^0) \\ I(x_1^1, x_2^1) \\ \end{bmatrix} $$
One can solve te system of equations and use $a$, $b$, $c$, $d$ coefficients to interpolate at any $x$ in between $p_{00}$, $p_{01}$, $p_{10}$, $p_{11}$.
Trilinear interpolation can be defined similarly with the following parametric form with 8 unknowns
$$I(x_1, x_2, x_3) = a + bx_1 + cx_2 + dx_3 + ex_1x_2 + fx_1x_3 + gx_2x_3 + hx_1x_2x_3$$
Approach 3: Bicubic interpolation¶
Uses closest 4 pixels' information
However, it requires not only the intensity values but also the derivates at those points
Therefore, it is common to use 16 closest points in the interpolation
Extension of cubic interpolation in 2D
Tricubic interpolation in 3D
Similar idea as bilinear interpolation: cubic interpolation in one dimension followed by the other
Functional form¶
Bicubic interpolation has the following form with 16 unknowns
$$I(x_1, x_2) = \sum_{i = 0}^{3} \sum_{j = 0}^3 a_{ij} x_1^i x_2^j$$
Similar to the bilinear interpolation, a linear system of equations can be constructed and solved. Information to determine these unknowns are taken from the 4 closest neighbors as
Intensities at $p_{00}$, $p_{01}$, $p_{10}$, $p_{11}$ = 4 values
$\partial x_1$ and $\partial x_2$ derivatives at $p_{00}$, $p_{01}$, $p_{10}$, $p_{11}$ = 8 values
$\partial x_1 x_2$ mixed derivatives at $p_{00}$, $p_{01}$, $p_{10}$, $p_{11}$ = 4 values
Remark
In order to compute the derivatives further closest pixels must be used. Therefore, bicubic interpolation uses more pixels than the closest 4.
Approximating the derivatives can be done in different ways.
Convolutional form¶
One easy way to do bicubic interpolation is through convolution with the following kernel - a result by keys
$$ \begin{align} K(x) &= \begin{cases} \frac{3}{2} |x|^3 - \frac{5}{2} |x|^2 + 1 & 0 \le |x| \le 1 \\ -\frac{1}{2} |x|^3 + \frac{5}{2} |x|^2 - 4 |x| + 2 & 1 \le |x| \le 2 \\ 0 & 2 \le |x| \end{cases} \\ f(x) &= \sum_{i = -2}^{2} f(x^i) K(x - x^i) \end{align} $$
The kernel is used to covolve each dimension one after the other
In 1d its support is in $[-2, 2]$ taking into account 4 sampled points
In 2d the bicubic interpolation with Key's convolution takes into account $4 \times 4$ closest poixels to the interpolating point $x$
The kernel is called "Catmull-Rom" kernel
Results on images and label maps¶
Bicubic interpolation is better at retaining gray values than bilinear and nearest neighbor. It has the highest quality interpolation amongst them
Compulational requirement is the highest
Bicubic interpolation leads to smooth functions that are both contunuous and differentiable
Tricubic interpolation in 3D repeats the same procedure for additional dimension and larger number of points
One label maps bicubic interpolation produces extra vales
These additional labels arise due to interpolation
They do not exist and should be avoided whenever resampling label maps