227-0391-00: Medical Image Analysis

Section 2 Enhancement and Pre-processing

Swiss Federal Institute of Technology Zurich

Eidgenössische Technische Hochschule Zürich

Last Edit Date: 06/27/2025

Disclaimer and Term of Use:

We do not guarantee the accuracy and completeness of the summary content. Some of the course material may not be included, and some of the content in the summary may not be correct. You should use this file properly and legally. We are not responsible for any results from using this file

This personal note is adapted from Professor Ender Konukoglu, Professor Mauricio Reyes, and Doctor Ertunc Erdil. Please contact us to delete this file if you think your rights have been violated.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Intensity normalization¶

Problem source¶

No description has been provided for this image

Intensity variations between images coming from the differences in
- Scanners
- Protocols
- Acquisition software
- Reconstruction method
Particularly important for MRI due to lack of absolute intensities
Less of a problem for modalities with absolute intensities, e.g., Hounsfield Unit in CT, hower variations can still occur
Intensity variations can cause adverse effects for the rest of the processing steps
Particularly problematic for machine learning methods. Intensity difference between training and test samples is called domain effect shift. Think of images at night and day
Remains an open problem

In the ideal case

Whenever

Two images are compared, e.g., registration
Two groups are compared, e.g., statistical analysis
A model's parameters are estimated from a set of images, i.e., training and estimated model is applied on a new set of images, i.e., testing

We would like images to have the same intensity "characteristics." Notion of "characteristics" refer to intensity distribution, including global and local intensity and contrast.

Approach 1: Min-max¶

Matching the range, the Min-max normalization is

$$f(j) = \frac{j - j_{\text{min}}}{j_{\text{max}} - j_{\text{min}}} \left( i_{\text{max}} - i_{\text{min}} \right) + i_{\text{min}}$$

where $i$ and $j$ are the intensities of the two images.

Max and min values are sensitive to outliers
Often intensities corresponding to $\epsilon$% and $1 - \epsilon$% of the CDF are used instead, e.g., $\epsilon = 2$
Commonly used in machine-learning based methods as a preprocessing step
Can solve global intensity shift and multiplicative factors
Cannot solve contrast differences

Approach 2: Matching mean and standard deviation¶

Matching second order statistics - mean and standard deviation normalization is

$$ \begin{align} f(j) &= (j - \mu_j) \frac{\sigma_i}{\sigma_j} + \mu_i \\ \mu_i &= \frac{1}{N} \sum_{x=1}^{N} i(x) \\ \sigma_i^2 &= \frac{1}{N-1} \sum_{x=1}^{N} \left( i(x) - \mu_i \right)^2 \end{align} $$

Less sensitive to outliers but still robust statistics may be used for better behavior, e.g., median, robust variance
Often used in machine-learning based methods
Can solve global intensity shift and multiplicative factors
Cannot solve contrast differences

Short summary on approaches 1 and 2

Both min-max and mean-std matching are linear mappings (linear except the bias or linear in the homogeneous coordinates). Both of these transformations cannot change the contrast of an image. In order to be able to change the contrast, we need non-linear transformations in the intensity space.

Approach 3: Histogram normalization with landmarks¶

This kind of piecewise linear mapping (quartiles and min-max as landmarks) can be represented as

$$ f(j) = \begin{cases} (j - j_{\text{beg}}) \frac{i_1 - i_{\text{beg}}}{j_1 - j_{\text{beg}}} + i_{\text{beg}} & j \le j_1 \\ (j - j_1) \frac{i_2 - i_1}{j_2 - j_1} + i_1 & j_1 < j \le j_2 \\ (j - j_2) \frac{i_3 - i_2}{j_3 - j_2} + i_2 & j_2 < j \le j_3 \\ (j - j_3) \frac{i_{\text{end}} - i_3}{j_{\text{end}} - j_3} + i_3 & j_3 < j \\ \end{cases} $$

Matching lanrmarks of image histograms
Piece-wise linear map: non-linear mapping

Different landmarks are possible
- min and max
- mean or median
- quartiles, deciles
- modes
Less senitive to outliers
Widely used
Population-wide used:
- Estimate landmark positions from population data as average
- Map the new image's landmarks to the averages
Can change the contrast because of the non-linearity of the mapping
Matching only the locations of the landmarks does not make the histograms perfectly euqal
We still might have some problems when matching histograms of images with large lesions and healthy brans
- Presence of pathologies will change histograms
- Matching pathology bearing images is hard
- Robust statistics and outlier detection can help

Several things to pay attention to

Image content

Histogram matching does not know about image content

All pixels are considered independent

No spatial information / correlation between neighboring pixels

Matching gradients has been proposed

Object size

Differences in object size can be a problem

Perfect histogram match would not be preferred in that case

Use mode matching if object sizes are different

Noise supression¶

Problem source¶

Imperfections in the acuisition device
Artifacts due to fast acquisition
Artifacts due to implants

Problem definition¶

Additive noise

$$J(x) = I(x) + \epsilon(x)$$

$\epsilon(x)$ is often considered to be i.i.d., for example, Gaussian noise $\epsilon ~ \mathcal{N}(0, \sigma)$
Multiplicative noise

$$J(x) = I(x) \epsilon(x)$$

For example, speckle in ultrasound and optical coherence tomography
More complicated, for example, noise in MR magnitute image

$$J(x) = \sqrt{\left( I_r(x) + \epsilon \right)^2 + \left( I_i(x) + \eta \right)^2}$$

$\epsilon, \eta \sim \mathcal{N}(0, \sigma)$, Rician noise
Given the noisy image $J$ remove noise to get $I$

Approach 1: Linear filter / convolutional¶

Convolution is defined as

$$ \begin{align} \tilde{I} &= J \cdot S \\ \tilde{I}(x, y, z) &= \int_r \int_p \int_q J(x-r, y-p, z-q) S(r, p, q) dr~dp~dq \\ \tilde{I}(i, j, k) &= \sum_{r} \sum_{p} \sum_{q} J(i - r, j - p, k - q) S(r, p, q) \end{align} $$

where $S$ is the structural element (kernel), $J$ is the pixel from the image we want to denoise, and $\tilde{I}$ is the denoised image.

Extremely efficient to apply
Easy to implement
Blurs edges
Not context aware
Sensitive to outliers

Remark

It is not possible to remove noise and not blur edges at the same time with a linear filter. We need non-linear filtering for this.

Approach 2: Median filter¶

For each location, rank order all neighboring intensities
Assign the median value as the result
- 1, 2, 2, 2, 2, 3, 3, 9
- Median is 2
Sliding window similar to convolution
Other rank order filters such as min and max, are defined similarly
Non-linear filters
Two rank filters are not commutative: $\text{min}(\text{median}(I)) \neq \text{median}(\text{min}(I))$

Roubustness and edge preservation
Easy and efficient implementation
Does not introduce new intensities: (i) keeping the intensities integer, (ii) applying median filtering to labels - in a post-processing step - will not introduce new labels and remove islands, e.g., segmentation
Patch result
Image content can change - adding and removing structures

Approach 3: Non-local means¶

Very widely used model, very competitive results
Extension of bilateral filtering
Most noise supression methods can be put in the form

$$I(x) = \sum_{y \in \Omega} w(x, y) J(y)$$

where $0 \le w(x, y) \le 1$ and $\sum_{y \in \Omega} w(x, y) = 1$, $\forall x$. For example, we can use mean and Gaussian filter, the weights $w(x, y)$ changes accordingly.
Non-local means uses $w(x, y)$ to assess similarity between image information at $x$ and $y$

$$w(x, y) = \frac{1}{Z(x)} \exp \left\{ - \frac{\| J(\mathcal{N}(x)) - J(\mathcal{N}(y)) \|_2^2}{h^2} \right\}$$

$Z(x)$ is the normalization constant, $h$ is the degree of filtering and $\mathcal{N}(x)$ neighborhood around $x$.

In the example image above, all the pixels in the circles are neighboring pixels $q_1, q_2, q_3$ corresponding to $\mathcal{N}(x)$, the target pixel is $p$ corresponding to $\mathcal{N}(y)$. If the pixel is similar to the target, then the corresponding weight is higher.

While smoothing point $x$ use areas with similar image content
If noise is random then averaging similar areas should get rid of it
Uses redundancy in the images, i.e., local structures repeat

Increasing $h$ leads to more smoothing
Model parameters
- $h$ - depends on the noise level
- size of $\mathcal{N}(x)$
- size of $W(x)$
The averaging often happens in a small search window $$I(x) = \sum_{y \in W(x)} w(x, y) J(y)$$
Size of $\mathcal{N}(x)$ matters
- $\mathcal{N}(x) = \Omega$ lead to mean filter
- $\mathcal{N}(x) = \{ x \}$ leads to Yaroslavsky's filter, a special case of bilateral filtering
Higher noise supression
Better preservation of edges
May create artifacts, but still better than the others
Important to estimate noise level - different algorithms to do it
Extension to MRI and difussion MRI
Currently the state-of-art filtering technique that is not based on machine learning

Summary

Convolutional Filters	Median Filters	Non-local means
Efficient to apply and easy to implement Linear - easy to analyze mathematically Blurs edges Not context aware Sensitive to outliers	Efficient to apply and easy to implement Robust to outliers Edge preservation No new inensities introduced Patchy results Image content may change	Higher noise supression Better edge preservation May create artifacts Not as efficient to apply as the others

Approach 4: Energy-based¶

Remeber the additive model for the noisy image

$$\underbrace{J(x)}_{\text{observed}} = \underbrace{I(x)}_{\text{desired}} + \underbrace{\epsilon(x)}_{\text{noise}}$$

Denoising as an energy minimization problem

$$\hat{I} = \arg \min_{I} \underbrace{\lambda D(I, J)}_{\text{punish all deviation}} + \underbrace{R(I)}_{\text{embed prior image knowledge}}$$

Data consistency

$$D(I, J) = \int (I(x) - J(x))^2 dx = \| I - J \|_2^2$$
Regularization

$$R(I) = \int \| \nabla I \| dx = \| I \|_{TV}$$
$\lambda$ is the allowed noise level in the observed image with $\lambda = \frac{1}{2\sigma^2}$

Total variation aims to get an image with minimal variation.

Probabilistic interpretation¶

The energy-based formulation can be interpreted in a probabilistic model. Remember that

$$\arg \min_{I} \lambda D(I, J) + R(I) = \arg \max_{I} \log P(J|I) + \log P(I)$$

where

$P(J|I)$ is the likelihood
$P(I)$ is the prior

Total vairatin denoising model:

$$ \begin{align} P(J|I) &= \mathcal{N}\left( J; I, \frac{1}{2\lambda} \right) \\ P(I) &\propto \exp\left( - \| \nabla I \| \right) \end{align} $$

Remark

Note that the standard deviation of the likelihood model is linked to the weight in the energy-based formulation

Generality¶

Note that the energy-based or the corresponding probabilistic formulations are very generic and many different models can be case in the same formulation

$$\arg \min_{I} \lambda D(I, J) + R(I) = \arg \max_{I} \log P(J|I) + \log P(I)$$

$D(I, J)$ or $P(J|I)$ encode information how the noise process works. It does not have to additive Gaussian noise but it can be different things as we have seen before.
$R(I)$ or $P(I)$ encode out prior information on what type of images are plausible
- TV norm: minimal variation
- Wavelet: sparse in a transform domain
- Dictionary learning: composed of sparse atomic elements
- Unsupervised learning: probabilistic models learned from data
- Adversarial learning: comparing distribution wise

Bias correction¶

Problem source¶

Acquisition artifact
- Field inhomogeneity
- Electrical characteristics of the tissue
- Poor uniformity of coils
- Gradients and eddy currents
Most MRI has it
Minimal effect on visual interpretation, difficult to see by eyes
Adverse affects on algorithms, especially segmentation
Non-linear effect on intensities that vary spatially
Remove it as a pre-processing step

Bias field¶

Smoothly varying
Does not depend on the underlying tissue to the first order approximation
In reality, it also depends on the tissue but this effect is mostly ignored
Approximate mathematical model

$$j = b\cdot i + n$$
- $i$: real image
- $b$: bias field
- $n$: noise
- $j$: observed image
Multiplicative factor

Basic approaches to correct it¶

At hardware side with extra measurements
- Ideally a perfect solution
- Additional measurements makes it hard for retrospective data analysis and usually not integrated
- Post-acquisition methods are very useful
Manually placed landmarks
- Volxels expected to have similar intensity
- Human in the loop - a very good solution
- Need for human interaction
Joint segmentation
- Same structure should have the same intensity in all voxels
- If you know what you expect in the image
- Geared to get best segmentation accuracy
- Need for prior information and too specific
N3 algorithm
- Generic
- Most popular solution

N3 algorithm¶

Principle¶

Starting from the main model

$$\tilde{j}(x) = \tilde{b}(x) \tilde{i}(x) + \tilde{n}(x)$$

Let us focus on the noise free case. Taking the logarithms of both sides to make effect of the bias field additive

$$j(x) = b(x) + i(x)$$

Goal

N3 estimates $b(x)$ at every point in the image. Bias removed image can be immediately computed by $i(x) = j(x) - b(x)$

Key point

N3 analyzes the distribution of $j$, $b$, and $i$ to understand the effect of the bias term

Assumptions

Independence between pixels when considering distributions of $j$, $b$, and $i$

Bias field and the image intensities are independent

$j(x) = b(x) + i(x)$ is a sum of random variables with pdf denoted as $J(j)$, $B(b)$, and $I(i)$. The probability distribution $J$ is the following convolution:

$$J(j) = \int I(j - b) B(b) db = I \times B$$

What can we say about $B$?

$\tilde{j}(x) = \tilde{i}(x) \tilde{b}(x)$
$\tilde{b}$ is around 1, so $b$ is around $0$
$B$ can be approximated with a Gaussian with small standard deviation
$J$ is a smoothed version of $I$
The bias field is a smooth field - $b$ is smooth

Sums of random variables¶

Let $a$ and $b$ be two independent random variables with pdfs denoted by $f_a(a)$ and $f_b(b)$. Let us define another random variable

$$c = a + b$$

We compute this using cdf of $c$, i.e., $F_c(c')$

$$ \begin{align} P(c \le c') &= F_c(c') = \int_{-\infty}^{c'} f_c(c) dc \\ &= \int_{-\infty}^{\infty} \int_{\infty}^{c'-b} f_a(a) f_b(b) dadb \end{align} $$

Then the PDF can be computed by taking the derivative $dF_c(c') / dc'$, using Leibnitz' rule twice

$$ \begin{align} f_c(c') &= \frac{d}{dc'} F_c(c') = \frac{d}{dc'} \int_{-\infty}^{\infty} \int_{-\infty}^{c'-b} f_a(a) f_b(b) dadb \\ &= \int_{-\infty}^{\infty} \frac{d}{dc'} \int_{-\infty}^{c'-b} f_a(a) f_b(b) dadb\\ &= \int_{-\infty}^{\infty} f_a(c' - b) f_b(b) db \end{align} $$

Convolution

$c = a + b \Rightarrow f_c = f_a \times f_b$ when $a$ and $b$ are independent random variables

Two components in the problem¶

Given

$$j(x) = b(x) + i(x)$$

denote the probability distributions of $j$, $b$, and $i$ with $J(j)$, $B(b)$, and $I(i)$. The probability distribution $J$ is the following convolution

$$J(j) = \int I(j - b) B(b) db = I \times B$$

The estimation must determine $I(i)$ and $b$ given only $j$. We divide this into two parts

Field estimation: Estimation of $b$ given only $j$ (note that $I(i)$ is only the distribution and not the image, and we are looking for $b$)
$I(i)$ estimation: Estimation of $I(i)$

Always assuming $B$ is a Gaussian with small standard deviation.

Compoonent 1: Field estimation¶

Strategy: Given an intensity value $j$, the probability distributions $I(i)$ and $B(b)$, predict the $i$ that may correspond $j$ on average. Then get the bias field as the difference. We can apply this at every pixel / voxel.

If we are given the distribution $I(i)$ we can estimate the bias field at any point by

$$ \begin{align} \mathbb{E}[i | j] &= \int_{-\infty}^{\infty} i \cdot p(i|j) di \\ \text{Bias field estimate: } b_e(j) &= \mathbb{E}[b | j] = j - E[i | j] \\ \mathbb{E}[i | j] &= \int_{-\infty}^{\infty} i \cdot \frac{p(i, j)}{p(j)} di \\ &= \frac{1}{J(j)} \int_{-\infty}^{\infty} i \cdot p(i, j) di~~~~~~~, b=j-i \\ &= \frac{1}{J(j)} \int_{-\infty}^{\infty} i \cdot p(i) \cdot p(b) di\\ &= \frac{\int_{-\infty}^{\infty} i \cdot B(j - i) \cdot I(i) di}{\int_{-\infty}^{\infty} B(j - i) \cdot I(i) di} \end{align} $$

We know that all the terms in this equation and $i$ is one dimensional, so we can solve it numerically with ease.

Component 2: $I(i)$ estimation¶

Key observation

$J(j)$ is a smoothed version of $I(i)$, and $B(b)$ is roughly Gaussian with a small standard deviation

Approximation

Approximate $I(i)$ by sharpening $J(j)$ assuming $B(b)$ is a Gaussian with small standard deviation

$$J = I \cdot B \Leftrightarrow \mathcal{F}\{J\} = \mathcal{F}\{I\}\mathcal{F}\{B\}$$

Deblurring filter $\hat{F} = \frac{\mathcal{F}\{B\}^*}{ |\mathcal{F}\{B\}|^2 + Z^2}$
$\mathcal{F}\{I\} \approx \hat{F} \mathcal{F}\{J\}$

with $Z$ being a small number $\mathcal{F}\{B\}^*$ denotes the complex conjugate. This is inverse filtering.

Intermediate summary

Estimate of the unbiased image's distribution: $I(i) \approx \mathcal{F}^{-1} \left( \hat{F} \mathcal{F} \{J\} \right)$

Field estimate: $b_e(j) = j - \frac{\int_{-\infty}^{\infty} i \cdot B(j - i) \cdot I(i) di}{\int_{-\infty}^{\infty} B(j - i) \cdot I(i) di}$

Using both we can estimate field at any point $x$ in the image

But there are two things we need to pay attention to - need for smoothing and need for iterations

Need for smoothing¶

There are several points that lead to noise
1. $\tilde{j}(x) = \tilde{b}(x) \tilde{i}(x) + \tilde{n}(x)$ - we ignore the noise
2. $\hat{F}$ is approximate and so is resulting $I$
3. We did not yet use the fact that $b(x)$ should be a smooth field - there is a spatial correlation between the bias field at neighboring pixels.
Solution: Smooth approximaton based on anchor points using splines or radial bais fucntion approximation:

$$b(x) = \sum_{n=1}^{N} b_e(j(x_n)) K(\| x - x_n \|)$$

with $K$ smoothing kernel such as $\exp\left( \| x - x_n \|_2^2 / \sigma^2 \right)$.
Example

Need for iterations¶

We showed that $J = I \cdot B$ and used $I \approx \mathcal{F}^{-1} (\hat{F} \mathcal{F}\{J\})$
In theory, we do not know $B$ only assume it is a Gaussian
If we use too wide of a $B$ the inverse filtering may fail

Remember that they have to be distributions (> 0). Deconvolution is hard.
Solution: Iterate many times with small Gaussians (small standard deviations)
Not a bas approximation: large Gaussian are convolutions of smaller ones

Summary

Bias correction with N3 is routinely used for MRI

There are variations available, e.g., N4ITK

In the method there are multiple approximations

Multiresolution solutions exist and perform better

May need to run the tool a few times to get better results

Variations in image and pixel-size¶

Problem source¶

What is it?¶

Images may come with different pixel and / or image size
They may have different Field-of-View (FOV)
We may need to compare such "heterogeneous" images

Why does it happen?¶

Differences in acquisitions
In FOV
In pixel size

In which applications does it matter?¶

Comparing measurements
- measurements taken in an image have a correspondence in real life.
- real life measurements rely on pixel size.
- comparisons need to take this into account.
- e.g. longitudinal analysis, population statistics, normative distributions
Registration
- aligning images requires building point correspondence
- building correspondence makes sense if two points show similar area
Machine learning applications
- parameters of a model are learned from a set of images.
- applying the same model to an image with different image/pixel size or FOV requires attention
Effectively important for all applications
Building algorithms that work for heterogeneous images is one of the most challenging tasks

How do we tackle this?¶

Q: What criteria should guide us?

A: Measurements taken in the two images should be comparable.
Q: What is the important factor to match?

A: Pixel size.

Strategy:¶

Resample one image to match the pixel-size of the other.
Resize an image with a factor determined by the pixel-size.
If volmetric images in 3D, then match the voxel-size.
Crop or pad if necessary to make the image sizes the same.

Examples¶

Q: To match image 2 to image 1, what are the resize ratios in x- and y-directions? A: ratio x = 1.0 / 1.25 = 0.8， ratio y = 1.0 / 1.25 = 0.8, getting image 3.
Q: To match image 1 to image 2, what are the resize ratios in x- and y-directions? A: ratio x = 1.25 / 1.0 = 1.25, ratio y = 1.25 / 1.0 = 1.25, getting image 4.

Remark

As indicatd before, we always want to match the pixel size. Matching the image size may lead to wrong normalization. The last image in the following plot shows the outcome of matching with image size, which makes the images not comparable anymore, even the aspect ratio changed.