227-0391-00: Medical Image Analysis
Section 2
Enhancement and Pre-processing
Swiss Federal Institute of Technology Zurich
Eidgenössische Technische Hochschule Zürich
Last Edit Date: 06/27/2025
Disclaimer and Term of Use:
We do not guarantee the accuracy and completeness of the summary content. Some of the course material may not be included, and some of the content in the summary may not be correct. You should use this file properly and legally. We are not responsible for any results from using this file
This personal note is adapted from Professor Ender Konukoglu, Professor Mauricio Reyes, and Doctor Ertunc Erdil. Please contact us to delete this file if you think your rights have been violated.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Intensity normalization¶
Problem source¶
Intensity variations between images coming from the differences in
- Scanners
- Protocols
- Acquisition software
- Reconstruction method
Particularly important for MRI due to lack of absolute intensities
Less of a problem for modalities with absolute intensities, e.g., Hounsfield Unit in CT, hower variations can still occur
Intensity variations can cause adverse effects for the rest of the processing steps
Particularly problematic for machine learning methods. Intensity difference between training and test samples is called domain effect shift. Think of images at night and day
Remains an open problem
In the ideal case
Whenever
Two images are compared, e.g., registration
Two groups are compared, e.g., statistical analysis
A model's parameters are estimated from a set of images, i.e., training and estimated model is applied on a new set of images, i.e., testing
We would like images to have the same intensity "characteristics." Notion of "characteristics" refer to intensity distribution, including global and local intensity and contrast.
Approach 1: Min-max¶
Matching the range, the Min-max normalization is
$$f(j) = \frac{j - j_{\text{min}}}{j_{\text{max}} - j_{\text{min}}} \left( i_{\text{max}} - i_{\text{min}} \right) + i_{\text{min}}$$
where $i$ and $j$ are the intensities of the two images.
Max and min values are sensitive to outliers
Often intensities corresponding to $\epsilon$% and $1 - \epsilon$% of the CDF are used instead, e.g., $\epsilon = 2$
Commonly used in machine-learning based methods as a preprocessing step
Can solve global intensity shift and multiplicative factors
Cannot solve contrast differences
Approach 2: Matching mean and standard deviation¶
Matching second order statistics - mean and standard deviation normalization is
$$ \begin{align} f(j) &= (j - \mu_j) \frac{\sigma_i}{\sigma_j} + \mu_i \\ \mu_i &= \frac{1}{N} \sum_{x=1}^{N} i(x) \\ \sigma_i^2 &= \frac{1}{N-1} \sum_{x=1}^{N} \left( i(x) - \mu_i \right)^2 \end{align} $$
Less sensitive to outliers but still robust statistics may be used for better behavior, e.g., median, robust variance
Often used in machine-learning based methods
Can solve global intensity shift and multiplicative factors
Cannot solve contrast differences
Short summary on approaches 1 and 2
Both min-max and mean-std matching are linear mappings (linear except the bias or linear in the homogeneous coordinates). Both of these transformations cannot change the contrast of an image. In order to be able to change the contrast, we need non-linear transformations in the intensity space.
Approach 3: Histogram normalization with landmarks¶
This kind of piecewise linear mapping (quartiles and min-max as landmarks) can be represented as
$$ f(j) = \begin{cases} (j - j_{\text{beg}}) \frac{i_1 - i_{\text{beg}}}{j_1 - j_{\text{beg}}} + i_{\text{beg}} & j \le j_1 \\ (j - j_1) \frac{i_2 - i_1}{j_2 - j_1} + i_1 & j_1 < j \le j_2 \\ (j - j_2) \frac{i_3 - i_2}{j_3 - j_2} + i_2 & j_2 < j \le j_3 \\ (j - j_3) \frac{i_{\text{end}} - i_3}{j_{\text{end}} - j_3} + i_3 & j_3 < j \\ \end{cases} $$
Matching lanrmarks of image histograms
Piece-wise linear map: non-linear mapping
Different landmarks are possible
- min and max
- mean or median
- quartiles, deciles
- modes
Less senitive to outliers
Widely used
Population-wide used:
- Estimate landmark positions from population data as average
- Map the new image's landmarks to the averages
Can change the contrast because of the non-linearity of the mapping
Matching only the locations of the landmarks does not make the histograms perfectly euqal
We still might have some problems when matching histograms of images with large lesions and healthy brans
- Presence of pathologies will change histograms
- Matching pathology bearing images is hard
- Robust statistics and outlier detection can help
Several things to pay attention to
- Image content
- Histogram matching does not know about image content
- All pixels are considered independent
- No spatial information / correlation between neighboring pixels
- Matching gradients has been proposed
- Object size
- Differences in object size can be a problem
- Perfect histogram match would not be preferred in that case
- Use mode matching if object sizes are different
Problem definition¶
Additive noise
$$J(x) = I(x) + \epsilon(x)$$
$\epsilon(x)$ is often considered to be i.i.d., for example, Gaussian noise $\epsilon ~ \mathcal{N}(0, \sigma)$
Multiplicative noise
$$J(x) = I(x) \epsilon(x)$$
For example, speckle in ultrasound and optical coherence tomography
More complicated, for example, noise in MR magnitute image
$$J(x) = \sqrt{\left( I_r(x) + \epsilon \right)^2 + \left( I_i(x) + \eta \right)^2}$$
$\epsilon, \eta \sim \mathcal{N}(0, \sigma)$, Rician noise
Given the noisy image $J$ remove noise to get $I$
Approach 1: Linear filter / convolutional¶
Convolution is defined as
$$ \begin{align} \tilde{I} &= J \cdot S \\ \tilde{I}(x, y, z) &= \int_r \int_p \int_q J(x-r, y-p, z-q) S(r, p, q) dr~dp~dq \\ \tilde{I}(i, j, k) &= \sum_{r} \sum_{p} \sum_{q} J(i - r, j - p, k - q) S(r, p, q) \end{align} $$
where $S$ is the structural element (kernel), $J$ is the pixel from the image we want to denoise, and $\tilde{I}$ is the denoised image.
Extremely efficient to apply
Easy to implement
Blurs edges
Not context aware
Sensitive to outliers
Remark
It is not possible to remove noise and not blur edges at the same time with a linear filter. We need non-linear filtering for this.
Approach 2: Median filter¶
For each location, rank order all neighboring intensities
Assign the median value as the result
1, 2, 2, 2, 2, 3, 3, 9
Median is 2
Sliding window similar to convolution
Other rank order filters such as min and max, are defined similarly
Non-linear filters
Two rank filters are not commutative: $\text{min}(\text{median}(I)) \neq \text{median}(\text{min}(I))$
Roubustness and edge preservation
Easy and efficient implementation
Does not introduce new intensities: (i) keeping the intensities integer, (ii) applying median filtering to labels - in a post-processing step - will not introduce new labels and remove islands, e.g., segmentation
Patch result
Image content can change - adding and removing structures
Approach 3: Non-local means¶
Very widely used model, very competitive results
Extension of bilateral filtering
Most noise supression methods can be put in the form
$$I(x) = \sum_{y \in \Omega} w(x, y) J(y)$$
where $0 \le w(x, y) \le 1$ and $\sum_{y \in \Omega} w(x, y) = 1$, $\forall x$. For example, we can use mean and Gaussian filter, the weights $w(x, y)$ changes accordingly.
Non-local means uses $w(x, y)$ to assess similarity between image information at $x$ and $y$
$$w(x, y) = \frac{1}{Z(x)} \exp \left\{ - \frac{\| J(\mathcal{N}(x)) - J(\mathcal{N}(y)) \|_2^2}{h^2} \right\}$$
$Z(x)$ is the normalization constant, $h$ is the degree of filtering and $\mathcal{N}(x)$ neighborhood around $x$.
In the example image above, all the pixels in the circles are neighboring pixels $q_1, q_2, q_3$ corresponding to $\mathcal{N}(x)$, the target pixel is $p$ corresponding to $\mathcal{N}(y)$. If the pixel is similar to the target, then the corresponding weight is higher.
While smoothing point $x$ use areas with similar image content
If noise is random then averaging similar areas should get rid of it
Uses redundancy in the images, i.e., local structures repeat
Increasing $h$ leads to more smoothing
Model parameters
- $h$ - depends on the noise level
- size of $\mathcal{N}(x)$
- size of $W(x)$
The averaging often happens in a small search window $$I(x) = \sum_{y \in W(x)} w(x, y) J(y)$$
Size of $\mathcal{N}(x)$ matters
- $\mathcal{N}(x) = \Omega$ lead to mean filter
- $\mathcal{N}(x) = \{ x \}$ leads to Yaroslavsky's filter, a special case of bilateral filtering
Higher noise supression
Better preservation of edges
May create artifacts, but still better than the others
Important to estimate noise level - different algorithms to do it
Extension to MRI and difussion MRI
Currently the state-of-art filtering technique that is not based on machine learning
Summary
Convolutional Filters | Median Filters | Non-local means |
---|---|---|
|
|
|
Approach 4: Energy-based¶
Remeber the additive model for the noisy image
$$\underbrace{J(x)}_{\text{observed}} = \underbrace{I(x)}_{\text{desired}} + \underbrace{\epsilon(x)}_{\text{noise}}$$
Denoising as an energy minimization problem
$$\hat{I} = \arg \min_{I} \underbrace{\lambda D(I, J)}_{\text{punish all deviation}} + \underbrace{R(I)}_{\text{embed prior image knowledge}}$$
Data consistency
$$D(I, J) = \int (I(x) - J(x))^2 dx = \| I - J \|_2^2$$
Regularization
$$R(I) = \int \| \nabla I \| dx = \| I \|_{TV}$$
$\lambda$ is the allowed noise level in the observed image with $\lambda = \frac{1}{2\sigma^2}$
Total variation aims to get an image with minimal variation.
Probabilistic interpretation¶
The energy-based formulation can be interpreted in a probabilistic model. Remember that
$$\arg \min_{I} \lambda D(I, J) + R(I) = \arg \max_{I} \log P(J|I) + \log P(I)$$
where
$P(J|I)$ is the likelihood
$P(I)$ is the prior
Total vairatin denoising model:
$$ \begin{align} P(J|I) &= \mathcal{N}\left( J; I, \frac{1}{2\lambda} \right) \\ P(I) &\propto \exp\left( - \| \nabla I \| \right) \end{align} $$
Remark
Note that the standard deviation of the likelihood model is linked to the weight in the energy-based formulation
Generality¶
Note that the energy-based or the corresponding probabilistic formulations are very generic and many different models can be case in the same formulation
$$\arg \min_{I} \lambda D(I, J) + R(I) = \arg \max_{I} \log P(J|I) + \log P(I)$$
$D(I, J)$ or $P(J|I)$ encode information how the noise process works. It does not have to additive Gaussian noise but it can be different things as we have seen before.
$R(I)$ or $P(I)$ encode out prior information on what type of images are plausible
- TV norm: minimal variation
- Wavelet: sparse in a transform domain
- Dictionary learning: composed of sparse atomic elements
- Unsupervised learning: probabilistic models learned from data
- Adversarial learning: comparing distribution wise
Bias correction¶
Problem source¶
Acquisition artifact
- Field inhomogeneity
- Electrical characteristics of the tissue
- Poor uniformity of coils
- Gradients and eddy currents
Most MRI has it
Minimal effect on visual interpretation, difficult to see by eyes
Adverse affects on algorithms, especially segmentation
Non-linear effect on intensities that vary spatially
Remove it as a pre-processing step
Bias field¶
Smoothly varying
Does not depend on the underlying tissue to the first order approximation
In reality, it also depends on the tissue but this effect is mostly ignored
Approximate mathematical model
$$j = b\cdot i + n$$
- $i$: real image
- $b$: bias field
- $n$: noise
- $j$: observed image
Multiplicative factor
Basic approaches to correct it¶
At hardware side with extra measurements
- Ideally a perfect solution
- Additional measurements makes it hard for retrospective data analysis and usually not integrated
- Post-acquisition methods are very useful
Manually placed landmarks
- Volxels expected to have similar intensity
- Human in the loop - a very good solution
- Need for human interaction
Joint segmentation
- Same structure should have the same intensity in all voxels
- If you know what you expect in the image
- Geared to get best segmentation accuracy
- Need for prior information and too specific
N3 algorithm
- Generic
- Most popular solution
N3 algorithm¶
Principle¶
Starting from the main model
$$\tilde{j}(x) = \tilde{b}(x) \tilde{i}(x) + \tilde{n}(x)$$
Let us focus on the noise free case. Taking the logarithms of both sides to make effect of the bias field additive
$$j(x) = b(x) + i(x)$$
Goal
N3 estimates $b(x)$ at every point in the image. Bias removed image can be immediately computed by $i(x) = j(x) - b(x)$
Key point
N3 analyzes the distribution of $j$, $b$, and $i$ to understand the effect of the bias term
Assumptions
Independence between pixels when considering distributions of $j$, $b$, and $i$
Bias field and the image intensities are independent
$j(x) = b(x) + i(x)$ is a sum of random variables with pdf denoted as $J(j)$, $B(b)$, and $I(i)$. The probability distribution $J$ is the following convolution:
$$J(j) = \int I(j - b) B(b) db = I \times B$$
What can we say about $B$?
$\tilde{j}(x) = \tilde{i}(x) \tilde{b}(x)$
$\tilde{b}$ is around 1, so $b$ is around $0$
$B$ can be approximated with a Gaussian with small standard deviation
$J$ is a smoothed version of $I$
The bias field is a smooth field - $b$ is smooth
Sums of random variables¶
Let $a$ and $b$ be two independent random variables with pdfs denoted by $f_a(a)$ and $f_b(b)$. Let us define another random variable
$$c = a + b$$
We compute this using cdf of $c$, i.e., $F_c(c')$
$$ \begin{align} P(c \le c') &= F_c(c') = \int_{-\infty}^{c'} f_c(c) dc \\ &= \int_{-\infty}^{\infty} \int_{\infty}^{c'-b} f_a(a) f_b(b) dadb \end{align} $$
Then the PDF can be computed by taking the derivative $dF_c(c') / dc'$, using Leibnitz' rule twice
$$ \begin{align} f_c(c') &= \frac{d}{dc'} F_c(c') = \frac{d}{dc'} \int_{-\infty}^{\infty} \int_{-\infty}^{c'-b} f_a(a) f_b(b) dadb \\ &= \int_{-\infty}^{\infty} \frac{d}{dc'} \int_{-\infty}^{c'-b} f_a(a) f_b(b) dadb\\ &= \int_{-\infty}^{\infty} f_a(c' - b) f_b(b) db \end{align} $$
Convolution
$c = a + b \Rightarrow f_c = f_a \times f_b$ when $a$ and $b$ are independent random variables
Two components in the problem¶
Given
$$j(x) = b(x) + i(x)$$
denote the probability distributions of $j$, $b$, and $i$ with $J(j)$, $B(b)$, and $I(i)$. The probability distribution $J$ is the following convolution
$$J(j) = \int I(j - b) B(b) db = I \times B$$
The estimation must determine $I(i)$ and $b$ given only $j$. We divide this into two parts
Field estimation: Estimation of $b$ given only $j$ (note that $I(i)$ is only the distribution and not the image, and we are looking for $b$)
$I(i)$ estimation: Estimation of $I(i)$
Always assuming $B$ is a Gaussian with small standard deviation.
Compoonent 1: Field estimation¶
Strategy: Given an intensity value $j$, the probability distributions $I(i)$ and $B(b)$, predict the $i$ that may correspond $j$ on average. Then get the bias field as the difference. We can apply this at every pixel / voxel.
If we are given the distribution $I(i)$ we can estimate the bias field at any point by
$$ \begin{align} \mathbb{E}[i | j] &= \int_{-\infty}^{\infty} i \cdot p(i|j) di \\ \text{Bias field estimate: } b_e(j) &= \mathbb{E}[b | j] = j - E[i | j] \\ \mathbb{E}[i | j] &= \int_{-\infty}^{\infty} i \cdot \frac{p(i, j)}{p(j)} di \\ &= \frac{1}{J(j)} \int_{-\infty}^{\infty} i \cdot p(i, j) di~~~~~~~, b=j-i \\ &= \frac{1}{J(j)} \int_{-\infty}^{\infty} i \cdot p(i) \cdot p(b) di\\ &= \frac{\int_{-\infty}^{\infty} i \cdot B(j - i) \cdot I(i) di}{\int_{-\infty}^{\infty} B(j - i) \cdot I(i) di} \end{align} $$
We know that all the terms in this equation and $i$ is one dimensional, so we can solve it numerically with ease.
Component 2: $I(i)$ estimation¶
Key observation
$J(j)$ is a smoothed version of $I(i)$, and $B(b)$ is roughly Gaussian with a small standard deviation
Approximation
Approximate $I(i)$ by sharpening $J(j)$ assuming $B(b)$ is a Gaussian with small standard deviation
$$J = I \cdot B \Leftrightarrow \mathcal{F}\{J\} = \mathcal{F}\{I\}\mathcal{F}\{B\}$$
Deblurring filter $\hat{F} = \frac{\mathcal{F}\{B\}^*}{ |\mathcal{F}\{B\}|^2 + Z^2}$
$\mathcal{F}\{I\} \approx \hat{F} \mathcal{F}\{J\}$
with $Z$ being a small number $\mathcal{F}\{B\}^*$ denotes the complex conjugate. This is inverse filtering.
Intermediate summary
Estimate of the unbiased image's distribution: $I(i) \approx \mathcal{F}^{-1} \left( \hat{F} \mathcal{F} \{J\} \right)$
Field estimate: $b_e(j) = j - \frac{\int_{-\infty}^{\infty} i \cdot B(j - i) \cdot I(i) di}{\int_{-\infty}^{\infty} B(j - i) \cdot I(i) di}$
Using both we can estimate field at any point $x$ in the image
But there are two things we need to pay attention to - need for smoothing and need for iterations
Need for smoothing¶
There are several points that lead to noise
- $\tilde{j}(x) = \tilde{b}(x) \tilde{i}(x) + \tilde{n}(x)$ - we ignore the noise
- $\hat{F}$ is approximate and so is resulting $I$
- We did not yet use the fact that $b(x)$ should be a smooth field - there is a spatial correlation between the bias field at neighboring pixels.
Solution: Smooth approximaton based on anchor points using splines or radial bais fucntion approximation:
$$b(x) = \sum_{n=1}^{N} b_e(j(x_n)) K(\| x - x_n \|)$$
with $K$ smoothing kernel such as $\exp\left( \| x - x_n \|_2^2 / \sigma^2 \right)$.
Example
Need for iterations¶
We showed that $J = I \cdot B$ and used $I \approx \mathcal{F}^{-1} (\hat{F} \mathcal{F}\{J\})$
In theory, we do not know $B$ only assume it is a Gaussian
If we use too wide of a $B$ the inverse filtering may fail
Remember that they have to be distributions (> 0). Deconvolution is hard.
Solution: Iterate many times with small Gaussians (small standard deviations)
Not a bas approximation: large Gaussian are convolutions of smaller ones
Summary
Bias correction with N3 is routinely used for MRI
There are variations available, e.g., N4ITK
In the method there are multiple approximations
Multiresolution solutions exist and perform better
May need to run the tool a few times to get better results
Variations in image and pixel-size¶
Problem source¶
What is it?¶
Images may come with different pixel and / or image size
They may have different Field-of-View (FOV)
We may need to compare such "heterogeneous" images
Why does it happen?¶
Differences in acquisitions
In FOV
In pixel size
In which applications does it matter?¶
Comparing measurements
- measurements taken in an image have a correspondence in real life.
- real life measurements rely on pixel size.
- comparisons need to take this into account.
- e.g. longitudinal analysis, population statistics, normative distributions
Registration
- aligning images requires building point correspondence
- building correspondence makes sense if two points show similar area
Machine learning applications
- parameters of a model are learned from a set of images.
- applying the same model to an image with different image/pixel size or FOV requires attention
Effectively important for all applications
Building algorithms that work for heterogeneous images is one of the most challenging tasks
How do we tackle this?¶
Q: What criteria should guide us?
A: Measurements taken in the two images should be comparable.
Q: What is the important factor to match?
A: Pixel size.
Strategy:¶
Resample one image to match the pixel-size of the other.
Resize an image with a factor determined by the pixel-size.
If volmetric images in 3D, then match the voxel-size.
Crop or pad if necessary to make the image sizes the same.
Examples¶
Q: To match image 2 to image 1, what are the resize ratios in x- and y-directions? A: ratio x = 1.0 / 1.25 = 0.8, ratio y = 1.0 / 1.25 = 0.8, getting image 3.
Q: To match image 1 to image 2, what are the resize ratios in x- and y-directions? A: ratio x = 1.25 / 1.0 = 1.25, ratio y = 1.25 / 1.0 = 1.25, getting image 4.
Remark
As indicatd before, we always want to match the pixel size. Matching the image size may lead to wrong normalization. The last image in the following plot shows the outcome of matching with image size, which makes the images not comparable anymore, even the aspect ratio changed.