227-0391-00: Medical Image Analysis
Section 2
Enhancement and Pre-processing
Swiss Federal Institute of Technology Zurich
Eidgenössische Technische Hochschule Zürich
Last Edit Date: 06/27/2025
Disclaimer and Term of Use:
We do not guarantee the accuracy and completeness of the summary content. Some of the course material may not be included, and some of the content in the summary may not be correct. You should use this file properly and legally. We are not responsible for any results from using this file
This personal note is adapted from Professor Ender Konukoglu, Professor Mauricio Reyes, and Doctor Ertunc Erdil. Please contact us to delete this file if you think your rights have been violated.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Intensity normalization¶
Problem source¶
- Intensity variations between images coming from the differences in - Scanners
- Protocols
- Acquisition software
- Reconstruction method
 
- Particularly important for MRI due to lack of absolute intensities 
- Less of a problem for modalities with absolute intensities, e.g., Hounsfield Unit in CT, hower variations can still occur 
- Intensity variations can cause adverse effects for the rest of the processing steps 
- Particularly problematic for machine learning methods. Intensity difference between training and test samples is called domain effect shift. Think of images at night and day 
- Remains an open problem 
In the ideal case
Whenever
- Two images are compared, e.g., registration 
- Two groups are compared, e.g., statistical analysis 
- A model's parameters are estimated from a set of images, i.e., training and estimated model is applied on a new set of images, i.e., testing 
We would like images to have the same intensity "characteristics." Notion of "characteristics" refer to intensity distribution, including global and local intensity and contrast.
Approach 1: Min-max¶
Matching the range, the Min-max normalization is
$$f(j) = \frac{j - j_{\text{min}}}{j_{\text{max}} - j_{\text{min}}} \left( i_{\text{max}} - i_{\text{min}} \right) + i_{\text{min}}$$
where $i$ and $j$ are the intensities of the two images.
- Max and min values are sensitive to outliers 
- Often intensities corresponding to $\epsilon$% and $1 - \epsilon$% of the CDF are used instead, e.g., $\epsilon = 2$ 
- Commonly used in machine-learning based methods as a preprocessing step 
- Can solve global intensity shift and multiplicative factors 
- Cannot solve contrast differences 
Approach 2: Matching mean and standard deviation¶
Matching second order statistics - mean and standard deviation normalization is
$$ \begin{align} f(j) &= (j - \mu_j) \frac{\sigma_i}{\sigma_j} + \mu_i \\ \mu_i &= \frac{1}{N} \sum_{x=1}^{N} i(x) \\ \sigma_i^2 &= \frac{1}{N-1} \sum_{x=1}^{N} \left( i(x) - \mu_i \right)^2 \end{align} $$
- Less sensitive to outliers but still robust statistics may be used for better behavior, e.g., median, robust variance 
- Often used in machine-learning based methods 
- Can solve global intensity shift and multiplicative factors 
- Cannot solve contrast differences 
Short summary on approaches 1 and 2
Both min-max and mean-std matching are linear mappings (linear except the bias or linear in the homogeneous coordinates). Both of these transformations cannot change the contrast of an image. In order to be able to change the contrast, we need non-linear transformations in the intensity space.
Approach 3: Histogram normalization with landmarks¶
This kind of piecewise linear mapping (quartiles and min-max as landmarks) can be represented as
$$ f(j) = \begin{cases} (j - j_{\text{beg}}) \frac{i_1 - i_{\text{beg}}}{j_1 - j_{\text{beg}}} + i_{\text{beg}} & j \le j_1 \\ (j - j_1) \frac{i_2 - i_1}{j_2 - j_1} + i_1 & j_1 < j \le j_2 \\ (j - j_2) \frac{i_3 - i_2}{j_3 - j_2} + i_2 & j_2 < j \le j_3 \\ (j - j_3) \frac{i_{\text{end}} - i_3}{j_{\text{end}} - j_3} + i_3 & j_3 < j \\ \end{cases} $$
- Matching lanrmarks of image histograms 
- Piece-wise linear map: non-linear mapping 
- Different landmarks are possible - min and max
- mean or median
- quartiles, deciles
- modes
 
- Less senitive to outliers 
- Widely used 
- Population-wide used: - Estimate landmark positions from population data as average
- Map the new image's landmarks to the averages
 
- Can change the contrast because of the non-linearity of the mapping 
- Matching only the locations of the landmarks does not make the histograms perfectly euqal 
- We still might have some problems when matching histograms of images with large lesions and healthy brans - Presence of pathologies will change histograms
- Matching pathology bearing images is hard
- Robust statistics and outlier detection can help
 
Several things to pay attention to
- Image content
- Histogram matching does not know about image content
- All pixels are considered independent
- No spatial information / correlation between neighboring pixels
- Matching gradients has been proposed
- Object size
- Differences in object size can be a problem
- Perfect histogram match would not be preferred in that case
- Use mode matching if object sizes are different
Problem definition¶
- Additive noise - $$J(x) = I(x) + \epsilon(x)$$ - $\epsilon(x)$ is often considered to be i.i.d., for example, Gaussian noise $\epsilon ~ \mathcal{N}(0, \sigma)$ 
- Multiplicative noise - $$J(x) = I(x) \epsilon(x)$$ - For example, speckle in ultrasound and optical coherence tomography 
- More complicated, for example, noise in MR magnitute image - $$J(x) = \sqrt{\left( I_r(x) + \epsilon \right)^2 + \left( I_i(x) + \eta \right)^2}$$ - $\epsilon, \eta \sim \mathcal{N}(0, \sigma)$, Rician noise 
- Given the noisy image $J$ remove noise to get $I$ 
Approach 1: Linear filter / convolutional¶
Convolution is defined as
$$ \begin{align} \tilde{I} &= J \cdot S \\ \tilde{I}(x, y, z) &= \int_r \int_p \int_q J(x-r, y-p, z-q) S(r, p, q) dr~dp~dq \\ \tilde{I}(i, j, k) &= \sum_{r} \sum_{p} \sum_{q} J(i - r, j - p, k - q) S(r, p, q) \end{align} $$
where $S$ is the structural element (kernel), $J$ is the pixel from the image we want to denoise, and $\tilde{I}$ is the denoised image.
- Extremely efficient to apply 
- Easy to implement 
- Blurs edges 
- Not context aware 
- Sensitive to outliers 
Remark
It is not possible to remove noise and not blur edges at the same time with a linear filter. We need non-linear filtering for this.
Approach 2: Median filter¶
- For each location, rank order all neighboring intensities 
- Assign the median value as the result - 1, 2, 2, 2, 2, 3, 3, 9 
- Median is 2 
 
- Sliding window similar to convolution 
- Other rank order filters such as min and max, are defined similarly 
- Non-linear filters 
- Two rank filters are not commutative: $\text{min}(\text{median}(I)) \neq \text{median}(\text{min}(I))$ 
- Roubustness and edge preservation 
- Easy and efficient implementation 
- Does not introduce new intensities: (i) keeping the intensities integer, (ii) applying median filtering to labels - in a post-processing step - will not introduce new labels and remove islands, e.g., segmentation 
- Patch result 
- Image content can change - adding and removing structures 
Approach 3: Non-local means¶
- Very widely used model, very competitive results 
- Extension of bilateral filtering 
- Most noise supression methods can be put in the form - $$I(x) = \sum_{y \in \Omega} w(x, y) J(y)$$ - where $0 \le w(x, y) \le 1$ and $\sum_{y \in \Omega} w(x, y) = 1$, $\forall x$. For example, we can use mean and Gaussian filter, the weights $w(x, y)$ changes accordingly. 
- Non-local means uses $w(x, y)$ to assess similarity between image information at $x$ and $y$ - $$w(x, y) = \frac{1}{Z(x)} \exp \left\{ - \frac{\| J(\mathcal{N}(x)) - J(\mathcal{N}(y)) \|_2^2}{h^2} \right\}$$ - $Z(x)$ is the normalization constant, $h$ is the degree of filtering and $\mathcal{N}(x)$ neighborhood around $x$. 
In the example image above, all the pixels in the circles are neighboring pixels $q_1, q_2, q_3$ corresponding to $\mathcal{N}(x)$, the target pixel is $p$ corresponding to $\mathcal{N}(y)$. If the pixel is similar to the target, then the corresponding weight is higher.
- While smoothing point $x$ use areas with similar image content 
- If noise is random then averaging similar areas should get rid of it 
- Uses redundancy in the images, i.e., local structures repeat 
- Increasing $h$ leads to more smoothing 
- Model parameters - $h$ - depends on the noise level
- size of $\mathcal{N}(x)$
- size of $W(x)$
 
- The averaging often happens in a small search window $$I(x) = \sum_{y \in W(x)} w(x, y) J(y)$$ 
- Size of $\mathcal{N}(x)$ matters - $\mathcal{N}(x) = \Omega$ lead to mean filter
- $\mathcal{N}(x) = \{ x \}$ leads to Yaroslavsky's filter, a special case of bilateral filtering
 
- Higher noise supression 
- Better preservation of edges 
- May create artifacts, but still better than the others 
- Important to estimate noise level - different algorithms to do it 
- Extension to MRI and difussion MRI 
- Currently the state-of-art filtering technique that is not based on machine learning 
Summary
| Convolutional Filters | Median Filters | Non-local means | 
|---|---|---|
| 
 | 
 | 
 | 
Approach 4: Energy-based¶
Remeber the additive model for the noisy image
$$\underbrace{J(x)}_{\text{observed}} = \underbrace{I(x)}_{\text{desired}} + \underbrace{\epsilon(x)}_{\text{noise}}$$
Denoising as an energy minimization problem
$$\hat{I} = \arg \min_{I} \underbrace{\lambda D(I, J)}_{\text{punish all deviation}} + \underbrace{R(I)}_{\text{embed prior image knowledge}}$$
- Data consistency - $$D(I, J) = \int (I(x) - J(x))^2 dx = \| I - J \|_2^2$$ 
- Regularization - $$R(I) = \int \| \nabla I \| dx = \| I \|_{TV}$$ 
- $\lambda$ is the allowed noise level in the observed image with $\lambda = \frac{1}{2\sigma^2}$ 
Total variation aims to get an image with minimal variation.
Probabilistic interpretation¶
The energy-based formulation can be interpreted in a probabilistic model. Remember that
$$\arg \min_{I} \lambda D(I, J) + R(I) = \arg \max_{I} \log P(J|I) + \log P(I)$$
where
- $P(J|I)$ is the likelihood 
- $P(I)$ is the prior 
Total vairatin denoising model:
$$ \begin{align} P(J|I) &= \mathcal{N}\left( J; I, \frac{1}{2\lambda} \right) \\ P(I) &\propto \exp\left( - \| \nabla I \| \right) \end{align} $$
Remark
Note that the standard deviation of the likelihood model is linked to the weight in the energy-based formulation
Generality¶
Note that the energy-based or the corresponding probabilistic formulations are very generic and many different models can be case in the same formulation
$$\arg \min_{I} \lambda D(I, J) + R(I) = \arg \max_{I} \log P(J|I) + \log P(I)$$
- $D(I, J)$ or $P(J|I)$ encode information how the noise process works. It does not have to additive Gaussian noise but it can be different things as we have seen before. 
- $R(I)$ or $P(I)$ encode out prior information on what type of images are plausible - TV norm: minimal variation
- Wavelet: sparse in a transform domain
- Dictionary learning: composed of sparse atomic elements
- Unsupervised learning: probabilistic models learned from data
- Adversarial learning: comparing distribution wise
 
Bias correction¶
Problem source¶
- Acquisition artifact - Field inhomogeneity
- Electrical characteristics of the tissue
- Poor uniformity of coils
- Gradients and eddy currents
 
- Most MRI has it 
- Minimal effect on visual interpretation, difficult to see by eyes 
- Adverse affects on algorithms, especially segmentation 
- Non-linear effect on intensities that vary spatially 
- Remove it as a pre-processing step 
Bias field¶
- Smoothly varying 
- Does not depend on the underlying tissue to the first order approximation 
- In reality, it also depends on the tissue but this effect is mostly ignored 
- Approximate mathematical model - $$j = b\cdot i + n$$ - $i$: real image
- $b$: bias field
- $n$: noise
- $j$: observed image
 
- Multiplicative factor 
Basic approaches to correct it¶
- At hardware side with extra measurements - Ideally a perfect solution
- Additional measurements makes it hard for retrospective data analysis and usually not integrated
- Post-acquisition methods are very useful
 
- Manually placed landmarks - Volxels expected to have similar intensity
- Human in the loop - a very good solution
- Need for human interaction
 
- Joint segmentation - Same structure should have the same intensity in all voxels
- If you know what you expect in the image
- Geared to get best segmentation accuracy
- Need for prior information and too specific
 
- N3 algorithm - Generic
- Most popular solution
 
N3 algorithm¶
Principle¶
Starting from the main model
$$\tilde{j}(x) = \tilde{b}(x) \tilde{i}(x) + \tilde{n}(x)$$
Let us focus on the noise free case. Taking the logarithms of both sides to make effect of the bias field additive
$$j(x) = b(x) + i(x)$$
Goal
N3 estimates $b(x)$ at every point in the image. Bias removed image can be immediately computed by $i(x) = j(x) - b(x)$
Key point
N3 analyzes the distribution of $j$, $b$, and $i$ to understand the effect of the bias term
Assumptions
Independence between pixels when considering distributions of $j$, $b$, and $i$
Bias field and the image intensities are independent
$j(x) = b(x) + i(x)$ is a sum of random variables with pdf denoted as $J(j)$, $B(b)$, and $I(i)$. The probability distribution $J$ is the following convolution:
$$J(j) = \int I(j - b) B(b) db = I \times B$$
What can we say about $B$?
- $\tilde{j}(x) = \tilde{i}(x) \tilde{b}(x)$ 
- $\tilde{b}$ is around 1, so $b$ is around $0$ 
- $B$ can be approximated with a Gaussian with small standard deviation 
- $J$ is a smoothed version of $I$ 
- The bias field is a smooth field - $b$ is smooth 
Sums of random variables¶
Let $a$ and $b$ be two independent random variables with pdfs denoted by $f_a(a)$ and $f_b(b)$. Let us define another random variable
$$c = a + b$$
We compute this using cdf of $c$, i.e., $F_c(c')$
$$ \begin{align} P(c \le c') &= F_c(c') = \int_{-\infty}^{c'} f_c(c) dc \\ &= \int_{-\infty}^{\infty} \int_{\infty}^{c'-b} f_a(a) f_b(b) dadb \end{align} $$
Then the PDF can be computed by taking the derivative $dF_c(c') / dc'$, using Leibnitz' rule twice
$$ \begin{align} f_c(c') &= \frac{d}{dc'} F_c(c') = \frac{d}{dc'} \int_{-\infty}^{\infty} \int_{-\infty}^{c'-b} f_a(a) f_b(b) dadb \\ &= \int_{-\infty}^{\infty} \frac{d}{dc'} \int_{-\infty}^{c'-b} f_a(a) f_b(b) dadb\\ &= \int_{-\infty}^{\infty} f_a(c' - b) f_b(b) db \end{align} $$
Convolution
$c = a + b \Rightarrow f_c = f_a \times f_b$ when $a$ and $b$ are independent random variables
Two components in the problem¶
Given
$$j(x) = b(x) + i(x)$$
denote the probability distributions of $j$, $b$, and $i$ with $J(j)$, $B(b)$, and $I(i)$. The probability distribution $J$ is the following convolution
$$J(j) = \int I(j - b) B(b) db = I \times B$$
The estimation must determine $I(i)$ and $b$ given only $j$. We divide this into two parts
- Field estimation: Estimation of $b$ given only $j$ (note that $I(i)$ is only the distribution and not the image, and we are looking for $b$) 
- $I(i)$ estimation: Estimation of $I(i)$ 
Always assuming $B$ is a Gaussian with small standard deviation.
Compoonent 1: Field estimation¶
Strategy: Given an intensity value $j$, the probability distributions $I(i)$ and $B(b)$, predict the $i$ that may correspond $j$ on average. Then get the bias field as the difference. We can apply this at every pixel / voxel.
If we are given the distribution $I(i)$ we can estimate the bias field at any point by
$$ \begin{align} \mathbb{E}[i | j] &= \int_{-\infty}^{\infty} i \cdot p(i|j) di \\ \text{Bias field estimate: } b_e(j) &= \mathbb{E}[b | j] = j - E[i | j] \\ \mathbb{E}[i | j] &= \int_{-\infty}^{\infty} i \cdot \frac{p(i, j)}{p(j)} di \\ &= \frac{1}{J(j)} \int_{-\infty}^{\infty} i \cdot p(i, j) di~~~~~~~, b=j-i \\ &= \frac{1}{J(j)} \int_{-\infty}^{\infty} i \cdot p(i) \cdot p(b) di\\ &= \frac{\int_{-\infty}^{\infty} i \cdot B(j - i) \cdot I(i) di}{\int_{-\infty}^{\infty} B(j - i) \cdot I(i) di} \end{align} $$
We know that all the terms in this equation and $i$ is one dimensional, so we can solve it numerically with ease.
Component 2: $I(i)$ estimation¶
Key observation
$J(j)$ is a smoothed version of $I(i)$, and $B(b)$ is roughly Gaussian with a small standard deviation
Approximation
Approximate $I(i)$ by sharpening $J(j)$ assuming $B(b)$ is a Gaussian with small standard deviation
$$J = I \cdot B \Leftrightarrow \mathcal{F}\{J\} = \mathcal{F}\{I\}\mathcal{F}\{B\}$$
- Deblurring filter $\hat{F} = \frac{\mathcal{F}\{B\}^*}{ |\mathcal{F}\{B\}|^2 + Z^2}$ 
- $\mathcal{F}\{I\} \approx \hat{F} \mathcal{F}\{J\}$ 
with $Z$ being a small number $\mathcal{F}\{B\}^*$ denotes the complex conjugate. This is inverse filtering.
Intermediate summary
Estimate of the unbiased image's distribution: $I(i) \approx \mathcal{F}^{-1} \left( \hat{F} \mathcal{F} \{J\} \right)$
Field estimate: $b_e(j) = j - \frac{\int_{-\infty}^{\infty} i \cdot B(j - i) \cdot I(i) di}{\int_{-\infty}^{\infty} B(j - i) \cdot I(i) di}$
Using both we can estimate field at any point $x$ in the image
But there are two things we need to pay attention to - need for smoothing and need for iterations
Need for smoothing¶
- There are several points that lead to noise - $\tilde{j}(x) = \tilde{b}(x) \tilde{i}(x) + \tilde{n}(x)$ - we ignore the noise
- $\hat{F}$ is approximate and so is resulting $I$
- We did not yet use the fact that $b(x)$ should be a smooth field - there is a spatial correlation between the bias field at neighboring pixels.
 
- Solution: Smooth approximaton based on anchor points using splines or radial bais fucntion approximation: - $$b(x) = \sum_{n=1}^{N} b_e(j(x_n)) K(\| x - x_n \|)$$ - with $K$ smoothing kernel such as $\exp\left( \| x - x_n \|_2^2 / \sigma^2 \right)$. 
- Example 
Need for iterations¶
- We showed that $J = I \cdot B$ and used $I \approx \mathcal{F}^{-1} (\hat{F} \mathcal{F}\{J\})$ 
- In theory, we do not know $B$ only assume it is a Gaussian 
- If we use too wide of a $B$ the inverse filtering may fail - Remember that they have to be distributions (> 0). Deconvolution is hard. 
- Solution: Iterate many times with small Gaussians (small standard deviations) 
- Not a bas approximation: large Gaussian are convolutions of smaller ones 
Summary
Bias correction with N3 is routinely used for MRI
There are variations available, e.g., N4ITK
In the method there are multiple approximations
Multiresolution solutions exist and perform better
May need to run the tool a few times to get better results
Variations in image and pixel-size¶
Problem source¶
What is it?¶
- Images may come with different pixel and / or image size 
- They may have different Field-of-View (FOV) 
- We may need to compare such "heterogeneous" images 
Why does it happen?¶
- Differences in acquisitions 
- In FOV 
- In pixel size 
In which applications does it matter?¶
- Comparing measurements - measurements taken in an image have a correspondence in real life.
- real life measurements rely on pixel size.
- comparisons need to take this into account.
- e.g. longitudinal analysis, population statistics, normative distributions
 
- Registration - aligning images requires building point correspondence
- building correspondence makes sense if two points show similar area
 
- Machine learning applications - parameters of a model are learned from a set of images.
- applying the same model to an image with different image/pixel size or FOV requires attention
 
- Effectively important for all applications 
- Building algorithms that work for heterogeneous images is one of the most challenging tasks 
How do we tackle this?¶
- Q: What criteria should guide us? - A: Measurements taken in the two images should be comparable. 
- Q: What is the important factor to match? - A: Pixel size. 
Strategy:¶
- Resample one image to match the pixel-size of the other. 
- Resize an image with a factor determined by the pixel-size. 
- If volmetric images in 3D, then match the voxel-size. 
- Crop or pad if necessary to make the image sizes the same. 
Examples¶
- Q: To match image 2 to image 1, what are the resize ratios in x- and y-directions? A: ratio x = 1.0 / 1.25 = 0.8, ratio y = 1.0 / 1.25 = 0.8, getting image 3. 
- Q: To match image 1 to image 2, what are the resize ratios in x- and y-directions? A: ratio x = 1.25 / 1.0 = 1.25, ratio y = 1.25 / 1.0 = 1.25, getting image 4. 
Remark
As indicatd before, we always want to match the pixel size. Matching the image size may lead to wrong normalization. The last image in the following plot shows the outcome of matching with image size, which makes the images not comparable anymore, even the aspect ratio changed.