CSE 559A: Computer Vision

Use Left/Right PgUp/PgDown to navigate slides

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).
Course Staff: Zhihao Xia, Charlie Wu, Han Liu

Sep 6, 2018

# Office Hours

• This Friday (and this Friday only):
• Zhihao's Office Hours in Jolley 431 instead of 309.
• Monday Office Hours:
• 5:30-6:30pm, Collaboration Space @ Jolley 217.

• PSET 0 Due Today by 11:59pm
• Any issues with submissions, post on Piazza.

# Last Time

• Convolutions
• Simplest spatial linear operation
• Output at each pixel is a function of a limited number of pixels in the input
• Linear Function
• Same function for different neighborhoods

• Edge & Line Detection: A Stereotypical Vision Algorithm Pipeline
• Use convolutions to detect local image properties (Gradients)
• Apply local non-linear processing to get local features (Edges)
• Aggregate information to find long-range structures (Lines)

# Other Neighborhood Operations

Median Filter / Order Statistics

$Y[n] = \text{Median} \{ X[n-n'] \}_{N[n'] = 1}$

• Neighborhood function $$N[n'] \in \{0,1\}$$
• Often better at removing outliers than convolution.

Source: Wikipedia

• Other ops: $$Y[n] = \text{max / min} \{ X[n-n'] \}_{N[n'] > 0}$$

# Other Neighborhood Operations

Morphological Operations

• Conducted on binary images ($$X[n] \in \{0,1\}$$)
• Erosion: $$~~~Y[n] = \text{AND}~~ \{ X[n-n'] \}_{N[n'] = 1}~~~$$
• Dilation: $$~~~Y[n] = \text{OR}~~\{ X[n-n'] \}_{N[n'] = 1}~~~$$

# Other Neighborhood Operations

Morphological Operations

• Conducted on binary images ($$X[n] \in \{0,1\}$$)
• Erosion: $$~~~Y[n] = \text{AND}~~ \{ X[n-n'] \}_{N[n'] = 1}~~~$$ (1 if all neighbors 1)
• Dilation: $$~~~Y[n] = \text{OR}~~\{ X[n-n'] \}_{N[n'] = 1}~~~$$ (1 if any neighbor 1)
• Opening: Erosion followed by Dilation
• Closing: Dilation followed by Erosion

See Szeliski Sec 3.3.2

# Fourier Transform

Quick Recap: Complex Numbers

• A complex number $$f = x + j~y$$ where $$x$$ and $$y$$ are scalar numbers.
• $$j = \sqrt{-1}$$ (EE convention: we use $$j$$ instead of $$i$$
• $$x$$ and $$y$$ are called the real and imaginary components of $$f$$

Think of $$f$$ as a 2-D vector with special definitions of addition, multiplication, etc.

• $$(x_1 + j~y_1) + (x_2 + j~y_2) = (x_1 + x_2) + j~(y_1+y_2)$$
• $$(x_1 + j~y_1) \times (x_2 + j~y_2) = (x_1x_2 - y_1y_2) + j~(x_2y_1 + x_1y_2)$$
• $$(x_1 + j~y_1) \times x_2 = x_1x_2 + j~y_1x_2$$
• Conjugate: $$\overline{x + j~y} = x - j~y = x + j (-y)$$
• Magnitude: $$(x + j~y) \times \overline{(x + j~y)} = x^2+y^2$$

# Fourier Transform

Quick Recap: Complex Numbers

Euler's Formula

• $$\exp(j \theta) = \cos \theta + j~\sin \theta$$
• $$x+j~y = M \exp(j\ \theta)$$
• $$M = \sqrt{x^2+y^2}, \theta = \tan^{-1}(y,x)$$
• $$\theta$$ is called the "phase"
• $$\overline{M\exp(j\theta)} = M \exp(-j\theta)$$
• $$(x + j~y)\times \exp(j\theta_0) = M \exp(j (\theta+\theta_0))$$
• Preserves magnitude, adds to phase
• $$\exp(j 0) = 1$$
• $$\exp(jN\pi) = 1$$ where $$N$$ is an even integer, and $$=-1$$ where $$N$$ is an odd integer.
• Real in both cases

# Fourier Transform

The Discrete 2D Fourier Transform

$\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)$

$\exp(j~\theta) = \cos \theta + j \sin \theta$

• Defined for a single-channel / grayscale image $$X$$.
• $$F$$ is a "complex valued" array indexed by integers $$u,v$$.
• Each $$F[u,v]$$ depends on the intensities at all pixels.

# Fourier Transform

The Discrete 2D Fourier Transform

$\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)$

$\exp(j~\theta) = \cos \theta + j \sin \theta$

• Note that $$F[u,v] = F[u+W,v] = F[u,v+H]$$ because of periodicity.

$\exp\left( -j~2\pi\left(\frac{(u+W)~n_x}{W} + \frac{v~n_y}{H}\right)\right) = \exp\left( -j~2\pi\left(\frac{u~n_x}{W} + n_x + \frac{v~n_y}{H}\right)\right)$

$= \exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) - j~2n_x\pi\right) = \exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right) \times exp(-j~2n_x\pi)$

$= \exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)$

# Fourier Transform

The Discrete 2D Fourier Transform

$\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)$

$\exp(j~\theta) = \cos \theta + j \sin \theta$

• Note that $$F[u,v] = F[u+W,v] = F[u,v+H]$$ because of periodicity.
• Therefore, we typically store $$F[u,v]$$ for $$u \in \{0,\ldots, W-1\}$$, $$v \in \{0,\ldots, H-1\}$$.
• Can think of $$F[u,v]$$ as a complex-valued "image" with the same number of pixels as $$X$$.

Can be implemented fairly efficiently using the FFT algorithm: $$O(n\log n)$$
(often, FFT is used to refer to the operation itself).

# Fourier Transform

The Discrete 2D Fourier Transform Pair

$\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)$

$\mathcal{F}^{-1}[F] = X[n_x,n_y] = \sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~\exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)$

• If $$X$$ is real-valued, $$F[-u,-v] = F[W-u,H-v] = \bar{F}[u,v]$$, where $$\bar{F}$$ implies complex conjugate.
• $$F[0,0]$$ is often called the DC component. It is the average intensity of $$X$$. It is real if $$X$$ is real.
• Only $$WH$$ independent "numbers" in $$F[u,v]$$ (counting real and imaginary separately) if $$X$$ is real.
• Parseval's Theorem: (energy preserving upto constant factor) $\sum_{u,v} \|F[u,v]\|^2 = \sum_{u,v} F[u,v]\bar{F}[u,v] = \frac{1}{WH}\sum_{n_x,n_y} \|X[n_x,n_y]\|^2$

# Fourier Transform

The Discrete 2D Fourier Transform Pair

$\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)$

$F'[u,v] = F[u,v] \times \exp\left(-j~2\pi\left(\frac{u~t_x}{W} + \frac{v~t_y}{H}\right)\right)$

$\mathcal{F}^{-1}[F'] = ~~~\color{red}{?}~~~ = \sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F'[u,v]~\exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)$

for a fixed integers $$t_x$$, $$t_y$$

# Fourier Transform

The Discrete 2D Fourier Transform Pair

$\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)$

$F'[u,v] = F[u,v] \times \exp\left(-j~2\pi\left(\frac{u~t_x}{W} + \frac{v~t_y}{H}\right)\right)$

$\mathcal{F}^{-1}[F'] = X[n_x+t_x,n_y+t_y] = \sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F'[u,v]~\exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)$

for a fixed integers $$t_x$$, $$t_y$$

A change in the phase of the Fourier coefficients, that is linear in $$u,v$$, leads to a translation in the image.

# Fourier Transform

DFT as a Co-ordinate Transform

$F[u,v] = \frac{1}{\sqrt{WH}} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~\bar{S}_{uv}[n_x,n_y]~X[n_x,n_y]$

where each $$S_{uv}$$ can be thought of as a different (complex-valued) image:

$S_{uv}[n_x,n_y] = \frac{1}{\sqrt{WH}} \exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)$

# Fourier Transform

DFT as a Co-ordinate Transform

$F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle$

where each $$S_{uv}$$ can be thought of as a different (complex-valued) image:

$S_{uv}[n_x,n_y] = \frac{1}{\sqrt{WH}} \exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)$

$$F[u,v]$$ is the inner-product between $$X$$ and $$S_{uv}$$. (scaled by $$\sqrt{WH}$$)

For $$x,y \in \mathbb{C}^n$$, $$\langle x,y\rangle = x^*y$$

• $$x^*$$ is the Hermitian of $$x$$
• Transpose + Conjugate (transpose the vector, and take conjugate of each entry)
• $$x^*y = \sum_i \bar{x}_iy_i$$

# Fourier Transform

DFT as a Co-ordinate Transform

$F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle$

where each $$S_{uv}$$ can be thought of as a different (complex-valued) image:

$S_{uv}[n_x,n_y] = \frac{1}{\sqrt{WH}} \exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)$

$$F[u,v]$$ is the inner-product between $$X$$ and $$S_{uv}$$. (scaled by $$\sqrt{WH}$$)

# Fourier Transform

DFT as a Co-ordinate Transform

$F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle$

where each $$S_{uv}$$ can be thought of as a different (complex-valued) image:

$S_{uv}[n_x,n_y] = \frac{1}{\sqrt{WH}} \exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)$

$$F[u,v]$$ is the inner-product between $$X$$ and $$S_{uv}$$. (scaled by $$\sqrt{WH}$$)

Property: $$\langle S_{uv}, S_{u'v'} \rangle = 1 ~\text{if}~u'=u~\&~v'=v,~\text{and}~0~\text{otherwise}.$$

Inverse-DFT: $X[n_x,n_y] = \sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~\exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)$

# Fourier Transform

DFT as a Co-ordinate Transform

$F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle$

where each $$S_{uv}$$ can be thought of as a different (complex-valued) image:

$S_{uv}[n_x,n_y] = \frac{1}{\sqrt{WH}} \exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)$

$$F[u,v]$$ is the inner-product between $$X$$ and $$S_{uv}$$. (scaled by $$\sqrt{WH}$$)

Property: $$\langle S_{uv}, S_{u'v'} \rangle = 1 ~\text{if}~u'=u~\&~v'=v,~\text{and}~0~\text{otherwise}.$$

Inverse-DFT: $X = \sqrt{WH}~~\sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~~S_{uv}$

$$X$$ is a weighted sum of the $$S_{uv}$$ images, weights are given by $$\sqrt{WH}F[u,v]$$.

# Fourier Transform

DFT as a Co-ordinate Transform

$F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle, \qquad X = \sqrt{WH}~~\sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~~S_{uv}$

$\langle S_{uv}, S_{u'v'} \rangle = 1 ~\text{if}~u'=u~\&~v'=v,~\text{and}~0~\text{otherwise}.$

# Fourier Transform

DFT as a Co-ordinate Transform

$F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle, \qquad X = \sqrt{WH}~~\sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~~S_{uv}$

$\langle S_{uv}, S_{u'v'} \rangle = 1 ~\text{if}~u'=u~\&~v'=v,~\text{and}~0~\text{otherwise}.$

# Fourier Transform

DFT as a Co-ordinate Transform

$F = \frac{1}{\sqrt{WH}} S^* X,\qquad X = \sqrt{WH}~S~F$

$$S$$ is a $$WH \times WH$$ matrix with each column a different $$S_{uv}$$.

So, $$SS^* = S^*S = I \Rightarrow S^{-1} = S^*$$.

• This means $$S$$ is a unitary matrix.
• Multiplication by $$S$$ is a co-ordinate transform:
• $$X$$ are the co-ordinates of a point in a $$WH$$ dimensional space.
• Multiplication by $$S^*$$ changes the 'co-ordinate system'.
• In the new co-ordinate system, each 'dimension' now corresponds to frequency rather than location.
• $$S$$ is a length-preserving matrix ($$\|S^*X\|^2 = \|X\|^2$$).
• It does rotations or reflections (in $$WH$$ dimensional space).

# Fourier Transform

The FT gave us a different representation for images.
Decomposing image into different frequency 'components'.

What else ?

# Convolution Theorem

$Y = X * k \Rightarrow Y = A_k X$

$$A_k$$ is not square for valid / long convolution.

Question:

Let $$Y=A_k~X$$ correspond to $$Y = X *_{\tiny \text{valid}} k$$. Now, let $$X' = A_k^T Y$$. How is $$X'$$ related to $$Y$$ by convolution ?
What operation does $$A_k^T$$ represent ?

A: Full convolution with $$k[-n_x,-n_y]$$ (flipped version of $$k$$)

# Convolution Theorem

$Y = X * k \Rightarrow Y = A_k X$

Now if we consider the square $$A_k$$ matrix corresponding to 'same' convolution with circular padding, i.e. padding as $$X[W+n_x,n_y] = X[n_x,n_y]$$, $$X[n_x,-n_y] = X[n_x,H-n_y]$$, etc.

Then, $$A_k$$ is diagonalized by the Fourier Transform !

$A_k = S~D_k~S^*$

• Here, $$D_k$$ is a diagonal matrix.
• The above equation holds for every $$A_k$$
• You get different diagonal matrices $$D_k$$.
• But $$S$$ is the diagonalizing basis for all kernels.
• In the Fourier co-ordinate system, convolution is a 'point-wise' operation !

$Y = A_k X = S~~D_k~~S^*~X \Rightarrow (S^*Y) = D_k (S^*X)$

# Convolution Theorem

Why does this happen ?

• $$X = \sqrt{WH~}~ \sum_{u,v} F[u,v] S_{uv}$$
• $$Y = X * k = \sqrt{WH~}~ \sum_{u,v} F[u,v] S_{uv} * k$$ (by linearity / distributivity)
• $$(S_{uv} * k)[n] = \sum_{n'} k[n'] S_{uv}[n-n']$$
• $$S_{uv}[n-n']$$, assuming circular padding, is also a sinusoid with the same frequency $$(u,v)$$ and magnitude, but different phase.
• Multiplying by $$k[n']$$ changes the magnitude, but frequency still the same.
• Adding different sinusoids of the same frequency gives you another sinusoid of the same frequency.
• $$(S_{uv} * k)[n_x,n_y] = d_{uv:k}~S_{uv}[n_x,n_y]$$, where $$d_{uv:k}$$ is some complex scalar.

Sinusoids are eigen-functions of convolution

$Y = X * k = \sqrt{WH~}~ \sum_{u,v} F[u,v] S_{uv} * k = \sqrt{WH~}~ \sum_{u,v} \big(F[u,v]~d_{uv:k}\big)~S_{uv}$

# Convolution Theorem

$A_k = S~D_k~S^*$

• What's more, the diagonal elements of $$D_k$$ are the $$(W_x \times W_y)$$ Fourier transform of $$k$$.

$D_k = \text{diag}\Bigg(\frac{1}{\sqrt{WH}} S^*k \Bigg)$

• This is the convolution theorem.
• Computational advantage for performing (and inverting!) convolution, albeit under circular padding.
• Good way of analyzing what a kernel is doing by looking at its Fourier transform.
• Why did we use complex numbers ? Like quaternions in Graphics, for convenience!
• If we used real number co-ordinate transform, convolution would convert to several $$2\times 2$$ transforms on pairs of co-ordinates.
• Complex numbers are just a way of grouping these pairs into a single 'number'.