CSE 559A: Computer Vision


Use Left/Right PgUp/PgDown to navigate slides

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).
Course Staff: Zhihao Xia, Charlie Wu, Han Liu

http://www.cse.wustl.edu/~ayan/courses/cse559a/

Sep 6, 2018

Office Hours

  • This Friday (and this Friday only):
    • Zhihao's Office Hours in Jolley 431 instead of 309.
  • Monday Office Hours:
    • 5:30-6:30pm, Collaboration Space @ Jolley 217.


  • PSET 0 Due Today by 11:59pm
    • Any issues with submissions, post on Piazza.

Last Time

  • Convolutions
    • Simplest spatial linear operation
    • Output at each pixel is a function of a limited number of pixels in the input
    • Linear Function
    • Same function for different neighborhoods


  • Edge & Line Detection: A Stereotypical Vision Algorithm Pipeline
    • Use convolutions to detect local image properties (Gradients)
    • Apply local non-linear processing to get local features (Edges)
    • Aggregate information to find long-range structures (Lines)

Other Neighborhood Operations

Median Filter / Order Statistics

\[Y[n] = \text{Median} \{ X[n-n'] \}_{N[n'] = 1}\]

  • Neighborhood function \(N[n'] \in \{0,1\}\)
  • Often better at removing outliers than convolution.




Source: Wikipedia

  • Other ops: \(Y[n] = \text{max / min} \{ X[n-n'] \}_{N[n'] > 0}\)

Other Neighborhood Operations

Morphological Operations

  • Conducted on binary images (\(X[n] \in \{0,1\}\))
  • Erosion: \(~~~Y[n] = \text{AND}~~ \{ X[n-n'] \}_{N[n'] = 1}~~~\)
  • Dilation: \(~~~Y[n] = \text{OR}~~\{ X[n-n'] \}_{N[n'] = 1}~~~\)

Other Neighborhood Operations

Morphological Operations

  • Conducted on binary images (\(X[n] \in \{0,1\}\))
  • Erosion: \(~~~Y[n] = \text{AND}~~ \{ X[n-n'] \}_{N[n'] = 1}~~~\) (1 if all neighbors 1)
  • Dilation: \(~~~Y[n] = \text{OR}~~\{ X[n-n'] \}_{N[n'] = 1}~~~\) (1 if any neighbor 1)
  • Opening: Erosion followed by Dilation
  • Closing: Dilation followed by Erosion


See Szeliski Sec 3.3.2

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Bilateral Filtering

Fourier Transform

Quick Recap: Complex Numbers

  • A complex number \(f = x + j~y\) where \(x\) and \(y\) are scalar numbers.
    • \(j = \sqrt{-1}\) (EE convention: we use \(j\) instead of \(i\)
    • \(x\) and \(y\) are called the real and imaginary components of \(f\)


Think of \(f\) as a 2-D vector with special definitions of addition, multiplication, etc.

  • \((x_1 + j~y_1) + (x_2 + j~y_2) = (x_1 + x_2) + j~(y_1+y_2)\)
  • \((x_1 + j~y_1) \times (x_2 + j~y_2) = (x_1x_2 - y_1y_2) + j~(x_2y_1 + x_1y_2)\)
  • \((x_1 + j~y_1) \times x_2 = x_1x_2 + j~y_1x_2\)
  • Conjugate: \(\overline{x + j~y} = x - j~y = x + j (-y)\)
  • Magnitude: \((x + j~y) \times \overline{(x + j~y)} = x^2+y^2\)

Fourier Transform

Quick Recap: Complex Numbers

Euler's Formula

  • \(\exp(j \theta) = \cos \theta + j~\sin \theta\)
  • \(x+j~y = M \exp(j\ \theta)\)
    • \(M = \sqrt{x^2+y^2}, \theta = \tan^{-1}(y,x)\)
    • \(\theta\) is called the "phase"
  • \(\overline{M\exp(j\theta)} = M \exp(-j\theta)\)
  • \((x + j~y)\times \exp(j\theta_0) = M \exp(j (\theta+\theta_0))\)
    • Preserves magnitude, adds to phase
  • \(\exp(j 0) = 1\)
  • \(\exp(jN\pi) = 1\) where \(N\) is an even integer, and \(=-1\) where \(N\) is an odd integer.
    • Real in both cases

Fourier Transform

The Discrete 2D Fourier Transform

\[\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

\[\exp(j~\theta) = \cos \theta + j \sin \theta\]

  • Defined for a single-channel / grayscale image \(X\).
  • \(F\) is a "complex valued" array indexed by integers \(u,v\).
  • Each \(F[u,v]\) depends on the intensities at all pixels.

Fourier Transform

The Discrete 2D Fourier Transform

\[\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

\[\exp(j~\theta) = \cos \theta + j \sin \theta\]

  • Note that \(F[u,v] = F[u+W,v] = F[u,v+H]\) because of periodicity.

\[\exp\left( -j~2\pi\left(\frac{(u+W)~n_x}{W} + \frac{v~n_y}{H}\right)\right) = \exp\left( -j~2\pi\left(\frac{u~n_x}{W} + n_x + \frac{v~n_y}{H}\right)\right)\]

\[= \exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) - j~2n_x\pi\right) = \exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right) \times exp(-j~2n_x\pi)\]

\[= \exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)\]

Fourier Transform

The Discrete 2D Fourier Transform

\[\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

\[\exp(j~\theta) = \cos \theta + j \sin \theta\]

  • Note that \(F[u,v] = F[u+W,v] = F[u,v+H]\) because of periodicity.
  • Therefore, we typically store \(F[u,v]\) for \(u \in \{0,\ldots, W-1\}\), \(v \in \{0,\ldots, H-1\}\).
  • Can think of \(F[u,v]\) as a complex-valued "image" with the same number of pixels as \(X\).

Can be implemented fairly efficiently using the FFT algorithm: \(O(n\log n)\)
(often, FFT is used to refer to the operation itself).

Fourier Transform

The Discrete 2D Fourier Transform Pair

\[\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

\[\mathcal{F}^{-1}[F] = X[n_x,n_y] = \sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~\exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]


  • If \(X\) is real-valued, \(F[-u,-v] = F[W-u,H-v] = \bar{F}[u,v]\), where \(\bar{F}\) implies complex conjugate.
  • \(F[0,0]\) is often called the DC component. It is the average intensity of \(X\). It is real if \(X\) is real.
  • Only \(WH\) independent "numbers" in \(F[u,v]\) (counting real and imaginary separately) if \(X\) is real.
  • Parseval's Theorem: (energy preserving upto constant factor) \[\sum_{u,v} \|F[u,v]\|^2 = \sum_{u,v} F[u,v]\bar{F}[u,v] = \frac{1}{WH}\sum_{n_x,n_y} \|X[n_x,n_y]\|^2\]

Fourier Transform

The Discrete 2D Fourier Transform Pair

\[\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

\[F'[u,v] = F[u,v] \times \exp\left(-j~2\pi\left(\frac{u~t_x}{W} + \frac{v~t_y}{H}\right)\right)\]

\[\mathcal{F}^{-1}[F'] = ~~~\color{red}{?}~~~ = \sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F'[u,v]~\exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

for a fixed integers \(t_x\), \(t_y\)

Fourier Transform

The Discrete 2D Fourier Transform Pair

\[\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

\[F'[u,v] = F[u,v] \times \exp\left(-j~2\pi\left(\frac{u~t_x}{W} + \frac{v~t_y}{H}\right)\right)\]

\[\mathcal{F}^{-1}[F'] = X[n_x+t_x,n_y+t_y] = \sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F'[u,v]~\exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

for a fixed integers \(t_x\), \(t_y\)

A change in the phase of the Fourier coefficients, that is linear in \(u,v\), leads to a translation in the image.

Fourier Transform

DFT as a Co-ordinate Transform

\[F[u,v] = \frac{1}{\sqrt{WH}} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~\bar{S}_{uv}[n_x,n_y]~X[n_x,n_y]\]

 

where each \(S_{uv}\) can be thought of as a different (complex-valued) image:

\[S_{uv}[n_x,n_y] = \frac{1}{\sqrt{WH}} \exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)\]

Fourier Transform

DFT as a Co-ordinate Transform

\[F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle\]

where each \(S_{uv}\) can be thought of as a different (complex-valued) image:

\[S_{uv}[n_x,n_y] = \frac{1}{\sqrt{WH}} \exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)\]

\(F[u,v]\) is the inner-product between \(X\) and \(S_{uv}\). (scaled by \(\sqrt{WH}\))


For \(x,y \in \mathbb{C}^n\), \(\langle x,y\rangle = x^*y\)

  • \(x^*\) is the Hermitian of \(x\)
    • Transpose + Conjugate (transpose the vector, and take conjugate of each entry)
  • \(x^*y = \sum_i \bar{x}_iy_i\)

Fourier Transform

DFT as a Co-ordinate Transform

\[F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle\]

where each \(S_{uv}\) can be thought of as a different (complex-valued) image:

\[S_{uv}[n_x,n_y] = \frac{1}{\sqrt{WH}} \exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)\]

\(F[u,v]\) is the inner-product between \(X\) and \(S_{uv}\). (scaled by \(\sqrt{WH}\))

Fourier Transform

DFT as a Co-ordinate Transform

\[F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle\]

where each \(S_{uv}\) can be thought of as a different (complex-valued) image:

\[S_{uv}[n_x,n_y] = \frac{1}{\sqrt{WH}} \exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)\]

\(F[u,v]\) is the inner-product between \(X\) and \(S_{uv}\). (scaled by \(\sqrt{WH}\))

Property: \(\langle S_{uv}, S_{u'v'} \rangle = 1 ~\text{if}~u'=u~\&~v'=v,~\text{and}~0~\text{otherwise}.\)


Inverse-DFT: \[X[n_x,n_y] = \sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~\exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

Fourier Transform

DFT as a Co-ordinate Transform

\[F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle\]

where each \(S_{uv}\) can be thought of as a different (complex-valued) image:

\[S_{uv}[n_x,n_y] = \frac{1}{\sqrt{WH}} \exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)\]

\(F[u,v]\) is the inner-product between \(X\) and \(S_{uv}\). (scaled by \(\sqrt{WH}\))

Property: \(\langle S_{uv}, S_{u'v'} \rangle = 1 ~\text{if}~u'=u~\&~v'=v,~\text{and}~0~\text{otherwise}.\)


Inverse-DFT: \[X = \sqrt{WH}~~\sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~~S_{uv}\]

\(X\) is a weighted sum of the \(S_{uv}\) images, weights are given by \(\sqrt{WH}F[u,v]\).

Fourier Transform

DFT as a Co-ordinate Transform

\[F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle, \qquad X = \sqrt{WH}~~\sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~~S_{uv}\]

\[\langle S_{uv}, S_{u'v'} \rangle = 1 ~\text{if}~u'=u~\&~v'=v,~\text{and}~0~\text{otherwise}.\]

Fourier Transform

DFT as a Co-ordinate Transform

\[F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle, \qquad X = \sqrt{WH}~~\sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~~S_{uv}\]

\[\langle S_{uv}, S_{u'v'} \rangle = 1 ~\text{if}~u'=u~\&~v'=v,~\text{and}~0~\text{otherwise}.\]

Fourier Transform

DFT as a Co-ordinate Transform

\[F = \frac{1}{\sqrt{WH}} S^* X,\qquad X = \sqrt{WH}~S~F\]

\(S\) is a \(WH \times WH\) matrix with each column a different \(S_{uv}\).

So, \(SS^* = S^*S = I \Rightarrow S^{-1} = S^*\).

  • This means \(S\) is a unitary matrix.
  • Multiplication by \(S\) is a co-ordinate transform:
    • \(X\) are the co-ordinates of a point in a \(WH\) dimensional space.
    • Multiplication by \(S^*\) changes the 'co-ordinate system'.
    • In the new co-ordinate system, each 'dimension' now corresponds to frequency rather than location.
    • \(S\) is a length-preserving matrix (\(\|S^*X\|^2 = \|X\|^2\)).
    • It does rotations or reflections (in \(WH\) dimensional space).

Fourier Transform

Fourier Transform

Fourier Transform

Fourier Transform

Fourier Transform

Fourier Transform

Fourier Transform

Fourier Transform

Fourier Transform

Fourier Transform

Fourier Transform

Fourier Transform

Fourier Transform

Fourier Transform

The FT gave us a different representation for images.
Decomposing image into different frequency 'components'.

What else ?

Convolution Theorem

Convolution Theorem

Convolution Theorem

Convolution Theorem

Convolution Theorem

Convolution Theorem

\[Y = X * k \Rightarrow Y = A_k X\]

\(A_k\) is not square for valid / long convolution.

Question:

Let \(Y=A_k~X\) correspond to \(Y = X *_{\tiny \text{valid}} k\). Now, let \(X' = A_k^T Y\). How is \(X'\) related to \(Y\) by convolution ?
What operation does \(A_k^T\) represent ?


A: Full convolution with \(k[-n_x,-n_y]\) (flipped version of \(k\))

Convolution Theorem

\[Y = X * k \Rightarrow Y = A_k X\]

Now if we consider the square \(A_k\) matrix corresponding to 'same' convolution with circular padding, i.e. padding as \(X[W+n_x,n_y] = X[n_x,n_y]\), \(X[n_x,-n_y] = X[n_x,H-n_y]\), etc.

Then, \(A_k\) is diagonalized by the Fourier Transform !

\[A_k = S~D_k~S^*\]

  • Here, \(D_k\) is a diagonal matrix.
  • The above equation holds for every \(A_k\)
    • You get different diagonal matrices \(D_k\).
    • But \(S\) is the diagonalizing basis for all kernels.
  • In the Fourier co-ordinate system, convolution is a 'point-wise' operation !

\[Y = A_k X = S~~D_k~~S^*~X \Rightarrow (S^*Y) = D_k (S^*X)\]

Convolution Theorem

Why does this happen ?

  • \(X = \sqrt{WH~}~ \sum_{u,v} F[u,v] S_{uv}\)
  • \(Y = X * k = \sqrt{WH~}~ \sum_{u,v} F[u,v] S_{uv} * k\) (by linearity / distributivity)
  • \((S_{uv} * k)[n] = \sum_{n'} k[n'] S_{uv}[n-n']\)
  • \(S_{uv}[n-n']\), assuming circular padding, is also a sinusoid with the same frequency \((u,v)\) and magnitude, but different phase.
  • Multiplying by \(k[n']\) changes the magnitude, but frequency still the same.
  • Adding different sinusoids of the same frequency gives you another sinusoid of the same frequency.
  • \((S_{uv} * k)[n_x,n_y] = d_{uv:k}~S_{uv}[n_x,n_y]\), where \(d_{uv:k}\) is some complex scalar.

Sinusoids are eigen-functions of convolution

\[Y = X * k = \sqrt{WH~}~ \sum_{u,v} F[u,v] S_{uv} * k = \sqrt{WH~}~ \sum_{u,v} \big(F[u,v]~d_{uv:k}\big)~S_{uv}\]

Convolution Theorem

\[A_k = S~D_k~S^*\]

  • What's more, the diagonal elements of \(D_k\) are the \((W_x \times W_y)\) Fourier transform of \(k\).

\[D_k = \text{diag}\Bigg(\frac{1}{\sqrt{WH}} S^*k \Bigg)\]

  • This is the convolution theorem.
    • Computational advantage for performing (and inverting!) convolution, albeit under circular padding.
    • Good way of analyzing what a kernel is doing by looking at its Fourier transform.
  • Why did we use complex numbers ? Like quaternions in Graphics, for convenience!
    • If we used real number co-ordinate transform, convolution would convert to several \(2\times 2\) transforms on pairs of co-ordinates.
    • Complex numbers are just a way of grouping these pairs into a single 'number'.

Convolution Theorem

Convolution Theorem

Convolution Theorem

Convolution Theorem

Convolution Theorem

Convolution Theorem

Convolution Theorem

Convolution Theorem

Convolution Theorem

Convolution Theorem

Convolution Theorem