CSE 559A: Computer Vision

Use Left/Right PgUp/PgDown to navigate slides

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).

Course Staff: Zhihao Xia, Charlie Wu, Han Liu

Sep 6, 2018

- This Friday (and this Friday only):
- Zhihao's Office Hours in Jolley 431 instead of 309.

- Monday Office Hours:
- 5:30-6:30pm, Collaboration Space @ Jolley 217.

- PSET 0 Due Today by 11:59pm
- Any issues with submissions, post on Piazza.

- Convolutions
- Simplest spatial linear operation
- Output at each pixel is a function of a limited number of pixels in the input
- Linear Function
- Same function for different neighborhoods

- Edge & Line Detection: A Stereotypical Vision Algorithm Pipeline
- Use convolutions to detect local image properties (Gradients)
- Apply local non-linear processing to get local features (Edges)
- Aggregate information to find long-range structures (Lines)

**Median Filter / Order Statistics**

\[Y[n] = \text{Median} \{ X[n-n'] \}_{N[n'] = 1}\]

- Neighborhood function \(N[n'] \in \{0,1\}\)
- Often better at removing outliers than convolution.

Source: Wikipedia

- Other ops: \(Y[n] = \text{max / min} \{ X[n-n'] \}_{N[n'] > 0}\)

**Morphological Operations**

- Conducted on binary images (\(X[n] \in \{0,1\}\))
- Erosion: \(~~~Y[n] = \text{AND}~~ \{ X[n-n'] \}_{N[n'] = 1}~~~\)
- Dilation: \(~~~Y[n] = \text{OR}~~\{ X[n-n'] \}_{N[n'] = 1}~~~\)

**Morphological Operations**

- Conducted on binary images (\(X[n] \in \{0,1\}\))
- Erosion: \(~~~Y[n] = \text{AND}~~ \{ X[n-n'] \}_{N[n'] = 1}~~~\) (1 if all neighbors 1)
- Dilation: \(~~~Y[n] = \text{OR}~~\{ X[n-n'] \}_{N[n'] = 1}~~~\) (1 if any neighbor 1)

- Opening: Erosion followed by Dilation
- Closing: Dilation followed by Erosion

See Szeliski Sec 3.3.2

*Guided Bilateral Filter*: \(B[n_1,n_2]\) based on a separate image \(Z[n]\): depth, infra-red, etc.- Far less efficient than convolution
- Filter also has to be computed, normalized, at each output location.
- Efficient Datastructures Possible

- Further Reading:
- Paris et al., SIGGRAPH/CVPR Course on Bilateral Filtering
- Recent work on using this for inference, best paper runner up at ECCV 2016

Barron & Poole, The Fast Bilateral Solver, ECCV 2016.

**Quick Recap: Complex Numbers**

- A complex number \(f = x + j~y\) where \(x\) and \(y\) are scalar numbers.
- \(j = \sqrt{-1}\) (EE convention: we use \(j\) instead of \(i\)
- \(x\) and \(y\) are called the real and imaginary components of \(f\)

Think of \(f\) as a 2-D vector with special definitions of addition, multiplication, etc.

- \((x_1 + j~y_1) + (x_2 + j~y_2) = (x_1 + x_2) + j~(y_1+y_2)\)

- \((x_1 + j~y_1) \times (x_2 + j~y_2) = (x_1x_2 - y_1y_2) + j~(x_2y_1 + x_1y_2)\)

- \((x_1 + j~y_1) \times x_2 = x_1x_2 + j~y_1x_2\)

- Conjugate: \(\overline{x + j~y} = x - j~y = x + j (-y)\)

- Magnitude: \((x + j~y) \times \overline{(x + j~y)} = x^2+y^2\)

**Quick Recap: Complex Numbers**

Euler's Formula

- \(\exp(j \theta) = \cos \theta + j~\sin \theta\)

- \(x+j~y = M \exp(j\ \theta)\)
- \(M = \sqrt{x^2+y^2}, \theta = \tan^{-1}(y,x)\)
- \(\theta\) is called the "phase"

- \(\overline{M\exp(j\theta)} = M \exp(-j\theta)\)

- \((x + j~y)\times \exp(j\theta_0) = M \exp(j (\theta+\theta_0))\)
- Preserves magnitude, adds to phase

- \(\exp(j 0) = 1\)
- \(\exp(jN\pi) = 1\) where \(N\) is an even integer, and \(=-1\) where \(N\) is an odd integer.
- Real in both cases

**The Discrete 2D Fourier Transform**

\[\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

\[\exp(j~\theta) = \cos \theta + j \sin \theta\]

- Defined for a single-channel / grayscale image \(X\).
- \(F\) is a "complex valued" array indexed by integers \(u,v\).

- Each \(F[u,v]\) depends on the intensities at all pixels.

**The Discrete 2D Fourier Transform**

\[\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

\[\exp(j~\theta) = \cos \theta + j \sin \theta\]

- Note that \(F[u,v] = F[u+W,v] = F[u,v+H]\) because of periodicity.

\[\exp\left( -j~2\pi\left(\frac{(u+W)~n_x}{W} + \frac{v~n_y}{H}\right)\right) = \exp\left( -j~2\pi\left(\frac{u~n_x}{W} + n_x + \frac{v~n_y}{H}\right)\right)\]

\[= \exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) - j~2n_x\pi\right) = \exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right) \times exp(-j~2n_x\pi)\]

\[= \exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)\]

**The Discrete 2D Fourier Transform**

\[\mathcal{F}[X] = F[u,v] = \frac{1}{WH} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~X[n_x,n_y]~\exp\left( -j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

\[\exp(j~\theta) = \cos \theta + j \sin \theta\]

- Note that \(F[u,v] = F[u+W,v] = F[u,v+H]\) because of periodicity.
- Therefore, we typically store \(F[u,v]\) for \(u \in \{0,\ldots, W-1\}\), \(v \in \{0,\ldots, H-1\}\).
- Can think of \(F[u,v]\) as a complex-valued "image" with the same number of pixels as \(X\).

Can be implemented fairly efficiently using the FFT algorithm: \(O(n\log n)\)

(often, FFT is used to refer to the operation itself).

**The Discrete 2D Fourier Transform Pair**

\[\mathcal{F}^{-1}[F] = X[n_x,n_y] = \sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~\exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

- If \(X\) is real-valued, \(F[-u,-v] = F[W-u,H-v] = \bar{F}[u,v]\), where \(\bar{F}\) implies complex conjugate.
- \(F[0,0]\) is often called the DC component. It is the average intensity of \(X\). It is real if \(X\) is real.
- Only \(WH\) independent "numbers" in \(F[u,v]\) (counting real and imaginary separately) if \(X\) is real.
- Parseval's Theorem: (energy preserving upto constant factor) \[\sum_{u,v} \|F[u,v]\|^2 = \sum_{u,v} F[u,v]\bar{F}[u,v] = \frac{1}{WH}\sum_{n_x,n_y} \|X[n_x,n_y]\|^2\]

**The Discrete 2D Fourier Transform Pair**

\[F'[u,v] = F[u,v] \times \exp\left(-j~2\pi\left(\frac{u~t_x}{W} + \frac{v~t_y}{H}\right)\right)\]

\[\mathcal{F}^{-1}[F'] = ~~~\color{red}{?}~~~ = \sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F'[u,v]~\exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

for a fixed integers \(t_x\), \(t_y\)

**The Discrete 2D Fourier Transform Pair**

\[F'[u,v] = F[u,v] \times \exp\left(-j~2\pi\left(\frac{u~t_x}{W} + \frac{v~t_y}{H}\right)\right)\]

\[\mathcal{F}^{-1}[F'] = X[n_x+t_x,n_y+t_y] = \sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F'[u,v]~\exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

for a fixed integers \(t_x\), \(t_y\)

A change in the phase of the Fourier coefficients, that is linear in \(u,v\), leads to a translation in the image.

**DFT as a Co-ordinate Transform**

\[F[u,v] = \frac{1}{\sqrt{WH}} \sum_{n_x=0}^{W-1}\sum_{n_y=0}^{H-1}~~\bar{S}_{uv}[n_x,n_y]~X[n_x,n_y]\]

where each \(S_{uv}\) can be thought of as a different (complex-valued) image:

\[S_{uv}[n_x,n_y] = \frac{1}{\sqrt{WH}} \exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)\]

**DFT as a Co-ordinate Transform**

\[F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle\]

where each \(S_{uv}\) can be thought of as a different (complex-valued) image:

\[S_{uv}[n_x,n_y] = \frac{1}{\sqrt{WH}} \exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)\]

\(F[u,v]\) is the inner-product between \(X\) and \(S_{uv}\). (scaled by \(\sqrt{WH}\))

For \(x,y \in \mathbb{C}^n\), \(\langle x,y\rangle = x^*y\)

- \(x^*\) is the Hermitian of \(x\)
- Transpose + Conjugate (transpose the vector, and take conjugate of each entry)

- \(x^*y = \sum_i \bar{x}_iy_i\)

**DFT as a Co-ordinate Transform**

\[F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle\]

where each \(S_{uv}\) can be thought of as a different (complex-valued) image:

\[S_{uv}[n_x,n_y] = \frac{1}{\sqrt{WH}} \exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right)\right)\]

\(F[u,v]\) is the inner-product between \(X\) and \(S_{uv}\). (scaled by \(\sqrt{WH}\))

**DFT as a Co-ordinate Transform**

\[F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle\]

where each \(S_{uv}\) can be thought of as a different (complex-valued) image:

\(F[u,v]\) is the inner-product between \(X\) and \(S_{uv}\). (scaled by \(\sqrt{WH}\))

**Property**: \(\langle S_{uv}, S_{u'v'} \rangle = 1 ~\text{if}~u'=u~\&~v'=v,~\text{and}~0~\text{otherwise}.\)

**Inverse-DFT:** \[X[n_x,n_y] = \sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~\exp\left( j~2\pi\left(\frac{u~n_x}{W} + \frac{v~n_y}{H}\right) \right)\]

**DFT as a Co-ordinate Transform**

\[F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle\]

where each \(S_{uv}\) can be thought of as a different (complex-valued) image:

\(F[u,v]\) is the inner-product between \(X\) and \(S_{uv}\). (scaled by \(\sqrt{WH}\))

**Property**: \(\langle S_{uv}, S_{u'v'} \rangle = 1 ~\text{if}~u'=u~\&~v'=v,~\text{and}~0~\text{otherwise}.\)

**Inverse-DFT:** \[X = \sqrt{WH}~~\sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~~S_{uv}\]

\(X\) is a weighted sum of the \(S_{uv}\) images, weights are given by \(\sqrt{WH}F[u,v]\).

**DFT as a Co-ordinate Transform**

\[F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle, \qquad X = \sqrt{WH}~~\sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~~S_{uv}\]

\[\langle S_{uv}, S_{u'v'} \rangle = 1 ~\text{if}~u'=u~\&~v'=v,~\text{and}~0~\text{otherwise}.\]

**DFT as a Co-ordinate Transform**

\[F[u,v] = \frac{1}{\sqrt{WH}} \Big\langle~~S_{uv},~X~~\Big\rangle, \qquad X = \sqrt{WH}~~\sum_{u=0}^{W-1}\sum_{v=0}^{H-1}~~F[u,v]~~S_{uv}\]

\[\langle S_{uv}, S_{u'v'} \rangle = 1 ~\text{if}~u'=u~\&~v'=v,~\text{and}~0~\text{otherwise}.\]

**DFT as a Co-ordinate Transform**

\[F = \frac{1}{\sqrt{WH}} S^* X,\qquad X = \sqrt{WH}~S~F\]

\(S\) is a \(WH \times WH\) matrix with each column a different \(S_{uv}\).

So, \(SS^* = S^*S = I \Rightarrow S^{-1} = S^*\).

- This means \(S\) is a unitary matrix.
- Multiplication by \(S\) is a co-ordinate transform:
- \(X\) are the co-ordinates of a point in a \(WH\) dimensional space.
- Multiplication by \(S^*\) changes the 'co-ordinate system'.
- In the new co-ordinate system, each 'dimension' now corresponds to frequency rather than location.
- \(S\) is a length-preserving matrix (\(\|S^*X\|^2 = \|X\|^2\)).
- It does rotations or reflections (in \(WH\) dimensional space).

The FT gave us a different representation for images.

Decomposing image into different frequency 'components'.

What else ?

\[Y = X * k \Rightarrow Y = A_k X\]

\(A_k\) is not square for valid / long convolution.

**Question**:

Let \(Y=A_k~X\) correspond to \(Y = X *_{\tiny \text{valid}} k\). Now, let \(X' = A_k^T Y\). How is \(X'\) related to \(Y\) by convolution ?

What operation does \(A_k^T\) represent ?

A: Full convolution with \(k[-n_x,-n_y]\) (flipped version of \(k\))

\[Y = X * k \Rightarrow Y = A_k X\]

Now if we consider the square \(A_k\) matrix corresponding to 'same' convolution with circular padding, i.e. padding as \(X[W+n_x,n_y] = X[n_x,n_y]\), \(X[n_x,-n_y] = X[n_x,H-n_y]\), etc.

**Then, \(A_k\) is diagonalized by the Fourier Transform !**

\[A_k = S~D_k~S^*\]

- Here, \(D_k\) is a diagonal matrix.
- The above equation holds for every \(A_k\)
- You get different diagonal matrices \(D_k\).
- But \(S\) is the diagonalizing basis for all kernels.

- In the Fourier co-ordinate system, convolution is a 'point-wise' operation !

\[Y = A_k X = S~~D_k~~S^*~X \Rightarrow (S^*Y) = D_k (S^*X)\]

Why does this happen ?

- \(X = \sqrt{WH~}~ \sum_{u,v} F[u,v] S_{uv}\)
- \(Y = X * k = \sqrt{WH~}~ \sum_{u,v} F[u,v] S_{uv} * k\) (by linearity / distributivity)

- \((S_{uv} * k)[n] = \sum_{n'} k[n'] S_{uv}[n-n']\)

- \(S_{uv}[n-n']\), assuming circular padding, is also a sinusoid with the same frequency \((u,v)\) and magnitude, but different phase.
- Multiplying by \(k[n']\) changes the magnitude, but frequency still the same.
- Adding different sinusoids of the same frequency gives you another sinusoid of the same frequency.
- \((S_{uv} * k)[n_x,n_y] = d_{uv:k}~S_{uv}[n_x,n_y]\), where \(d_{uv:k}\) is some complex scalar.

*Sinusoids are eigen-functions of convolution*

\[Y = X * k = \sqrt{WH~}~ \sum_{u,v} F[u,v] S_{uv} * k = \sqrt{WH~}~ \sum_{u,v} \big(F[u,v]~d_{uv:k}\big)~S_{uv}\]

\[A_k = S~D_k~S^*\]

- What's more, the diagonal elements of \(D_k\) are the \((W_x \times W_y)\) Fourier transform of \(k\).

\[D_k = \text{diag}\Bigg(\frac{1}{\sqrt{WH}} S^*k \Bigg)\]

- This is the convolution theorem.
- Computational advantage for performing (and inverting!) convolution, albeit under circular padding.
- Good way of analyzing what a kernel is doing by looking at its Fourier transform.

- Why did we use complex numbers ? Like quaternions in Graphics, for convenience!
- If we used real number co-ordinate transform, convolution would convert to several \(2\times 2\) transforms on pairs of co-ordinates.
- Complex numbers are just a way of grouping these pairs into a single 'number'.