CSE 559A: Computer Vision

Use Left/Right PgUp/PgDown to navigate slides

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).

Course Staff: Zhihao Xia, Charlie Wu, Han Liu

Sep 11, 2018

- Convolutions
- Linear operations on images
- Each pixel of output is linear weighted sum of neighborhood in input
- Same weights for all neighborhoods

- Can be "diagonalized" in the Fourier Domain

But so far, output image size is (approximately) equal to input image size

- Convolution, in the most general case, takes \(O(n_x n_k)\) time.
- \(n_x = W_xH_x\), \(n_k = W_kH_k\).

- Convolution in the frequency domain:
- FFT, point-wise multiply, Inverse FFT
- FFT/IFFT complexity is \(O(n_x \log n_x)\) (Most efficient for power of 2 image size)
- May be worth it for large kernels
- Or same image convolved with many different kernels

**Separable Kernels**

\[G[n_x,n_y] \propto \exp\left(-\frac{n_x^2+n_y^2}{2\sigma^2}\right) = G_x[n_x]G_y[n_y]\]

\(x-\) and \(y-\) derivatives of Gaussian also separable.

- Realize that \(k[n_x,n_y] = k_x[n_x]_yk[n_y] = k_x *_{\tiny \text{full}} k_y\).

This is by interpreting \(k_x\) and \(k_y\) as having size \(W_k \times 1\) and \(1 \times H_k\).

So \(X*k = X*(k_x*k_y) = (X*k_x)*k_y\). This takes \(W_k+H_k\) operations instead of \(W_kH_k\).

- Often if a kernel itself isn't separable, it can be sometimes expressed as a sum of separable kernels.
- E.g., Unsharp Mask: \((1+\alpha) \delta - \alpha G_\sigma\) (don't combine!)

- Could also try to do this automatically using SVD.

**Recursive Computation**

- Sometimes can decompose into convolution with sparse kernels.
- Many implementations of
`convolve2d`

won't make use of sparsity.- But you can write your own.

**Smooth and Sub-sample**

- Don't smooth and subsample !
- For sub-sampling by two, you're computing 4x as many smooth filter outputs as you need to.

- Similarly, using zero-filling + convolution for upsampling is inefficient.

**numpy Specifics**

- In general, prefer algorithms that have lower total number of multiplies / adds.
- Try to use
`scipy.signal.convolve2d`

(subject to rule 1). It is optimized for cache reads, parallel execution, etc.

(I import it often as:`from scipy.signal import convovle2d as conv2`

) - Similarly, avoid
`for`

loops and use element-wise operations on large arrays, matrix multiplies (`np.matmul`

/`np.dot`

), etc.

- Some of these things are faster in python because a single large operation runs natively instead of returning to the compiler. But they're also faster because these operations are often 'atomics' in lower-level libraries too (BLAS), and have been highly optimized for modern hardware.
- Thinking about designing your algorithm in terms of these atomic operations is useful beyond python.

- Some points in problem sets allocated for efficient code.

- \(F[u,v]\) is intuitively average variation in image at that frequency.

- But averaged across the entire image.
- This isn't useful because images aren't "stationary"
- Different parts of the image, "have different frequencies".

- FT decomposition of different levels (coarse/fine) of variation: but without sense of spatial location.
- Multi-scale representations aim to address this.