CSE 559A: Computer Vision

Use Left/Right PgUp/PgDown to navigate slides

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).
Course Staff: Zhihao Xia, Charlie Wu, Han Liu

Sep 11, 2018

# Story So Far

• Convolutions
• Linear operations on images
• Each pixel of output is linear weighted sum of neighborhood in input
• Same weights for all neighborhoods
• Can be "diagonalized" in the Fourier Domain

But so far, output image size is (approximately) equal to input image size

# Efficient Computation

• Convolution, in the most general case, takes $$O(n_x n_k)$$ time.
• $$n_x = W_xH_x$$, $$n_k = W_kH_k$$.
• Convolution in the frequency domain:
• FFT, point-wise multiply, Inverse FFT
• FFT/IFFT complexity is $$O(n_x \log n_x)$$ (Most efficient for power of 2 image size)
• May be worth it for large kernels
• Or same image convolved with many different kernels

# Efficient Computation

Separable Kernels

$G[n_x,n_y] \propto \exp\left(-\frac{n_x^2+n_y^2}{2\sigma^2}\right) = G_x[n_x]G_y[n_y]$

• $$x-$$ and $$y-$$ derivatives of Gaussian also separable.

• Realize that $$k[n_x,n_y] = k_x[n_x]_yk[n_y] = k_x *_{\tiny \text{full}} k_y$$.

This is by interpreting $$k_x$$ and $$k_y$$ as having size $$W_k \times 1$$ and $$1 \times H_k$$.
• So $$X*k = X*(k_x*k_y) = (X*k_x)*k_y$$. This takes $$W_k+H_k$$ operations instead of $$W_kH_k$$.

• Often if a kernel itself isn't separable, it can be sometimes expressed as a sum of separable kernels.
• E.g., Unsharp Mask: $$(1+\alpha) \delta - \alpha G_\sigma$$ (don't combine!)
• Could also try to do this automatically using SVD.

# Efficient Computation

Recursive Computation

• Sometimes can decompose into convolution with sparse kernels.
• Many implementations of convolve2d won't make use of sparsity.
• But you can write your own.

# Efficient Computation

Smooth and Sub-sample

• Don't smooth and subsample !
• For sub-sampling by two, you're computing 4x as many smooth filter outputs as you need to.
• Similarly, using zero-filling + convolution for upsampling is inefficient.

# Efficient Computation

numpy Specifics

1. In general, prefer algorithms that have lower total number of multiplies / adds.
2. Try to use scipy.signal.convolve2d (subject to rule 1). It is optimized for cache reads, parallel execution, etc.
(I import it often as: from scipy.signal import convovle2d as conv2)
3. Similarly, avoid for loops and use element-wise operations on large arrays, matrix multiplies (np.matmul / np.dot), etc.

• Some of these things are faster in python because a single large operation runs natively instead of returning to the compiler. But they're also faster because these operations are often 'atomics' in lower-level libraries too (BLAS), and have been highly optimized for modern hardware.
• Thinking about designing your algorithm in terms of these atomic operations is useful beyond python.
• Some points in problem sets allocated for efficient code.

# Multi-scale Representations

• $$F[u,v]$$ is intuitively average variation in image at that frequency.
• But averaged across the entire image.
• This isn't useful because images aren't "stationary"
• Different parts of the image, "have different frequencies".

• FT decomposition of different levels (coarse/fine) of variation: but without sense of spatial location.
• Multi-scale representations aim to address this.