CSE 559A: Computer Vision


Use Left/Right PgUp/PgDown to navigate slides

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).
Course Staff: Zhihao Xia, Charlie Wu, Han Liu

http://www.cse.wustl.edu/~ayan/courses/cse559a/

Sep 11, 2018

Story So Far

  • Convolutions
    • Linear operations on images
    • Each pixel of output is linear weighted sum of neighborhood in input
    • Same weights for all neighborhoods
  • Can be "diagonalized" in the Fourier Domain

But so far, output image size is (approximately) equal to input image size

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Scale & Aliasing

Efficient Computation

  • Convolution, in the most general case, takes \(O(n_x n_k)\) time.
    • \(n_x = W_xH_x\), \(n_k = W_kH_k\).
  • Convolution in the frequency domain:
    • FFT, point-wise multiply, Inverse FFT
    • FFT/IFFT complexity is \(O(n_x \log n_x)\) (Most efficient for power of 2 image size)
    • May be worth it for large kernels
    • Or same image convolved with many different kernels

Efficient Computation

Separable Kernels

\[G[n_x,n_y] \propto \exp\left(-\frac{n_x^2+n_y^2}{2\sigma^2}\right) = G_x[n_x]G_y[n_y]\]

  • \(x-\) and \(y-\) derivatives of Gaussian also separable.

  • Realize that \(k[n_x,n_y] = k_x[n_x]_yk[n_y] = k_x *_{\tiny \text{full}} k_y\).

    This is by interpreting \(k_x\) and \(k_y\) as having size \(W_k \times 1\) and \(1 \times H_k\).
  • So \(X*k = X*(k_x*k_y) = (X*k_x)*k_y\). This takes \(W_k+H_k\) operations instead of \(W_kH_k\).

  • Often if a kernel itself isn't separable, it can be sometimes expressed as a sum of separable kernels.
  • E.g., Unsharp Mask: \((1+\alpha) \delta - \alpha G_\sigma\) (don't combine!)
  • Could also try to do this automatically using SVD.

Efficient Computation

Recursive Computation

  • Sometimes can decompose into convolution with sparse kernels.
  • Many implementations of convolve2d won't make use of sparsity.
    • But you can write your own.

Efficient Computation

Smooth and Sub-sample

  • Don't smooth and subsample !
  • For sub-sampling by two, you're computing 4x as many smooth filter outputs as you need to.
  • Similarly, using zero-filling + convolution for upsampling is inefficient.

Efficient Computation

numpy Specifics

  1. In general, prefer algorithms that have lower total number of multiplies / adds.
  2. Try to use scipy.signal.convolve2d (subject to rule 1). It is optimized for cache reads, parallel execution, etc.
    (I import it often as: from scipy.signal import convovle2d as conv2)
  3. Similarly, avoid for loops and use element-wise operations on large arrays, matrix multiplies (np.matmul / np.dot), etc.


  • Some of these things are faster in python because a single large operation runs natively instead of returning to the compiler. But they're also faster because these operations are often 'atomics' in lower-level libraries too (BLAS), and have been highly optimized for modern hardware.
  • Thinking about designing your algorithm in terms of these atomic operations is useful beyond python.
  • Some points in problem sets allocated for efficient code.

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

  • \(F[u,v]\) is intuitively average variation in image at that frequency.
  • But averaged across the entire image.
  • This isn't useful because images aren't "stationary"
  • Different parts of the image, "have different frequencies".


  • FT decomposition of different levels (coarse/fine) of variation: but without sense of spatial location.
  • Multi-scale representations aim to address this.

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations

Multi-scale Representations