CSE 559A: Computer Vision


Use Left/Right PgUp/PgDown to navigate slides

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).
Course Staff: Zhihao Xia, Charlie Wu, Han Liu

http://www.cse.wustl.edu/~ayan/courses/cse559a/

Sep 25, 2018

General

  • PSET 1 Due 11:59pm today.
  • git commit AND git push (and git pull after that to check)
    • Need to push only once you're done
    • Try to avoid force-adding all files in your directory
      • We've received lots of random files: .DS_store, synctex.gz, ...
  • Late days applied automatically at the end of the semester.


  • PSET 2 out and ready to clone.

Radiance, Irradiance, BRDFs

  • Additional Reference: Forsyth & Ponce: Chapters 4 & 5
  • Less Detailed / Quick: Szeliski Sec 2.2

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Lights

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

For each point, let the set of intensities \(\{I_i\}\) be observed under lights \(\{\ell_i\}\).

Ignore color, assume \(I_i\) is scalar (convert the images to grayscale / R+G+B).

\[I_i = \rho~~\langle \hat{n}, \ell_i\rangle = \rho~~\ell_i^T~\hat{n}\] \[I_i = \rho~~\langle \hat{n}, \ell_i\rangle = \rho~~\ell_i^T~\hat{n} = \ell_i^T~(\rho~\hat{n})\] \[I_i = \rho~~\langle \hat{n}, \ell_i\rangle = \rho~~\ell_i^T~\hat{n} = \ell_i^T~(\rho~\hat{n}) = \ell_i^T~~n\]

Three observations of \(I_i\) with different, linearly independent, \(\ell_i\) will give us \(n\).
Three linear equations in three variables.

Given \(n\), we can factor into \(\rho\) and \(\hat{n}\): \(\rho\) is length of \(\|n\|\), and \(\hat{n} = n / \|n\|\).

Photometric Stereo

But using only three images is unstable: there will be noise, etc. We solve in the "least squares" sense.

Given \(K\) images under \(K\) different lights, for each pixel:

\[\left[\begin{array}{c}\ell_1^T\\\ell_2^T\\\ell_3^T\\\vdots\\\ell_K^T\end{array}\right] n = \left[ \begin{array}{c}I_1\\I_2\\I_3\\ \vdots \\ I_K \end{array}\right] \Rightarrow L~n = I\]

where \(L\) is a \(K\times 3\) matrix, \(I\) is a \(K\times 1\) vector, and \(n\) is a three-vector.

Photometric Stereo

\[n = \arg \min_n \|L~n - I\|^2\] \[n = \arg \min_n \|L~n - I\|^2 = \arg \min_n n^T~(L^TL)~n - 2(L^T~I)^T~n + I^TI\]

Take gradient, set to 0: \(~~~~~(L^TL)~n = (L^T~I)\)

This is actually a \(3\times 3\) equation. Solve using np.lingalg.solve to use Cholesky.

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Photometric Stereo

Some Practical Issues:

  • Even though we assume light at infinity, works well in practice for just far away lights.
  • Calibrate light vector \(\ell\) by looking at an image of some known shape and albedo (typically a matte sphere)
  • Technically only works for Lambertian objects. But often, can make objects Lambertian with polarizers.
  • Also, estimated normals are typically well-defined for a 'valid' set of pixels in the image.
  • You'll create / be given a "mask" of these valid pixels.

Normals to Depth

Normals to Depth

Normals to Depth

Normals to Depth

Normals to Depth

Normals to Depth

Normals to Depth

Normals to Depth

Normals to Depth

Normals to Depth

Normals to Depth

Normals to Depth

\[Z = \arg \min_Z \|g_x - f_x*Z\|^2 + \|g_y - f_y*Z\|^2\]

Normals to Depth

\[Z = \arg \min_Z \|g_x - f_x*Z\|^2 + \|g_y - f_y*Z\|^2 + \lambda R(Z)\]

We'll use \(R(Z)\) as:

\[R(Z) = \sum_n (Z*f_r)[n]^2 ~~~\text{for}~~~ f_r = \begin{array}{|c|c|c|}\hline-1/9&-1/9&-1/9\\\hline-1/9&8/9&-1/9\\\hline-1/9&-1/9&-1/9\\\hline\end{array}\]

Normals to Depth

\[Z = \arg \min_Z \|g_x - f_x*Z\|^2 + \|g_y - f_y*Z\|^2 + \lambda R(Z)\]

Version 1: Do it in the Fourier Domain (called Frankot-Chellappa)

  • Assume that in the masked out regions, \(g_x = g_y = 0\).

\[\mathcal{F}(Z)[u,v] = \frac{\bar{F}_x[u,v]{G}_x[u,v] + \bar{F}_y[u,v]{G}_y[u,v]}{|F_x[u,v]|^2 + |F_y[u,v]|^2 + \lambda |F_r[u,v]|^2}\]

\(\mathcal{F}(Z)\) is FT of depth map, \(G_x\) is FT of \(g_x\), \(F_x\) is (circular padded) FT of \(f_x\), and so on.

  • In general, should add some very small number (e.g., \(10^{-12}\)) to denominator for stability.

  • In particular, what is the denominator for \([u,v] = [0,0]\) ?

  • Both numerator and denominator are 0, because normals tell us nothing about average depth / offset.
  • Explicitly set \(\mathcal{F}(Z)[0,0]\) to 0.

Normals to Depth

\[Z = \arg \min_Z \|g_x - f_x*Z\|^2 + \|g_y - f_y*Z\|^2 + \lambda R(Z)\]

Version 2: Use conjugate gradient.

  • Allows us to use different weights for different pixels.

Normals to Depth

\[Z = \arg \min_Z \sum_n w[n] (g_x[n] - (f_x*Z)[n])^2 + \sum_n w[n] (g_y - (f_y*Z)[n])^2 + \lambda R(Z)\]

Version 2: Use conjugate gradient.

  • Allows us to use different weights for different pixels.
    • Set \(w[n]\) to 0 for masked out pixels.
    • Set \(w[n]\) to \((\hat{n}_z[n])^2\) everywhere else.
      • Accounts for the fact that we got gradients by dividing by \(\hat{n}_z\).
      • Smaller values will be noisier.

Normals to Depth

\[Z = \arg \min_Z \sum_n w[n] (g_x[n] - (f_x*Z)[n])^2 + \sum_n w[n] (g_y - (f_y*Z)[n])^2 + \lambda R(Z)\]

\[Z = \arg \min_Z Z^T Q Z - 2 Z^T~b + c\]

  • Begin with all zeroes guess \(Z_0\) for \(Z\)
  • \(k = 0, r_0 \leftarrow b-Q~Z_0,~~p_0\leftarrow r_0\)
  • Repeat (for say a fixed number of iterations)
    • \(\alpha_k \leftarrow \frac{r_k^Tr_k}{p_k^TQp_k}\)
    • \(Z_{k+1} \leftarrow Z_k + \alpha_k p_k\)
    • \(r_{k+1} \leftarrow r_k - \alpha_k Qp_k\)
    • \(\beta_k \leftarrow \frac{r_{k+1}^Tr_{k+1}}{r_k^Tr_k}\)
    • \(p_{k+1} \leftarrow r_{k+1} + \beta_kp_k\)
    • \(k=k+1\)

Normals to Depth

\[Z = \arg \min_Z \sum_n w[n] (g_x[n] - (f_x*Z)[n])^2 + \sum_n w[n] (g_y - (f_y*Z)[n])^2 + \lambda \sum_n ((f_r*Z)[n])^2\]

\[Z = \arg \min_Z Z^T Q Z - 2 Z^T~b + c\]

We need to figure out:

  • What is \(b\) (should be same shape as image)
  • How do compute \(Q~p\) for a given \(p\) (where \(p\) is same shape as image)

Let \(W\) be a diagonal matrix with values of \(w[n]\). \(F_x\), \(F_y\) and \(F_r\) denote convolutions by \(f_x\), \(f_y\), and \(f_r\).

\[Z = \arg \min (G_x - F_xZ)^TW(G_x - F_xZ) + (G_y - F_yZ)^TW(G_y - F_yZ) + \lambda (F_rZ)^T(F_rZ)\]

What are \(Q\) and \(b\) ?

\[Q = F_x^TWF_x + F_y^TWF_y + \lambda F_r^TF_r\] \[b = F_x^TWG_x + F_y^TWG_y\]

Normals to Depth

\[Z = \arg \min_Z \sum_n w[n] (g_x[n] - (f_x*Z)[n])^2 + \sum_n w[n] (g_y - (f_y*Z)[n])^2 + \sum_n ((f_r*Z)[n])^2\]

\[Z = \arg \min_Z Z^T Q Z - 2 Z^T~b + c\]

  • What is \(b\) (should be same shape as image)
  • How do compute \(Q~p\) for a given \(p\) (where \(p\) is same shape as image)

\[Q = F_x^TWF_x + F_y^TWF_y + \lambda F_r^TF_r\] \[b = F_x^TWG_x + F_y^TWG_y\]

\[Q~p = ((p * f_x) \times w) * \bar{f}_x + ((p * f_y) \times w) * \bar{f}_y + \lambda ((p*f_r)*\bar{f}_r)\]

\(\bar{f}\) mean the flipped versions of \(f\). \(\times\) means element-wise product.
(\(*\) can be same convolutions with zero padding)

\[b = (g_x \times w) * \bar{f}_x + (g_y \times w) * \bar{f}_y\]

  • Remember, \(p^T Qp\) is just \(\langle p, Qp\rangle\). So compute \(Qp\), take element-wise product with \(p\),
    and sum across all pixels.

Photometric Stereo