CSE 559A: Computer Vision

Use Left/Right PgUp/PgDown to navigate slides

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).

Course Staff: Zhihao Xia, Charlie Wu, Han Liu

Sep 25, 2018

- PSET 1 Due 11:59pm today.

- git commit
**AND**git push (and git pull after that to check)- Need to push only once you're done
- Try to avoid force-adding all files in your directory
- We've received lots of random files: .DS_store, synctex.gz, ...

- Late days applied automatically at the end of the semester.

- PSET 2 out and ready to clone.

- Additional Reference: Forsyth & Ponce: Chapters 4 & 5
- Less Detailed / Quick: Szeliski Sec 2.2

For each point, let the set of intensities \(\{I_i\}\) be observed under lights \(\{\ell_i\}\).

Ignore color, assume \(I_i\) is scalar (convert the images to grayscale / R+G+B).

\[I_i = \rho~~\langle \hat{n}, \ell_i\rangle = \rho~~\ell_i^T~\hat{n}\] \[I_i = \rho~~\langle \hat{n}, \ell_i\rangle = \rho~~\ell_i^T~\hat{n} = \ell_i^T~(\rho~\hat{n})\] \[I_i = \rho~~\langle \hat{n}, \ell_i\rangle = \rho~~\ell_i^T~\hat{n} = \ell_i^T~(\rho~\hat{n}) = \ell_i^T~~n\]

Three observations of \(I_i\) with different, linearly independent, \(\ell_i\) will give us \(n\).

Three linear equations in three variables.

Given \(n\), we can factor into \(\rho\) and \(\hat{n}\): \(\rho\) is length of \(\|n\|\), and \(\hat{n} = n / \|n\|\).

But using only three images is unstable: there will be noise, etc. We solve in the "least squares" sense.

Given \(K\) images under \(K\) different lights, for each pixel:

\[\left[\begin{array}{c}\ell_1^T\\\ell_2^T\\\ell_3^T\\\vdots\\\ell_K^T\end{array}\right] n = \left[ \begin{array}{c}I_1\\I_2\\I_3\\ \vdots \\ I_K \end{array}\right] \Rightarrow L~n = I\]

where \(L\) is a \(K\times 3\) matrix, \(I\) is a \(K\times 1\) vector, and \(n\) is a three-vector.

\[n = \arg \min_n \|L~n - I\|^2\] \[n = \arg \min_n \|L~n - I\|^2 = \arg \min_n n^T~(L^TL)~n - 2(L^T~I)^T~n + I^TI\]

Take gradient, set to 0: \(~~~~~(L^TL)~n = (L^T~I)\)

This is actually a \(3\times 3\) equation. Solve using `np.lingalg.solve`

to use Cholesky.

Some Practical Issues:

- Even though we assume light at infinity, works well in practice for just far away lights.
- Calibrate light vector \(\ell\) by looking at an image of some known shape and albedo (typically a matte sphere)

- Technically only works for Lambertian objects. But often, can make objects Lambertian with polarizers.

- Also, estimated normals are typically well-defined for a 'valid' set of pixels in the image.
- You'll create / be given a "mask" of these valid pixels.

\[Z = \arg \min_Z \|g_x - f_x*Z\|^2 + \|g_y - f_y*Z\|^2\]

\[Z = \arg \min_Z \|g_x - f_x*Z\|^2 + \|g_y - f_y*Z\|^2 + \lambda R(Z)\]

We'll use \(R(Z)\) as:

\[R(Z) = \sum_n (Z*f_r)[n]^2 ~~~\text{for}~~~ f_r = \begin{array}{|c|c|c|}\hline-1/9&-1/9&-1/9\\\hline-1/9&8/9&-1/9\\\hline-1/9&-1/9&-1/9\\\hline\end{array}\]

\[Z = \arg \min_Z \|g_x - f_x*Z\|^2 + \|g_y - f_y*Z\|^2 + \lambda R(Z)\]

Version 1: Do it in the Fourier Domain (called Frankot-Chellappa)

- Assume that in the masked out regions, \(g_x = g_y = 0\).

\[\mathcal{F}(Z)[u,v] = \frac{\bar{F}_x[u,v]{G}_x[u,v] + \bar{F}_y[u,v]{G}_y[u,v]}{|F_x[u,v]|^2 + |F_y[u,v]|^2 + \lambda |F_r[u,v]|^2}\]

\(\mathcal{F}(Z)\) is FT of depth map, \(G_x\) is FT of \(g_x\), \(F_x\) is (circular padded) FT of \(f_x\), and so on.

In general, should add some very small number (e.g., \(10^{-12}\)) to denominator for stability.

In particular, what is the denominator for \([u,v] = [0,0]\) ?

- Both numerator and denominator are 0, because normals tell us nothing about average depth / offset.
- Explicitly set \(\mathcal{F}(Z)[0,0]\) to 0.

\[Z = \arg \min_Z \|g_x - f_x*Z\|^2 + \|g_y - f_y*Z\|^2 + \lambda R(Z)\]

Version 2: Use conjugate gradient.

- Allows us to use different weights for different pixels.

\[Z = \arg \min_Z \sum_n w[n] (g_x[n] - (f_x*Z)[n])^2 + \sum_n w[n] (g_y - (f_y*Z)[n])^2 + \lambda R(Z)\]

Version 2: Use conjugate gradient.

- Allows us to use different weights for different pixels.
- Set \(w[n]\) to 0 for masked out pixels.
- Set \(w[n]\) to \((\hat{n}_z[n])^2\) everywhere else.
- Accounts for the fact that we got gradients by dividing by \(\hat{n}_z\).
- Smaller values will be noisier.

\[Z = \arg \min_Z \sum_n w[n] (g_x[n] - (f_x*Z)[n])^2 + \sum_n w[n] (g_y - (f_y*Z)[n])^2 + \lambda R(Z)\]

\[Z = \arg \min_Z Z^T Q Z - 2 Z^T~b + c\]

- Begin with all zeroes guess \(Z_0\) for \(Z\)
- \(k = 0, r_0 \leftarrow b-Q~Z_0,~~p_0\leftarrow r_0\)
- Repeat (for say a fixed number of iterations)
- \(\alpha_k \leftarrow \frac{r_k^Tr_k}{p_k^TQp_k}\)
- \(Z_{k+1} \leftarrow Z_k + \alpha_k p_k\)
- \(r_{k+1} \leftarrow r_k - \alpha_k Qp_k\)
- \(\beta_k \leftarrow \frac{r_{k+1}^Tr_{k+1}}{r_k^Tr_k}\)
- \(p_{k+1} \leftarrow r_{k+1} + \beta_kp_k\)
- \(k=k+1\)

\[Z = \arg \min_Z \sum_n w[n] (g_x[n] - (f_x*Z)[n])^2 + \sum_n w[n] (g_y - (f_y*Z)[n])^2 + \lambda \sum_n ((f_r*Z)[n])^2\]

\[Z = \arg \min_Z Z^T Q Z - 2 Z^T~b + c\]

We need to figure out:

- What is \(b\) (should be same shape as image)
- How do compute \(Q~p\) for a given \(p\) (where \(p\) is same shape as image)

Let \(W\) be a diagonal matrix with values of \(w[n]\). \(F_x\), \(F_y\) and \(F_r\) denote convolutions by \(f_x\), \(f_y\), and \(f_r\).

\[Z = \arg \min (G_x - F_xZ)^TW(G_x - F_xZ) + (G_y - F_yZ)^TW(G_y - F_yZ) + \lambda (F_rZ)^T(F_rZ)\]

What are \(Q\) and \(b\) ?

\[Q = F_x^TWF_x + F_y^TWF_y + \lambda F_r^TF_r\] \[b = F_x^TWG_x + F_y^TWG_y\]

\[Z = \arg \min_Z \sum_n w[n] (g_x[n] - (f_x*Z)[n])^2 + \sum_n w[n] (g_y - (f_y*Z)[n])^2 + \sum_n ((f_r*Z)[n])^2\]

\[Z = \arg \min_Z Z^T Q Z - 2 Z^T~b + c\]

- What is \(b\) (should be same shape as image)
- How do compute \(Q~p\) for a given \(p\) (where \(p\) is same shape as image)

\[Q = F_x^TWF_x + F_y^TWF_y + \lambda F_r^TF_r\] \[b = F_x^TWG_x + F_y^TWG_y\]

\[Q~p = ((p * f_x) \times w) * \bar{f}_x + ((p * f_y) \times w) * \bar{f}_y + \lambda ((p*f_r)*\bar{f}_r)\]

\(\bar{f}\) mean the flipped versions of \(f\). \(\times\) means element-wise product.

(\(*\) can be same convolutions with zero padding)

\[b = (g_x \times w) * \bar{f}_x + (g_y \times w) * \bar{f}_y\]

- Remember, \(p^T Qp\) is just \(\langle p, Qp\rangle\). So compute \(Qp\), take element-wise product with \(p\),

and sum across all pixels.