CSE 559A: Computer Vision

Use Left/Right PgUp/PgDown to navigate slides

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).
Course Staff: Zhihao Xia, Charlie Wu, Han Liu

Sep 25, 2018

# General

• PSET 1 Due 11:59pm today.
• git commit AND git push (and git pull after that to check)
• Need to push only once you're done
• We've received lots of random files: .DS_store, synctex.gz, ...
• Late days applied automatically at the end of the semester.

• PSET 2 out and ready to clone.

• Additional Reference: Forsyth & Ponce: Chapters 4 & 5
• Less Detailed / Quick: Szeliski Sec 2.2

# Photometric Stereo

For each point, let the set of intensities $$\{I_i\}$$ be observed under lights $$\{\ell_i\}$$.

Ignore color, assume $$I_i$$ is scalar (convert the images to grayscale / R+G+B).

$I_i = \rho~~\langle \hat{n}, \ell_i\rangle = \rho~~\ell_i^T~\hat{n}$ $I_i = \rho~~\langle \hat{n}, \ell_i\rangle = \rho~~\ell_i^T~\hat{n} = \ell_i^T~(\rho~\hat{n})$ $I_i = \rho~~\langle \hat{n}, \ell_i\rangle = \rho~~\ell_i^T~\hat{n} = \ell_i^T~(\rho~\hat{n}) = \ell_i^T~~n$

Three observations of $$I_i$$ with different, linearly independent, $$\ell_i$$ will give us $$n$$.
Three linear equations in three variables.

Given $$n$$, we can factor into $$\rho$$ and $$\hat{n}$$: $$\rho$$ is length of $$\|n\|$$, and $$\hat{n} = n / \|n\|$$.

# Photometric Stereo

But using only three images is unstable: there will be noise, etc. We solve in the "least squares" sense.

Given $$K$$ images under $$K$$ different lights, for each pixel:

$\left[\begin{array}{c}\ell_1^T\\\ell_2^T\\\ell_3^T\\\vdots\\\ell_K^T\end{array}\right] n = \left[ \begin{array}{c}I_1\\I_2\\I_3\\ \vdots \\ I_K \end{array}\right] \Rightarrow L~n = I$

where $$L$$ is a $$K\times 3$$ matrix, $$I$$ is a $$K\times 1$$ vector, and $$n$$ is a three-vector.

# Photometric Stereo

$n = \arg \min_n \|L~n - I\|^2$ $n = \arg \min_n \|L~n - I\|^2 = \arg \min_n n^T~(L^TL)~n - 2(L^T~I)^T~n + I^TI$

Take gradient, set to 0: $$~~~~~(L^TL)~n = (L^T~I)$$

This is actually a $$3\times 3$$ equation. Solve using np.lingalg.solve to use Cholesky.

# Photometric Stereo

Some Practical Issues:

• Even though we assume light at infinity, works well in practice for just far away lights.
• Calibrate light vector $$\ell$$ by looking at an image of some known shape and albedo (typically a matte sphere)
• Technically only works for Lambertian objects. But often, can make objects Lambertian with polarizers.
• Also, estimated normals are typically well-defined for a 'valid' set of pixels in the image.
• You'll create / be given a "mask" of these valid pixels.

# Normals to Depth

$Z = \arg \min_Z \|g_x - f_x*Z\|^2 + \|g_y - f_y*Z\|^2$

# Normals to Depth

$Z = \arg \min_Z \|g_x - f_x*Z\|^2 + \|g_y - f_y*Z\|^2 + \lambda R(Z)$

We'll use $$R(Z)$$ as:

$R(Z) = \sum_n (Z*f_r)[n]^2 ~~~\text{for}~~~ f_r = \begin{array}{|c|c|c|}\hline-1/9&-1/9&-1/9\\\hline-1/9&8/9&-1/9\\\hline-1/9&-1/9&-1/9\\\hline\end{array}$

# Normals to Depth

$Z = \arg \min_Z \|g_x - f_x*Z\|^2 + \|g_y - f_y*Z\|^2 + \lambda R(Z)$

Version 1: Do it in the Fourier Domain (called Frankot-Chellappa)

• Assume that in the masked out regions, $$g_x = g_y = 0$$.

$\mathcal{F}(Z)[u,v] = \frac{\bar{F}_x[u,v]{G}_x[u,v] + \bar{F}_y[u,v]{G}_y[u,v]}{|F_x[u,v]|^2 + |F_y[u,v]|^2 + \lambda |F_r[u,v]|^2}$

$$\mathcal{F}(Z)$$ is FT of depth map, $$G_x$$ is FT of $$g_x$$, $$F_x$$ is (circular padded) FT of $$f_x$$, and so on.

• In general, should add some very small number (e.g., $$10^{-12}$$) to denominator for stability.

• In particular, what is the denominator for $$[u,v] = [0,0]$$ ?

• Both numerator and denominator are 0, because normals tell us nothing about average depth / offset.
• Explicitly set $$\mathcal{F}(Z)[0,0]$$ to 0.

# Normals to Depth

$Z = \arg \min_Z \|g_x - f_x*Z\|^2 + \|g_y - f_y*Z\|^2 + \lambda R(Z)$

• Allows us to use different weights for different pixels.

# Normals to Depth

$Z = \arg \min_Z \sum_n w[n] (g_x[n] - (f_x*Z)[n])^2 + \sum_n w[n] (g_y - (f_y*Z)[n])^2 + \lambda R(Z)$

• Allows us to use different weights for different pixels.
• Set $$w[n]$$ to 0 for masked out pixels.
• Set $$w[n]$$ to $$(\hat{n}_z[n])^2$$ everywhere else.
• Accounts for the fact that we got gradients by dividing by $$\hat{n}_z$$.
• Smaller values will be noisier.

# Normals to Depth

$Z = \arg \min_Z \sum_n w[n] (g_x[n] - (f_x*Z)[n])^2 + \sum_n w[n] (g_y - (f_y*Z)[n])^2 + \lambda R(Z)$

$Z = \arg \min_Z Z^T Q Z - 2 Z^T~b + c$

• Begin with all zeroes guess $$Z_0$$ for $$Z$$
• $$k = 0, r_0 \leftarrow b-Q~Z_0,~~p_0\leftarrow r_0$$
• Repeat (for say a fixed number of iterations)
• $$\alpha_k \leftarrow \frac{r_k^Tr_k}{p_k^TQp_k}$$
• $$Z_{k+1} \leftarrow Z_k + \alpha_k p_k$$
• $$r_{k+1} \leftarrow r_k - \alpha_k Qp_k$$
• $$\beta_k \leftarrow \frac{r_{k+1}^Tr_{k+1}}{r_k^Tr_k}$$
• $$p_{k+1} \leftarrow r_{k+1} + \beta_kp_k$$
• $$k=k+1$$

# Normals to Depth

$Z = \arg \min_Z \sum_n w[n] (g_x[n] - (f_x*Z)[n])^2 + \sum_n w[n] (g_y - (f_y*Z)[n])^2 + \lambda \sum_n ((f_r*Z)[n])^2$

$Z = \arg \min_Z Z^T Q Z - 2 Z^T~b + c$

We need to figure out:

• What is $$b$$ (should be same shape as image)
• How do compute $$Q~p$$ for a given $$p$$ (where $$p$$ is same shape as image)

Let $$W$$ be a diagonal matrix with values of $$w[n]$$. $$F_x$$, $$F_y$$ and $$F_r$$ denote convolutions by $$f_x$$, $$f_y$$, and $$f_r$$.

$Z = \arg \min (G_x - F_xZ)^TW(G_x - F_xZ) + (G_y - F_yZ)^TW(G_y - F_yZ) + \lambda (F_rZ)^T(F_rZ)$

What are $$Q$$ and $$b$$ ?

$Q = F_x^TWF_x + F_y^TWF_y + \lambda F_r^TF_r$ $b = F_x^TWG_x + F_y^TWG_y$

# Normals to Depth

$Z = \arg \min_Z \sum_n w[n] (g_x[n] - (f_x*Z)[n])^2 + \sum_n w[n] (g_y - (f_y*Z)[n])^2 + \sum_n ((f_r*Z)[n])^2$

$Z = \arg \min_Z Z^T Q Z - 2 Z^T~b + c$

• What is $$b$$ (should be same shape as image)
• How do compute $$Q~p$$ for a given $$p$$ (where $$p$$ is same shape as image)

$Q = F_x^TWF_x + F_y^TWF_y + \lambda F_r^TF_r$ $b = F_x^TWG_x + F_y^TWG_y$

$Q~p = ((p * f_x) \times w) * \bar{f}_x + ((p * f_y) \times w) * \bar{f}_y + \lambda ((p*f_r)*\bar{f}_r)$

$$\bar{f}$$ mean the flipped versions of $$f$$. $$\times$$ means element-wise product.
($$*$$ can be same convolutions with zero padding)

$b = (g_x \times w) * \bar{f}_x + (g_y \times w) * \bar{f}_y$

• Remember, $$p^T Qp$$ is just $$\langle p, Qp\rangle$$. So compute $$Qp$$, take element-wise product with $$p$$,
and sum across all pixels.