CSE 559A: Computer Vision

Use Left/Right PgUp/PgDown to navigate slides

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).

Course Staff: Zhihao Xia, Charlie Wu, Han Liu

Oct 11, 2018

- Still missing a few problem set 2 submissions
- Make sure you have
`git push`

-ed. - Do a
`git pull; git log`

and make sure the latest log message confirms your submission.

- Make sure you have
- Problem set 3 due two weeks from today.

No Class Tuesday (Fall Break)

No office hours tomorrow or Monday. Recitation next Friday.

**Last Time**

- We define a cost volume \(C\) of size \(W\times H\times D\)
- \(C[x,y,d]\) measures dis-similarity between \((x,y)\) in left image

and \((x-d,y)\) in right image

- \(C[x,y,d]\) measures dis-similarity between \((x,y)\) in left image

- Simplest Approach: \(d[x,y] = \arg min_d C[x,y,d]\)
- Too noisy

Want to express that disparity (and therefore depth) of nearby pixels is similar

Ad-hoc Method: Cost Volume Filtering

- Only encodes that nearby pixel disparities are exactly equal.

\[d = \arg \min_d \sum_{n} C[n,d[n]] + \lambda \sum_{(n,n') \in \mathbf{E}} S(d[n],d[n'])\]

- \(n=[x,y]^T\) for pixel location.

- \(C\) is cost-volume as before. Gives us "local evidence"

- \(\mathbf{E}\) is a set of all pairs of pixels that are "neighbors" / adjacent in some way.
- Can include all un-ordered pairs of pixels with \([(x,y),(x-1,y)]\) and \([(x,y),(x,y-1)]\) (four connected)
- Or diagonal neighbors as well.

- \(S\) is a function that indicates a preference for \(d[n]\) and \(d[n']\) to be the same.

\[d = \arg \min_d \sum_{n} C[n,d[n]] + \lambda \sum_{(n,n') \in \mathbf{E}} S(d[n],d[n'])\]

\(S\) is a function that indicates a preference for \(d[n]\) and \(d[n']\) to be the same.

- Choice 1:
- \(0\) if \(d[n']=d[n]\), \(1\) otherwise.

- Choice 2: \(|d[n']-d[n]|\)

- Choice 3:
- \(0\) if \(d[n']=d[n]\)
- \(T_1\) if \(|d[n']-d[n]| < \epsilon\)
- \(T_2\) otherwise.

How do we solve this ?

Note that this is a discrete minimization. Each \(d[n] \in \{0,1,\ldots D-1\}\).

\[d = \arg \min_d \sum_{n} C[n,d[n]] + \lambda \sum_{(n,n') \in \mathbf{E}} S(d[n],d[n'])\]

**One approach: Iterated Conditional Modes**

- Begin with \(d_0 = \arg min_d C[n,d[n]]\)

- At each iteration \(t\), compute \(d_{t+1}\) from \(d_t\), by solving

for each pixel in \(d_{t+1}\) assuming neighbors have values from \(d_t\).

\[d_{t+1}[n] = \arg \min_{d_n} C[n,d_n] + \lambda \sum_{(n,n') \in \mathbf{E_n}} S(d_n,d_{t}[n'])\]

- So for each pixel,
- Take matching cost.
- Add smoothness cost from its neighbors, assuming values from previous iteration.
- Minimize.

Does it converge ?

No Guarantee: We are changing all pixel assignments simultaneously.

\[d = \arg \min_d \sum_{n} C[n,d[n]] + \lambda \sum_{(n,n') \in \mathbf{E}} S(d[n],d[n'])\]

**Per-pixel Iterated Conditional Modes (slow!)**

- Begin with \(d_0 = \arg \min_d C[n,d[n]]\)
- At each iteration \(t\), compute \(d_{t+1}\) from \(d_t\), by solving

for**one**pixel in \(d_{t+1}\) assuming neighbors have values from \(d_t\).

\[d_{t+1}[n_{t+1}] = \arg \min_{d_n} C[n_{t+1},d_n] + \lambda \sum_{(n_{t+1},n') \in \mathbf{E_{n_{t+1}}}} S(d_n,d_{t}[n'])\]

**Does it converge ?**

- Each iteration decreases the cost. So it will converge (but to a local optimum).

\[d = \arg \min_d \sum_{n} C[n,d[n]] + \lambda \sum_{(n,n') \in \mathbf{E}} S(d[n],d[n'])\]

- These kind of cost functions / optimization problems are quite common in vision.
- The cost can be interpreted as a log probability distribution:

\[p(d) \propto \prod_{n} \exp\left(-C[n,d[n]]\right) \prod_{(n,n') \in \mathbf{E}} \exp\left(-\lambda S(d[n],d[n'])\right)\]

- Joint distribution over all the \(d[n]\) values.

\[p(d) \propto \prod_{n} \exp\left(-C[n,d[n]]\right) \prod_{(n,n') \in \mathbf{E}} \exp\left(-\lambda S(d[n],d[n'])\right)\]

- Joint distribution over all the \(d[n]\) values.

**Graphical Model**: Probability Distribution Represented as a "Graph" \((V,E)\)

\[p(\{v \in V\}) = \prod_{v\in V} \Psi_v(v) \prod_{(v_1,v_2)\in E} \Phi_{v_1,v_2}(v_1,v_2)\]

- Unary term for each node, pair-wise term for each edge.

(Directed Graphs represent Bayesian Networks)

\[p(d) \propto \prod_{n} \exp\left(-C[n,d[n]]\right) \prod_{(n,n') \in \mathbf{E}} \exp\left(-\lambda S(d[n],d[n'])\right)\]

Question: Are \(d[n]\) and \(d[n']\) independent if:

- If \((n,n') \in \mathbf{E}\) -- pixels are neighbors ?

Question: Are \(d[n]\) and \(d[n']\) independent if:

- If \((n,n') \in \mathbf{E}\) -- pixels are neighbors ?

Reminder: Two variables are independent if we can express their joint distribution as a product of distributions on each variable.

Question: Are \(d[n]\) and \(d[n']\) independent if:

- If \((n,n') \in \mathbf{E}\) -- pixels are neighbors. No
- If \((n,n') \notin \mathbf{E}\) -- pixels are not neighbors ?

Question: Are \(d[n]\) and \(d[n']\) independent if:

- If \((n,n') \in \mathbf{E}\) -- pixels are neighbors. No
- If \((n,n') \notin \mathbf{E}\) -- pixels are not neighbors ? NO.

Question: Are \(d[n]\) and \(d[n']\) independent if:

- If \((n,n') \notin \mathbf{E}\) -- pixels are not neighbors ? NO. Unless \(n,n'\) are parts of disconnected components of graph.

Question: Are \(d[n]\) and \(d[n']\) independent if:

- If \((n,n') \notin \mathbf{E}\) -- pixels are not neighbors ? NO. Unless \(n,n'\) are parts of disconnected components of graph.
- If \((n,n') \notin \mathbf{E}\), "conditioned" on all the neighbors of \(n\) being observed. \(p(d[n],d[n'] | \{d[n'']\})\)

Question: Are \(d[n]\) and \(d[n']\) independent if:

- If \((n,n') \notin \mathbf{E}\), "conditioned" on all the neighbors of \(n\) being observed. \(p(d[n],d[n'] | \{d[n'']\})\)

YES. This is the Markov property. And these kinds of graphical models are called Markov random fields.

Graph structure encodes "conditional independence".

Compute assignment with highest probability

\[d = \arg \max_d p(d) = \arg \min_d \sum_{n} C[n,d[n]] + \lambda \sum_{(n,n') \in \mathbf{E}} S(d[n],d[n'])\]

\[d = \arg \min_d \sum_{n} C[n,d[n]] + \lambda \sum_{(n,n') \in \mathbf{E}} S(d[n],d[n'])\]

- Iterated Conditional Modes really slow.
- No guaranteed solution for arbitrary graphs.

- But could solve it we our graph were a chain (or more generally a tree).

\[d = \arg \min_d \sum_{x} C[x,d[x]] + \lambda \sum_x S(d[x],d[x+1])\]

- Consider where we optimize each epipolar line separately.

We could apply this on individual epipolar lines.

Get "streaking" artifacts. Because we're smoothing each line independently.

- That's why we want to use a full 2D grid.
- But forward-backward only works on chains (or graphs without cycles).

One flavor of approximate algorithms apply the same idea of forming a \(\bar{C}[x,d]\)

- TRW-S
- Loopy Belief Propagation
- SGM

**Semi-Global Matching**

\[\bar{C}[x,d] = C[x,d] + \min_{d'} \bar{C}[x-1,d'] + \lambda S(d,d')\]

This is going left to right in the horizontal direction.

Idea: Compute different \(\bar{C}\) along different directions ...

and average.

**Semi-Global Matching**

\[\bar{C}_{lr}[n,d] = C[n,d] + \min_{d'} \bar{C}_{lr}[n-[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{rl}[n,d] = C[n,d] + \min_{d'} \bar{C}_{rl}[n+[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{du}[n,d] = C[n,d] + \min_{d'} \bar{C}_{du}[n-[0,1]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{ud}[n,d] = C[n,d] + \min_{d'} \bar{C}_{ud}[n+[0,1]^T,d'] + \lambda S(d,d')\]

\[d[n] = \arg \min_d \bar{C}_{lr}[n,d] + \bar{C}_{rl}[n,d]+ \bar{C}_{ud}[n,d]+\bar{C}_{du}[n,d]\]

**Semi-Global Matching**

\[\bar{C}[x,d] = C[x,d] + \min_{d'} \bar{C}[x-1,d'] + \lambda S(d,d')\]

- Consider the case when \(S(d,d')\):
- 0 if \(d=d'\)
- \(P_1\) if \(|d-d'| = 1\)
- \(P_2\) otherwise.

- Can we do this efficiently ?
- Need to go through each line sequentially.
- But can go through all lines in parallel.
- But what about \(d\) ? Do we need to do minimization for every \(d\) independently ?

\[\bar{C}[x,d] = C[x,d] + \min_{d'} \bar{C}[x-1,d'] + \lambda S(d,d')\]

- Note: It doesn't matter if we add / subtract constants to all d's:
- \(C[x,d]\) with \(C[x,d] + C_0[x]\)
- \(\bar{C}[x,d]\) with \(\bar{C}[x,d] + C_0[x]\)

Why not ?

- Because the minimization will always be over \(d\). You are never comparing \(C[x_1,d_1]\) with \(C[x_2,d_2]\).

\[\bar{C}[x,d] = C[x,d] + \min_{d'} \bar{C}[x-1,d'] + S(d,d')\]

\[S(d,d') = \left\{\begin{array}{l}0~\text{if}~d=d'\\P_1~\text{if}~|d-d'|=1\\P_2~\text{otherwise}\end{array} \right.\]

\[\bar{C}[x,d] = C[x,d] + \color{red}{\min_{d'} \bar{C}[x-1,d'] + S(d,d')}\]

\[S(d,d') = \left\{\begin{array}{l}0~\text{if}~d=d'\\P_1~\text{if}~|d-d'|=1\\P_2~\text{otherwise}\end{array} \right.\]

- Step 1 (Simplify): Replace \(\bar{C}[x-1,d']\) with \(\tilde{C}[x-1,d'] = \bar{C}[x-1,d'] - \min_{d''} \bar{C}[x-1,d'']\)

\[\bar{C}[x,d] = C[x,d] + \color{red}{\min_{d'} \tilde{C}[x-1,d'] + S(d,d')}\]

\[S(d,d') = \left\{\begin{array}{l}0~\text{if}~d=d'\\P_1~\text{if}~|d-d'|=1\\P_2~\text{otherwise}\end{array} \right.\]

- Step 1 (Simplify): Replace \(\bar{C}[x-1,d']\) with \(\tilde{C}[x-1,d']=\bar{C}[x-1,d'] - \min_{d''} \bar{C}[x-1,d'']\)

What happens then ?

What is the MAXIMUM value for \(\min_{d'} \tilde{C}[x-1,d'] + S(d,d')\) for any \(d\) ?

\[\bar{C}[x,d] = C[x,d] + \color{red}{\min_{d'} \tilde{C}[x-1,d'] + S(d,d')}\]

- Step 1 (Simplify): Replace \(\bar{C}[x-1,d']\) with \(\tilde{C}[x-1,d']=\bar{C}[x-1,d'] - \min_{d''} \bar{C}[x-1,d'']\)

The MAXIMUM value for \(\min_{d'} \tilde{C}[x-1,d'] +S(d,d')\) is \(P_2\).

- Step 2: This means that for every value of \(d\), we just need to consider four values.

- \(\min_{d'} \tilde{C}[x-1,d'] + S(d,d')\) is the min of
- \(P_2\) (for \(d' = \arg \min \tilde{C}[x-1,d']\))
- \(\tilde{C}[x-1,d-1]+P_1\) (for \(d' = d-1\))
- \(\tilde{C}[x-1,d+1]+P_1\) (for \(d' = d+1\))
- \(\tilde{C}[x-1,d]\) (for \(d' = d\))

\[\bar{C}[x,d] = C[x,d] + \color{red}{\min_{d'} \tilde{C}[x-1,d'] + S(d,d')}\]

- \(\min_{d'} \tilde{C}[x-1,d'] + S(d,d')\) is the min of
- \(P_2\) (for \(d' = \arg \min \tilde{C}[x-1,d']\))
- \(\tilde{C}[x-1,d-1]+P_1\) (for \(d' = d-1\))
- \(\tilde{C}[x-1,d+1]+P_1\) (for \(d' = d+1\))
- \(\tilde{C}[x-1,d]\) (for \(d' = d\))

Can do this in parallel with matrix operations for all \(d\) and all lines.

Full algorithm in paper:

**Hirschmueller, Stereo Processing by Semi-Global Matching and Mutual Information, PAMI 2008.**

SGM Algorithm Averages along four directions:

\[\bar{C}_{lr}[n,d] = C[n,d] + \min_{d'} \bar{C}_{lr}[n-[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{rl}[n,d] = C[n,d] + \min_{d'} \bar{C}_{rl}[n+[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{du}[n,d] = C[n,d] + \min_{d'} \bar{C}_{du}[n-[0,1]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{ud}[n,d] = C[n,d] + \min_{d'} \bar{C}_{ud}[n+[0,1]^T,d'] + \lambda S(d,d')\]

\[d[n] = \arg \min_d \bar{C}_{lr}[n,d] + \bar{C}_{rl}[n,d]+ \bar{C}_{ud}[n,d]+\bar{C}_{du}[n,d]\]

Bur \(\bar{C}_{lr}\) is still smoothing the original cost.

SGM Algorithm Averages along four directions:

\[\bar{C}_{lr}[n,d] = (C[n,d]+\bar{C}_{rl}[n,d]+\bar{C}_{ud}[n,d] + \bar{C}_{du}[n,d]) + \min_{d'} \bar{C}_{lr}[n-[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{rl}[n,d] = C[n,d] + \min_{d'} \bar{C}_{rl}[n+[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{du}[n,d] = C[n,d] + \min_{d'} \bar{C}_{du}[n-[0,1]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{ud}[n,d] = C[n,d] + \min_{d'} \bar{C}_{ud}[n+[0,1]^T,d'] + \lambda S(d,d')\]

Wouldn't this be better ?

But then ...

SGM Algorithm Averages along four directions:

\[\bar{C}_{lr}[n,d] = (C[n,d]+\bar{C}_{rl}[n,d]+\bar{C}_{ud}[n,d] + \bar{C}_{du}[n,d]) + \min_{d'} \bar{C}_{lr}[n-[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{rl}[n,d] = (C[n,d]+\bar{C}_{lr}[n,d]+\bar{C}_{ud}[n,d] + \bar{C}_{du}[n,d]) + \min_{d'} \bar{C}_{rl}[n-[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{du}[n,d] = C[n,d] + \min_{d'} \bar{C}_{du}[n-[0,1]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{ud}[n,d] = C[n,d] + \min_{d'} \bar{C}_{ud}[n+[0,1]^T,d'] + \lambda S(d,d')\]

Wouldn't this be better ?

Why not this ...

SGM Algorithm Averages along four directions:

\[\bar{C}_{lr}[n,d] = (C[n,d]+\bar{C}_{rl}[n,d]+\bar{C}_{ud}[n,d] + \bar{C}_{du}[n,d]) + \min_{d'} \bar{C}_{lr}[n-[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{rl}[n,d] = (C[n,d]+\bar{C}_{lr}[n,d]+\bar{C}_{ud}[n,d] + \bar{C}_{du}[n,d]) + \min_{d'} \bar{C}_{rl}[n-[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{du}[n,d] = (C[n,d]+\bar{C}_{lr}[n,d]+\bar{C}_{rl}[n,d] + \bar{C}_{ud}[n,d]) + \min_{d'} \bar{C}_{du}[n-[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}_{ud}[n,d] = (C[n,d]+\bar{C}_{lr}[n,d]+\bar{C}_{rl}[n,d] + \bar{C}_{du}[n,d]) + \min_{d'} \bar{C}_{ud}[n-[1,0]^T,d'] + \lambda S(d,d')\]

Wouldn't this be better ?

Why not this ?

Because this is a circular definition.

Loopy Belief Propagation (one version)

\[\bar{C}^{t+1}_{lr}[n,d] = (C[n,d]+\bar{C}^t_{rl}[n,d]+\bar{C}^t_{ud}[n,d] + \bar{C}^t_{du}[n,d]) + \min_{d'} \bar{C}^{t+1}_{lr}[n-[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}^{t+1}_{rl}[n,d] = (C[n,d]+\bar{C}^t_{lr}[n,d]+\bar{C}^t_{ud}[n,d] + \bar{C}^t_{du}[n,d]) + \min_{d'} \bar{C}^{t+1}_{rl}[n-[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}^{t+1}_{du}[n,d] = (C[n,d]+\bar{C}^t_{lr}[n,d]+\bar{C}^t_{rl}[n,d] + \bar{C}^t_{ud}[n,d]) + \min_{d'} \bar{C}^{t+1}_{du}[n-[1,0]^T,d'] + \lambda S(d,d')\] \[\bar{C}^{t+1}_{ud}[n,d] = (C[n,d]+\bar{C}^t_{lr}[n,d]+\bar{C}^t_{rl}[n,d] + \bar{C}^t_{du}[n,d]) + \min_{d'} \bar{C}^{t+1}_{ud}[n-[1,0]^T,d'] + \lambda S(d,d')\]

**Do this iteratively**

More generally, at time step \(t\), pass a message from node \(n\) to \(n'\), based on all messages \(n\) has at that time, except for the message from \(n'\).

Read more:

- Yedidia, Freeman, Weiss, "Understanding belief propagation and its generalizations," IJCAI 2001 (Distinguished Paper)

- Tappen & Freeman, "Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters", ICCV 2003.

- Other methods for discrete minimzation---based on "Graph Cuts".

- SGM / Loopy BP: Generalize that there is an exact solution for a chain.
- Graph Cuts (with expansions / swaps): Generalize that there is an exact solution if only two values of \(d\).

D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 2002.