CSE 559A: Computer Vision

Use Left/Right PgUp/PgDown to navigate slides

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).
Course Staff: Zhihao Xia, Charlie Wu, Han Liu

Oct 23, 2018

# General

• Project Proposals Deadline was Sunday
• We'll be providing you with feedback over the next week.
• Problem Set 3 Due Thursday
• Friday Office Hours will be (always) in Lopata 103

# Grouping & Segmentation

But what is the basis of this grouping ?

• Physical
• Lie on the same surface / plane
• Made of the same material
• Moving together rigidly

# Grouping & Segmentation

But what is the basis of this grouping ?

• Semantic
• Same object
• Foreground / background
• Interesting / non-interesting

Semantic segmentation: often humans will disagree on what goes where.

# Grouping & Segmentation

Simplest Version: Superpixel Segementation

• Partition Image into a large number of segments called superpixels.
• Many segments, each segment relatively small.
• Oversegmentation of the image
• Each object / plane / surface might be broken into multiple segments
• But (hope) each segment does not cross a boundary.
• Can be based on appearance alone

• Simplifies further processing (dealing with $$K$$ segments instead of $$N$$ pixels)

# Grouping & Segmentation

SLIC Superpixels

Achanta et al., 2010. Simple Linear Iterative Clustering.

Formally, given an image $$I[n]$$ with $$N$$ pixels, you want group the pixels into $$K<<N$$ super pixels.

You want to determine a label

$L[n] \in \{1,2,\ldots K\}$

for every pixel $$n$$, based on some metric.

Note the value of $$L$$ doesn't matter. What matters is similar pixels have the same label. This is clustering !

The final output we care about is $$K$$ sets

$S_k = \{n: L[n] = k\}$

# Grouping & Segmentation

SLIC Superpixels

We will want to group pixels that appear similar and are close by into the same super-pixel.

Define an "augmented" image $$I'[n]$$ where each $$I'[n]\in \mathbb{R}^5$$

• First 3 dimensions are R,G,B
• Two dimensions are $$x$$ and $$y$$ co-ordinates.

For grayscale images, $$I'[n] \in \mathbb{R}^3$$.

# Grouping & Segmentation

SLIC Superpixels

Determine labeling $$L[n]$$ to minimize the following cost:

$L = \arg \min_L \min_{\{\mu_k\}}~~~\sum_{k=1}^K~~~\sum_{n:L[n] = k} \|I'[n] - \mu_k\|^2$

Here, each $$\mu_k \in \mathbb{R}^5$$.

• This is K-means clustering.
• Easy to see that $$\mu_k$$ will be the mean of the $$I'$$ vectors of pixels assigned to label $$k$$.
• We're saying that all pixels assigned the label $$k$$ should be
close to each other in the squared distance sense of their augmented vectors.
• This augmented vector encodes both appearance and location.
• So we want pixels that look the same and are close-by to have the same label.

# Grouping & Segmentation

SLIC Superpixels

$L = \arg \min_L \min_{\{\mu_k\}}~~~\sum_{k=1}^K~~~\sum_{n:L[n] = k} \|I'[n] - \mu_k\|^2$

• Typically, use Lab color space instead of RGB.
• You can weight the contribution of location vs appearance by normalizing $$(x,y)$$ in $$I'$$ differently. $I'[n] = [I[n]_R,I[n]_G,I[n]_B,\alpha n_x,\alpha n_y]^T$

# Grouping & Segmentation

$L = \arg \min_L \min_{\{\mu_k\}}~~~\sum_{k=1}^K~~~\sum_{n:L[n] = k} \|I'[n] - \mu_k\|^2$

K-Means: Lloyd's algorithm

• Begin with some initial assignment $$L[n]$$ (more later).
• At each iteration ...

Step 1: For each $$k$$, assign

$\mu_k = \text{Mean} \{I'[n]\}_{L[n] = k}$

Step 2: For each $$n$$, assign

$L[n] = \arg \min_k \|I'[n]-\mu_k\|^2$

• Does this converge ?
• How do we initialize ?
• Do we really need to do $$K\times N$$ computations of $$\|I'[n]-\mu_k\|^2$$ ?

# Grouping & Segmentation

SLIC: Initialization

• Actually, begin with an assignment of $$\{\mu_k\}$$ (and do a step 2).
• Given desired number of super-pixels $$K$$, choose $$K$$ points on a grid.
• Spaced horizontally and vertically apart by $$S = \sqrt{\frac{HW}{K}}$$
• Set each $$u_k = I'[n_k]$$ as the augmented vector of one of these points.
• In step 2, each seed is going to attract pixels in its neighborhood that are most like it.

# Grouping & Segmentation

SLIC: Initialization

• Actually, begin with an assignment of $$\{\mu_k\}$$ (and do a step 2).
• Given desired number of super-pixels $$K$$, choose $$K$$ points on a grid.
• Spaced horizontally and vertically apart by $$S = \sqrt{\frac{HW}{K}}$$
• Set each $$u_k = I'[n_k]$$ as the augmented vector of one of these points.
• In step 2, each seed is going to attract pixels in its neighborhood that are most like it.
• Sometimes this initialization gives you a 'seed' that lies right on an edge.

# Grouping & Segmentation

SLIC: Initialization

• Actually, begin with an assignment of $$\{\mu_k\}$$ (and do a step 2).
• Given desired number of super-pixels $$K$$, choose $$K$$ points on a grid.
• Spaced horizontally and vertically apart by $$S = \sqrt{\frac{HW}{K}}$$
• Set each $$u_k = I'[n_k]$$ as the augmented vector of one of these points.
• In step 2, each seed is going to attract pixels in its neighborhood that are most like it.
• Sometimes this initialization gives you a 'seed' that lies right on an edge.
• Bad because pixel on either side of edge will often look nothing like it.

# Grouping & Segmentation

SLIC: Initialization

• Actually, begin with an assignment of $$\{\mu_k\}$$ (and do a step 2).
• Given desired number of super-pixels $$K$$, choose $$K$$ points on a grid.
• Spaced horizontally and vertically apart by $$S = \sqrt{\frac{HW}{K}}$$
• Set each $$u_k = I'[n_k]$$ as the augmented vector of one of these points.
• In step 2, each seed is going to attract pixels in its neighborhood that are most like it.
• Sometimes this initialization gives you a 'seed' that lies right on an edge.
• Bad because pixel on either side of edge will often look nothing like it.
• Solution: Look in a 3x3 neighborhood, and choose pixel with lowest gradient magnitude.

# Grouping & Segmentation

SLIC: Minimization

At any given iteration, for step 2:

# Grouping & Segmentation

SLIC: Minimization

At any given iteration, for step 2:

• Don't consider all possible $$K$$ for every $$n$$.
• Instead, say that a pixel $$n$$ can only be assigned to a cluster $$k$$ if
$$n$$ is within a $$2S \times 2S$$ window around the spatial co-ordinates in $$u_k$$.
• Note that $$\mu_k$$'s will no longer be on a regular grid.

# Grouping & Segmentation

SLIC: Minimization

At any given iteration, for step 2:

• Initialize min_dist[n] to Infinity for all n

# Grouping & Segmentation

SLIC: Minimization

At any given iteration, for step 2:

• Initialize min_dist[n] to Infinity for all n
• Loop through each $$u_k$$, and consider pixels in $$2S\times 2S$$ window around $$\mu_k$$
• This will be a regular grid.
• For each pixel in this window, compute distance of $$I'[n]$$ to $$\mu_k$$,
compare to min_dist[n], if lower, update min_dist[n] and update L[n].

Do we need to loop over $$K$$ ? Can get some parallelism if you're clever about it.

# Grouping & Segmentation

SLIC: Uses

Given a set of super-pixels $$S_k = \{n: L[n] = k\}$$:

• You can "denoise" your image by smoothing independently within each $$S_k$$.
• Replace all intensities by their mean.
• Fit intensity to be a linear function of $$n$$.
• You can "denoise" other scene properties
• Filter your stereo cost volume within each super-pixel.
• Take your disparities within each super-pixel, and fit them to a plane.
• Do the aggregation for Lucas-Kanade flow estimation within each super-pixel.
• Build super-pixels with intensity + other information
• Get an initial estimate of disparity, add it to your augmented vector $$I'[n]$$.
• Get a super-pixel segmentation. Smooth cost-volume, re-estimate disparities.
• Repeat segmentation ...
• Group pixels (instead of super-pixels) into objects or by semantic labels

# Grouping & Segmentation

Formally, let's say our smoothness cost $$S_{n,n'}(l,l') = w_{n,n'} \delta[l!=l']$$, for $$w_{n,n'} \geq 0$$.

$L = \arg \min_{L[n] \in \{0,1\}} \sum_n C[n,L[n]] + \sum_{(n,n')\in\mathbb{E}} w_{n,n'} \delta[L[n]!=L[n']]$

• Build a graph with vertices $$V = \{n\} \cup \{0,1\}$$.
• Place an edge between every $$(n,n')\in\mathbb{E}$$ with weight $$w_{n,n'}$$.
• Place an edge between $$(n,0)\forall n$$ with weight $$C[n,1]$$ (assuming costs are positive).
• Place an edge between $$(n,1)\forall n$$ with weight $$C[n,0]$$ (assuming costs are positive).
• Partition the vertices into sets $$A,B$$ such that $$0 \in A, 1 \in B$$, to minimize Cut$$(A,B)$$.
• The cut is defined as the sum of the weights of the edges going between vertices in A to vertices in B.
• Can be solved in polynomial time (e.g., Stoer-Wagner)
• Assign all pixels in $$A$$ label 0, and all pixels in $$B$$ label 1.

# Grouping & Segmentation

• Polynomial Time for Binary Segmentation
• NP-hard for multi-label cases. $$L[n] \in \{A,B,C,\ldots ...\}$$
• Remember, this is the same as our stereo case.
• But approximate algorithms available
• Typically different algorithms work well here than for stereo

# Grouping & Segmentation

Multi-label Case: $$L[n] = \{A,B,C,\ldots ...\}$$

• Begin with some initial assignment of $$L[n]$$ (perhaps the pixel-wise minimizer of $$C$$)
• Then update $$L$$ by making one of two kinds of moves in each iteration
• $$\alpha$$-Expansion
• Choose one of the labels (say $$A$$)
• Build a binary segmentation problem where $$1 = A$$, $$0=$$ everything else
• Set $$C[n,0]=\infty$$ for all pixels $$n$$ where the current label is already $$A$$
• Set $$C[n,0]=$$ cost of its current assigned label for every other pixel
• Set $$C[n,1]=$$ cost of $$A$$ for every other pixel
• Do a min-cut. Replace all pixels labeled $$1$$ with $$A$$.
• $$\alpha-\beta$$ Swap
• Choose a pair of labels (say $$A$$ and $$B$$)
• Now define a new graph, containing only pixels that currently have label $$A$$ or $$B$$.
• Solve the binary segmentation problem
• Iterate through these different kinds of moves for different choices of labels.

# Grouping & Segmentation

References

• Boykov and Kolmogorov, An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision, PAMI 2004.
• Delong et al., Fast Approximate Energy Minimization with Label Costs, IJCV 2012.
• Rother et al., GrabCut -Interactive Foreground Extraction using Iterated Graph Cuts, SIGGRAPH 2004.

# Grouping & Segmentation

Next Time

• Min-cut can often lead to isolated points

• Avoid with a method called "Normalized Cuts"