CSE 559A: Computer Vision

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).
Course Staff: Zhihao Xia, Charlie Wu, Han Liu

November 29, 2018

# Surface Normal Estimation

• Don't care about absolute depth, only surface orientation
• Derive training set from depth data
• Same architecture, but loss is on surface normals: continuous (maximize dot product), or classification (discretize possible orientations)

• Figure from Eigen & Fergus, ICCV 2015.

# Parse Geometry

• Instead of providing a depth or normal map, sometimes we care about decomposing a scene into largely planar regions.
• You could derive this information from predicted depth maps, but:
• Let's say depth or normal in some region is ambiguous
• By training the neural network, you're asking it to make its best guess for each point-wise value of depth/normal.
• But this doesn't tell you about the uncertainty, or multiple plausible values.
• So you have the network predict what you care about: a segmentation of the image into planes, and the parameters for each plane.

# Parse Geometry

Liu et al., PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image. CVPR 2018.

# Parse Geometry

• Can enable augmented-reality applications.

# Intrinsic Image Decomposition

• Recall from Photometric Reasoning Lectures
• Observed image is a product of albedo and shading (in the lambertian case $$\rho \langle \hat{n}, \ell \rangle$$
• Intrinsic Image Decomposition: Given an image, for each pixel estimate $$\rho$$ and shading.
• Train a neural network to output these two: $$RGB[n] \rightarrow \rho[n] \times s[n]$$

Narihira et al., Direct Intrinsics: Learning Albedo-Shading Decomposition by Convolutional Regression, ICCV 2015.

# Color Constancy

• Remove color cast by predicting illuminant color chromaticity.
• We talked about simple methods like gray world, white-patch retinex, etc.
• Turn this into a classification problem:
• $$[I_r,I_g,I_b]$$ so that they are all $$\in [0,1]$$ and sum to one.
• Discretize this space and consider labels.
• Predict from an input image.
• Barron, "Convolutional Color Constancy", ICCV 2015.

# Image Colorization

• Instead of predicting illuminant color, predict pixel color

from a grayscale image, at each pixel !

Larsson et al., "Learning Representations for Automatic Colorization," ECCV 2016.

# Illumination Estimation

• Remember in the most general case, object shading is based on an "environment map".
• For AR and image-editing applications, we sometimes want to estimate this map to figure out how new objects will appear.
• So let's train a neural network that, given an image, predicts this map.
• But how do we get ground truth training data ?
• To the rescue: 360 degree panorama cameras, and datasets collected using such cameras
• SUN360: Xiao et al., CVPR 2012.

# Illumination Estimation

• Outdoor Illumination: Predict position of sun, and relative strength of sun and sky.
• Hold-Geoffroy et al., Deep Outdoor Illumination Estimation, CVPR 2017.

# Illumination Estimation

• Outdoor Illumination: Predict position of sun, and relative strength of sun and sky.
• Hold-Geoffroy et al., Deep Outdoor Illumination Estimation, CVPR 2017.

# Illumination Estimation

• Outdoor Illumination: Predict position of sun, and relative strength of sun and sky.
• Hold-Geoffroy et al., Deep Outdoor Illumination Estimation, CVPR 2017.

# Illumination Estimation

• Outdoor Illumination: Predict position of sun, and relative strength of sun and sky.
• Hold-Geoffroy et al., Deep Outdoor Illumination Estimation, CVPR 2017.

# Illumination Estimation

• Indoor Illumination: More complicated, possibly multiple light sources.
• But similar idea: Gardner et al., Learning to Predict Indoor Illumination from a Single Image

# Illumination Estimation

• But similar idea: Gardner et al., Learning to Predict Indoor Illumination from a Single Image

# Style Transfer

• Make one image have the "style" or texture quality of another.
• Gatys et al., CVPR 2016:
• Don't train a network for this
• Instead take an existing network and look at the properties of its activations
• Values of higher layers represent "content": Try to preserve them
• Covariances of other layers represent style: Try to match them with other image

# Style Transfer

• Set this up as an optimization problem, and minimize with SGD+Backprop from a random init.

# Point-cloud Processing

• From a lot of depth sensor, you don't get an "image" but a point cloud.
• This is a list of $$(x,y,z)$$ locations of various points the sensor detected in the scene.
• We want to solve standard vision tasks on these clouds.
• Classification: Assign a label to the whole cloud.
• Segmentation: Assign a label to each point

# Point-cloud Processing

Approach 1

• Convert point cloud into a 3D grid, where each "voxel" records number of points in that grid.
• Maturana and Scherer, VoxNet.

# Point-cloud Processing

Approach 1: Problem

• Either you choose a fine grid, but then your 'volume' is huge and most values are zero.
• Or you choose a coarse grid, but then you lose spatial detail.

# Point-cloud Processing

Approach 2

• Su et al., "MV-CNN". You "render" the image from different pre-selected views, and feed these into an NN.
• Still a 3-D input, but high spatial resolution, limited "angular" resolution.
• Can architect your network better (most processing is per-view, followed by view-pooling)

# Point-cloud Processing

• Both methods still make resolution trade-offs.
• Also can solve classification, but what about segmentation.

Approach 3

• Qi et al., PointNet. Operate directly on the list of co-ordinates.

# Point-cloud Processing

Approach 3b

• PointNet++: PointNet applied on groups of points.