CSE 559A: Computer Vision


Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).
Course Staff: Zhihao Xia, Charlie Wu, Han Liu

http://www.cse.wustl.edu/~ayan/courses/cse559a/

November 29, 2018

Surface Normal Estimation

  • Don't care about absolute depth, only surface orientation
  • Derive training set from depth data
  • Same architecture, but loss is on surface normals: continuous (maximize dot product), or classification (discretize possible orientations)

  • Figure from Eigen & Fergus, ICCV 2015.

Parse Geometry

  • Instead of providing a depth or normal map, sometimes we care about decomposing a scene into largely planar regions.
  • You could derive this information from predicted depth maps, but:
    • Let's say depth or normal in some region is ambiguous
    • By training the neural network, you're asking it to make its best guess for each point-wise value of depth/normal.
    • But this doesn't tell you about the uncertainty, or multiple plausible values.
  • So you have the network predict what you care about: a segmentation of the image into planes, and the parameters for each plane.

Parse Geometry

Liu et al., PlaneNet: Piece-wise Planar Reconstruction from a Single RGB Image. CVPR 2018.

Parse Geometry

Parse Geometry

  • Can enable augmented-reality applications.

Intrinsic Image Decomposition

  • Recall from Photometric Reasoning Lectures
    • Observed image is a product of albedo and shading (in the lambertian case \(\rho \langle \hat{n}, \ell \rangle\)
    • Intrinsic Image Decomposition: Given an image, for each pixel estimate \(\rho\) and shading.
  • Train a neural network to output these two: \(RGB[n] \rightarrow \rho[n] \times s[n]\)

Narihira et al., Direct Intrinsics: Learning Albedo-Shading Decomposition by Convolutional Regression, ICCV 2015.

Color Constancy

  • Remove color cast by predicting illuminant color chromaticity.
  • We talked about simple methods like gray world, white-patch retinex, etc.
  • Turn this into a classification problem:
    • \([I_r,I_g,I_b]\) so that they are all \(\in [0,1]\) and sum to one.
    • Discretize this space and consider labels.
    • Predict from an input image.
  • Barron, "Convolutional Color Constancy", ICCV 2015.

Image Colorization

  • Instead of predicting illuminant color, predict pixel color

from a grayscale image, at each pixel !

Larsson et al., "Learning Representations for Automatic Colorization," ECCV 2016.

Image Colorization

Image Colorization

Illumination Estimation

  • Remember in the most general case, object shading is based on an "environment map".
  • For AR and image-editing applications, we sometimes want to estimate this map to figure out how new objects will appear.
  • So let's train a neural network that, given an image, predicts this map.
  • But how do we get ground truth training data ?
  • To the rescue: 360 degree panorama cameras, and datasets collected using such cameras
    • SUN360: Xiao et al., CVPR 2012.

Illumination Estimation

  • Outdoor Illumination: Predict position of sun, and relative strength of sun and sky.
    • Hold-Geoffroy et al., Deep Outdoor Illumination Estimation, CVPR 2017.

Illumination Estimation

  • Outdoor Illumination: Predict position of sun, and relative strength of sun and sky.
    • Hold-Geoffroy et al., Deep Outdoor Illumination Estimation, CVPR 2017.

Illumination Estimation

  • Outdoor Illumination: Predict position of sun, and relative strength of sun and sky.
    • Hold-Geoffroy et al., Deep Outdoor Illumination Estimation, CVPR 2017.

Illumination Estimation

  • Outdoor Illumination: Predict position of sun, and relative strength of sun and sky.
    • Hold-Geoffroy et al., Deep Outdoor Illumination Estimation, CVPR 2017.

Illumination Estimation

  • Indoor Illumination: More complicated, possibly multiple light sources.
  • But similar idea: Gardner et al., Learning to Predict Indoor Illumination from a Single Image

Illumination Estimation

  • But similar idea: Gardner et al., Learning to Predict Indoor Illumination from a Single Image

Style Transfer

  • Make one image have the "style" or texture quality of another.
  • Gatys et al., CVPR 2016:
    • Don't train a network for this
    • Instead take an existing network and look at the properties of its activations
    • Values of higher layers represent "content": Try to preserve them
    • Covariances of other layers represent style: Try to match them with other image

Style Transfer

  • Set this up as an optimization problem, and minimize with SGD+Backprop from a random init.

Style Transfer

Point-cloud Processing

  • From a lot of depth sensor, you don't get an "image" but a point cloud.
  • This is a list of \((x,y,z)\) locations of various points the sensor detected in the scene.
  • We want to solve standard vision tasks on these clouds.
    • Classification: Assign a label to the whole cloud.
    • Segmentation: Assign a label to each point

Point-cloud Processing

Approach 1

  • Convert point cloud into a 3D grid, where each "voxel" records number of points in that grid.
    • Maturana and Scherer, VoxNet.

Point-cloud Processing

Approach 1: Problem

  • Either you choose a fine grid, but then your 'volume' is huge and most values are zero.
  • Or you choose a coarse grid, but then you lose spatial detail.

Point-cloud Processing

Approach 2

  • Su et al., "MV-CNN". You "render" the image from different pre-selected views, and feed these into an NN.
    • Still a 3-D input, but high spatial resolution, limited "angular" resolution.
    • Can architect your network better (most processing is per-view, followed by view-pooling)

Point-cloud Processing

  • Both methods still make resolution trade-offs.
  • Also can solve classification, but what about segmentation.

Approach 3

  • Qi et al., PointNet. Operate directly on the list of co-ordinates.

Point-cloud Processing

Approach 3b

  • PointNet++: PointNet applied on groups of points.

Physics Based Vision Tasks

  • This is only a small set of examples of neural networks applied to physics-based tasks.
  • Key Ideas
    • Neural networks / learning can still be useful even if you have a physical model.
    • They help with dealing with the ill-posed nature of the problem, better than hand-picked priors or regularizers.
    • But, you need to architect your networks to express the required computation.
    • Have to be creative about getting training data.

Next Time: GANs and Adversarial losses.