CSE 559A: Computer Vision


Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).
Course Staff: Zhihao Xia, Charlie Wu, Han Liu

http://www.cse.wustl.edu/~ayan/courses/cse559a/

November 12, 2018

General

  • Problem Set 4 Due Tonight !


  • Problem Set 5 will be posted shortly.

Object Detection

Object Detection

Object Detection

Object Detection

Object Detection

  • Newer methods also use a neural network to generate "region proposals"
  • Efficient Implementations: bulk of the computation happens once on the entire image, and you crop a feature map for each region.
  • Even Faster Methods: Discretize image locations into grid, and directly output upto a fixed number of bounding boxes for each grid block.

Transfer Learning

  • Say you want to train a network to solve a problem.
    • The task is complex, so you need a large network.
    • But you don't have enough training data to train such a network.
  • Pick a related task for which you do have a lot of training data
    • ImageNet is a great database for this for a variety of semantic tasks
  • Train a network (like VGG-16) to solve that task.
  • Then, choose the output of some intermediate layer of that network
  • Use it as a feature vector, and learn a smaller network for your problem which goes from those features to the desired output.

Transfer Learning

  • VGG-16 does well on Imagenet classification

and gives you a feature representation that is surprisingly useful for a broad range of tasks.

Remember computing encoding \(\tilde{x}\) from \(x\). VGG-16's pool5, fc1, fc2, features can be the \(\tilde{x}\) for many tasks.

One can also "initialize" a network with the VGG-16 architecture to one trained with imagenet, and then "finetune" by replacing the final layer as classification for another task.

In general, empirical question to determine when training on Task A will provide good features for Task B.

Classification for Other Tasks

Classification for Other Tasks

Classification for Other Tasks

Classification for Other Tasks

Classification for Other Tasks

Classification for Other Tasks

Classification for Other Tasks

Classification for Other Tasks

Classification for Other Tasks

Classification for Other Tasks

Classification for Other Tasks

Classification for Other Tasks

Classification for Other Tasks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

Fully-Convolutional Networks

But what about downsampling ?

  • Option 0: Just don't use downsampling

Bad, because down-sampling is a way to quickly increase the "receptive field" of your network.

  • Option 1: Just produce a label map at lower-resolution.
  • Option 2: If you downsample by \(N\) (typically \(N=2^K\))
    Feed every \((N-1)\times(N-1)\) "shifted" version of your input through this FCN.

Bad because if you down-sample multiple times, you're still
re-computing activations prior to the last-downsampling.

  • Option 3: Dilated Convolutions

Dilated Convolution

Dilated Convolution

Dilated Convolution

Dilated Convolution

Dilated Convolution

Dilated Convolution

Dilated Convolution

Dilated Convolution

Semantic Segmentation

Semantic Segmentation

Deep Architectures

Deep Architectures

Deep Architectures

Deep Architectures

Deep Architectures

Deep Architectures

Deep Architectures

Deep Architectures

Deep Architectures

Deep Architectures