CSE 559A: Computer Vision

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).
Course Staff: Zhihao Xia, Charlie Wu, Han Liu

December 6, 2018

# General

• Last class. No office hours tomorrow.
• Project Reports Due Sunday Night !
• Keys for PSET 5 can be picked up from Jolley 205 on Tuesday, b/w 12:30pm-1:30pm.

• Max $$-\log (1-D(G))$$ vs min $$-\log(D(G))$$
• Binary Classifier: output of $$D$$ is $$\sigma(y)$$ for some $$y$$
• Discriminator is doing really well, so $$D(G)$$ is almost 0 ($$y << 0$$)
• $$\max -\log (1-D(G)) = \min \log(1-D(G))$$

$\nabla_y = -(D(G)-0) = 0-D(G) \approx 0$

• $$\min -\log(D(G))$$

$\nabla_y = D(G)-1 \approx 1$

• Both are negative (i.e., correct sign, says try to increase $$D(G)$$)
• But second version has much higher magnitude.

• Generated images from a training set of bedrooms (LSUN dataset).

Neyshabur et al., Stabilizing GAN Training with Multiple Random Projections

Neyshabur et al., Stabilizing GAN Training with Multiple Random Projections

Neyshabur et al., Stabilizing GAN Training with Multiple Random Projections

Denton et al., Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Karras et al., PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION

Karras et al., PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION

# Conditional GANs

• Train with

$L(G) = \|G(x)-y\|^2 - \lambda \log D(G(x))$ $L(D) = - \log(1-D(G(x))) - \log D(y)$

• The GAN loss is unconditional, but there is also a reconstruction loss.
• So the loss says, be close to the true answer, but make your output resemble natural images.

# Un-supervised Learning

• For a lot of tasks, it is hard to collect enough training data.
• We saw for the stereo example, how you can have an indirect supervision.
• But in other cases, you have to use transfer learning.
• Train a network on a large dataset for a related task for which you have ground truth.
• Remove last layer, and use / finetune feature extractor for new task.

# Un-supervised Learning

• Pre-train by learning to add color

# Un-supervised Learning

• Pre-train by solving jigsaw puzzles

# Un-supervised Learning

• Pre-train by predicting sound from video

• Generate synthetic training data using renderers.
• But networks trained on synthetic data need not generalize to real data.
• (In fact, they may not transfer from high-quality Flickr data to cell-phone camera data)

Problem Setting

• Have input-output training pairs of $$(x',y)$$ from source domain: renderings/high-quality images/...
• Have only inputs $$x$$ from target domain: where we actually want to use this.
• Train a network so that features computed from $$x'$$ and $$x$$ have the same distribution ...

i.e., use GANs !

That's all folks !
• We've covered what forms the foundations of state-of-the-art vision algorithms