CSE 559A: Computer Vision

Use Left/Right PgUp/PgDown to navigate slides

Fall 2018: T-R: 11:30-1pm @ Lopata 101

Instructor: Ayan Chakrabarti (ayan@wustl.edu).
Course Staff: Zhihao Xia, Charlie Wu, Han Liu

Aug 30, 2018

• EVERYONE needs to fill out survey.

• Setup git and Anaconda, send us your public key, and do problem set 0.
• Do immediately: submit public key and make sure you can clone repo.

• If you have trouble with git/Python/LaTeX setup:
• Attend Zhihao's office hours tomorrow: 10:30 AM-Noon @ Jolley 309

• This monday is labor day: no office hours!
• Monday location still TBD

# Sensor

• $$E(x,y,t)$$: Light energy, per unit area per unit time, arriving at point $$(x,y)$$ at time $$t$$
• Here, $$x,y$$ are real numbers (in meters) denoting actual position on the sensor plane.
• $$I[n_x,n_y]$$: Intensity measured by the sensor element at grid location $$n_x,n_y$$
• Here, $$n_x$$, $$n_y$$ are integers, indexing pixel location.
• $$p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})$$: is a sensitivity function
• $$\bar{x}_{n_x},\bar{y}_{n_y}$$ is the location (in meters) of the center of the sensor element
• $$p(\cdot,\cdot)$$ is ideally 1 inside pixel, 0 outside. But may have attenuation at boundaries.
• Defining $$q$$ as the "quantum efficiency" of the sensor: Ratio of Light Energy to Charge/Voltage
• $$\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy$$
Rate at which charge/voltage increases in sensor element $$n_x,n_y$$ at time $$t$$.

# Sensor

• An image capture involves "exposing" the image for an interval $$T$$ (seconds)
• So the total intensity is going to involve integrating the charge/voltage rate over that interval.

# Sensor

$I[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

• $$n_x, n_y$$ are integers indexing pixels in image array.
• $$(x,y)$$ is spatial location
• $$I[n_x,n_y]$$ is recorded pixel intensity.
• $$E(x,y,t)$$ is light "power" per unit area incident at location $$(x,y)$$ on the sensor plane at time $$t$$
• $$(\bar{x}_{n_x},\bar{y}_{n_y})$$ is the "center" spatial location of the pixel / sensor element at $$[n_x,n_y]$$.
• $$p(x,y)$$ is spatial sensitivity of the sensor (might be lower near boundaries, etc.)
• $$q$$ is quantum efficiency of the sensor (photons/energy to charge/voltage)
• $$T$$ is the duration of the exposure interval.

CCD/CMOS sensors measure total energy or "count photons" that arrived during exposure.

# Sensor

$I[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

$$I[n_x,n_y]$$ is recorded pixel intensity. $$I[n_x,n_y]$$ is the ideal unquantized pixel intensity

# Sensor

$I^0[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

$I \leftarrow I^0$

# Sensor

$I^0[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

$I \leftarrow I^0$

#### Shot Noise

• Caused by uncertainty in photon arrival
• Actual number of photons $$K$$ is a discrete random variable with Poisson distribution
• $$P(K = k) = \frac{\lambda^k e^{-\lambda}}{k!}$$
• $$\lambda$$ is the "expected" number of photons. In our case, $$\propto I^0$$
• Property of Poisson distribution: Mean and Variance both equal to $$\lambda$$
• Often, shot noise is modeled with additive Gaussian noise with signal dependent variance:

$I \leftarrow I^0 + \sqrt{I^0}~~\epsilon_1$

where $$\epsilon \sim \mathcal{N}(0,1)$$ (Gaussian random noise with mean 0, variance 1).

$$\sqrt{I^0}\epsilon_1~~\sim~~ ?$$ $$\sqrt{I^0}\epsilon_1~~\sim \mathcal{N}(?,?)$$ $$\sqrt{I^0}\epsilon_1~~\sim \mathcal{N}(0,I^0)$$

# Sensor

$I^0[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

$I \leftarrow I^0 + \sqrt{I^0}~~\epsilon_1$

• Signal amplified by gain $$g$$ before digitization. Based on ISO (higher $$g$$ for higher ISO).
• Some signal-independent Gaussian noise added before and after amplification.

$I \leftarrow g \times (I^0 + \sqrt{I^0}~~\epsilon_{1})$ $I \leftarrow g \times (I^0 + \sqrt{I^0}~~\epsilon_{1} + \sigma_{2a}\epsilon_{2a})$ $I \leftarrow g \times (I^0 + \sqrt{I^0}~~\epsilon_{1} + \sigma_{2a}\epsilon_{2a}) + \sigma_{2b}\epsilon_{2b}$

where $$\sigma_{2a}$$ and $$\sigma_{2b}$$ are parameters (lower for high quality sensors),
and $$\epsilon_1,\epsilon_{2a},\epsilon_{2b}$$ are $$\mathcal{N}(0,1)$$ noise variables, all independent.

$~~~~~~~~~~~I \leftarrow g I^0 + g \sqrt{I^0}~~\epsilon_1 + g\sigma_{2a}\epsilon_{2a}+\sigma_{2b}\epsilon_{2b}$ $~~~~~~~~~~~I \leftarrow g I^0 + g \sqrt{I^0}~~\epsilon_1 + \color{red}{g\sigma_{2a}\epsilon_{2a}+\sigma_{2b}\epsilon_{2b}}$ $I \leftarrow ~~~~~g I^0~~~~~ + ~~~g \sqrt{I^0}~~\epsilon_1~~~ + ~~~~~\sqrt{\big(g^2\sigma_{2a}^2+\sigma_{2b}^2\big)}~~\epsilon_2$ $I \leftarrow \underbrace{g I^0}_{\tiny \mbox{Amplified Signal}} + \underbrace{g \sqrt{I^0}~~\epsilon_1}_{\tiny \mbox{Amplified Shot Noise}} + \underbrace{\sqrt{\big(g^2\sigma_{2a}^2+\sigma_{2b}^2\big)}~~\epsilon_2}_{\tiny \mbox{Amplified and un-amplified additive noise}}$

# Sensor

$I^0[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

$I \leftarrow g I^0 + g \sqrt{I^0}~~\epsilon_1 + \sqrt{\big(g^2\sigma_{2a}^2+\sigma_{2b}^2\big)}~~\epsilon_2$

#### Digitization

• Final step is rounding and clipping (by an analog to digital converter)

$I \leftarrow \text{Round}\Big(g I^0 + g \sqrt{I^0}~~\epsilon_1 + \sqrt{\big(g^2\sigma_{2a}^2+\sigma_{2b}^2\big)}~~\epsilon_2\Big)$

$I = \min\Bigg(I_\max,~~\text{Round}\Big(g I^0 + g \sqrt{I^0}~~\epsilon_1 + \sqrt{\big(g^2\sigma_{2a}^2+\sigma_{2b}^2\big)}~~\epsilon_2\Big)\Bigg)$

# Sensor

$I^0[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

$I = \min\Bigg(I_\max,~~\text{Round}\Big(g I^0 + g \sqrt{I^0}~~\epsilon_1 + \sqrt{\big(g^2\sigma_{2a}^2+\sigma_{2b}^2\big)}~~\epsilon_2\Big)\Bigg)$

ignoring sensor saturation, dark current, ...

#### Why study this ?

• To understand the degradation process of noise (if we want to denoise / recover $$I^0$$ from $$I$$).
• To prevent degradation during capture, because we control exposure time $$T$$ and ISO / gain $$g$$.
• To understand the different trade-offs for loss of information from noise, rounding, and clipping.

# Sensor

$I^0[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

$I = \min\Bigg(I_\max,~~\text{Round}\Big(g I^0 + g \sqrt{I^0}~~\epsilon_1 + \sqrt{\big(g^2\sigma_{2a}^2+\sigma_{2b}^2\big)}~~\epsilon_2\Big)\Bigg)$

### Rounding vs Clipping

Ignoring noise, what is the optimal $$g$$ for a given $$I^0[n_x,n_y]$$ ?

• Keep $$g$$ low so that most values of $$g I^0[n_x,n_y]$$ are below $$I_\max$$.
• But if $$g$$ is too low, a lot of the variation will get rounded to the same value.

# Sensor

$I^0[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

$I = \min\Bigg(I_\max,~~\text{Round}\Big(g I^0 + g \sqrt{I^0}~~\epsilon_1 + \sqrt{\big(g^2\sigma_{2a}^2+\sigma_{2b}^2\big)}~~\epsilon_2\Big)\Bigg)$

Note that here, our 'ideal' intensity is $$gI^0$$, everything else is noise.

### Light vs Amplification

Say we have chosen the optimal target values for the product $$gI^0$$. Is it better:

• To have a higher $$g$$ and lower magnitude $$I^0$$
• To have a lower $$g$$ and higher magnitude $$I^0$$
• Depends, based on $$\sigma_{2a}, \sigma_{2b}$$

# Sensor

$I^0[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

$I = \min\Bigg(I_\max,~~\text{Round}\Big(g I^0 + g \sqrt{I^0}~~\epsilon_1 + \sqrt{\big(g^2\sigma_{2a}^2+\sigma_{2b}^2\big)}~~\epsilon_2\Big)\Bigg)$

Note that here, our 'ideal' intensity is $$gI^0$$, everything else is noise.

### Light vs Amplification

Say we have chosen the optimal target values for the product $$gI^0$$. Is it better:

• To have a higher $$g$$ and lower magnitude $$I^0$$
• To have a lower $$g$$ and higher magnitude $$I^0$$
• Depends, based on $$\sigma_{2a}, \sigma_{2b}$$

S. Hasinoff, F. Durand, W.T. Freeman, "Noise-Optimal Capture for High Dynamic Range Photography," CVPR 2010.

# Sensor

$I^0[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

So how do we increase $$I^0$$ ?

• Better sensors (higher $$q$$)
• Larger sensor elements: $$~~p(\cdot,\cdot) > 0$$ over a larger area.

But we've gone the other way: cameras stuff more 'megapixels' in smaller form factors.

# Sensor

$I^0[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

Increase exposure time $$T$$ ?

• If scene is static and camera is stationary:
• $$E(x,y,t)$$ doesn't change with $$t \Rightarrow I^0 \propto T$$
• If scene is moving ...

# Sensor

$I^0[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

Increase $$E(x,y,t)$$ itself. How ?

• Take pictures outdoors, or under brighter lights.
• Don't use a pinhole camera !

# Lenses

• Dynamic range and what part of the image should be well exposed (rounding and clipping)

• Choosing between:
• ISO i.e. Gain & noise
• Exposure Time & motion blur
• F-stop i.e. aperture size & defocus blur

# Color

$I^0[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q~dx~dy \Bigg] dt$

We left out at an important term in this equation: wavelength

# Color

$I^0[n_x,n_y] = \int_{t=0}^T \Bigg[\int E(\lambda,x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~q(\lambda)~d\lambda~dx~dy \Bigg] dt$

• Light carries different amounts of power in different wavelengths
• $$E(\lambda,x,y,t)$$ now refers to power per unit area per unit wavelength
• In wavlength $$\lambda$$, incident at $$(x,y)$$ at time $$t$$
• Both spectral and spatial density function
• $$q(\lambda)$$: Quantum efficiency also a function of wavelength
• CMOS/CCD sensors are sensitive (have high $$q$$) across most of the visible spectrum
• Actually extend to longer than visible wavelengths (near infra red)
• Why cameras have NIR filter, to prevent NIR radiation from being 'superimposed' on the image

Q: But this measures 'total' power in all wavelengths. How do we measure color ?

Ans: By putting a color filter in front of each sensor element.

# Color

$I^0[n_x,n_y,c] = \int_{t=0}^T \Bigg[\int E(\lambda,x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~\Pi_c(\lambda)~q(\lambda)~d\lambda~dx~dy \Bigg] dt$

$\text{for}~~~~c \in \{R,G,B\}$

• $$\Pi_c$$ is the transmittance of a color filter for color channel $$c$$
• E.g, $$~~\Pi_R$$ will transmit power in (be high for) wavelengths in the red part of the visible spectrum
and attenuate power in (be low for) other wavelengths.
• Sometimes also called "color matching functions"

# Color

$I^0[n_x,n_y,c] = \int_{t=0}^T \Bigg[\int E(\lambda,x,y,t)~~p(x-\bar{x}_{n_x},y-\bar{y}_{n_y})~\Pi_c(\lambda)~q(\lambda)~d\lambda~dx~dy \Bigg] dt$

$\text{for}~~~~c \in \{R,G,B\}$

• But we can only put one filter in front of each sensor element / pixel location.
• So color cameras "multiplex" color measurements: they measure a different color channel at each location.
• Usually in an alternating pattern called the Bayer pattern:
• Note: a disadvantage is that color filters block light, so measured $$I^0$$ values are lower.
• That's why black and white / grayscale cameras are "faster" than color cameras.

# Final Steps

Final steps in camera processing pipelines (except for some DSLR cameras shooting in RAW):

• Filter Colors to Standard RGB:
• Cameras often use their own color filters $$\Pi_c$$.
• Apply a linear transformation to map those measurements to standard RGB.
• White-balance: scale color channels to remove color cast from a non-neutral illuminant.
• Tone-mapping:
• The simplest form is "gamma correction" (approximately raising each intensity to the power $$(1/2.2)$$)
• Done based on standard developed for what old display devices expected
• Fits the full set of measurable colors into the gamut that can be displayed / printed
• Modern cameras often do more advanced processing (to make colors look vibrant)
• Compression

And that's how you get your PNG / JPEG images !

# Other Effects

Other effects we did not talk about. E.g.,

• Real Lenses not thin lenses and have distortions:

# Other Effects

Other effects we did not talk about. E.g.,

Rolling Shutter: No explicit shutter but when pixels reset electronically (along scanlines)

# Images

• Exist as 2-D (grayscale) or 3-D (color image) arrays

• Precision: uint8 (0-255), uint16(0-65535), Floating point (0-1)
• We will often treat them as (positive) real numbers.
• Conventions:
• $$I[n_x,n_y] \in \mathbb{R}$$
• $$I[n_x,n_y,c] \in \mathbb{R}$$
• $$I[n_x,n_y] \in \mathbb{R}^3$$
• $$I[n] \in \mathbb{R}$$ or $$\in \mathbb{R}^3$$, where $$n \in \mathbb{Z}^2$$
• How do you process / manipulate these arrays ?

# Point-wise Operations

• $$Y[n] = h(X[n])$$
• $$Y[n] = h(X_1[n],X_2[n],\ldots))$$
• $$Y[n] = h_n(X[n])$$ - Might vary based on location.
• $$h(\cdot)$$ itself might be based on 'global statistics'