CNN

Posted on June 2, 2019

1 Week 1
- 1.1 Problem Types
2 Transformations
3 Stride
4 Example dimensions
5 Week 2
- 5.1 ConvNet
6 Transfer learning
7 Mathematica
8 Increasing Dataset size

1 Week 1

Core problem with images and Neural Network. A 1000 x 1000 blackwhite image is a 1 million parameter, add just 1000 hidden layer, we have 1 billion parameters.
The Solution is using convolution which is a type of information compression.

1.1 Problem Types

Image classification
Object detection
- Bounded object and Position detection
Style transfer - art generation

2 Transformations

2.1 Kernel(Filter)

Kernels aka Filters - Sliding window operation with some convolution matrix on your image which may result in compressed dimension.

Convolution is elementwise multiplication. It is NOT the dotproduct matrix multiplication.
X convolve with kernel means
submatrix(X) elementwise multiply by kernel matrix. Example of the

\[kernel = \begin{bmatrix} 2 & 5 \\ -7 & -6 \end{bmatrix}\]

\[\begin{bmatrix} a & b & c\\ d & e & f\\ g & h & i\\ j & k & l \end{bmatrix} \star^{Conv1} \begin{bmatrix} 2 & 5 \\ -7 & -6 \end{bmatrix} = \begin{bmatrix} Sum(\begin{bmatrix} 2a & 5b \\ -7d & -6e \end{bmatrix}) & Conv2\\ Conv3 & Conv4 \\ Conv5 & Conv6\\ \end{bmatrix}\] \[Conv1 = Sum(\begin{bmatrix} 2a & 5b \\ -7d & -6e \end{bmatrix}) = 2a + 5b -7d -6e\]

2.2 Pooling

Pooling - Sliding window Aggregation or compression operation which will result in compressed dimension.
- Max pooling
- Min pooling
- L2 pooling

2.3 Kernel(filter) vs Pooling

They are very similar but Kernel(Filters) have a convolution matrix which is applied onto the layer or input.

Kernel(filter) can be thought of as Transform w/ convolution matrix then Pool; meaning Pooling is an implicit operation of Kernels(filters).

2.4 Padding

Adding extra data all around the edges of the original input image.
- Ex. 2-by-3 image is padded to 4-by-5

2-by-3 is padded to 4-by-5

o o o  =PADDING=>  P P P P P
o o o              P o o o P
                   P o o o P
                   P P P P P

Why?
- A kernel(filter) is a sliding window meaning it will only touch an edge once for an unpadded image.
  - No-padding means you lose alot of the information on the edges of the image.
- Every kernel(filter) or Pooling will decrease dimension and in extreme case it collapses to 1-by-1 pixel, which padding prevents.

2.5 Channels

Alternative terminology to Channel=depth of the 3d volume
Channels just means an extra dimension like RGB uses 3 dimensions, aka we get 3 matrices stacked on top of each other
- eg. 100-by-50 RGB image is dimensions 100-50-3
Number of Channels of input must match Number of Channels of kernel(filter)

Convolving a RGB image denoted with dimensions 100-50-3 with a 6-6-3 results in a 95-45-1 matrix. How did we calculate the output dimensions 95-45-1?
(InputDim-ConvolveDim+1)=OutputDim
(100-6+1)=95
(50-6+1)=45
(3-3+1)=1

Observe how the rgb dimension collapses when we convolved with the analogous 6-6-3 convolution filter.

KNOW we can use multiple convolution filters then stack the results meaning we can get output of dimension 95-45-N
- N is the number of convolution filters we used on the original input.

DIMENSION OF # of INPUT CHANNELS MUST EQUAL TO # OF CONVOLUTION CHANNELS BUT NEITHER HAS ANY RELATION TO THE # OF OUTPUT CHANNELS

3 Stride

Kernel(filter) and Pooling are considered Sliding windows.
Stride is how many units to slide each step.

4 Example dimensions

\[\displaylines{ \overbrace{ 32\times32\times{\color{green}\overset{channel}{3}}}^{Input} \overbrace{ \overset{Convolve}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\Convolve=5\times5\\filterCount=6}}{\longrightarrow}} \overset{32-5+1}{28}\times\overset{32-5+1}{28}\times6 \overset{Pool}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\filters=5}}{\longrightarrow}} 14\times14\times6}^{Layer\ 1} \overbrace{ \overset{Convolve}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\filters=5}}{\longrightarrow}}\overset{32-5+1}{28}\times\overset{32-5+1}{28}\times6 \overset{Pool}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\filters=5}}{\longrightarrow}} 14\times14\times6 =200 \rightarrow}^{Layer\ 2} \\ \overset{Matrix^{200\times60}}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\filters=5}}{\longrightarrow}}}\]

You should notice there is NO activation AKA non-linear function after each layer(convolution+pooling) and the activation is only done once at the end.

5 Week 2

5.1 ConvNet

Practically use opensource resnet, then use transfer learning

6 Transfer learning

Prexisting model and freeze all the layers AKA freeze the trained weights or coefficients and throw away the last layer(which should represent an activation function).
- Why throw away the last layer?
  - Ex: The existing model outputs 100 classifications symbolized with 100-vector but instead you want to only do 2 classes so you throw away the last layer which outputs 100-vector classes and make your own layer that outputs 2-vector classes.
- Training means you only train the last layer you replaced, NOT the frozen layer of the previous model.
Typically you do this by setting trainableparameters=0 or freeze=1 which tells the ML framework not to train these layers.

7 Mathematica

randphoto = ResourceFunction["RandomPhoto"][200];
rphoto2 = Binarize@randphoto;
hlinekernel = {{-1,-1,-1},{2,2,2},{-1,-1,-1}};
vlinekernel = {{-1,2,-1},{-1,2,-1},{-1,2,-1}};

a = ImageConvolve[rphoto2,vlinekernel]

b= ImageConvolve[rphoto2,hlinekernel]

ImageAdd[a,b]

8 Increasing Dataset size

Tactics

Making a mirror duplicate
Random cropping
Rotation
Shearing
Local Warping
Color Shifting
- PCA color augmentation