CNN

Posted on June 2, 2019
Tags: machinelearning

1 Week 1

Core problem with images and Neural Network. A 1000 x 1000 blackwhite image is a 1 million parameter, add just 1000 hidden layer, we have 1 billion parameters.
The Solution is using convolution which is a type of information compression.

1.1 Problem Types

  • Image classification
  • Object detection
    • Bounded object and Position detection
  • Style transfer - art generation

2 Transformations

2.1 Kernel(Filter)

  • Kernels aka Filters - Sliding window operation with some convolution matrix on your image which may result in compressed dimension.

Convolution is elementwise multiplication. It is NOT the dotproduct matrix multiplication.
X convolve with kernel means
submatrix(X) elementwise multiply by kernel matrix. Example of the

\[kernel = \begin{bmatrix} 2 & 5 \\ -7 & -6 \end{bmatrix}\]

\[\begin{bmatrix} a & b & c\\ d & e & f\\ g & h & i\\ j & k & l \end{bmatrix} \star^{Conv1} \begin{bmatrix} 2 & 5 \\ -7 & -6 \end{bmatrix} = \begin{bmatrix} Sum(\begin{bmatrix} 2a & 5b \\ -7d & -6e \end{bmatrix}) & Conv2\\ Conv3 & Conv4 \\ Conv5 & Conv6\\ \end{bmatrix}\] \[Conv1 = Sum(\begin{bmatrix} 2a & 5b \\ -7d & -6e \end{bmatrix}) = 2a + 5b -7d -6e\]

2.2 Pooling

  • Pooling - Sliding window Aggregation or compression operation which will result in compressed dimension.
    • Max pooling
    • Min pooling
    • L2 pooling

2.3 Kernel(filter) vs Pooling

They are very similar but Kernel(Filters) have a convolution matrix which is applied onto the layer or input.

Kernel(filter) can be thought of as Transform w/ convolution matrix then Pool; meaning Pooling is an implicit operation of Kernels(filters).

2.4 Padding

  • Adding extra data all around the edges of the original input image.
    • Ex. 2-by-3 image is padded to 4-by-5
2-by-3 is padded to 4-by-5

o o o  =PADDING=>  P P P P P
o o o              P o o o P
                   P o o o P
                   P P P P P
  • Why?
    • A kernel(filter) is a sliding window meaning it will only touch an edge once for an unpadded image.
      • No-padding means you lose alot of the information on the edges of the image.
    • Every kernel(filter) or Pooling will decrease dimension and in extreme case it collapses to 1-by-1 pixel, which padding prevents.

2.5 Channels

  • Alternative terminology to Channel=depth of the 3d volume
  • Channels just means an extra dimension like RGB uses 3 dimensions, aka we get 3 matrices stacked on top of each other
    • eg. 100-by-50 RGB image is dimensions 100-50-3
  • Number of Channels of input must match Number of Channels of kernel(filter)

Convolving a RGB image denoted with dimensions 100-50-3 with a 6-6-3 results in a 95-45-1 matrix. How did we calculate the output dimensions 95-45-1?
(InputDim-ConvolveDim+1)=OutputDim
(100-6+1)=95
(50-6+1)=45
(3-3+1)=1

Observe how the rgb dimension collapses when we convolved with the analogous 6-6-3 convolution filter.

  • KNOW we can use multiple convolution filters then stack the results meaning we can get output of dimension 95-45-N
    • N is the number of convolution filters we used on the original input.

DIMENSION OF # of INPUT CHANNELS MUST EQUAL TO # OF CONVOLUTION CHANNELS BUT NEITHER HAS ANY RELATION TO THE # OF OUTPUT CHANNELS

3 Stride

4 Example dimensions

\[\displaylines{ \overbrace{ 32\times32\times{\color{green}\overset{channel}{3}}}^{Input} \overbrace{ \overset{Convolve}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\Convolve=5\times5\\filterCount=6}}{\longrightarrow}} \overset{32-5+1}{28}\times\overset{32-5+1}{28}\times6 \overset{Pool}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\filters=5}}{\longrightarrow}} 14\times14\times6}^{Layer\ 1} \overbrace{ \overset{Convolve}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\filters=5}}{\longrightarrow}}\overset{32-5+1}{28}\times\overset{32-5+1}{28}\times6 \overset{Pool}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\filters=5}}{\longrightarrow}} 14\times14\times6 =200 \rightarrow}^{Layer\ 2} \\ \overset{Matrix^{200\times60}}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\filters=5}}{\longrightarrow}}}\]

5 Week 2

5.1 ConvNet

Practically use opensource resnet, then use transfer learning

6 Transfer learning

7 Mathematica

randphoto = ResourceFunction["RandomPhoto"][200];
rphoto2 = Binarize@randphoto;
hlinekernel = {{-1,-1,-1},{2,2,2},{-1,-1,-1}};
vlinekernel = {{-1,2,-1},{-1,2,-1},{-1,2,-1}};
a = ImageConvolve[rphoto2,vlinekernel]
b= ImageConvolve[rphoto2,hlinekernel]
ImageAdd[a,b]

8 Increasing Dataset size

Tactics