CNN
1 Week 1
Core problem with images and Neural Network. A 1000 x 1000 blackwhite image is a 1 million parameter, add just 1000 hidden layer, we have 1 billion parameters.
The Solution is using convolution which is a type of information compression.
1.1 Problem Types
- Image classification
- Object detection
- Bounded object and Position detection
- Style transfer - art generation
2 Transformations
2.1 Kernel(Filter)
- Kernels aka Filters - Sliding window operation with some convolution matrix on your image which may result in compressed dimension.
Convolution is elementwise multiplication. It is NOT the dotproduct matrix multiplication.
X convolve with kernel means
submatrix(X) elementwise multiply by kernel matrix.
Example of the
\[kernel = \begin{bmatrix} 2 & 5 \\ -7 & -6 \end{bmatrix}\]
\[\begin{bmatrix} a & b & c\\ d & e & f\\ g & h & i\\ j & k & l \end{bmatrix} \star^{Conv1} \begin{bmatrix} 2 & 5 \\ -7 & -6 \end{bmatrix} = \begin{bmatrix} Sum(\begin{bmatrix} 2a & 5b \\ -7d & -6e \end{bmatrix}) & Conv2\\ Conv3 & Conv4 \\ Conv5 & Conv6\\ \end{bmatrix}\] \[Conv1 = Sum(\begin{bmatrix} 2a & 5b \\ -7d & -6e \end{bmatrix}) = 2a + 5b -7d -6e\]
2.2 Pooling
- Pooling - Sliding window Aggregation or compression operation which will result in compressed dimension.
- Max pooling
- Min pooling
- L2 pooling
2.3 Kernel(filter) vs Pooling
They are very similar but Kernel(Filters) have a convolution matrix which is applied onto the layer or input.
Kernel(filter) can be thought of as Transform w/ convolution matrix then Pool; meaning Pooling is an implicit operation of Kernels(filters).
2.4 Padding
- Adding extra data all around the edges of the original input image.
- Ex. 2-by-3 image is padded to 4-by-5
2-by-3 is padded to 4-by-5
o o o =PADDING=> P P P P P
o o o P o o o P
P o o o P P P P P P
- Why?
- A kernel(filter) is a sliding window meaning it will only touch an edge once for an unpadded image.
- No-padding means you lose alot of the information on the edges of the image.
- Every kernel(filter) or Pooling will decrease dimension and in extreme case it collapses to 1-by-1 pixel, which padding prevents.
- A kernel(filter) is a sliding window meaning it will only touch an edge once for an unpadded image.
2.5 Channels
- Alternative terminology to Channel=depth of the 3d volume
- Channels just means an extra dimension like RGB uses 3 dimensions, aka we get 3 matrices stacked on top of each other
- eg. 100-by-50 RGB image is dimensions 100-50-3
- Number of Channels of input must match Number of Channels of kernel(filter)
Convolving a RGB image denoted with dimensions 100-50-3 with a 6-6-3 results in a 95-45-1 matrix.
How did we calculate the output dimensions 95-45-1?
(InputDim-ConvolveDim+1)=OutputDim
(100-6+1)=95
(50-6+1)=45
(3-3+1)=1
Observe how the rgb dimension collapses when we convolved with the analogous 6-6-3 convolution filter.
- KNOW we can use multiple convolution filters then stack the results meaning we can get output of dimension 95-45-N
- N is the number of convolution filters we used on the original input.
DIMENSION OF # of INPUT CHANNELS MUST EQUAL TO # OF CONVOLUTION CHANNELS BUT NEITHER HAS ANY RELATION TO THE # OF OUTPUT CHANNELS
3 Stride
- Kernel(filter) and Pooling are considered Sliding windows.
- Stride is how many units to slide each step.
4 Example dimensions
\[\displaylines{ \overbrace{ 32\times32\times{\color{green}\overset{channel}{3}}}^{Input} \overbrace{ \overset{Convolve}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\Convolve=5\times5\\filterCount=6}}{\longrightarrow}} \overset{32-5+1}{28}\times\overset{32-5+1}{28}\times6 \overset{Pool}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\filters=5}}{\longrightarrow}} 14\times14\times6}^{Layer\ 1} \overbrace{ \overset{Convolve}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\filters=5}}{\longrightarrow}}\overset{32-5+1}{28}\times\overset{32-5+1}{28}\times6 \overset{Pool}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\filters=5}}{\longrightarrow}} 14\times14\times6 =200 \rightarrow}^{Layer\ 2} \\ \overset{Matrix^{200\times60}}{\underset{\displaylines{{\color{green}channel=3}\\stride=1\\pad=0\\filters=5}}{\longrightarrow}}}\]
- You should notice there is NO activation AKA non-linear function after each layer(convolution+pooling) and the activation is only done once at the end.
5 Week 2
5.1 ConvNet
Practically use opensource resnet, then use transfer learning
6 Transfer learning
- Prexisting model and freeze all the layers AKA freeze the trained weights or coefficients and throw away the last layer(which should represent an activation function).
- Why throw away the last layer?
- Ex: The existing model outputs 100 classifications symbolized with 100-vector but instead you want to only do 2 classes so you throw away the last layer which outputs 100-vector classes and make your own layer that outputs 2-vector classes.
- Training means you only train the last layer you replaced, NOT the frozen layer of the previous model.
- Why throw away the last layer?
- Typically you do this by setting
trainableparameters=0
orfreeze=1
which tells the ML framework not to train these layers.
7 Mathematica
= ResourceFunction["RandomPhoto"][200];
randphoto = Binarize@randphoto;
rphoto2 = {{-1,-1,-1},{2,2,2},{-1,-1,-1}};
hlinekernel = {{-1,2,-1},{-1,2,-1},{-1,2,-1}}; vlinekernel
= ImageConvolve[rphoto2,vlinekernel] a
= ImageConvolve[rphoto2,hlinekernel] b
ImageAdd[a,b]
8 Increasing Dataset size
Tactics
- Making a mirror duplicate
- Random cropping
- Rotation
- Shearing
- Local Warping
- Color Shifting
- PCA color augmentation