Logic Neuron Mathematica
""] := {Inset[Style[label,Purple],Midpoint[{p,q}]], Arrow[{p,q}]};
AArrow[{p_, q_}, label_:= Graphics@AArrow[{ConstantArray[0,Length[p]],p}];
vec2[p_] := Which[ ArrayQ[q] == False, Graphics@AArrow[{ConstantArray[0,Length[p]],p},q],
vec2[p_,q_] :== True, Graphics@AArrow[{p,q},""] ]
ArrayQ[q] = Graphics@AArrow[{p,q},label];
vec2[p_,q_,label_] :
= Which[ Length[p] == 2, vec2[p],
vec[p_] :== 3, vec2[p] /. Graphics -> Graphics3D ];
Length[p] = Which[ Length[p] == 2, vec2[p,q],
vec[p_,q_] :== 3, vec2[p,q] /. Graphics -> Graphics3D ];
Length[p] = Which[ Length[p] == 2,vec2[p,q,label],
vec[p_,q_,label_] :== 3,vec2[p,q,label] /. Graphics -> Graphics3D];
Length[q] = (# /. Arrow[x__] -> {c, Arrow[x]})&
sty[c__:Red] :* sty[Red]@vec[{3,5}]*)
(
= expr /. List[p__]-> MatrixForm[List[p]]
matrix[expr_] := TeXForm[Row[{stmt}]];
cout[stmt__] := Graphics3D@Point[x];
pnt[x_] :
* polynomial *)
(* helper functions *)
(vars[n_, m_] := Flatten@Transpose[Outer[Symbol@StringJoin[##] &, CharacterRange["A", "Z"][[;; m]], ToString /@ Range[n]]]
= Flatten[{1,vars[v-1,1]}];
polyvar[v_] :
* Give a list of coefficents and it will generate a polynomial with variables *)
(= Transpose[coef].polyvar[Length@coef];
poly[coef_] :@Thread[{{1,2,3}}];
poly
= Transpose;
T := Dimensions;
Dim = ConstantArray[1,n]
Ones[n_] := MapThread[Append, {#, x}] &
addCol[x_] :* addCol[ConstantArray[1,2]]@{{1,3},{3,4}} // matrix*)
(= Append[#,x]&
addRow[x_] :
= #[[1]][[1]]&
unwrap :* unwrap[{{x+1}}] = x+1 *)
(
* 3d gradient color *)
(1] := {PlotStyle->Opacity[opacity],ColorFunction -> Function[{x, y, z}, Hue[z]]};
opt3d[opacity_:* Plot3D[f,{x1,0,1},{x2,0,1},Evaluate@optcolor3d] *) (
ClearAll[x,x1,x2,w1,w2,b,lineq,nonlineq]= {{x1,x2}};
x := 10;
w1 := 10;
w2 := {{w1,w2}};
beta := -5;
b := x.T[beta] + b;
lineq := 1/(1+Exp[-lineq]);
nonlineq :"x is: `1`",matrix@x]
StringForm["weights are: `1`",matrix@T[beta]]
StringForm["linear eq is: `1`", unwrap@lineq]
StringForm[
Row[{
Show[
-1,1},{x2,-1,1},PlotStyle->Opacity[0.6],ColorFunction -> Function[{x, y, z}, Hue[z]]],
Plot3D[nonlineq,{x1,0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
Graphics3D[{PointSize[Large],Point[{{->Small,
ImageSize->True
Axes
]
Show[-1,1},{x2,-1,1}],
ContourPlot[nonlineq,{x1,0,0},{0,1},{1,0},{1,1}}]}],
Graphics[{PointSize[Large],Point[{{->Small,
ImageSize->True
Axes
]
}]
1 Understanding Error
ClearAll[x,x1,x2,w1,w2,b,lineq,nonlineq]= {{x1,x2}} ;
x = {{w1,w2}};
beta = -2;
b = x.T[beta] + b;
lineq * nonlineqval = nonlineq /. {w1 -> 1,w2->2}; *)
(= 1/(1+Exp[-lineq]);
nonlineq := nonlineq /. {w1 ->1, w2 ->2}
forward :"gradient is nonlineq: `1` with weights as `2`", unwrap@forward,beta]
StringForm[= {0,0,0};
pt1 = {0,0,unwrap@(forward /. {x1->0,x2->0})};
predpt1 = {0,1,1} ;
pt2 = {0,1,unwrap@(forward /. {x1->0,x2->1})};
predpt2 = {1,0,1} ;
pt3 = {1,0,unwrap@(forward /. {x1->1,x2->0})};
predpt3 = {1,1,1};
pt4 = {1,1,unwrap@(forward /. {x1->1,x2->1})};
predpt4
Row[{
Show[@vec[predpt1,pt1,"pt1 err"],
sty[]@vec[predpt2,pt2,"pt2 err"],
sty[]@vec[predpt3,pt3,"pt3 err"],
sty[]@vec[predpt4,pt4,"pt4 err"],
sty[]-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
Plot3D[forward ,{x1,0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
Graphics3D[{PointSize[Large],Point[{{->Small,
ImageSize->True
Axes
]
Show[-1,1},{x2,-1,1}],
ContourPlot[forward,{x1,0,0},{0,1},{1,0},{1,1}}]}],
Graphics[{PointSize[Large],Point[{{->Small,
ImageSize->True
Axes
]
}]
= pt2[[2]]- (forward /. {x1 -> pt2[[1]], x2 -> pt2[[2]]})
predictionDistpt2 := 0.5*(predictionDistpt2)^2
SEpt2 :"original point:", pt2]
cout["predicted point:",predpt2]
cout["vanilla distance:",predictionDistpt2]
cout["mutated distance:",SEpt2] cout[
$$\text{predicted point:}\left\{0,1,\frac{1}{2}\right\}$$
$$\text{vanilla distance:}\left( \begin{array}{c} \frac{1}{2} \\ \end{array} \right)$$
$$\text{mutated distance:}\left( \begin{array}{c} 0.125 \\ \end{array} \right)$$
Distance DOES NOT mean 3D vector distance between the 2 points.
It only means the distance of the output aka 3rd parameter which is 1 and 1/2 in this case.
Even if it were the case, the 3D vector distance should be the same since the input parameters x1,x2 should be same in the original point and prediction.
Vanilla distance is simple subtraction.
Mutated distance is squaring vanilla distance then multiplying by 0.5.
2 Manual Training
- Loss function is not simply distance, it is a mutated distance!
- Aggregate(SUM) the Loss Function applied to each point.
- Gradient: derive symbolically the derivative of the Loss function wrt to weight.
- Minimizing loss <=> (solving the derivative of loss function wrt weight = 0) , but since this is difficult we just iteratively subtract weight from derivative
- Intuition Loss function represents the Hill of Errors
- Each datapoint is allowed to have it’s own Hill of Error
- x-axis = w1 ,y-axis = 2 in our Imaginary Hill of Error.
- z-axis = Error of Prediction from ground truth.
- Goal is reach the valley of No Error.
- Gradient is calulated from:
- Subtracting the nonlinear sigmoid symbolic equation containing (w1,w2,x1,x2) from the target ground truth (y2).
- Then apply the Loss function Mutation (Square then multiply 0.5).
- Calculate derivative wrt to weights w1,w2.
- Output is a gradient
- Gradient:
- INPUT: that takes (w1,w2,x1,x2,y2)
- OUTPUTS: the slope aka direction wrt to weight w1,w2 that gets us closer to a valley.
- INPUT: that takes (w1,w2,x1,x2,y2)
- Calculate the derivatives aka Backprops for each datapoint by
- Gradient applied to (w1,w2,x1,x2,y2)
- Do this for all points resulting in 4 slopes
- Sum the backprops
- Remember each datapoint has it’s own Hill of Error
- Summing the backprops means summing the slopes on each Hill of Error. This summed backprop is the slope on the summed Bigger Hill of Error and this is okay to do because the sum of each backprop(derivative) is equal to the backprop(derivative) of the sum due to linearity of differentiation.
- Intuitively just adding Hills of error for each datapoint is okay as well because we want to minimize overall error.
- Imagine a scenarior where One hill of error on datapoint 1 gives us a backprop in one direction and the other hill of error on datapoint 2 gives us a backpropr in the other direction.
The sum of the direction means the steepest backprop wins but if they are same steepness, then we don’t move.
Of course one issue that is a universal problem in Neural net is getting trapped in a local minima.
- Imagine a scenarior where One hill of error on datapoint 1 gives us a backprop in one direction and the other hill of error on datapoint 2 gives us a backpropr in the other direction.
- Sum the Backprops(derivatives) multiply by an alpha then use this to subtract from the Current weight to get our New weight.
Quick summary:
Forward prop just means calculate predicted output for each datapoint.
Forward prop is equivalent to the non-linear activation function composed with the linear function.
Back prop just means calculate the derivative for the loss function.
Some Terms:
beta is weight vector containing w1, w2.
wIter1 is weight vector on init, we pick random weight {1,2}.
wIter2 is weight vector after 1st backprop.
wIter3 is weight vector after 2nd backprop.
forward1 = nonlineq
gradient is calculated by subtracting the nonlineq from the target value, then applies some loss transformation like squaring it then finding the derivatives wrt to weights(beta)
forward1 is the initialized surface plot built from the weights
forward1pt1, forward1pt2 … are the points on the forward1 surface
2.1 Gradient formula
\(GradientFunction(x1,x2,w1,w2,y2) = \frac{\partial}{\partial w1,w2}Loss(x1,x2,w1,w2,y2)\)
- INPUT: \(x1\) \(x2\) are data, \(y2\) is ground truth, \(w1\) \(w2\) are weights
- OUTPUT: Differential Loss weight vector \((w1',w2')\)
- USECASE: update weights with \((w1 \leftarrow w1-\alpha w1',w2 \leftarrow w2-\alpha w2')\)
* The derivative *)
(= D[0.5*(unwrap@nonlineq-y2)^2,beta];
gradient
cout[gradient]
$$\left\{\frac{1. \text{x1} e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2} \left(\frac{1}{e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2}+1}-\text{y2}\right)}{\left(e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2}+1\right)^2},\frac{1. \text{x2} e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2} \left(\frac{1}{e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2}+1}-\text{y2}\right)}{\left(e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2}+1\right)^2}\right\}$$
2.1.1 1st pass
= {1,2};
wIter1 "Initial weight vector=",wIter1] cout[
$$\text{Initial weight vector=}\{1,2\}$$
= nonlineq /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]]};
forward1 = forward1 /. {x1 -> pt1[[1]], x2 -> pt1[[2]] };
forward1pt1 = forward1 /. {x1 -> pt2[[1]], x2 -> pt2[[2]] };
forward1pt2 = forward1 /. {x1 -> pt3[[1]], x2 -> pt3[[2]] };
forward1pt3 = forward1 /. {x1 -> pt4[[1]], x2 -> pt4[[2]] };
forward1pt4
Show[@vec[{0,0,unwrap@forward1pt1},pt1,"pt1 err"],
sty[]@vec[{0,1,unwrap@forward1pt2},pt2,"pt2 err"],
sty[]@vec[{1,0,unwrap@forward1pt3},pt3,"pt3 err"],
sty[]@vec[{1,1,unwrap@forward1pt4},pt4,"pt4 err"],
sty[]-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
Plot3D[forward1,{x1,0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
Graphics3D[{PointSize[Large],Point[{{->Small,
ImageSize->True
Axes
]
"w=",wIter1]
cout["surfaceplot=",forward1]
cout["predY=(",forward1,forward1pt1,forward1pt2,forward1pt3,forward1pt4,")"] cout[
$$\text{surfaceplot=}\left( \begin{array}{c} \frac{1}{e^{-\text{x1}-2 \text{x2}+2}+1} \\ \end{array} \right)$$
$$\text{predY=(}\left( \begin{array}{c} \frac{1}{e^{-\text{x1}-2 \text{x2}+2}+1} \\ \end{array} \right)\left( \begin{array}{c} \frac{1}{1+e^2} \\ \end{array} \right)\left( \begin{array}{c} \frac{1}{2} \\ \end{array} \right)\left( \begin{array}{c} \frac{1}{1+e} \\ \end{array} \right)\left( \begin{array}{c} \frac{1}{1+\frac{1}{e}} \\ \end{array} \right))$$
= gradient /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]], x1 -> pt1[[1]], x2 -> pt1[[2]], y2 -> pt1[[3]]};
backprop1pt1 = gradient /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]], x1 -> pt2[[1]], x2 -> pt2[[2]], y2 -> pt2[[3]]};
backprop1pt2 = gradient /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]], x1 -> pt3[[1]], x2 -> pt3[[2]], y2 -> pt3[[3]]};
backprop1pt3 = gradient /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]], x1 -> pt4[[1]], x2 -> pt4[[2]], y2 -> pt4[[3]]};
backprop1pt4
"differential loss weight vector for point 1=",backprop1pt1]
cout["differential loss weight vector for point 2=",backprop1pt2]
cout["differential loss weight vector for point 3=",backprop1pt3]
cout["differential loss weight vector for point 4=",backprop1pt4] cout[
$$\text{differential loss weight vector for point 2=}\{0.,-0.125\}$$
$$\text{differential loss weight vector for point 3=}\{-0.143735,0.\}$$
$$\text{differential loss weight vector for point 4=}\{-0.0528771,-0.0528771\}$$
2.1.2 2nd Pass
= wIter1 - 10*(backprop1pt1 + backprop1pt2 + backprop1pt3 + backprop1pt4);
wIter2 "updated weight vector for 2nd forward pass=", wIter2] cout[
$$\text{updated weight vector for 2nd forward pass=}\{2.96612,3.77877\}$$
= nonlineq /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]]};
forward2 = forward2 /. {x1 -> pt1[[1]], x2 -> pt1[[2]] };
forward2pt1 = forward2 /. {x1 -> pt2[[1]], x2 -> pt2[[2]] };
forward2pt2 = forward2 /. {x1 -> pt3[[1]], x2 -> pt3[[2]] };
forward2pt3 = forward2 /. {x1 -> pt4[[1]], x2 -> pt4[[2]] };
forward2pt4
Show[@vec[{0,0,unwrap@forward2pt1},pt1,"pt1 err"],
sty[]@vec[{0,1,unwrap@forward2pt2},pt2,"pt2 err"],
sty[]@vec[{1,0,unwrap@forward2pt3},pt3,"pt3 err"],
sty[]@vec[{1,1,unwrap@forward2pt4},pt4,"pt4 err"],
sty[]-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
Plot3D[forward2,{x1,0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
Graphics3D[{PointSize[Large],Point[{{->Small,
ImageSize->True
Axes ]
"w=",wIter2]
cout["surfaceplot=",forward2]
cout["predY=(",forward2,forward2pt1,forward2pt2,forward2pt3,forward2pt4,")"] cout[
$$\text{surfaceplot=}\left( \begin{array}{c} \frac{1}{e^{-2.96612 \text{x1}-3.77877 \text{x2}+2}+1} \\ \end{array} \right)$$
$$\text{predY=(}\left( \begin{array}{c} \frac{1}{e^{-2.96612 \text{x1}-3.77877 \text{x2}+2}+1} \\ \end{array} \right)\left( \begin{array}{c} 0.119203 \\ \end{array} \right)\left( \begin{array}{c} 0.855545 \\ \end{array} \right)\left( \begin{array}{c} 0.724345 \\ \end{array} \right)\left( \begin{array}{c} 0.991379 \\ \end{array} \right))$$
= gradient /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]], x1 -> pt1[[1]], x2 -> pt1[[2]], y2 -> pt1[[3]]};
backprop2pt1 = gradient /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]], x1 -> pt2[[1]], x2 -> pt2[[2]], y2 -> pt2[[3]]};
backprop2pt2 = gradient /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]], x1 -> pt3[[1]], x2 -> pt3[[2]], y2 -> pt3[[3]]};
backprop2pt3 = gradient /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]], x1 -> pt4[[1]], x2 -> pt4[[2]], y2 -> pt4[[3]]};
backprop2pt4
"differential loss weight vector for point 1=",backprop2pt1]
cout["differential loss weight vector for point 2=",backprop2pt2]
cout["differential loss weight vector for point 3=",backprop2pt3]
cout["differential loss weight vector for point 4=",backprop2pt4] cout[
$$\text{differential loss weight vector for point 2=}\{0.,-0.0178529\}$$
$$\text{differential loss weight vector for point 3=}\{-0.0550397,0.\}$$
$$\text{differential loss weight vector for point 4=}\{-0.0000736817,-0.0000736817\}$$
2.1.3 3rd Pass
= wIter2 - 10*(backprop2pt1 + backprop2pt2 + backprop2pt3 + backprop2pt4);
wIter3 "updated weight vector for 3rd forward pass=", wIter3] cout[
$$\text{updated weight vector for 3rd forward pass=}\{3.51725,3.95804\}$$
= nonlineq /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]]};
forward3 = forward3 /. {x1 -> pt1[[1]], x2 -> pt1[[2]] };
forward3pt1 = forward3 /. {x1 -> pt2[[1]], x2 -> pt2[[2]] };
forward3pt2 = forward3 /. {x1 -> pt3[[1]], x2 -> pt3[[2]] };
forward3pt3 = forward3 /. {x1 -> pt4[[1]], x2 -> pt4[[2]] };
forward3pt4
Show[@vec[{0,0,unwrap@forward3pt1},pt1,"pt1 err"],
sty[]@vec[{0,1,unwrap@forward3pt2},pt2,"pt2 err"],
sty[]@vec[{1,0,unwrap@forward3pt3},pt3,"pt3 err"],
sty[]@vec[{1,1,unwrap@forward3pt4},pt4,"pt4 err"],
sty[]-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
Plot3D[forward3,{x1,0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
Graphics3D[{PointSize[Large],Point[{{->Small,
ImageSize->True
Axes ]
"w=",wIter3]
cout["surfaceplot=",forward3]
cout["predY=(",forward3,forward3pt1,forward3pt2,forward3pt3,forward3pt4,")"] cout[
$$\text{surfaceplot=}\left( \begin{array}{c} \frac{1}{e^{-3.51725 \text{x1}-3.95804 \text{x2}+2}+1} \\ \end{array} \right)$$
$$\text{predY=(}\left( \begin{array}{c} \frac{1}{e^{-3.51725 \text{x1}-3.95804 \text{x2}+2}+1} \\ \end{array} \right)\left( \begin{array}{c} 0.119203 \\ \end{array} \right)\left( \begin{array}{c} 0.87632 \\ \end{array} \right)\left( \begin{array}{c} 0.820134 \\ \end{array} \right)\left( \begin{array}{c} 0.995828 \\ \end{array} \right))$$
= gradient /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]], x1 -> pt1[[1]], x2 -> pt1[[2]], y2 -> pt1[[3]]};
backprop3pt1 = gradient /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]], x1 -> pt2[[1]], x2 -> pt2[[2]], y2 -> pt2[[3]]};
backprop3pt2 = gradient /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]], x1 -> pt3[[1]], x2 -> pt3[[2]], y2 -> pt3[[3]]};
backprop3pt3 = gradient /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]], x1 -> pt4[[1]], x2 -> pt4[[2]], y2 -> pt4[[3]]};
backprop3pt4
"differential loss weight vector for point 1=",backprop3pt1]
cout["differential loss weight vector for point 2=",backprop3pt2]
cout["differential loss weight vector for point 3=",backprop3pt3]
cout["differential loss weight vector for point 4=",backprop3pt4] cout[
$$\text{differential loss weight vector for point 2=}\{0.,-0.0134048\}$$
$$\text{differential loss weight vector for point 3=}\{-0.0265329,0.\}$$
$$\text{differential loss weight vector for point 4=}\{-0.0000173291,-0.0000173291\}$$
2.1.4 4th Pass
= wIter3 - 10*(backprop3pt1 + backprop3pt2 + backprop3pt3 + backprop3pt4);
wIter4 "updated weight vector for 4th forward pass=", wIter4] cout[
$$\text{updated weight vector for 4th forward pass=}\{3.78276,4.09226\}$$
= nonlineq /. {w1 -> wIter4[[1]], w2 -> wIter4[[2]]};
forward4 = forward4 /. {x1 -> pt1[[1]], x2 -> pt1[[2]] };
forward4pt1 = forward4 /. {x1 -> pt2[[1]], x2 -> pt2[[2]] };
forward4pt2 = forward4 /. {x1 -> pt3[[1]], x2 -> pt3[[2]] };
forward4pt3 = forward4 /. {x1 -> pt4[[1]], x2 -> pt4[[2]] };
forward4pt4
Show[@vec[{0,0,unwrap@forward4pt1},pt1,"pt1 err"],
sty[]@vec[{0,1,unwrap@forward4pt2},pt2,"pt2 err"],
sty[]@vec[{1,0,unwrap@forward4pt3},pt3,"pt3 err"],
sty[]@vec[{1,1,unwrap@forward4pt4},pt4,"pt4 err"],
sty[]-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
Plot3D[forward4,{x1,0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
Graphics3D[{PointSize[Large],Point[{{->Small,
ImageSize->True
Axes ]
"w=",wIter4]
cout["surfaceplot=",forward4]
cout["predY=(",forward4,forward4pt1,forward4pt2,forward4pt3,forward4pt4,")"] cout[
$$\text{surfaceplot=}\left( \begin{array}{c} \frac{1}{e^{-3.78276 \text{x1}-4.09226 \text{x2}+2}+1} \\ \end{array} \right)$$
$$\text{predY=(}\left( \begin{array}{c} \frac{1}{e^{-3.78276 \text{x1}-4.09226 \text{x2}+2}+1} \\ \end{array} \right)\left( \begin{array}{c} 0.119203 \\ \end{array} \right)\left( \begin{array}{c} 0.890148 \\ \end{array} \right)\left( \begin{array}{c} 0.856037 \\ \end{array} \right)\left( \begin{array}{c} 0.997199 \\ \end{array} \right))$$
In the next section we will try to automate this process of gradient descent.
3 Gradient descent looping
= {0,0,0};
pt1 = {0,1,1};
pt2 = {1,0,1};
pt3 = {1,1,1};
pt4 = {1,2};
wIter1 = D[0.5*(unwrap@nonlineq-y2)^2,beta];
gradient
= nonlineq /. {w1 -> wIter[[1]], w2 -> wIter[[2]]};
forwardSurface[wIter_] := forwardSurface[wIter] /. {x1 -> pnts[[1]], x2 -> pnts[[2]] }
forwardPropPoints[pnts_,wIter_] := gradient /. {w1 -> wIter[[1]], w2 -> wIter[[2]], x1 -> pnts[[1]], x2 -> pnts[[2]], y2 -> pnts[[3]]};
backwardPropWeights[pnts_,wIter_] :
= unwrap@forwardPropPoints[#,wIter]& /@ {pt1,pt2,pt3,pt4}
y2predPoints[wIter_] :
= backwardPropWeights[#,wIter]& /@ {pt1,pt2,pt3,pt4}
weightDescentLossHill[wIter_] := wIter - (alpha*Total[weightDescentLossHill[wIter]])
updateWeight[wIter_,alpha_] :
* required functions shown above *)
(
10]
updateWeight[wIter1,10],10]
updateWeight[updateWeight[wIter1,#,10]&,wIter1,4]
NestList[updateWeight[
= NestList[updateWeight[#,alpha]&,wIter,n];
weightList[wIter_,alpha_,n_] :
= {{0,0,y2[[1]]},{0,1,y2[[2]]},{1,0,y2[[3]]},{1,1,y2[[4]]}}
buildPoints[y2_] := vec @@@ MapThread[List,{buildPoints[y2predPoints[wIter]],{pt1,pt2,pt3,pt4}}]
plotPoints[wIter_] :
* just modify the alpha 4, and iteration 11 in weightSequence *)
(= weightList[wIter1,4,11];
weightSequence
= Show[
showPlot[wIter_] :
plotPoints[wIter],-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
Plot3D[forwardSurface[wIter],{x1,0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
Graphics3D[{PointSize[Large],Point[{{->Small,
ImageSize->True
Axes;
]
#]& /@ weightSequence]
Row[showPlot[
4 Manipulate
= Nest[updateWeight[#,alpha]&, wIter,n]
nthWeight[wIter_,alpha_,n_] :3,6];
nthWeight[wIter1,* nthWeight is a failed attempt *)
(
* modify only these 2 below *)
(= 2;
alpha = 30;
numIterations
= Evaluate@weightList[wIter1,alpha,numIterations];
weightManipulateSequence = forwardSurface[#]& /@ weightManipulateSequence;
surfaceSequence = plotPoints[#]& /@ weightManipulateSequence;
plotPointsSequence = MapThread[List,{surfaceSequence,plotPointsSequence}];
completeSequence
Manipulate[
Show[#[[z]][[2]],
#[[z]][[1]],{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
Plot3D[0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
Graphics3D[{PointSize[Large],Point[{{->Small,
ImageSize->True
Axes
],
1,numIterations,1}]& @ completeSequence {z,
5 Visual Loss
- REMEMBER LOSS FUNCTION is NOT simply the difference between actual and predicted
- Loss is defined as 0.5 multiplied by the square difference
ClearAll[x,x1,x2,w1,w2]={x1,x2};
x= {0,0,0};
pt1 = {0,1,1};
pt2 = {1,0,1};
pt3 = {1,1,1};
pt4 = {1,2};
wIter1 = D[0.5*(unwrap@nonlineq-y2)^2,beta];
gradient
= nonlineq /. {w1 -> wIter[[1]], w2 -> wIter[[2]]};
forwardSurface[wIter_] := forwardSurface[wIter] /. {x1 -> pnts[[1]], x2 -> pnts[[2]] }
forwardPropPoints[pnts_,wIter_] := 0.5*(pnts[[3]] - forwardPropPoints[pnts,wIter])^2 theLoss[pnts_,wIter_] :
3]] - forwardPropPoints[pt1,{1,1}] pt1[[
0.5,0.3}]
theLoss[pt1,{0.5,0.3}] theLoss[pt2,{
{0.0000223971}
{0.491027}
Row[{
Show[-10,10},{wa2,-10,10},Evaluate@opt3d[0.5]],
Plot3D[theLoss[pt1,{wa1,wa2}],{wa1,@vec[{0.5,0.3,-0.1193203}],
sty[]->Small,
ImageSize->True
Axes
]
Show[-10,10},{wa2,-10,10},Evaluate@opt3d[0.5]],
Plot3D[theLoss[pt2,{wa1,wa2}],{wa1,->Small,
ImageSize->True
Axes
]
Show[-10,10},{wa2,-10,10},Evaluate@opt3d[0.5]],
Plot3D[theLoss[pt3,{wa1,wa2}],{wa1,->Small,
ImageSize->True
Axes
]
Show[-10,0},{wa2,-10,10},Evaluate@opt3d[0.5]],
Plot3D[theLoss[pt4,{wa1,wa2}],{wa1,->Small,
ImageSize->True
Axes
] }]
Above are the loss functions for each of the 4 points
Next we will add up these loss functions
= theLoss[pt1,wIter] + theLoss[pt2,wIter] + theLoss[pt3,wIter] + theLoss[pt4,wIter]; totalLoss[wIter_] :
-10,10},{wa2,-10,10},Evaluate@opt3d[0.3]],
Show[Plot3D[ totalLoss[{wa1,wa2}],{wa1,@vec[{1,2,0.5},{1.6,2.4,0.2}]
sty[] ]
5.1 Climbing down the total loss function
Sum of derivative is the derivative of the Sum
Sum of differential weights of each of our 4 point’s loss functions is the differential weight of the total loss function.
- Below is a sequence of {{w1,w2,loss},{w1,w2,loss}…}
- {1,2} are the weights, 0.435493 is the loss. (remember loss is not simply difference)
- Notice how the descent decreases the loss
= Flatten /@ MapThread[List,{weightSequence,totalLoss /@ weightSequence}] descent
{{1, 2, 0.435493}, {1.78645, 2.71151, 0.217208}, {2.35472, 3.02373, 0.127546}, > {2.75847, 3.23351, 0.0835834}, {3.03693, 3.39241, 0.0612339}, > {3.23979, 3.51987, 0.0484416}, {3.39638, 3.62588, 0.0403185}, > {3.52275, 3.71638, 0.0347525}, {3.62815, 3.79518, 0.0307206}, > {3.71825, 3.86485, 0.027675}, {3.79676, 3.92723, 0.0252987}, > {3.86623, 3.98364, 0.0233962}}
-10,10},{wa2,-10,10},Evaluate@opt3d[0.3]],
Show[Plot3D[ totalLoss[{wa1,wa2}],{wa1,@vec[descent[[1]],descent[[2]]],
sty[]@vec[descent[[2]],descent[[3]]],
sty[]@vec[descent[[3]],descent[[4]]],
sty[]@vec[descent[[4]],descent[[5]]],
sty[]@vec[descent[[5]],descent[[6]]],
sty[]@vec[descent[[6]],descent[[7]]],
sty[]@vec[descent[[7]],descent[[8]]],
sty[]@vec[descent[[8]],descent[[9]]],
sty[]@vec[descent[[9]],descent[[10]]],
sty[]@vec[descent[[10]],descent[[11]]],
sty[]@vec[descent[[11]],descent[[12]]]
sty[]
]