Logic Neuron Mathematica

Posted on June 2, 2019

1 Understanding Error
2 Manual Training
- 2.1 Gradient formula
3 Gradient descent looping
4 Manipulate
5 Visual Loss
- 5.1 Climbing down the total loss function

AArrow[{p_, q_}, label_:""] := {Inset[Style[label,Purple],Midpoint[{p,q}]], Arrow[{p,q}]};
vec2[p_] := Graphics@AArrow[{ConstantArray[0,Length[p]],p}];
vec2[p_,q_] := Which[  ArrayQ[q] == False, Graphics@AArrow[{ConstantArray[0,Length[p]],p},q],
                       ArrayQ[q] == True, Graphics@AArrow[{p,q},""] ]
vec2[p_,q_,label_] := Graphics@AArrow[{p,q},label];

vec[p_] := Which[ Length[p] == 2, vec2[p],
                  Length[p] == 3, vec2[p] /. Graphics -> Graphics3D ];
vec[p_,q_] := Which[ Length[p] == 2, vec2[p,q],
                     Length[p] == 3, vec2[p,q] /. Graphics -> Graphics3D ];
vec[p_,q_,label_] := Which[  Length[p] == 2,vec2[p,q,label],
                             Length[q] == 3,vec2[p,q,label] /. Graphics -> Graphics3D];
sty[c__:Red] := (# /. Arrow[x__] -> {c, Arrow[x]})&
(*  sty[Red]@vec[{3,5}]*)

matrix[expr_] := expr /. List[p__]-> MatrixForm[List[p]]
cout[stmt__] := TeXForm[Row[{stmt}]];
pnt[x_] := Graphics3D@Point[x];

(* polynomial *)
(* helper functions *)
 vars[n_, m_] := Flatten@Transpose[Outer[Symbol@StringJoin[##] &, CharacterRange["A", "Z"][[;; m]], ToString /@ Range[n]]]
 polyvar[v_] :=  Flatten[{1,vars[v-1,1]}]; 

(* Give a list of coefficents and it will generate a polynomial with variables *)
poly[coef_] := Transpose[coef].polyvar[Length@coef];
poly@Thread[{{1,2,3}}];

T := Transpose;
Dim = Dimensions;
Ones[n_] := ConstantArray[1,n]
addCol[x_] := MapThread[Append, {#, x}] &
(*  addCol[ConstantArray[1,2]]@{{1,3},{3,4}} // matrix*)
addRow[x_] := Append[#,x]&

unwrap := #[[1]][[1]]&
(* unwrap[{{x+1}}] = x+1 *)

(* 3d gradient color *)
opt3d[opacity_:1] := {PlotStyle->Opacity[opacity],ColorFunction -> Function[{x, y, z}, Hue[z]]};
(* Plot3D[f,{x1,0,1},{x2,0,1},Evaluate@optcolor3d] *)

ClearAll[x,x1,x2,w1,w2,b,lineq,nonlineq]
x := {{x1,x2}};
w1 := 10;
w2 := 10;
beta := {{w1,w2}};
b := -5;
lineq := x.T[beta] + b;
nonlineq := 1/(1+Exp[-lineq]);
StringForm["x is: `1`",matrix@x]
StringForm["weights are: `1`",matrix@T[beta]]
StringForm["linear eq is: `1`", unwrap@lineq]
Row[{
    Show[

        Plot3D[nonlineq,{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.6],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]
    Show[
        ContourPlot[nonlineq,{x1,-1,1},{x2,-1,1}],
        Graphics[{PointSize[Large],Point[{{0,0},{0,1},{1,0},{1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]

}]

Output

1 Understanding Error

ClearAll[x,x1,x2,w1,w2,b,lineq,nonlineq]
x = {{x1,x2}} ;
beta = {{w1,w2}};
b = -2;
lineq = x.T[beta] + b;
(* nonlineqval = nonlineq /. {w1 -> 1,w2->2}; *)
nonlineq := 1/(1+Exp[-lineq]);
forward := nonlineq /. {w1 ->1, w2 ->2}
StringForm["gradient is nonlineq: `1` with weights as `2`", unwrap@forward,beta]
pt1 = {0,0,0};
predpt1 = {0,0,unwrap@(forward /. {x1->0,x2->0})};
pt2 = {0,1,1} ;
predpt2 = {0,1,unwrap@(forward /. {x1->0,x2->1})};
pt3 = {1,0,1} ;
predpt3 = {1,0,unwrap@(forward /. {x1->1,x2->0})};
pt4 = {1,1,1};
predpt4 = {1,1,unwrap@(forward /. {x1->1,x2->1})};
Row[{
    Show[
        sty[]@vec[predpt1,pt1,"pt1 err"],
        sty[]@vec[predpt2,pt2,"pt2 err"],
        sty[]@vec[predpt3,pt3,"pt3 err"],
        sty[]@vec[predpt4,pt4,"pt4 err"],
        Plot3D[forward ,{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]
    Show[
        ContourPlot[forward,{x1,-1,1},{x2,-1,1}],
        Graphics[{PointSize[Large],Point[{{0,0},{0,1},{1,0},{1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]

}]

Output

predictionDistpt2 := pt2[[2]]- (forward /. {x1 -> pt2[[1]], x2 -> pt2[[2]]})
SEpt2 := 0.5*(predictionDistpt2)^2
cout["original point:", pt2]
cout["predicted point:",predpt2]
cout["vanilla distance:",predictionDistpt2]
cout["mutated distance:",SEpt2]

$$\text{original point:}\{0,1,1\}$$

$$\text{predicted point:}\left\{0,1,\frac{1}{2}\right\}$$

$$\text{vanilla distance:}\left( \begin{array}{c} \frac{1}{2} \\ \end{array} \right)$$

$$\text{mutated distance:}\left( \begin{array}{c} 0.125 \\ \end{array} \right)$$

Distance DOES NOT mean 3D vector distance between the 2 points.
It only means the distance of the output aka 3rd parameter which is 1 and 1/2 in this case.
Even if it were the case, the 3D vector distance should be the same since the input parameters x1,x2 should be same in the original point and prediction.

Vanilla distance is simple subtraction.
Mutated distance is squaring vanilla distance then multiplying by 0.5.

2 Manual Training

Loss function is not simply distance, it is a mutated distance!
Aggregate(SUM) the Loss Function applied to each point.
Gradient: derive symbolically the derivative of the Loss function wrt to weight.
- Minimizing loss <=> (solving the derivative of loss function wrt weight = 0) , but since this is difficult we just iteratively subtract weight from derivative
- Intuition Loss function represents the Hill of Errors
  - Each datapoint is allowed to have it’s own Hill of Error
  - x-axis = w1 ,y-axis = 2 in our Imaginary Hill of Error.
  - z-axis = Error of Prediction from ground truth.
  - Goal is reach the valley of No Error.
- Gradient is calulated from:
  1. Subtracting the nonlinear sigmoid symbolic equation containing (w1,w2,x1,x2) from the target ground truth (y2).
  2. Then apply the Loss function Mutation (Square then multiply 0.5).
  3. Calculate derivative wrt to weights w1,w2.
  4. Output is a gradient
Gradient:
- INPUT: that takes (w1,w2,x1,x2,y2)
- OUTPUTS: the slope aka direction wrt to weight w1,w2 that gets us closer to a valley.
Calculate the derivatives aka Backprops for each datapoint by
1. Gradient applied to (w1,w2,x1,x2,y2)
2. Do this for all points resulting in 4 slopes
3. Sum the backprops
  - Remember each datapoint has it’s own Hill of Error
  - Summing the backprops means summing the slopes on each Hill of Error. This summed backprop is the slope on the summed Bigger Hill of Error and this is okay to do because the sum of each backprop(derivative) is equal to the backprop(derivative) of the sum due to linearity of differentiation.
  - Intuitively just adding Hills of error for each datapoint is okay as well because we want to minimize overall error.
    - Imagine a scenarior where One hill of error on datapoint 1 gives us a backprop in one direction and the other hill of error on datapoint 2 gives us a backpropr in the other direction.
      The sum of the direction means the steepest backprop wins but if they are same steepness, then we don’t move.
      Of course one issue that is a universal problem in Neural net is getting trapped in a local minima.
Sum the Backprops(derivatives) multiply by an alpha then use this to subtract from the Current weight to get our New weight.

Quick summary:

Forward prop just means calculate predicted output for each datapoint.
Forward prop is equivalent to the non-linear activation function composed with the linear function.
Back prop just means calculate the derivative for the loss function.

Some Terms:

beta is weight vector containing w1, w2.
wIter1 is weight vector on init, we pick random weight {1,2}.
wIter2 is weight vector after 1st backprop.
wIter3 is weight vector after 2nd backprop.
forward1 = nonlineq
gradient is calculated by subtracting the nonlineq from the target value, then applies some loss transformation like squaring it then finding the derivatives wrt to weights(beta)
forward1 is the initialized surface plot built from the weights
forward1pt1, forward1pt2 … are the points on the forward1 surface

2.1 Gradient formula

$GradientFunction(x1,x2,w1,w2,y2) = \frac{\partial}{\partial w1,w2}Loss(x1,x2,w1,w2,y2)$

INPUT: $x1$ $x2$ are data, $y2$ is ground truth, $w1$ $w2$ are weights
OUTPUT: Differential Loss weight vector $(w1',w2')$
USECASE: update weights with $(w1 \leftarrow w1-\alpha w1',w2 \leftarrow w2-\alpha w2')$

(* The derivative *)
gradient =  D[0.5*(unwrap@nonlineq-y2)^2,beta];
cout[gradient]

$$\left\{\frac{1. \text{x1} e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2} \left(\frac{1}{e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2}+1}-\text{y2}\right)}{\left(e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2}+1\right)^2},\frac{1. \text{x2} e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2} \left(\frac{1}{e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2}+1}-\text{y2}\right)}{\left(e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2}+1\right)^2}\right\}$$

2.1.1 1st pass

wIter1 = {1,2};
cout["Initial weight vector=",wIter1]

$$\text{Initial weight vector=}\{1,2\}$$

forward1 = nonlineq /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]]};
forward1pt1 = forward1 /. {x1 -> pt1[[1]], x2 -> pt1[[2]] };
forward1pt2 = forward1 /. {x1 -> pt2[[1]], x2 -> pt2[[2]] };
forward1pt3 = forward1 /. {x1 -> pt3[[1]], x2 -> pt3[[2]] };
forward1pt4 = forward1 /. {x1 -> pt4[[1]], x2 -> pt4[[2]] };
     Show[
        sty[]@vec[{0,0,unwrap@forward1pt1},pt1,"pt1 err"],
        sty[]@vec[{0,1,unwrap@forward1pt2},pt2,"pt2 err"],
        sty[]@vec[{1,0,unwrap@forward1pt3},pt3,"pt3 err"],
        sty[]@vec[{1,1,unwrap@forward1pt4},pt4,"pt4 err"],
        Plot3D[forward1,{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]

Output

cout["w=",wIter1]
cout["surfaceplot=",forward1]
cout["predY=(",forward1,forward1pt1,forward1pt2,forward1pt3,forward1pt4,")"]

$$\text{w=}\{1,2\}$$

$$\text{surfaceplot=}\left( \begin{array}{c} \frac{1}{e^{-\text{x1}-2 \text{x2}+2}+1} \\ \end{array} \right)$$

$$\text{predY=(}\left( \begin{array}{c} \frac{1}{e^{-\text{x1}-2 \text{x2}+2}+1} \\ \end{array} \right)\left( \begin{array}{c} \frac{1}{1+e^2} \\ \end{array} \right)\left( \begin{array}{c} \frac{1}{2} \\ \end{array} \right)\left( \begin{array}{c} \frac{1}{1+e} \\ \end{array} \right)\left( \begin{array}{c} \frac{1}{1+\frac{1}{e}} \\ \end{array} \right))$$

backprop1pt1 = gradient /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]], x1 -> pt1[[1]], x2 -> pt1[[2]], y2 -> pt1[[3]]};
backprop1pt2 = gradient /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]], x1 -> pt2[[1]], x2 -> pt2[[2]], y2 -> pt2[[3]]};
backprop1pt3 = gradient /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]], x1 -> pt3[[1]], x2 -> pt3[[2]], y2 -> pt3[[3]]};
backprop1pt4 = gradient /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]], x1 -> pt4[[1]], x2 -> pt4[[2]], y2 -> pt4[[3]]};

cout["differential loss weight vector for point 1=",backprop1pt1] 
cout["differential loss weight vector for point 2=",backprop1pt2] 
cout["differential loss weight vector for point 3=",backprop1pt3] 
cout["differential loss weight vector for point 4=",backprop1pt4]

$$\text{differential loss weight vector for point 1=}\{0.,0.\}$$

$$\text{differential loss weight vector for point 2=}\{0.,-0.125\}$$

$$\text{differential loss weight vector for point 3=}\{-0.143735,0.\}$$

$$\text{differential loss weight vector for point 4=}\{-0.0528771,-0.0528771\}$$

2.1.2 2nd Pass

wIter2 = wIter1 - 10*(backprop1pt1 + backprop1pt2 + backprop1pt3 + backprop1pt4);
cout["updated weight vector for 2nd forward pass=", wIter2]

$$\text{updated weight vector for 2nd forward pass=}\{2.96612,3.77877\}$$

forward2 = nonlineq /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]]};
forward2pt1 = forward2 /. {x1 -> pt1[[1]], x2 -> pt1[[2]] };
forward2pt2 = forward2 /. {x1 -> pt2[[1]], x2 -> pt2[[2]] };
forward2pt3 = forward2 /. {x1 -> pt3[[1]], x2 -> pt3[[2]] };
forward2pt4 = forward2 /. {x1 -> pt4[[1]], x2 -> pt4[[2]] };

    Show[
        sty[]@vec[{0,0,unwrap@forward2pt1},pt1,"pt1 err"],
        sty[]@vec[{0,1,unwrap@forward2pt2},pt2,"pt2 err"],
        sty[]@vec[{1,0,unwrap@forward2pt3},pt3,"pt3 err"],
        sty[]@vec[{1,1,unwrap@forward2pt4},pt4,"pt4 err"],
        Plot3D[forward2,{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]

Output

cout["w=",wIter2]
cout["surfaceplot=",forward2]
cout["predY=(",forward2,forward2pt1,forward2pt2,forward2pt3,forward2pt4,")"]

$$\text{w=}\{2.96612,3.77877\}$$

$$\text{surfaceplot=}\left( \begin{array}{c} \frac{1}{e^{-2.96612 \text{x1}-3.77877 \text{x2}+2}+1} \\ \end{array} \right)$$

$$\text{predY=(}\left( \begin{array}{c} \frac{1}{e^{-2.96612 \text{x1}-3.77877 \text{x2}+2}+1} \\ \end{array} \right)\left( \begin{array}{c} 0.119203 \\ \end{array} \right)\left( \begin{array}{c} 0.855545 \\ \end{array} \right)\left( \begin{array}{c} 0.724345 \\ \end{array} \right)\left( \begin{array}{c} 0.991379 \\ \end{array} \right))$$

backprop2pt1 = gradient /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]], x1 -> pt1[[1]], x2 -> pt1[[2]], y2 -> pt1[[3]]};
backprop2pt2 = gradient /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]], x1 -> pt2[[1]], x2 -> pt2[[2]], y2 -> pt2[[3]]};
backprop2pt3 = gradient /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]], x1 -> pt3[[1]], x2 -> pt3[[2]], y2 -> pt3[[3]]};
backprop2pt4 = gradient /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]], x1 -> pt4[[1]], x2 -> pt4[[2]], y2 -> pt4[[3]]};

cout["differential loss weight vector for point 1=",backprop2pt1] 
cout["differential loss weight vector for point 2=",backprop2pt2] 
cout["differential loss weight vector for point 3=",backprop2pt3] 
cout["differential loss weight vector for point 4=",backprop2pt4]

$$\text{differential loss weight vector for point 1=}\{0.,0.\}$$

$$\text{differential loss weight vector for point 2=}\{0.,-0.0178529\}$$

$$\text{differential loss weight vector for point 3=}\{-0.0550397,0.\}$$

$$\text{differential loss weight vector for point 4=}\{-0.0000736817,-0.0000736817\}$$

2.1.3 3rd Pass

wIter3 = wIter2 - 10*(backprop2pt1 + backprop2pt2 + backprop2pt3 + backprop2pt4);
cout["updated weight vector for 3rd forward pass=", wIter3]

$$\text{updated weight vector for 3rd forward pass=}\{3.51725,3.95804\}$$

forward3 = nonlineq /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]]};
forward3pt1 = forward3 /. {x1 -> pt1[[1]], x2 -> pt1[[2]] };
forward3pt2 = forward3 /. {x1 -> pt2[[1]], x2 -> pt2[[2]] };
forward3pt3 = forward3 /. {x1 -> pt3[[1]], x2 -> pt3[[2]] };
forward3pt4 = forward3 /. {x1 -> pt4[[1]], x2 -> pt4[[2]] };

    Show[
        sty[]@vec[{0,0,unwrap@forward3pt1},pt1,"pt1 err"],
        sty[]@vec[{0,1,unwrap@forward3pt2},pt2,"pt2 err"],
        sty[]@vec[{1,0,unwrap@forward3pt3},pt3,"pt3 err"],
        sty[]@vec[{1,1,unwrap@forward3pt4},pt4,"pt4 err"],
        Plot3D[forward3,{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]

Output

cout["w=",wIter3]
cout["surfaceplot=",forward3]
cout["predY=(",forward3,forward3pt1,forward3pt2,forward3pt3,forward3pt4,")"]

$$\text{w=}\{3.51725,3.95804\}$$

$$\text{surfaceplot=}\left( \begin{array}{c} \frac{1}{e^{-3.51725 \text{x1}-3.95804 \text{x2}+2}+1} \\ \end{array} \right)$$

$$\text{predY=(}\left( \begin{array}{c} \frac{1}{e^{-3.51725 \text{x1}-3.95804 \text{x2}+2}+1} \\ \end{array} \right)\left( \begin{array}{c} 0.119203 \\ \end{array} \right)\left( \begin{array}{c} 0.87632 \\ \end{array} \right)\left( \begin{array}{c} 0.820134 \\ \end{array} \right)\left( \begin{array}{c} 0.995828 \\ \end{array} \right))$$

backprop3pt1 = gradient /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]], x1 -> pt1[[1]], x2 -> pt1[[2]], y2 -> pt1[[3]]};
backprop3pt2 = gradient /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]], x1 -> pt2[[1]], x2 -> pt2[[2]], y2 -> pt2[[3]]};
backprop3pt3 = gradient /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]], x1 -> pt3[[1]], x2 -> pt3[[2]], y2 -> pt3[[3]]};
backprop3pt4 = gradient /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]], x1 -> pt4[[1]], x2 -> pt4[[2]], y2 -> pt4[[3]]};

cout["differential loss weight vector for point 1=",backprop3pt1] 
cout["differential loss weight vector for point 2=",backprop3pt2] 
cout["differential loss weight vector for point 3=",backprop3pt3] 
cout["differential loss weight vector for point 4=",backprop3pt4]

$$\text{differential loss weight vector for point 1=}\{0.,0.\}$$

$$\text{differential loss weight vector for point 2=}\{0.,-0.0134048\}$$

$$\text{differential loss weight vector for point 3=}\{-0.0265329,0.\}$$

$$\text{differential loss weight vector for point 4=}\{-0.0000173291,-0.0000173291\}$$

2.1.4 4th Pass

wIter4 = wIter3 - 10*(backprop3pt1 + backprop3pt2 + backprop3pt3 + backprop3pt4);
cout["updated weight vector for 4th forward pass=", wIter4]

$$\text{updated weight vector for 4th forward pass=}\{3.78276,4.09226\}$$

forward4 = nonlineq /. {w1 -> wIter4[[1]], w2 -> wIter4[[2]]};
forward4pt1 = forward4 /. {x1 -> pt1[[1]], x2 -> pt1[[2]] };
forward4pt2 = forward4 /. {x1 -> pt2[[1]], x2 -> pt2[[2]] };
forward4pt3 = forward4 /. {x1 -> pt3[[1]], x2 -> pt3[[2]] };
forward4pt4 = forward4 /. {x1 -> pt4[[1]], x2 -> pt4[[2]] };

    Show[
        sty[]@vec[{0,0,unwrap@forward4pt1},pt1,"pt1 err"],
        sty[]@vec[{0,1,unwrap@forward4pt2},pt2,"pt2 err"],
        sty[]@vec[{1,0,unwrap@forward4pt3},pt3,"pt3 err"],
        sty[]@vec[{1,1,unwrap@forward4pt4},pt4,"pt4 err"],
        Plot3D[forward4,{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]

Output

cout["w=",wIter4]
cout["surfaceplot=",forward4]
cout["predY=(",forward4,forward4pt1,forward4pt2,forward4pt3,forward4pt4,")"]

$$\text{w=}\{3.78276,4.09226\}$$

$$\text{surfaceplot=}\left( \begin{array}{c} \frac{1}{e^{-3.78276 \text{x1}-4.09226 \text{x2}+2}+1} \\ \end{array} \right)$$

$$\text{predY=(}\left( \begin{array}{c} \frac{1}{e^{-3.78276 \text{x1}-4.09226 \text{x2}+2}+1} \\ \end{array} \right)\left( \begin{array}{c} 0.119203 \\ \end{array} \right)\left( \begin{array}{c} 0.890148 \\ \end{array} \right)\left( \begin{array}{c} 0.856037 \\ \end{array} \right)\left( \begin{array}{c} 0.997199 \\ \end{array} \right))$$

In the next section we will try to automate this process of gradient descent.

3 Gradient descent looping

pt1 = {0,0,0};
pt2 = {0,1,1};
pt3 = {1,0,1};
pt4 = {1,1,1};
wIter1 = {1,2};
gradient =  D[0.5*(unwrap@nonlineq-y2)^2,beta];

forwardSurface[wIter_] := nonlineq /. {w1 -> wIter[[1]], w2 -> wIter[[2]]};
forwardPropPoints[pnts_,wIter_] := forwardSurface[wIter] /. {x1 -> pnts[[1]], x2 -> pnts[[2]] }
backwardPropWeights[pnts_,wIter_] := gradient /. {w1 -> wIter[[1]], w2 -> wIter[[2]], x1 -> pnts[[1]], x2 -> pnts[[2]], y2 -> pnts[[3]]};


y2predPoints[wIter_] := unwrap@forwardPropPoints[#,wIter]& /@ {pt1,pt2,pt3,pt4}

weightDescentLossHill[wIter_] := backwardPropWeights[#,wIter]& /@ {pt1,pt2,pt3,pt4}
updateWeight[wIter_,alpha_] := wIter - (alpha*Total[weightDescentLossHill[wIter]])

(* required functions shown above *)



updateWeight[wIter1,10]
updateWeight[updateWeight[wIter1,10],10]
NestList[updateWeight[#,10]&,wIter1,4]

weightList[wIter_,alpha_,n_] := NestList[updateWeight[#,alpha]&,wIter,n];

buildPoints[y2_] := {{0,0,y2[[1]]},{0,1,y2[[2]]},{1,0,y2[[3]]},{1,1,y2[[4]]}} 
plotPoints[wIter_] := vec @@@ MapThread[List,{buildPoints[y2predPoints[wIter]],{pt1,pt2,pt3,pt4}}]

(* just modify the alpha 4, and iteration 11 in weightSequence *)
weightSequence = weightList[wIter1,4,11];

showPlot[wIter_] := Show[
    plotPoints[wIter],
    Plot3D[forwardSurface[wIter],{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
    Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
    ImageSize->Small,
    Axes->True
];

Row[showPlot[#]& /@ weightSequence]

Output

4 Manipulate

nthWeight[wIter_,alpha_,n_] := Nest[updateWeight[#,alpha]&, wIter,n]
nthWeight[wIter1,3,6];
(* nthWeight is a failed attempt *)

(* modify only these 2 below *)
alpha = 2;
numIterations = 30;

weightManipulateSequence = Evaluate@weightList[wIter1,alpha,numIterations];
surfaceSequence = forwardSurface[#]& /@ weightManipulateSequence;
plotPointsSequence = plotPoints[#]& /@ weightManipulateSequence;
completeSequence = MapThread[List,{surfaceSequence,plotPointsSequence}];
Manipulate[

    Show[
        #[[z]][[2]],
        Plot3D[#[[z]][[1]],{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True 
    ],
    
    {z,1,numIterations,1}]& @ completeSequence

5 Visual Loss

REMEMBER LOSS FUNCTION is NOT simply the difference between actual and predicted
- Loss is defined as 0.5 multiplied by the square difference

ClearAll[x,x1,x2,w1,w2]
x={x1,x2};
pt1 = {0,0,0};
pt2 = {0,1,1};
pt3 = {1,0,1};
pt4 = {1,1,1};
wIter1 = {1,2};
gradient =  D[0.5*(unwrap@nonlineq-y2)^2,beta];

forwardSurface[wIter_] := nonlineq /. {w1 -> wIter[[1]], w2 -> wIter[[2]]};
forwardPropPoints[pnts_,wIter_] := forwardSurface[wIter] /. {x1 -> pnts[[1]], x2 -> pnts[[2]] }
theLoss[pnts_,wIter_] := 0.5*(pnts[[3]] - forwardPropPoints[pnts,wIter])^2

pt1[[3]] - forwardPropPoints[pt1,{1,1}]

Output

theLoss[pt1,{0.5,0.3}]
theLoss[pt2,{0.5,0.3}]

{0.0000223971}

{0.491027}

Row[{
    Show[
        Plot3D[theLoss[pt1,{wa1,wa2}],{wa1,-10,10},{wa2,-10,10},Evaluate@opt3d[0.5]],
        sty[]@vec[{0.5,0.3,-0.1193203}],
        ImageSize->Small,
        Axes->True
        ]
    Show[
        Plot3D[theLoss[pt2,{wa1,wa2}],{wa1,-10,10},{wa2,-10,10},Evaluate@opt3d[0.5]],
        ImageSize->Small,
        Axes->True
        ]
    Show[
        Plot3D[theLoss[pt3,{wa1,wa2}],{wa1,-10,10},{wa2,-10,10},Evaluate@opt3d[0.5]],
        ImageSize->Small,
        Axes->True
        ]
    Show[
        Plot3D[theLoss[pt4,{wa1,wa2}],{wa1,-10,0},{wa2,-10,10},Evaluate@opt3d[0.5]],
        ImageSize->Small,
        Axes->True
        ]
    }]

Output

Above are the loss functions for each of the 4 points
Next we will add up these loss functions

totalLoss[wIter_] := theLoss[pt1,wIter] + theLoss[pt2,wIter] + theLoss[pt3,wIter] + theLoss[pt4,wIter];

Show[Plot3D[ totalLoss[{wa1,wa2}],{wa1,-10,10},{wa2,-10,10},Evaluate@opt3d[0.3]],
 sty[]@vec[{1,2,0.5},{1.6,2.4,0.2}]
 ]

Output

5.1 Climbing down the total loss function

Sum of derivative is the derivative of the Sum
Sum of differential weights of each of our 4 point’s loss functions is the differential weight of the total loss function.

Below is a sequence of {{w1,w2,loss},{w1,w2,loss}…}
{1,2} are the weights, 0.435493 is the loss. (remember loss is not simply difference)
Notice how the descent decreases the loss

descent = Flatten /@ MapThread[List,{weightSequence,totalLoss /@ weightSequence}]

{{1, 2, 0.435493}, {1.78645, 2.71151, 0.217208}, {2.35472, 3.02373, 0.127546}, 
 
>   {2.75847, 3.23351, 0.0835834}, {3.03693, 3.39241, 0.0612339}, 
 
>   {3.23979, 3.51987, 0.0484416}, {3.39638, 3.62588, 0.0403185}, 
 
>   {3.52275, 3.71638, 0.0347525}, {3.62815, 3.79518, 0.0307206}, 
 
>   {3.71825, 3.86485, 0.027675}, {3.79676, 3.92723, 0.0252987}, 
 
>   {3.86623, 3.98364, 0.0233962}}

Show[Plot3D[ totalLoss[{wa1,wa2}],{wa1,-10,10},{wa2,-10,10},Evaluate@opt3d[0.3]],
 sty[]@vec[descent[[1]],descent[[2]]],
 sty[]@vec[descent[[2]],descent[[3]]],
 sty[]@vec[descent[[3]],descent[[4]]],
 sty[]@vec[descent[[4]],descent[[5]]],
 sty[]@vec[descent[[5]],descent[[6]]],
 sty[]@vec[descent[[6]],descent[[7]]],
 sty[]@vec[descent[[7]],descent[[8]]],
 sty[]@vec[descent[[8]],descent[[9]]],
 sty[]@vec[descent[[9]],descent[[10]]],
 sty[]@vec[descent[[10]],descent[[11]]],
 sty[]@vec[descent[[11]],descent[[12]]]

 ]

Output