Logic Neuron Mathematica

Posted on June 2, 2019
Tags: machinelearning
AArrow[{p_, q_}, label_:""] := {Inset[Style[label,Purple],Midpoint[{p,q}]], Arrow[{p,q}]};
vec2[p_] := Graphics@AArrow[{ConstantArray[0,Length[p]],p}];
vec2[p_,q_] := Which[  ArrayQ[q] == False, Graphics@AArrow[{ConstantArray[0,Length[p]],p},q],
                       ArrayQ[q] == True, Graphics@AArrow[{p,q},""] ]
vec2[p_,q_,label_] := Graphics@AArrow[{p,q},label];

vec[p_] := Which[ Length[p] == 2, vec2[p],
                  Length[p] == 3, vec2[p] /. Graphics -> Graphics3D ];
vec[p_,q_] := Which[ Length[p] == 2, vec2[p,q],
                     Length[p] == 3, vec2[p,q] /. Graphics -> Graphics3D ];
vec[p_,q_,label_] := Which[  Length[p] == 2,vec2[p,q,label],
                             Length[q] == 3,vec2[p,q,label] /. Graphics -> Graphics3D];
sty[c__:Red] := (# /. Arrow[x__] -> {c, Arrow[x]})&
(*  sty[Red]@vec[{3,5}]*)

matrix[expr_] := expr /. List[p__]-> MatrixForm[List[p]]
cout[stmt__] := TeXForm[Row[{stmt}]];
pnt[x_] := Graphics3D@Point[x];

(* polynomial *)
(* helper functions *)
 vars[n_, m_] := Flatten@Transpose[Outer[Symbol@StringJoin[##] &, CharacterRange["A", "Z"][[;; m]], ToString /@ Range[n]]]
 polyvar[v_] :=  Flatten[{1,vars[v-1,1]}]; 

(* Give a list of coefficents and it will generate a polynomial with variables *)
poly[coef_] := Transpose[coef].polyvar[Length@coef];
poly@Thread[{{1,2,3}}];

T := Transpose;
Dim = Dimensions;
Ones[n_] := ConstantArray[1,n]
addCol[x_] := MapThread[Append, {#, x}] &
(*  addCol[ConstantArray[1,2]]@{{1,3},{3,4}} // matrix*)
addRow[x_] := Append[#,x]&

unwrap := #[[1]][[1]]&
(* unwrap[{{x+1}}] = x+1 *)

(* 3d gradient color *)
opt3d[opacity_:1] := {PlotStyle->Opacity[opacity],ColorFunction -> Function[{x, y, z}, Hue[z]]};
(* Plot3D[f,{x1,0,1},{x2,0,1},Evaluate@optcolor3d] *)
ClearAll[x,x1,x2,w1,w2,b,lineq,nonlineq]
x := {{x1,x2}};
w1 := 10;
w2 := 10;
beta := {{w1,w2}};
b := -5;
lineq := x.T[beta] + b;
nonlineq := 1/(1+Exp[-lineq]);
StringForm["x is: `1`",matrix@x]
StringForm["weights are: `1`",matrix@T[beta]]
StringForm["linear eq is: `1`", unwrap@lineq]
Row[{
    Show[

        Plot3D[nonlineq,{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.6],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]
    Show[
        ContourPlot[nonlineq,{x1,-1,1},{x2,-1,1}],
        Graphics[{PointSize[Large],Point[{{0,0},{0,1},{1,0},{1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]

}]
Output

Output

Output

Output

1 Understanding Error

ClearAll[x,x1,x2,w1,w2,b,lineq,nonlineq]
x = {{x1,x2}} ;
beta = {{w1,w2}};
b = -2;
lineq = x.T[beta] + b;
(* nonlineqval = nonlineq /. {w1 -> 1,w2->2}; *)
nonlineq := 1/(1+Exp[-lineq]);
forward := nonlineq /. {w1 ->1, w2 ->2}
StringForm["gradient is nonlineq: `1` with weights as `2`", unwrap@forward,beta]
pt1 = {0,0,0};
predpt1 = {0,0,unwrap@(forward /. {x1->0,x2->0})};
pt2 = {0,1,1} ;
predpt2 = {0,1,unwrap@(forward /. {x1->0,x2->1})};
pt3 = {1,0,1} ;
predpt3 = {1,0,unwrap@(forward /. {x1->1,x2->0})};
pt4 = {1,1,1};
predpt4 = {1,1,unwrap@(forward /. {x1->1,x2->1})};
Row[{
    Show[
        sty[]@vec[predpt1,pt1,"pt1 err"],
        sty[]@vec[predpt2,pt2,"pt2 err"],
        sty[]@vec[predpt3,pt3,"pt3 err"],
        sty[]@vec[predpt4,pt4,"pt4 err"],
        Plot3D[forward ,{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]
    Show[
        ContourPlot[forward,{x1,-1,1},{x2,-1,1}],
        Graphics[{PointSize[Large],Point[{{0,0},{0,1},{1,0},{1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]

}]
Output

Output

predictionDistpt2 := pt2[[2]]- (forward /. {x1 -> pt2[[1]], x2 -> pt2[[2]]})
SEpt2 := 0.5*(predictionDistpt2)^2
cout["original point:", pt2]
cout["predicted point:",predpt2]
cout["vanilla distance:",predictionDistpt2]
cout["mutated distance:",SEpt2]
$$\text{original point:}\{0,1,1\}$$

$$\text{predicted point:}\left\{0,1,\frac{1}{2}\right\}$$

$$\text{vanilla distance:}\left( \begin{array}{c} \frac{1}{2} \\ \end{array} \right)$$

$$\text{mutated distance:}\left( \begin{array}{c} 0.125 \\ \end{array} \right)$$

Distance DOES NOT mean 3D vector distance between the 2 points.
It only means the distance of the output aka 3rd parameter which is 1 and 1/2 in this case.
Even if it were the case, the 3D vector distance should be the same since the input parameters x1,x2 should be same in the original point and prediction.

Vanilla distance is simple subtraction.
Mutated distance is squaring vanilla distance then multiplying by 0.5.

2 Manual Training

  1. Loss function is not simply distance, it is a mutated distance!
  2. Aggregate(SUM) the Loss Function applied to each point.
  3. Gradient: derive symbolically the derivative of the Loss function wrt to weight.
    • Minimizing loss <=> (solving the derivative of loss function wrt weight = 0) , but since this is difficult we just iteratively subtract weight from derivative
    • Intuition Loss function represents the Hill of Errors
      • Each datapoint is allowed to have it’s own Hill of Error
      • x-axis = w1 ,y-axis = 2 in our Imaginary Hill of Error.
      • z-axis = Error of Prediction from ground truth.
      • Goal is reach the valley of No Error.
    • Gradient is calulated from:
      1. Subtracting the nonlinear sigmoid symbolic equation containing (w1,w2,x1,x2) from the target ground truth (y2).
      2. Then apply the Loss function Mutation (Square then multiply 0.5).
      3. Calculate derivative wrt to weights w1,w2.
      4. Output is a gradient
  4. Gradient:
    • INPUT: that takes (w1,w2,x1,x2,y2)
    • OUTPUTS: the slope aka direction wrt to weight w1,w2 that gets us closer to a valley.
  5. Calculate the derivatives aka Backprops for each datapoint by
    1. Gradient applied to (w1,w2,x1,x2,y2)
    2. Do this for all points resulting in 4 slopes
    3. Sum the backprops
      • Remember each datapoint has it’s own Hill of Error
      • Summing the backprops means summing the slopes on each Hill of Error. This summed backprop is the slope on the summed Bigger Hill of Error and this is okay to do because the sum of each backprop(derivative) is equal to the backprop(derivative) of the sum due to linearity of differentiation.
      • Intuitively just adding Hills of error for each datapoint is okay as well because we want to minimize overall error.
        • Imagine a scenarior where One hill of error on datapoint 1 gives us a backprop in one direction and the other hill of error on datapoint 2 gives us a backpropr in the other direction.
          The sum of the direction means the steepest backprop wins but if they are same steepness, then we don’t move.
          Of course one issue that is a universal problem in Neural net is getting trapped in a local minima.
  6. Sum the Backprops(derivatives) multiply by an alpha then use this to subtract from the Current weight to get our New weight.

Quick summary:

Forward prop just means calculate predicted output for each datapoint.
Forward prop is equivalent to the non-linear activation function composed with the linear function.
Back prop just means calculate the derivative for the loss function.

Some Terms:

2.1 Gradient formula

\(GradientFunction(x1,x2,w1,w2,y2) = \frac{\partial}{\partial w1,w2}Loss(x1,x2,w1,w2,y2)\)

  • INPUT: \(x1\) \(x2\) are data, \(y2\) is ground truth, \(w1\) \(w2\) are weights
  • OUTPUT: Differential Loss weight vector \((w1',w2')\)
  • USECASE: update weights with \((w1 \leftarrow w1-\alpha w1',w2 \leftarrow w2-\alpha w2')\)
(* The derivative *)
gradient =  D[0.5*(unwrap@nonlineq-y2)^2,beta];
cout[gradient]

$$\left\{\frac{1. \text{x1} e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2} \left(\frac{1}{e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2}+1}-\text{y2}\right)}{\left(e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2}+1\right)^2},\frac{1. \text{x2} e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2} \left(\frac{1}{e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2}+1}-\text{y2}\right)}{\left(e^{-\text{w1} \text{x1}-\text{w2} \text{x2}+2}+1\right)^2}\right\}$$

2.1.1 1st pass

wIter1 = {1,2};
cout["Initial weight vector=",wIter1]

$$\text{Initial weight vector=}\{1,2\}$$

forward1 = nonlineq /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]]};
forward1pt1 = forward1 /. {x1 -> pt1[[1]], x2 -> pt1[[2]] };
forward1pt2 = forward1 /. {x1 -> pt2[[1]], x2 -> pt2[[2]] };
forward1pt3 = forward1 /. {x1 -> pt3[[1]], x2 -> pt3[[2]] };
forward1pt4 = forward1 /. {x1 -> pt4[[1]], x2 -> pt4[[2]] };
     Show[
        sty[]@vec[{0,0,unwrap@forward1pt1},pt1,"pt1 err"],
        sty[]@vec[{0,1,unwrap@forward1pt2},pt2,"pt2 err"],
        sty[]@vec[{1,0,unwrap@forward1pt3},pt3,"pt3 err"],
        sty[]@vec[{1,1,unwrap@forward1pt4},pt4,"pt4 err"],
        Plot3D[forward1,{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]
    

Output

cout["w=",wIter1]
cout["surfaceplot=",forward1]
cout["predY=(",forward1,forward1pt1,forward1pt2,forward1pt3,forward1pt4,")"]
$$\text{w=}\{1,2\}$$

$$\text{surfaceplot=}\left( \begin{array}{c} \frac{1}{e^{-\text{x1}-2 \text{x2}+2}+1} \\ \end{array} \right)$$

$$\text{predY=(}\left( \begin{array}{c} \frac{1}{e^{-\text{x1}-2 \text{x2}+2}+1} \\ \end{array} \right)\left( \begin{array}{c} \frac{1}{1+e^2} \\ \end{array} \right)\left( \begin{array}{c} \frac{1}{2} \\ \end{array} \right)\left( \begin{array}{c} \frac{1}{1+e} \\ \end{array} \right)\left( \begin{array}{c} \frac{1}{1+\frac{1}{e}} \\ \end{array} \right))$$

backprop1pt1 = gradient /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]], x1 -> pt1[[1]], x2 -> pt1[[2]], y2 -> pt1[[3]]};
backprop1pt2 = gradient /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]], x1 -> pt2[[1]], x2 -> pt2[[2]], y2 -> pt2[[3]]};
backprop1pt3 = gradient /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]], x1 -> pt3[[1]], x2 -> pt3[[2]], y2 -> pt3[[3]]};
backprop1pt4 = gradient /. {w1 -> wIter1[[1]], w2 -> wIter1[[2]], x1 -> pt4[[1]], x2 -> pt4[[2]], y2 -> pt4[[3]]};

cout["differential loss weight vector for point 1=",backprop1pt1] 
cout["differential loss weight vector for point 2=",backprop1pt2] 
cout["differential loss weight vector for point 3=",backprop1pt3] 
cout["differential loss weight vector for point 4=",backprop1pt4] 
$$\text{differential loss weight vector for point 1=}\{0.,0.\}$$

$$\text{differential loss weight vector for point 2=}\{0.,-0.125\}$$

$$\text{differential loss weight vector for point 3=}\{-0.143735,0.\}$$

$$\text{differential loss weight vector for point 4=}\{-0.0528771,-0.0528771\}$$

2.1.2 2nd Pass

wIter2 = wIter1 - 10*(backprop1pt1 + backprop1pt2 + backprop1pt3 + backprop1pt4);
cout["updated weight vector for 2nd forward pass=", wIter2]

$$\text{updated weight vector for 2nd forward pass=}\{2.96612,3.77877\}$$

forward2 = nonlineq /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]]};
forward2pt1 = forward2 /. {x1 -> pt1[[1]], x2 -> pt1[[2]] };
forward2pt2 = forward2 /. {x1 -> pt2[[1]], x2 -> pt2[[2]] };
forward2pt3 = forward2 /. {x1 -> pt3[[1]], x2 -> pt3[[2]] };
forward2pt4 = forward2 /. {x1 -> pt4[[1]], x2 -> pt4[[2]] };

    Show[
        sty[]@vec[{0,0,unwrap@forward2pt1},pt1,"pt1 err"],
        sty[]@vec[{0,1,unwrap@forward2pt2},pt2,"pt2 err"],
        sty[]@vec[{1,0,unwrap@forward2pt3},pt3,"pt3 err"],
        sty[]@vec[{1,1,unwrap@forward2pt4},pt4,"pt4 err"],
        Plot3D[forward2,{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]

Output

cout["w=",wIter2]
cout["surfaceplot=",forward2]
cout["predY=(",forward2,forward2pt1,forward2pt2,forward2pt3,forward2pt4,")"]
$$\text{w=}\{2.96612,3.77877\}$$

$$\text{surfaceplot=}\left( \begin{array}{c} \frac{1}{e^{-2.96612 \text{x1}-3.77877 \text{x2}+2}+1} \\ \end{array} \right)$$

$$\text{predY=(}\left( \begin{array}{c} \frac{1}{e^{-2.96612 \text{x1}-3.77877 \text{x2}+2}+1} \\ \end{array} \right)\left( \begin{array}{c} 0.119203 \\ \end{array} \right)\left( \begin{array}{c} 0.855545 \\ \end{array} \right)\left( \begin{array}{c} 0.724345 \\ \end{array} \right)\left( \begin{array}{c} 0.991379 \\ \end{array} \right))$$

backprop2pt1 = gradient /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]], x1 -> pt1[[1]], x2 -> pt1[[2]], y2 -> pt1[[3]]};
backprop2pt2 = gradient /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]], x1 -> pt2[[1]], x2 -> pt2[[2]], y2 -> pt2[[3]]};
backprop2pt3 = gradient /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]], x1 -> pt3[[1]], x2 -> pt3[[2]], y2 -> pt3[[3]]};
backprop2pt4 = gradient /. {w1 -> wIter2[[1]], w2 -> wIter2[[2]], x1 -> pt4[[1]], x2 -> pt4[[2]], y2 -> pt4[[3]]};

cout["differential loss weight vector for point 1=",backprop2pt1] 
cout["differential loss weight vector for point 2=",backprop2pt2] 
cout["differential loss weight vector for point 3=",backprop2pt3] 
cout["differential loss weight vector for point 4=",backprop2pt4] 
$$\text{differential loss weight vector for point 1=}\{0.,0.\}$$

$$\text{differential loss weight vector for point 2=}\{0.,-0.0178529\}$$

$$\text{differential loss weight vector for point 3=}\{-0.0550397,0.\}$$

$$\text{differential loss weight vector for point 4=}\{-0.0000736817,-0.0000736817\}$$

2.1.3 3rd Pass

wIter3 = wIter2 - 10*(backprop2pt1 + backprop2pt2 + backprop2pt3 + backprop2pt4);
cout["updated weight vector for 3rd forward pass=", wIter3]

$$\text{updated weight vector for 3rd forward pass=}\{3.51725,3.95804\}$$

forward3 = nonlineq /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]]};
forward3pt1 = forward3 /. {x1 -> pt1[[1]], x2 -> pt1[[2]] };
forward3pt2 = forward3 /. {x1 -> pt2[[1]], x2 -> pt2[[2]] };
forward3pt3 = forward3 /. {x1 -> pt3[[1]], x2 -> pt3[[2]] };
forward3pt4 = forward3 /. {x1 -> pt4[[1]], x2 -> pt4[[2]] };

    Show[
        sty[]@vec[{0,0,unwrap@forward3pt1},pt1,"pt1 err"],
        sty[]@vec[{0,1,unwrap@forward3pt2},pt2,"pt2 err"],
        sty[]@vec[{1,0,unwrap@forward3pt3},pt3,"pt3 err"],
        sty[]@vec[{1,1,unwrap@forward3pt4},pt4,"pt4 err"],
        Plot3D[forward3,{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]

Output

cout["w=",wIter3]
cout["surfaceplot=",forward3]
cout["predY=(",forward3,forward3pt1,forward3pt2,forward3pt3,forward3pt4,")"]
$$\text{w=}\{3.51725,3.95804\}$$

$$\text{surfaceplot=}\left( \begin{array}{c} \frac{1}{e^{-3.51725 \text{x1}-3.95804 \text{x2}+2}+1} \\ \end{array} \right)$$

$$\text{predY=(}\left( \begin{array}{c} \frac{1}{e^{-3.51725 \text{x1}-3.95804 \text{x2}+2}+1} \\ \end{array} \right)\left( \begin{array}{c} 0.119203 \\ \end{array} \right)\left( \begin{array}{c} 0.87632 \\ \end{array} \right)\left( \begin{array}{c} 0.820134 \\ \end{array} \right)\left( \begin{array}{c} 0.995828 \\ \end{array} \right))$$

backprop3pt1 = gradient /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]], x1 -> pt1[[1]], x2 -> pt1[[2]], y2 -> pt1[[3]]};
backprop3pt2 = gradient /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]], x1 -> pt2[[1]], x2 -> pt2[[2]], y2 -> pt2[[3]]};
backprop3pt3 = gradient /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]], x1 -> pt3[[1]], x2 -> pt3[[2]], y2 -> pt3[[3]]};
backprop3pt4 = gradient /. {w1 -> wIter3[[1]], w2 -> wIter3[[2]], x1 -> pt4[[1]], x2 -> pt4[[2]], y2 -> pt4[[3]]};

cout["differential loss weight vector for point 1=",backprop3pt1] 
cout["differential loss weight vector for point 2=",backprop3pt2] 
cout["differential loss weight vector for point 3=",backprop3pt3] 
cout["differential loss weight vector for point 4=",backprop3pt4] 
$$\text{differential loss weight vector for point 1=}\{0.,0.\}$$

$$\text{differential loss weight vector for point 2=}\{0.,-0.0134048\}$$

$$\text{differential loss weight vector for point 3=}\{-0.0265329,0.\}$$

$$\text{differential loss weight vector for point 4=}\{-0.0000173291,-0.0000173291\}$$

2.1.4 4th Pass

wIter4 = wIter3 - 10*(backprop3pt1 + backprop3pt2 + backprop3pt3 + backprop3pt4);
cout["updated weight vector for 4th forward pass=", wIter4]

$$\text{updated weight vector for 4th forward pass=}\{3.78276,4.09226\}$$

forward4 = nonlineq /. {w1 -> wIter4[[1]], w2 -> wIter4[[2]]};
forward4pt1 = forward4 /. {x1 -> pt1[[1]], x2 -> pt1[[2]] };
forward4pt2 = forward4 /. {x1 -> pt2[[1]], x2 -> pt2[[2]] };
forward4pt3 = forward4 /. {x1 -> pt3[[1]], x2 -> pt3[[2]] };
forward4pt4 = forward4 /. {x1 -> pt4[[1]], x2 -> pt4[[2]] };

    Show[
        sty[]@vec[{0,0,unwrap@forward4pt1},pt1,"pt1 err"],
        sty[]@vec[{0,1,unwrap@forward4pt2},pt2,"pt2 err"],
        sty[]@vec[{1,0,unwrap@forward4pt3},pt3,"pt3 err"],
        sty[]@vec[{1,1,unwrap@forward4pt4},pt4,"pt4 err"],
        Plot3D[forward4,{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True
    ]

Output

cout["w=",wIter4]
cout["surfaceplot=",forward4]
cout["predY=(",forward4,forward4pt1,forward4pt2,forward4pt3,forward4pt4,")"]
$$\text{w=}\{3.78276,4.09226\}$$

$$\text{surfaceplot=}\left( \begin{array}{c} \frac{1}{e^{-3.78276 \text{x1}-4.09226 \text{x2}+2}+1} \\ \end{array} \right)$$

$$\text{predY=(}\left( \begin{array}{c} \frac{1}{e^{-3.78276 \text{x1}-4.09226 \text{x2}+2}+1} \\ \end{array} \right)\left( \begin{array}{c} 0.119203 \\ \end{array} \right)\left( \begin{array}{c} 0.890148 \\ \end{array} \right)\left( \begin{array}{c} 0.856037 \\ \end{array} \right)\left( \begin{array}{c} 0.997199 \\ \end{array} \right))$$

In the next section we will try to automate this process of gradient descent.

3 Gradient descent looping

pt1 = {0,0,0};
pt2 = {0,1,1};
pt3 = {1,0,1};
pt4 = {1,1,1};
wIter1 = {1,2};
gradient =  D[0.5*(unwrap@nonlineq-y2)^2,beta];

forwardSurface[wIter_] := nonlineq /. {w1 -> wIter[[1]], w2 -> wIter[[2]]};
forwardPropPoints[pnts_,wIter_] := forwardSurface[wIter] /. {x1 -> pnts[[1]], x2 -> pnts[[2]] }
backwardPropWeights[pnts_,wIter_] := gradient /. {w1 -> wIter[[1]], w2 -> wIter[[2]], x1 -> pnts[[1]], x2 -> pnts[[2]], y2 -> pnts[[3]]};


y2predPoints[wIter_] := unwrap@forwardPropPoints[#,wIter]& /@ {pt1,pt2,pt3,pt4}

weightDescentLossHill[wIter_] := backwardPropWeights[#,wIter]& /@ {pt1,pt2,pt3,pt4}
updateWeight[wIter_,alpha_] := wIter - (alpha*Total[weightDescentLossHill[wIter]])

(* required functions shown above *)



updateWeight[wIter1,10]
updateWeight[updateWeight[wIter1,10],10]
NestList[updateWeight[#,10]&,wIter1,4]

weightList[wIter_,alpha_,n_] := NestList[updateWeight[#,alpha]&,wIter,n];

buildPoints[y2_] := {{0,0,y2[[1]]},{0,1,y2[[2]]},{1,0,y2[[3]]},{1,1,y2[[4]]}} 
plotPoints[wIter_] := vec @@@ MapThread[List,{buildPoints[y2predPoints[wIter]],{pt1,pt2,pt3,pt4}}]

(* just modify the alpha 4, and iteration 11 in weightSequence *)
weightSequence = weightList[wIter1,4,11];

showPlot[wIter_] := Show[
    plotPoints[wIter],
    Plot3D[forwardSurface[wIter],{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
    Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
    ImageSize->Small,
    Axes->True
];

Row[showPlot[#]& /@ weightSequence]
 
Output

Output

Output

Output

4 Manipulate

nthWeight[wIter_,alpha_,n_] := Nest[updateWeight[#,alpha]&, wIter,n]
nthWeight[wIter1,3,6];
(* nthWeight is a failed attempt *)

(* modify only these 2 below *)
alpha = 2;
numIterations = 30;

weightManipulateSequence = Evaluate@weightList[wIter1,alpha,numIterations];
surfaceSequence = forwardSurface[#]& /@ weightManipulateSequence;
plotPointsSequence = plotPoints[#]& /@ weightManipulateSequence;
completeSequence = MapThread[List,{surfaceSequence,plotPointsSequence}];
Manipulate[

    Show[
        #[[z]][[2]],
        Plot3D[#[[z]][[1]],{x1,-1,1},{x2,-1,1},PlotStyle->Opacity[0.3],ColorFunction -> Function[{x, y, z}, Hue[z]]],
        Graphics3D[{PointSize[Large],Point[{{0,0,0},{0,1,1},{1,0,1},{1,1,1}}]}],
        ImageSize->Small,
        Axes->True 
    ],
    
    {z,1,numIterations,1}]& @ completeSequence

Output

5 Visual Loss

ClearAll[x,x1,x2,w1,w2]
x={x1,x2};
pt1 = {0,0,0};
pt2 = {0,1,1};
pt3 = {1,0,1};
pt4 = {1,1,1};
wIter1 = {1,2};
gradient =  D[0.5*(unwrap@nonlineq-y2)^2,beta];

forwardSurface[wIter_] := nonlineq /. {w1 -> wIter[[1]], w2 -> wIter[[2]]};
forwardPropPoints[pnts_,wIter_] := forwardSurface[wIter] /. {x1 -> pnts[[1]], x2 -> pnts[[2]] }
theLoss[pnts_,wIter_] := 0.5*(pnts[[3]] - forwardPropPoints[pnts,wIter])^2
pt1[[3]] - forwardPropPoints[pt1,{1,1}]

Output

theLoss[pt1,{0.5,0.3}]
theLoss[pt2,{0.5,0.3}]
{0.0000223971}
{0.491027}
Row[{
    Show[
        Plot3D[theLoss[pt1,{wa1,wa2}],{wa1,-10,10},{wa2,-10,10},Evaluate@opt3d[0.5]],
        sty[]@vec[{0.5,0.3,-0.1193203}],
        ImageSize->Small,
        Axes->True
        ]
    Show[
        Plot3D[theLoss[pt2,{wa1,wa2}],{wa1,-10,10},{wa2,-10,10},Evaluate@opt3d[0.5]],
        ImageSize->Small,
        Axes->True
        ]
    Show[
        Plot3D[theLoss[pt3,{wa1,wa2}],{wa1,-10,10},{wa2,-10,10},Evaluate@opt3d[0.5]],
        ImageSize->Small,
        Axes->True
        ]
    Show[
        Plot3D[theLoss[pt4,{wa1,wa2}],{wa1,-10,0},{wa2,-10,10},Evaluate@opt3d[0.5]],
        ImageSize->Small,
        Axes->True
        ]
    }]

Output

Above are the loss functions for each of the 4 points
Next we will add up these loss functions

totalLoss[wIter_] := theLoss[pt1,wIter] + theLoss[pt2,wIter] + theLoss[pt3,wIter] + theLoss[pt4,wIter];
Show[Plot3D[ totalLoss[{wa1,wa2}],{wa1,-10,10},{wa2,-10,10},Evaluate@opt3d[0.3]],
 sty[]@vec[{1,2,0.5},{1.6,2.4,0.2}]
 ]

Output

5.1 Climbing down the total loss function

Sum of derivative is the derivative of the Sum
Sum of differential weights of each of our 4 point’s loss functions is the differential weight of the total loss function.

  • Below is a sequence of {{w1,w2,loss},{w1,w2,loss}…}
  • {1,2} are the weights, 0.435493 is the loss. (remember loss is not simply difference)
  • Notice how the descent decreases the loss
descent = Flatten /@ MapThread[List,{weightSequence,totalLoss /@ weightSequence}]
{{1, 2, 0.435493}, {1.78645, 2.71151, 0.217208}, {2.35472, 3.02373, 0.127546}, 
 
>   {2.75847, 3.23351, 0.0835834}, {3.03693, 3.39241, 0.0612339}, 
 
>   {3.23979, 3.51987, 0.0484416}, {3.39638, 3.62588, 0.0403185}, 
 
>   {3.52275, 3.71638, 0.0347525}, {3.62815, 3.79518, 0.0307206}, 
 
>   {3.71825, 3.86485, 0.027675}, {3.79676, 3.92723, 0.0252987}, 
 
>   {3.86623, 3.98364, 0.0233962}}
Show[Plot3D[ totalLoss[{wa1,wa2}],{wa1,-10,10},{wa2,-10,10},Evaluate@opt3d[0.3]],
 sty[]@vec[descent[[1]],descent[[2]]],
 sty[]@vec[descent[[2]],descent[[3]]],
 sty[]@vec[descent[[3]],descent[[4]]],
 sty[]@vec[descent[[4]],descent[[5]]],
 sty[]@vec[descent[[5]],descent[[6]]],
 sty[]@vec[descent[[6]],descent[[7]]],
 sty[]@vec[descent[[7]],descent[[8]]],
 sty[]@vec[descent[[8]],descent[[9]]],
 sty[]@vec[descent[[9]],descent[[10]]],
 sty[]@vec[descent[[10]],descent[[11]]],
 sty[]@vec[descent[[11]],descent[[12]]]

 ]

Output