Added concept of Linear Unit, Layers, Stacking Dense Layers, an…

…d `Dropout and Batch Normalization` to `Neural Networks`
joshiayush · Nov 26, 2023 · cd1c8ea · cd1c8ea
1 parent 429f1f1
commit cd1c8ea
Showing 1 changed file with 238 additions and 7 deletions.
diff --git a/notebooks/ml/Machine_Learning.ipynb b/notebooks/ml/Machine_Learning.ipynb
@@ -3289,7 +3289,7 @@
         "\n",
         "<div align=\"center\">\n",
         "\n",
-        "<img src=\"https://developers.google.com/static/machine-learning/crash-course/images/ROCCurve.svg\" width=\"400\" height=\"400\" />\n",
+        "<img src=\"https://developers.google.com/static/machine-learning/crash-course/images/ROCCurve.svg\" />\n",
         "\n",
         "<strong>Figure 4. TP vs. FP rate at different classification thresholds.</strong>\n",
         "\n",
@@ -3317,7 +3317,7 @@
         "\n",
         "<div align=\"center\">\n",
         "\n",
-        "<img src=\"https://developers.google.com/static/machine-learning/crash-course/images/AUC.svg\" width=\"400\" height=\"400\" />\n",
+        "<img src=\"https://developers.google.com/static/machine-learning/crash-course/images/AUC.svg\" />\n",
         "\n",
         "<strong>Figure 5. AUC (Area under the ROC Curve).</strong>\n",
         "\n",
@@ -3602,11 +3602,104 @@
         "id": "qoqUz62BTB1q"
       },
       "source": [
-        "Each blue circle represents an input feature, and the green circle represents the weighted sum of the inputs.\n",
-        "\n",
-        "How can we alter this model to improve its ability to deal with nonlinear problems?"
+        "Each blue circle represents an input feature, and the green circle represents the weighted sum of the inputs."
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### The Linear Unit\n",
+        "\n",
+        "So let's begin with the fundamental component of a neural network: the individual neuron. As a diagram, a **neuron** (or **unit**) with one input looks like:\n",
+        "\n",
+        "<div align='center'>\n",
+        "\n",
+        "<img src=\"https://storage.googleapis.com/kaggle-media/learn/images/mfOlDR6.png\" />\n",
+        "\n",
+        "<strong><i>The Linear Unit:</i> y = wx + b</strong>\n",
+        "\n",
+        "</div>\n",
+        "\n",
+        "The input is $x$. Its connection to the neuron has a **weight** which is $w$. Whenever a value flows through a connection, you multiply the value by the connection's weight. For the input $x$, what reaches the neuron is $w * x$. A neural network \"learns\" by modifying its weights.\n",
+        "\n",
+        "The $b$ is a special kind of weight we call it **bias**. The bias doesn't have any input data associated with it; instead, we put a $1$ in the diagram so that the value that reaches the neuron is just $b$ (since $1 * b = b$). The bias enables the neuron to modify the output independently of its inputs.\n",
+        "\n",
+        "The $y$ is the value the neuron ultimately outputs. To get the output, the neuron sums up all the values it receives through its connections. This neuron's activation is $y = w * x + b$, or as a formula $y=wx+b$."
+      ],
+      "metadata": {
+        "id": "Y4beOJzTP6MS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Multiple Inputs\n",
+        "\n",
+        "In the previous section we saw how can we handle a single input using *The Linear Unit*, but what if we wanted to expand our model to include more inputs? That's easy enough. We can just add more input connections to the neuron, one for each additional feature. To find the output, we would multiply each input to its connection weight and then add them all together."
+      ],
+      "metadata": {
+        "id": "WYYOQ9BSRN3F"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<div align='center'>\n",
+        "\n",
+        "<img src=\"https://storage.googleapis.com/kaggle-media/learn/images/vyXSnlZ.png\" />\n",
+        "\n",
+        "<strong>A linear unit with three inputs.</strong>\n",
+        "\n",
+        "</div>"
+      ],
+      "metadata": {
+        "id": "AWTrr3PiR6et"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The formula for this neuron would be $y=w0x0+w1x1+w2x2+b$. A linear unit with two inputs will fit a plane, and a unit with more inputs than that will fit a hyperplane."
+      ],
+      "metadata": {
+        "id": "DXYdbGAFR9EM"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Layers\n",
+        "\n",
+        "Neural networks typically organize their neurons into **layers**. When we collect together linear units having a common set of inputs we get a **dense** layer."
+      ],
+      "metadata": {
+        "id": "fppT1ZWPSUqu"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<div align='center'>\n",
+        "\n",
+        "<img src=\"https://storage.googleapis.com/kaggle-media/learn/images/2MA4iMV.png\" />\n",
+        "\n",
+        "<strong>A dense layer of two linear units receiving two inputs and a bias.</strong>\n",
+        "\n",
+        "</div>"
+      ],
+      "metadata": {
+        "id": "yWc04mriSfTQ"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "You could think of each layer in a neural network as performing some kind of relatively simple transformation. Through a deep stack of layers, a neural network can transform its inputs in more and more complex ways. In a well-trained neural network, each layer is a transformation getting us a little bit closer to a solution."
+      ],
+      "metadata": {
+        "id": "b1NS6SG5SreL"
+      }
+    },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -3667,14 +3760,36 @@
         "Is this model still linear? Yes, it is. When you express the output as a function of the input and simplify, you get just another weighted sum of the inputs. This sum won't effectively model the nonlinear problem in Figure 2."
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Activation Functions"
+      ],
+      "metadata": {
+        "id": "dQixfYTwTdur"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<div align='center'>\n",
+        "\n",
+        "<img src=\"https://storage.googleapis.com/kaggle-media/learn/images/OLSUEYT.png\" />\n",
+        "\n",
+        "<i>Without activation functions, neural networks can only learn linear relationships. In order to fit curves, we'll need to use activation functions.<i>\n",
+        "\n",
+        "</div>"
+      ],
+      "metadata": {
+        "id": "lrCQyVaQTjHc"
+      }
+    },
     {
       "cell_type": "markdown",
       "metadata": {
         "id": "1uIQpA6xT4c8"
       },
       "source": [
-        "### Activation Functions\n",
-        "\n",
         "To model a nonlinear problem, we can directly introduce a nonlinearity. We can pipe each hidden layer node through a nonlinear function.\n",
         "\n",
         "In the model represented by the following graph, the value of each node in Hidden Layer 1 is transformed by a nonlinear function before being passed on to the weighted sums of the next layer. This nonlinear function is called the activation function."
@@ -3695,6 +3810,54 @@
         "</div>"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "An **activation function** is simply some function we apply to each of a layer's outputs (its activations). The most common is the rectifier function  $max(0,x)$."
+      ],
+      "metadata": {
+        "id": "TPZQiL1iUPNO"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<div align='center'>\n",
+        "\n",
+        "<img src=\"https://storage.googleapis.com/kaggle-media/learn/images/aeIyAlF.png\" />\n",
+        "\n",
+        "</div>"
+      ],
+      "metadata": {
+        "id": "sObsNBnKUXvx"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The rectifier function has a graph that's a line with the negative part \"rectified\" to zero. Applying the function to the outputs of a neuron will put a bend in the data, moving us away from simple lines.\n",
+        "\n",
+        "When we attach the rectifier to a linear unit, we get a **rectified linear unit** or **ReLU**. (For this reason, it's common to call the rectifier function the \"ReLU function\".) Applying a ReLU activation to a linear unit means the output becomes $max(0, w * x + b)$, which we might draw in a diagram like:"
+      ],
+      "metadata": {
+        "id": "3Gr42j0xUdx5"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<div align='center'>\n",
+        "\n",
+        "<img src=\"https://storage.googleapis.com/kaggle-media/learn/images/eFry7Yu.png\" />\n",
+        "\n",
+        "<i>A rectified linear unit.</i>\n",
+        "\n",
+        "</div>"
+      ],
+      "metadata": {
+        "id": "LT_El95dUthu"
+      }
+    },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -3761,6 +3924,43 @@
         "TensorFlow provides out-of-the-box support for many activation functions. You can find these activation functions within TensorFlow's [list of wrappers for primitive neural network operations](https://www.tensorflow.org/api_docs/python/tf/nn). That said, we still recommend starting with ReLU."
       ]
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Stacking Dense Layers\n",
+        "\n",
+        "Now that we have some nonlinearity, let's see how we can stack layers to get complex data transformations."
+      ],
+      "metadata": {
+        "id": "M6WFk9TTVCb8"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<div align='center'>\n",
+        "\n",
+        "<img src=\"https://storage.googleapis.com/kaggle-media/learn/images/Y5iwFQZ.png\" />\n",
+        "\n",
+        "<i>A stack of dense layers makes a \"fully-connected\" network.</i>\n",
+        "\n",
+        "</div>"
+      ],
+      "metadata": {
+        "id": "7ZPkfHYbVGcn"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "The layers before the output layer are sometimes called **hidden** since we never see their outputs directly.\n",
+        "\n",
+        "Now, notice that the final (output) layer is a linear unit (meaning, no activation function). That makes this network appropriate to a regression task, where we are trying to predict some arbitrary numeric value. Other tasks (like classification) might require an activation function on the output."
+      ],
+      "metadata": {
+        "id": "wJvqRos5VOlu"
+      }
+    },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -3844,11 +4044,42 @@
       "metadata": {
         "id": "ZNVls-ewkXoE"
       }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "<div align=\"center\">\n",
+        "\n",
+        "<img src=\"https://storage.googleapis.com/kaggle-media/learn/images/a86utxY.gif\" />\n",
+        "\n",
+        "<i>Here, 50% dropout has been added between the two hidden layers.</i>\n",
+        "\n",
+        "</div>"
+      ],
+      "metadata": {
+        "id": "9AHUCy0GWXlj"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Batch Normalization (batchnorm)\n",
+        "\n",
+        "With neural networks, it's generally a good idea to put all of your data on a common scale. The reason is that SGD will shift the network weights in proportion to how large an activation the data produces. Features that tend to produce activations of very different sizes can make for unstable training behavior.\n",
+        "\n",
+        "Now, if it's good to normalize the data before it goes into the network, maybe also normalizing inside the network would be better! In fact, we have a special kind of layer that can do this, the **batch normalization layer**. A batch normalization layer looks at each batch as it comes in, first normalizing the batch with its own mean and standard deviation, and then also putting the data on a new scale with two trainable rescaling parameters. Batchnorm, in effect, performs a kind of coordinated rescaling of its inputs.\n",
+        "\n",
+        "Most often, batchnorm is added as an aid to the optimization process (though it can sometimes also help prediction performance). Models with batchnorm tend to need fewer epochs to complete training. Moreover, batchnorm can also fix various problems that can cause the training to get \"stuck\". Consider adding batch normalization to your models, especially if you're having trouble during training."
+      ],
+      "metadata": {
+        "id": "BjBWVeQaW3ow"
+      }
     }
   ],
   "metadata": {
     "colab": {
       "provenance": [],
+      "toc_visible": true,
       "include_colab_link": true
     },
     "kernelspec": {