From 37c654fc5db83b7bbc124e4ad5aa27d6ab0bcf40 Mon Sep 17 00:00:00 2001 From: Ayush Joshi Date: Thu, 16 Nov 2023 11:38:04 +0530 Subject: [PATCH] Added a brief explaination of `Training Neural Networks` into the main `ai` documentation Signed-off-by: Ayush Joshi --- .github/workflows/docs.yml | 4 +-- docs/ml/Neural-Networks.md | 3 ++- docs/ml/README.md | 2 +- docs/ml/Training-Neural-Networks.md | 39 +++++++++++++++++++++++++++++ 4 files changed, 44 insertions(+), 4 deletions(-) create mode 100644 docs/ml/Training-Neural-Networks.md diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index 2753d83..4ef43e3 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -2,8 +2,8 @@ name: github-pages on: push: - branches: - - docs + paths: + - 'CHANGELOG.md' permissions: actions: write diff --git a/docs/ml/Neural-Networks.md b/docs/ml/Neural-Networks.md index 802552d..ec3f59a 100644 --- a/docs/ml/Neural-Networks.md +++ b/docs/ml/Neural-Networks.md @@ -126,4 +126,5 @@ Now our model has all the standard components of what people usually mean when t * A set of nodes, analogous to neurons, organized in layers. * A set of weights representing the connections between each neural network layer and the layer beneath it. The layer beneath may be another neural network layer, or some other kind of layer. * A set of biases, one for each node. -* An activation function that transforms the output of each node in a layer. Different layers may have different activation functions. \ No newline at end of file +* An activation function that transforms the output of each node in a layer. Different layers may have different activation functions. + diff --git a/docs/ml/README.md b/docs/ml/README.md index 91b715f..d84144b 100644 --- a/docs/ml/README.md +++ b/docs/ml/README.md @@ -14,6 +14,6 @@ 12. [Classification](https://github.com/joshiayush/ai/blob/master/docs/ml/Classification.md) 13. [Regularization for Sparsity](https://github.com/joshiayush/ai/blob/master/docs/ml/Regularization-for-Sparsity.md) 14. [Neural Networks](https://github.com/joshiayush/ai/blob/master/docs/ml/Neural-Networks.md) -15. Training Neural Nets +15. [Training Neural Nets](https://github.com/joshiayush/ai/blob/master/docs/ml/Training-Neural-Nets.md) 16. Multi-Class Neural Nets 17. Embeddings \ No newline at end of file diff --git a/docs/ml/Training-Neural-Networks.md b/docs/ml/Training-Neural-Networks.md new file mode 100644 index 0000000..b2dacdc --- /dev/null +++ b/docs/ml/Training-Neural-Networks.md @@ -0,0 +1,39 @@ +# Training Neural Networks + +**Backpropagation** is the most common training algorithm for neural networks. It makes gradient descent feasible for multi-layer neural networks. + +## Best Practices + +This section explains backpropagation's failure cases and the most common way to regularize a neural network. + +### Failure Cases + +There are a number of common ways for backpropagation to go wrong. + +#### Vanishing Gradients + +The gradients for the lower layers (closer to the input) can become very small. In deep networks, computing these gradients can involve taking the product of many small terms. + +When the gradients vanish toward 0 for the lower layers, these layers train very slowly, or not at all. + +The ReLU activation function can help prevent vanishing gradients. + +#### Exploding Gradients + +If the weights in a network are very large, then the gradients for the lower layers involve products of many large terms. In this case you can have exploding gradients: gradients that get too large to converge. + +Batch normalization can help prevent exploding gradients, as can lowering the learning rate. + +#### Dead ReLU Units + +Once the weighted sum for a ReLU unit falls below 0, the ReLU unit can get stuck. It outputs 0 activation, contributing nothing to the network's output, and gradients can no longer flow through it during backpropagation. With a source of gradients cut off, the input to the ReLU may not ever change enough to bring the weighted sum back above 0. + +Lowering the learning rate can help keep ReLU units from dying. + +### Dropout Regularization + +Yet another form of regularization, called **Dropout**, is useful for neural networks. It works by randomly "dropping out" unit activations in a network for a single gradient step. The more you drop out, the stronger the regularization: + +* 0.0 = No dropout regularization. +* 1.0 = Drop out everything. The model learns nothing. +* Values between 0.0 and 1.0 = More useful. \ No newline at end of file