Skip to content

Commit

Permalink
Added a brief explaination of Training Neural Networks into the mai…
Browse files Browse the repository at this point in the history
…n `ai` documentation

Signed-off-by: Ayush Joshi <[email protected]>
  • Loading branch information
joshiayush committed Nov 16, 2023
1 parent 16b5a6c commit 26ba4bf
Show file tree
Hide file tree
Showing 5 changed files with 45 additions and 4 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ name: github-pages

on:
push:
branches:
- docs
paths:
- 'CHANGELOG.md'

permissions:
actions: write
Expand Down
1 change: 1 addition & 0 deletions ai/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ def _PreprocessReadme(fpath: Union[str, pathlib.Path]) -> str:
'Classification.md',
'Regularization-for-Sparsity.md',
'Neural-Networks.md',
'Training-Neural-Networks.md',
)


Expand Down
3 changes: 2 additions & 1 deletion docs/ml/Neural-Networks.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,4 +126,5 @@ Now our model has all the standard components of what people usually mean when t
* A set of nodes, analogous to neurons, organized in layers.
* A set of weights representing the connections between each neural network layer and the layer beneath it. The layer beneath may be another neural network layer, or some other kind of layer.
* A set of biases, one for each node.
* An activation function that transforms the output of each node in a layer. Different layers may have different activation functions.
* An activation function that transforms the output of each node in a layer. Different layers may have different activation functions.

2 changes: 1 addition & 1 deletion docs/ml/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@
12. [Classification](https://github.com/joshiayush/ai/blob/master/docs/ml/Classification.md)
13. [Regularization for Sparsity](https://github.com/joshiayush/ai/blob/master/docs/ml/Regularization-for-Sparsity.md)
14. [Neural Networks](https://github.com/joshiayush/ai/blob/master/docs/ml/Neural-Networks.md)
15. Training Neural Nets
15. [Training Neural Networks](https://github.com/joshiayush/ai/blob/master/docs/ml/Training-Neural-Networks.md)
16. Multi-Class Neural Nets
17. Embeddings
39 changes: 39 additions & 0 deletions docs/ml/Training-Neural-Networks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Training Neural Networks

**Backpropagation** is the most common training algorithm for neural networks. It makes gradient descent feasible for multi-layer neural networks.

## Best Practices

This section explains backpropagation's failure cases and the most common way to regularize a neural network.

### Failure Cases

There are a number of common ways for backpropagation to go wrong.

#### Vanishing Gradients

The gradients for the lower layers (closer to the input) can become very small. In deep networks, computing these gradients can involve taking the product of many small terms.

When the gradients vanish toward 0 for the lower layers, these layers train very slowly, or not at all.

The ReLU activation function can help prevent vanishing gradients.

#### Exploding Gradients

If the weights in a network are very large, then the gradients for the lower layers involve products of many large terms. In this case you can have exploding gradients: gradients that get too large to converge.

Batch normalization can help prevent exploding gradients, as can lowering the learning rate.

#### Dead ReLU Units

Once the weighted sum for a ReLU unit falls below 0, the ReLU unit can get stuck. It outputs 0 activation, contributing nothing to the network's output, and gradients can no longer flow through it during backpropagation. With a source of gradients cut off, the input to the ReLU may not ever change enough to bring the weighted sum back above 0.

Lowering the learning rate can help keep ReLU units from dying.

### Dropout Regularization

Yet another form of regularization, called **Dropout**, is useful for neural networks. It works by randomly "dropping out" unit activations in a network for a single gradient step. The more you drop out, the stronger the regularization:

* 0.0 = No dropout regularization.
* 1.0 = Drop out everything. The model learns nothing.
* Values between 0.0 and 1.0 = More useful.

0 comments on commit 26ba4bf

Please sign in to comment.