diff --git a/misc/expgraph3.webp b/misc/expgraph3.webp new file mode 100644 index 00000000..56566b0c Binary files /dev/null and b/misc/expgraph3.webp differ diff --git a/module2/module2/index.html b/module2/module2/index.html index 7551387e..3ec0f487 100644 --- a/module2/module2/index.html +++ b/module2/module2/index.html @@ -1406,13 +1406,37 @@

Converts a multidimensional tensor index into a single-dimensional position in storage based on strides.

-
-
index : index tuple of ints
-strides : tensor strides
-
-
-
Position in storage
-
+ + +

Parameters:

+ + + +

Returns:

+ @@ -1433,11 +1457,33 @@

Should ensure that enumerating position 0 ... size of a tensor produces every index exactly once. It may not be the inverse of index_to_position.

-
-
ordinal: ordinal position to convert.
-shape : tensor shape.
-out_index : return index corresponding to position.
-
+ + +

Parameters:

+ @@ -1455,12 +1501,33 @@

Permute the dimensions of the tensor.

-
-
*order: a permutation of the dimensions
-
-
-
New `TensorData` with the same storage and a new dimension order.
-
+ + +

Parameters:

+ + + +

Returns:

+
@@ -1484,16 +1551,49 @@

Broadcast two shapes to create a new union shape.

-
-
shape1 : first shape
-shape2 : second shape
-
-
-
broadcasted shape
-
-
-
IndexingError : if cannot broadcast
-
+ + +

Parameters:

+ + + +

Returns:

+ + + +

Raises:

+
@@ -1515,15 +1615,51 @@

it may be larger or with more dimensions than the shape given. Additional dimensions may need to be mapped to 0 or removed.

-
-
big_index : multidimensional index of bigger tensor
-big_shape : tensor shape of bigger tensor
-shape : tensor shape of smaller tensor
-out_index : multidimensional index of smaller tensor
-
-
-
None
-
+ + +

Parameters:

+ + + +

Returns:

+ @@ -1542,52 +1678,12 @@

the expression at in Streamlit to view the graph

y = x * z + 10.0
 
-
>>> python project/show_expression.py
+
>>> streamlit run project/app.py -- 2
 
-

+

Todo

-

Add functions in minitorch/tensor.py, minitorch/tensor_ops.py and -minitorch/tensor_functions.py for each of the following, and pass tests -marked as task2_3.

-

In the base tensor.py file you need to implement many of the same functions -from minitorch/scalar.py.

-

Information:

-
    -
  • shape
  • -
  • size
  • -
  • dims
  • -
-

Operators (you will likely find _ensure_tensor useful here):

-
    -
  • add
  • -
  • sub
  • -
  • mul
  • -
  • lt
  • -
  • eq
  • -
  • gt
  • -
  • neg
  • -
  • radd
  • -
  • rmul
  • -
  • all
  • -
  • is_close
  • -
  • sigmoid
  • -
  • relu
  • -
  • log
  • -
  • exp
  • -
-

Should take an optional dim argument: -- sum -- mean

-
    -
  • permute
  • -
  • view
  • -
-

Should set grad to None

-
    -
  • zero_grad_
  • -
-

Next in the tensor_ops.py file you need to implement these three +

First in the tensor_ops.py file you need to implement these three core functions. See the docs for detailed explanations.

@@ -1617,12 +1713,31 @@

value of in_storage assuming out_shape and in_shape broadcast. (in_shape must be smaller than out_shape). -
-
fn: function from float-to-float to apply
-
-
-
Tensor map function.
-
+ + +

Parameters:

+
    +
  • + fn + (Callable[[float], float]) + – +
    +

    function from float-to-float to apply

    +
    +
  • +
+ + +

Returns:

+
    +
  • + Callable[[Storage, Shape, Strides, Storage, Shape, Strides], None] + – +
    +

    Tensor map function.

    +
    +
  • +

@@ -1653,12 +1768,31 @@

value of a_storage and b_storage assuming a_shape and b_shape broadcast to out_shape. -
-
fn: function mapping two floats to float to apply
-
-
-
Tensor zip function.
-
+ + +

Parameters:

+
    +
  • + fn + (Callable[[float, float], float]) + – +
    +

    function mapping two floats to float to apply

    +
    +
  • +
+ + +

Returns:

+
    +
  • + Callable[[Storage, Shape, Strides, Storage, Shape, Strides, Storage, Shape, Strides], None] + – +
    +

    Tensor zip function.

    +
    +
  • +
@@ -1680,18 +1814,37 @@

  • out_shape will be the same as a_shape except with reduce_dim turned to size 1
  • -
    -
    fn: reduction function mapping two floats to float
    -
    -
    -
    Tensor reduce function.
    -
    + + +

    Parameters:

    +
      +
    • + fn + (Callable[[float, float], float]) + – +
      +

      reduction function mapping two floats to float

      +
      +
    • +
    + + +

    Returns:

    +
      +
    • + Callable[[Storage, Shape, Strides, Storage, Shape, Strides, int], None] + – +
      +

      Tensor reduce function.

      +
      +
    • +

    Todo

    -

    Finally implement the forward version of the +

    Next implement the forward version of the tensor_functions.py file.

    • Mul
    • @@ -1707,13 +1860,55 @@

    • Permute
    +
    +

    Todo

    +

    Finally add functions in minitorch/tensor.py for each of the following, and pass tests +marked as task2_3. You need to implement many of the same functions +from minitorch/scalar.py.

    +

    Properties: +- shape +- size +- dims

    +

    Operators (you will likely find _ensure_tensor useful here since the user +may pass in standard python values):

    +
      +
    • add
    • +
    • sub
    • +
    • mul
    • +
    • lt
    • +
    • eq
    • +
    • gt
    • +
    • neg
    • +
    • radd
    • +
    • rmul
    • +
    • all
    • +
    • is_close
    • +
    • sigmoid
    • +
    • relu
    • +
    • log
    • +
    • exp
    • +
    +

    Should take an optional dim argument:

    +
      +
    • sum
    • +
    • +

      mean

      +
    • +
    • +

      permute

      +
    • +
    • view
    • +
    +

    Should set .grad to None +- zero_grad_

    +

    Tasks 2.4: Gradients and Autograd

    Similar to minitorch.Scalar, minitorch.Tensor is a Variable that supports autodifferentiation. In this task, you will implement backward functions for tensor operations.

    Todo

    -

    Complete the background functions in minitorch/tensor_functions.py, and pass +

    Implement the background functions in minitorch/tensor_functions.py, and pass tests marked as task2_4.

    Task 2.5: Training

    diff --git a/sitemap.xml b/sitemap.xml index 65b0051d..28b71c00 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,177 +2,187 @@ https://minitorch.github.io/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/Untitled/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/install/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/mlprimer/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module0/contributing/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module0/functional/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module0/module0/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module0/modules/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module0/property_testing/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module0/visualization/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module1/backpropagate/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module1/chainrule/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module1/derivative/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module1/module1/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module1/scalar/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module2/broadcasting/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module2/module2/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module2/tensor/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module2/tensordata/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module2/tensorops/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module3/cuda/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module3/matrixmult/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module3/module3/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module3/parallel/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module4/convolution/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module4/module4/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module4/pooling/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/module4/softmax/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/slides/module0/module0.0/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/slides/module0/module0.1/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/slides/module0/module0.2/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/slides/module1/module1.0/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/slides/module1/module1.1/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/slides/module1/module1.2/ - 2024-09-20 + 2024-09-30 daily https://minitorch.github.io/slides/module1/module1.3/ - 2024-09-20 + 2024-09-30 + daily + + + https://minitorch.github.io/slides/module2/module2.0/ + 2024-09-30 + daily + + + https://minitorch.github.io/slides/module2/module2.1/ + 2024-09-30 daily \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index bb56ed27..abaeeafd 100644 Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ diff --git a/slides/module2/module2.0.slides.html b/slides/module2/module2.0.slides.html new file mode 100644 index 00000000..07b9e25f --- /dev/null +++ b/slides/module2/module2.0.slides.html @@ -0,0 +1,10520 @@ + + + + + + + +module2.0 slides + + + + + + + + + + + + + + + + + + +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + + + diff --git a/slides/module2/module2.0.slides.pdf b/slides/module2/module2.0.slides.pdf new file mode 100644 index 00000000..8c454f03 Binary files /dev/null and b/slides/module2/module2.0.slides.pdf differ diff --git a/slides/module2/module2.0/index.html b/slides/module2/module2.0/index.html new file mode 100644 index 00000000..3d946147 --- /dev/null +++ b/slides/module2/module2.0/index.html @@ -0,0 +1,1937 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Module2.0 - MiniTorch + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + Skip to content + + +
    +
    + +
    + + + + + + +
    + + + + + + + +
    + +
    + + + + +
    +
    + + + +
    +
    +
    + + + + + + + + + +
    +
    +
    + + + + +
    +
    + + + + + +

    ```python slideshow={"slide_type": "skip"} +from dataclasses import dataclass

    +

    import chalk +from colour import Color +from IPython.display import SVG +from mt_diagrams.autodiff_draw import backprop, draw_boxes +from mt_diagrams.mlprimer_draw import ( + compare, + draw_graph, + draw_nn_graph, + draw_with_hard_points, + graph, + s, + s1_hard, + s2_hard, + show, + show_loss, + split_graph, +) +from mt_diagrams.show_expression import make_graph

    +

    import minitorch +from minitorch import Module, Parameter, Scalar

    +

    chalk.set_svg_draw_height(150) +chalk.set_svg_height(100) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +
    +Module 2.0 - Neural Networks
    +==============================
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Our Goal
    +-----------
    +
    +Compute derivative of Python function with respect to inputs.
    +<!-- #endregion -->
    +
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Example: Function
    +---------------------
    +<!-- #endregion -->
    +
    +
    +```python slideshow={"slide_type": "x"}
    +def expression():
    +    x = Scalar(1.0)
    +    y = Scalar(1.0)
    +    z = -y * sum([x, x, x]) * y + 10.0 * x
    +    h_x_y = z + z
    +    return h_x_y
    +

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +SVG(make_graph(expression(), lr=True)) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Chain Rule: Simple Case
    +-----------
    +
    + $$
    +\begin{eqnarray*}
    +z &=& g(x) \\
    +d &=& f'(z) \\
    +f'_x(g(x)) &=& g'(x) \times d \\
    +\end{eqnarray*}
    + $$
    +<!-- #endregion -->
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +draw_boxes(["$x$", "$z = g(x)$", "$f(g(x))$"], [1, 1])
    +

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +draw_boxes([r"\(d\cdot g'(x)\)", "\(f'(z)\)", "\(1\)"], [1, 1], lr=False) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Chain Rule: Two Arguments
    +-------------------------
    +
    + $$
    +   \begin{eqnarray*}
    +   z &=& g(x, y) \\
    +   d &=& f'(z) \\
    +   f'_x(g(x, y)) &=& g_x'(x, y) \times d \\
    +   f'_y(g(x, y)) &=& g_y'(x, y) \times d
    +   \end{eqnarray*}
    + $$
    +<!-- #endregion -->
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +draw_boxes([("$x$", "$y$"), "$z = g(x, y)$", "$h(x,y)$"], [1, 1])
    +

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +draw_boxes( + [(r"\(d \times g'_x(x, y)\)", r"\(d \times g'_y(x, y)\)"), "\(f'(z)\)", "\(1\)"], + [1, 1], + lr=False, +) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Chain Rule: Repeated Use
    +-------------------------
    + $$z = g(x)$$
    + $$f(z, z)$$
    +<!-- #endregion -->
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +draw_boxes(["$x$", ("$z_1, z_2$"), "$h(x)$"], [1, 1])
    +

    +

    Chain Rule: Repeated Use

    +

    $$ +begin{aligned} +begin{eqnarray*} +d &=& f'{z_1}(z_1, z_2) + f'(z_1, z_2) \ +h'_x(x) &=& d times g'_x(x) \ +end{eqnarray*} +end{aligned} + $$

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +draw_boxes(["\(x\)", ("\(z_1 = g(x)\)", "\(z_2 = g(x)\)"), "\(h(x)\)"], [1, 1]) +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +draw_boxes(
    +    [r"$d \cdot g'_x(x)$", ("$f'_{z_1}(z_1, z_2)$", "$f'_{z_2}(z_1, z_2)$"), "$1$"],
    +    [1, 1],
    +    lr=False,
    +)
    +

    +

    Algorithm: Outer Loop

    +
      +
    1. Call topological sort
    2. +
    3. Create dict of edges and empty \(d\) values.
    4. +
    5. For each edge and \(d\) in topological order:
    6. +
    +

    Algorithm: Inner Loop

    +
      +
    1. If edge goes to Leaf, done
    2. +
    3. Call backward with \(d\) on previous box
    4. +
    5. Loop through all its input edges and add derivative
    6. +
    +

    Example

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"]

    +

    chalk.set_svg_height(200) +backprop(1) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Example
    +-----------
    +<!-- #endregion -->
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +backprop(2)
    +

    +

    Example

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +backprop(3) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Example
    +-----------
    +<!-- #endregion -->
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +backprop(4)
    +

    +

    Example

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +backprop(5) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Example
    +-----------
    +<!-- #endregion -->
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +backprop(6)
    +

    +

    Example

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +backprop(7)

    +

    chalk.set_svg_height(200) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +
    +Quiz
    +------------
    +
    +
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Outline
    +---------
    +* Model Training
    +* Neural Networks
    +* Modern Models
    +
    +<!-- #endregion -->
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Model Training
    +=================
    +
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Reminder: MiniML
    +-----------------
    +
    +* Dataset - Data to fit
    +* Model - Shape of fit
    +* Loss - Goodness of fit
    +<!-- #endregion -->
    +
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Model 1
    +---------------------
    +
    +* Linear Model
    +<!-- #endregion -->
    +
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +from minitorch import Parameter, Module
    +class Linear(Module):
    +    def __init__(self, w1, w2, b):
    +        super().__init__()
    +        self.w1 = Parameter(w1)
    +        self.w2 = Parameter(w2)
    +        self.b = Parameter(b)
    +
    +    def forward(self, x1: float, x2: float) -> float:
    +        return self.w1.value * x1 + self.w2.value * x2 + self.b.value
    +
    +
    +model = Linear(1, 1, -0.9)
    +draw_graph(model)
    +

    +

    Point Loss

    +

    ```python slideshow={"slide_type": "x"} +def point_loss(x): + return minitorch.operators.relu(x)

    +

    def full_loss(m): + l = 0 + for x, y in zip(s.X, s.y): + l += point_loss(-y * m.forward(*x)) + return -l

    +

    graph(point_loss, [], [-2, -0.2, 1]) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Class Goal
    +-----------
    +
    + * Find parameters that minimize loss
    +<!-- #endregion -->
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +
    +chalk.hcat(
    +    [show(Linear(1, 1, -0.6)), show(Linear(1, 1, -0.7)), show(Linear(1, 1, -0.8))], 0.3
    +)
    +

    +

    Parameter Fitting

    +
      +
    1. (Forward) Compute the loss function, \(L(w_1, w_2, b)\)
    2. +
    3. (Backward) See how small changes would change the loss
    4. +
    5. Update to parameters to locally reduce the loss
    6. +
    +

    Update Procedure

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +chalk.set_svg_height(400)

    +

    show_loss(full_loss, Linear(1, 1, 0)) +chalk.set_svg_height(200) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Module for Linear
    +--------------------------------------
    +<!-- #endregion -->
    +```python slideshow={"slide_type": "x"}
    +
    +
    +class LinearModule(minitorch.Module):
    +    def __init__(self):
    +        super().__init__()
    +        # 0.0 is start value for param
    +        self.w1 = Parameter(Scalar(0.0))
    +        self.w2 = Parameter(Scalar(0.0))
    +        self.bias = Parameter(Scalar(0.0))
    +
    +    def forward(self, x1: Scalar, x2: Scalar) -> Scalar:
    +        return x1 * self.w1.value + x2 * self.w2.value + self.bias.value
    +

    +

    Training Loop

    +

    ```python slideshow={"slide_type": "x"}

    +

    def train_step(optim, model, data): + # Step 1 - Forward (Loss function) + x_1, x_2 = Scalar(data[0]), Scalar(data[1]) + loss = model.forward(x_1, x_2).relu() + # Step 2 - Backward (Compute derivative) + loss.backward() + # Step 3 - Update Params + optim.step() +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +More Features: Linear Model
    +------------------------------
    +
    +  $\text{lin}(x; w, b) = x_1 \times w_1 + \ldots + x_n \times w_n + b$
    +
    +
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +More Features: Linear (Code)
    +--------------------------------------
    +<!-- #endregion -->
    +```python slideshow={"slide_type": "x"}
    +
    +
    +class LinearModule(minitorch.Module):
    +    def __init__(self, in_size):
    +        super().__init__()
    +        self.weights = []
    +        self.bias = []
    +        # Need add parameter
    +        for i in range(in_size):
    +            self.weights.append(self.add_parameter(f"weight_{i}", 0.0))
    +

    +

    Neural Networks

    +

    Linear Model Example

    +
      +
    • Parameters
    • +
    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +chalk.set_svg_height(300) +model1 = Linear(1, 1, -1.0) +model2 = Linear(0.5, 1.5, -1.0) +compare(model1, model2) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Harder Datasets
    +----------------
    +<!-- #endregion -->
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +split_graph(s1_hard, s2_hard, show_origin=True)
    +

    +

    Harder Datasets

    +
      +
    • Model may not be good with any parameters.
    • +
    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +model = Linear(1, 1, -0.7) +draw_with_hard_points(model) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Neural Networks
    +------------------
    +* New *model*
    +* Uses repeated splits of data
    +* Loss will not change
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Intuition: Neural Networks
    +--------------------------
    +
    +1. Apply many linear seperators
    +2. Reshape the data space based on results
    +3. Apply a linear model on new space
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Notation: Multiple Parameters
    +--------------------------
    +
    +* Use superscript $w^0$ and $w^1$ to indicate different parameters.
    +* Our final model will have many linears.
    +* These will become Torch sub-modules.
    +<!-- #endregion -->
    +
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Intuition: Split 1
    +--------------------------
    +<!-- #endregion -->
    +
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +yellow = Linear(-1, 0, 0.25)
    +ycolor = Color("#fde699")
    +draw_with_hard_points(yellow, ycolor, Color("white"))
    +

    +

    Reshape: ReLU

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"]

    +

    graph( + minitorch.operators.relu, + [yellow.forward(*pt) for pt in s2_hard], + [yellow.forward(*pt) for pt in s1_hard], + 3, + 0.25, + c=ycolor, +) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Math View
    +---------------
    +
    + $$
    +\begin{eqnarray*}
    +h_ 1 &=& \text{ReLU}(\text{lin}(x; w^0, b^0)) \\
    +\end{eqnarray*}
    + $$
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Intuition: Split 2
    +-------------------------
    +<!-- #endregion -->
    +
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +green = Linear(1, 0, -0.8)
    +gcolor = Color("#d1e9c3")
    +draw_with_hard_points(green, gcolor, Color("white"))
    +

    +

    Math View

    +

    $$ +begin{eqnarray*} +h_ 2 &=& text{ReLU}(text{lin}(x; w^1, b^1)) \ +end{eqnarray*} + $$

    +

    Reshape: ReLU

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"]

    +

    graph( + minitorch.operators.relu, + [green.forward(*pt) for pt in s2_hard], + [green.forward(*pt) for pt in s1_hard], + 3, + 0.25, + c=gcolor, +) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Reshape: ReLU
    +--------------
    +<!-- #endregion -->
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +draw_nn_graph(green, yellow)
    +

    +

    Final Layer

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"]

    +

    @dataclass +class MLP: + lin1: Linear + lin2: Linear + final: Linear

    +
    def forward(self, x1, x2):
    +    x1_1 = minitorch.operators.relu(self.lin1.forward(x1, x2))
    +    x2_1 = minitorch.operators.relu(self.lin2.forward(x1, x2))
    +    return self.final.forward(x1_1, x2_1)
    +
    +

    mlp = MLP(green, yellow, Linear(3, 3, -0.3)) +draw_with_hard_points(mlp) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Math View
    +----------
    + $$
    +\begin{eqnarray*}
    +h_1 &=& \text{ReLU}(x_1 \times w^0_1 + x_2 \times w^0_2 + b^0) \\
    +h_2 &=& \text{ReLU}(x_1 \times w^1_1 + x_2 \times w^1_2 + b^1)\\
    +m(x_1, x_2) &=& h_1 \times w_1 + h_2 \times w_2 + b
    +\end{eqnarray*}
    + $$
    +Parameters:
    + $w_1, w_2, w^0_1, w^0_2, w^1_1, w^1_2, b, b^0, b^1$
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Math View (Alt)
    +---------------
    +
    + $$
    +   \begin{eqnarray*}
    +   h_ 1 &=& \text{ReLU}(\text{lin}(x; w^0, b^0)) \\
    +   h_ 2 &=& \text{ReLU}(\text{lin}(x; w^1, b^1))\\
    +   m(x_1, x_2) &=& \text{lin}(h; w, b)
    +   \end{eqnarray*}
    + $$
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Code View
    +----------
    +
    +Linear
    +<!-- #endregion -->
    +
    +
    +```python slideshow={"slide_type": "x"}
    +class LinearModule(Module):
    +    def __init__(self):
    +        super().__init__()
    +        self.w_1 = Parameter(Scalar(0.0))
    +        self.w_2 = Parameter(Scalar(0.0))
    +        self.b = Parameter(Scalar(0.0))
    +
    +    def forward(self, inputs):
    +        return inputs[0] * self.w_1.value + inputs[1] * self.w_2.value + self.b.value
    +

    +

    Code View

    +

    Model

    +

    ```python slideshow={"slide_type": "x"} +class Network(minitorch.Module): + def init(self): + super().init() + self.unit1 = LinearModule() + self.unit2 = LinearModule() + self.classify = LinearModule()

    +
    def forward(self, x):
    +    h1 = self.unit1.forward(x).relu()
    +    h2 = self.unit2.forward(x).relu()
    +    return self.classify.forward((h1, h2))
    +
    +

    ``` +Training

    +
    +
      +
    • All the parameters in model are leaves
    • +
    • Computing backward on loss fills their derivative +python slideshow={"slide_type": "x"} +model = Network() +parameters = dict(model.named_parameters()) +parameters
    • +
    +

    Derivatives

    +
      +
    • All the parameters in model are leaf Variables
    • +
    +

    ```python slideshow={"slide_type": "x"} +model = Network() +x1, x2 = Scalar(0.5), Scalar(0.5)

    +

    Step 1

    +

    out = model.forward((0.5, 0.5)) +loss = out.relu()

    +

    Step 2

    +

    SVG(make_graph(loss, lr=True)) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Derivatives
    +----------
    +* All the parameters in model are leaf scalars
    +<!-- #endregion -->
    +
    +
    +```python slideshow={"slide_type": "x"}
    +parameters["unit1.w_1"].value.derivative
    +

    +

    Playground

    +

    NN Playground

    +

    QA

    + + + + + + + + + + + + + + + + + +
    +
    + + + + + +
    + + + +
    + + + +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/slides/module2/module2.1/index.html b/slides/module2/module2.1/index.html new file mode 100644 index 00000000..bd66c1d7 --- /dev/null +++ b/slides/module2/module2.1/index.html @@ -0,0 +1,1840 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Module2.1 - MiniTorch + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + Skip to content + + +
    +
    + +
    + + + + + + +
    + + + + + + + +
    + +
    + + + + +
    +
    + + + +
    +
    +
    + + + + + + + + + +
    +
    +
    + + + + +
    +
    + + + + + +

    ```python slideshow={"slide_type": "skip"}

    +

    import chalk +from chalk import * +from chalk import vstrut +from colour import Color +import mt_diagrams.drawing as drawing +from mt_diagrams.mlprimer_draw import ( + Linear, + draw_nn_graph, + draw_with_hard_points, + graph, + s1_hard, + s2_hard, +) +from mt_diagrams.tensor_draw import color, matrix, tensor +from mt_diagrams.autodiff_draw import draw_boxes

    +

    import minitorch

    +

    x = minitorch.tensor([[1, 2, 3, 4, 5] for _ in range(2)]) +set_svg_draw_height(400) +set_svg_height(300) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Module 2.1 - Tensors
    +=============================================
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Intuition: Split 1
    +--------------------------
    +<!-- #endregion -->
    +
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +yellow = Linear(-1, 0, 0.25)
    +ycolor = Color("#fde699")
    +draw_with_hard_points(yellow, ycolor, Color("white"))
    +

    +

    Reshape: ReLU

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"]

    +

    graph( + minitorch.operators.relu, + [yellow.forward(*pt) for pt in s2_hard], + [yellow.forward(*pt) for pt in s1_hard], + 3, + 0.25, + c=ycolor, +) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Math View
    +---------------
    +
    + $$
    +\begin{eqnarray*}
    +h_ 1 &=& \text{ReLU}(\text{lin}(x; w^0, b^0)) \\
    +\end{eqnarray*}
    + $$
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Intuition: Split 2
    +--------------------------
    +<!-- #endregion -->
    +
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +green = Linear(1, 0, -0.8)
    +gcolor = Color("#d1e9c3")
    +draw_with_hard_points(green, gcolor, Color("white"))
    +

    +

    Math View

    +

    $$ +begin{eqnarray*} +h_ 2 &=& text{ReLU}(text{lin}(x; w^1, b^1)) \ +end{eqnarray*} + $$

    +

    Reshape: ReLU

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"]

    +

    graph( + minitorch.operators.relu, + [green.forward(*pt) for pt in s2_hard], + [green.forward(*pt) for pt in s1_hard], + 3, + 0.25, + c=gcolor, +) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Reshape: ReLU
    +--------------
    +<!-- #endregion -->
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +draw_nn_graph(green, yellow)
    +

    +

    Math View (Alt)

    +

    $$ + begin{eqnarray*} + text{lin}(x; w, b) &=& x_1 times w_1 + x_2 times w_2 + b \ + h_ 1 &=& text{ReLU}(text{lin}(x; w^0, b^0)) \ + h_ 2 &=& text{ReLU}(text{lin}(x; w^1, b^1))\ + m(x_1, x_2) &=& text{lin}(h; w, b) + end{eqnarray*} +$$

    +

    Code View

    +

    Model

    +

    ```python slideshow={"slide_type": "x"} +class Network(minitorch.Module): + def init(self): + super().init() + self.unit1 = LinearModule() + self.unit2 = LinearModule() + self.classify = LinearModule()

    +
    def forward(self, x):
    +    # yellow
    +    h1 = self.unit1.forward(x).relu()
    +    # green
    +    h2 = self.unit2.forward(x).relu()
    +    return self.classify.forward((h1, h2))
    +
    +

    ``` +Quiz

    +
    +

    +Outline

    +
    +
      +
    • Tensors
    • +
    • Operations
    • +
    • Strides +Tensors +================ +Motivation
    • +
    +
    +

    $$ + begin{eqnarray*} + text{lin}(x; w, b) &=& x_1 times w_1 + x_2 times w_2 + b \ + h_ 1 &=& text{ReLU}(text{lin}(x; w^0, b^0)) \ + h_ 2 &=& text{ReLU}(text{lin}(x; w^1, b^1))\ + m(x_1, x_2) &=& text{lin}(h; w, b) + end{eqnarray*} +$$ + +Parameters: + \(w_1, w_2, w^0_1, w^0_2, w^1_1, w^1_2, b, b^0, b^1\) + +* This is really messy! +Matrix Form

    +
    +

    $$ + begin{eqnarray*} + mathbf{h} &=& text{ReLU}(mathbf{W}^{(0)} mathbf{x} + mathbf{b}^{(0)}) \ + m(mathbf{x}) &=& mathbf{W} mathbf{h} + mathbf{b} + end{eqnarray*} +$$ + +Parameters: + \(\mathbf{W}, \mathbf{b}, \mathbf{W}^{(0)}, \mathbf{b}^{(0)}\) + +* Matrix - compute a bunch of linears at once (may be more than 2!) +Matrix / Tensors

    +
    +
      +
    • Multi-dimensional arrays
    • +
    • Basis for an mathmatical programming
    • +
    • Similar foundation for many libraries (matlab, numpy, etc) +Terminology
    • +
    +
    +
      +
    • 0-Dimensional Scalar +
    • +
    • Scalar from module-0 +Terminology
    • +
    +
    +
      +
    • 1-Dimensional - Vector +python slideshow={"slide_type": "x"} tags=["hide_inp"] +matrix(5, 1)
    • +
    +

    Terminology

    +
      +
    • 2-Dimensional - Matrix
    • +
    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +matrix(3, 5) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Terminology
    +------------
    +
    +* n-dimensions - Tensor
    +<!-- #endregion -->
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +tensor(0.75, 2, 3, 5)
    +

    +

    Terminology

    +
      +
    • Dims - # dimensions (x.dims)
    • +
    • Shape - # cells per dimension (x.shape)
    • +
    • Size - # cells (x.size)
    • +
    +

    Visual Convention

    +
      +
    • depth
    • +
    • row
    • +
    • columns
    • +
    +

    Example

    +
      +
    • dims: 2
    • +
    • shape: (3, 5)
    • +
    • size : 15
    • +
    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +matrix(3, 5) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Example
    +------------
    +
    +* dims: ?
    +* shape: ?
    +* size : ?
    +<!-- #endregion -->
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +matrix(4, 3)
    +

    +

    Indexing

    +
      +
    • Indexing syntax: x[0, 1, 2]
    • +
    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +tensor( + 0.75, + 2, + 3, + 5, + colormap=lambda i, j, k: drawing.aqua if (i, j, k) == (0, 1, 2) else drawing.white, +) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Implementing Tensors
    +====================
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Why not just use lists?
    +------------------------
    +* Functions to manipulate shape
    +* Mathematical notation
    +* Enables autodiff
    +* Efficient control of memory (Module-3)
    +<!-- #endregion -->
    +
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Tensor Usage
    +-------------
    +
    +Unary
    +<!-- #endregion -->
    +```python slideshow={"slide_type": "x"}
    +
    +new_tensor = x.log()
    +

    +

    Binary (for now, only same shape)

    +

    ```python slideshow={"slide_type": "x"} +new_tensor = x + x +

    <!-- #region slideshow={"slide_type": "x"} -->
    +Reductions
    +<!-- #endregion -->
    +
    +```python slideshow={"slide_type": "x"}
    +new_tensor = x.sum()
    +

    +

    Immutable Operations

    +
      +
    • We never change the tensors itself (mostly)
    • +
    • All operations return a new tensor (just like `Scalar``)
    • +
    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"]

    +

    set_svg_height(200) +draw_boxes(["\(x\)", "\(f(x)\)"], [1]) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +What's bad about tensors?
    +--------------------------
    +* Hard to grow or shrink
    +* Only numerical values
    +* Lose comprehensions / python built-ins
    +* Shapes are easy to mess up
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Next Couple Lectures
    +----------------------
    +* No autodifferentiation for now
    +* Only consider forward tensor operations
    +* Add autodiff afterwards
    +<!-- #endregion -->
    +
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Tensor Internals
    +=================
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +How does this work
    +--------------------
    +
    +* **Storage** :  1-D array of numbers of length `size`
    +
    +* **Strides** : tuple that provides the mapping from user `indexing`
    +  to the `position` in the 1-D `storage`.
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Strides
    +--------
    +
    +* Stride: $(1, 5)$
    +* Shape:  $(5,2)$
    +<!-- #endregion -->
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +set_svg_height(200)
    +d = (
    +    matrix(5, 2, "n", colormap=color(5, 2))
    +    / vstrut(1)
    +    / matrix(1, 10, "s", colormap=lambda i, j: color(5, 2)(j % 5, j // 5))
    +)
    +d.connect(("n", 3, 0), ("s", 0, 3)).connect(("n", 3, 1), ("s", 0, 8))
    +

    +

    Strides

    +
      +
    • Stride: \((1, 2)\)
    • +
    • Shape: \((2, 5)\)
    • +
    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +d = ( + matrix(2, 5, "n", colormap=lambda i, j: color(5, 2)(j, i)) + / vstrut(1) + / matrix(1, 10, "s", colormap=color(1, 10)) +) +d.connect(("n", 0, 3), ("s", 0, 6)).connect(("n", 1, 3), ("s", 0, 7)) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Strides
    +--------
    +* Shape: $(2, 2, 3)$
    +* Stride:  $(6, 3, 1)$
    +<!-- #endregion -->
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +d = (
    +    tensor(0.5, 2, 2, 3, "n", colormap=lambda i, j, k: color(4, 3)(i * 2 + j, k))
    +    / vstrut(1)
    +    / matrix(1, 12, "s", colormap=color(1, 12))
    +)
    +d.connect(("n", 0, 1, 1), ("s", 0, 4)).connect_perim(
    +    ("n", 1, 0, 2), ("s", 0, 2 + 6), unit_x - unit_y, -unit_y
    +)
    +

    +

    Which do we use?

    +
      +
    • Contiguous: Bigger strides left
    • +
    • \((s_1, s_2, s_3)\)
    • +
    • However, need to handle all cases.
    • +
    +

    Strides are useful: Transpose

    +

    Can transpose without copying.

    +

    ```python slideshow={"slide_type": "x"} tags=["hide_inp"] +( + matrix(2, 5, colormap=color(2, 5)) + | chalk.hstrut(1) + | matrix(5, 2, colormap=lambda i, j: color(2, 5)(j, i)) +) +

    <!-- #region slideshow={"slide_type": "slide"} -->
    +Operation 1: Indexing
    +---------------------
    +
    +* $x[i, j, k]$
    +
    +How to find data point?
    +
    +<!-- #endregion -->
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Operation 2: Movement
    +---------------------
    +
    +How do I move to the next in the row? Column?
    +<!-- #endregion -->
    +
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Operation 3: Reverse Indexing
    +------------------------------
    +
    +How do I find the index for data?
    +<!-- #endregion -->
    +
    +
    +<!-- #region slideshow={"slide_type": "slide"} -->
    +Stride Intuition
    +-----------------------
    +
    +* Numerical bases,
    +* Index for position 0? Position 1? Position 2?
    +<!-- #endregion -->
    +
    +```python slideshow={"slide_type": "x"} tags=["hide_inp"]
    +tensor(0.75, 2, 2, 2)
    +

    +

    Stride Intuition

    +
      +
    • +

      Index for position 0? Position 1? Position 2?

      +
    • +
    • +

      \([0, 0, 0], [0, 0, 1], [0, 1, 0]\)

      +
    • +
    +

    python slideshow={"slide_type": "x"} tags=["hide_inp"] +( + tensor(0.5, 2, 2, 2, "n", colormap=lambda i, j, k: color(4, 2)(i * 2 + j, k)) + / vstrut(1) + / matrix(1, 8, "s", colormap=color(1, 8)) +)

    +

    Conversion Formula

    +
      +
    • Divide and mod
    • +
    • $ k = p % s_2 $
    • +
    • $ j = (p // s_2) % s_1 $
    • +
    • ...
    • +
    +

    Implementation

    +
      +
    • TensorData : Manager of strides and storage
    • +
    +

    Module-2

    +

    Overview

    +
      +
    • tensor.py - Tensor Variable
    • +
    • tensor_functions.py - Tensor Functions
    • +
    • tensor_data.py - Storage and Indexing
    • +
    • tensor_ops.py - Low-level tensor operations
    • +
    +

    Q&A

    + + + + + + + + + + + + + + + + + +
    +
    + + + + + +
    + + + +
    + + + +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file