Merge pull request #105 from kk-Syuer/main

Main
iacopomasi · May 2, 2024 · 4e91bc1 · 4e91bc1
2 parents c4b0958 + cc3f47b
commit 4e91bc1
Showing 1 changed file with 14 additions and 14 deletions.
diff --git a/AA2324/course/09_decision_trees/09_decision_trees.ipynb b/AA2324/course/09_decision_trees/09_decision_trees.ipynb
@@ -204,7 +204,7 @@
    "source": [
     "# This lecture material is taken from\n",
     "- Information Theory part  - (Entropy etc) is taken from __Chapter 1 - Bishop__.\n",
-    "- Decision Trees are very briefly covered in __Bishop at page 663__.\n",
+    "- Decision Trees are very briefly covered in __Bishop on page 663__.\n",
     "- [Cimi Book - Chapter 01](http://ciml.info/dl/v0_99/ciml-v0_99-ch01.pdf)\n",
     "- [CSC411: Introduction to Machine Learning](https://www.cs.toronto.edu/~urtasun/courses/CSC411_Fall16/06_trees_handout.pdf)\n",
     "- [CSC411: Introduction to Machine Learning - Tutorial](https://www.cs.toronto.edu/~urtasun/courses/CSC411_Fall16/tutorial3.pdf)\n",
@@ -299,7 +299,7 @@
     }
    },
    "source": [
-    "# What is the the training error of $k$-NN? 🤔"
+    "# What is the training error of $k$-NN? 🤔"
    ]
   },
   {
@@ -310,7 +310,7 @@
     }
    },
    "source": [
-    "- In $k$-NN there there is no explicit cost/loss, how can we measure the training error? \n"
+    "- In $k$-NN there is no explicit cost/loss, how can we measure the training error? \n"
    ]
   },
   {
@@ -630,7 +630,7 @@
    "source": [
     "# When $k=1$ we perfectly classify the training set! 100% accuracy!\n",
     "\n",
-    "It is easy to show that this follow by definition **(each point is neighbour to itself).**\n",
+    "It is easy to show that this follows by definition **(each point is neighbour to itself).**\n",
     "\n",
     "but will this hold for $K \\gt 1$?"
    ]
@@ -643,7 +643,7 @@
     }
    },
    "source": [
-    "# We record the training accuracy in function of increasing $k$"
+    "# We record the training accuracy in the function of increasing $k$"
    ]
   },
   {
@@ -912,7 +912,7 @@
    "source": [
     "# Remember to estimate scaling on the training set only!\n",
     "\n",
-    "- In theory this is part below is an error.\n",
+    "- In theory this part below is an error.\n",
     "- I took the code from sklearn documentation but in practice you have to estimate the scale parameters **ONLY** in the training set.\n",
     "- Then applying it directly to the test set. \n",
     "- If you work in inductive settings, you cannot do it jointly like the code above.\n",
@@ -1268,7 +1268,7 @@
     }
    },
    "source": [
-    "# Plot Miclassification function for binary case\n",
+    "# Plot Misclassification function for binary case\n",
     "\n",
     "```python\n",
     "pk = np.arange(0, 1.1, 0.1)\n",
@@ -1772,7 +1772,7 @@
    "source": [
     "# This lecture material is taken from\n",
     "- Information Theory part  - (Entropy etc) is taken from __Chapter 1 - Bishop__.\n",
-    "- Decision Trees are very briefly covered in __Bishop at page 663__.\n",
+    "- Decision Trees are very briefly covered in __Bishop on page 663__.\n",
     "- [Cimi Book - Chapter 01](http://ciml.info/dl/v0_99/ciml-v0_99-ch01.pdf)\n",
     "- [CSC411: Introduction to Machine Learning](https://www.cs.toronto.edu/~urtasun/courses/CSC411_Fall16/06_trees_handout.pdf)\n",
     "- [CSC411: Introduction to Machine Learning - Tutorial](https://www.cs.toronto.edu/~urtasun/courses/CSC411_Fall16/tutorial3.pdf)\n",
@@ -2010,8 +2010,8 @@
     "\n",
     "    \n",
     "- **Termination**:\n",
-    "    1. if no examples – return **majority** from parent (Voting such as in k-NN).\n",
-    "    2. else if all examples in same class – return the class **(pure node)**.\n",
+    "    1. if no examples – return **majority** from the parent (Voting such as in k-NN).\n",
+    "    2. else if all examples are in the same class – return the class **(pure node)**.\n",
     "    3. else we are not in a termination node (keep recursing)\n",
     "    4. **[Optional]** we could also terminate for some **regularization** parameters"
    ]
@@ -3276,7 +3276,7 @@
     "\n",
     "$$G(Q, \\theta) = \\frac{N^{L}}{N} H(Q^{L}(\\theta))   + \\frac{N^{R}}{N} H(Q^{R}(\\theta))\n",
     "$$\n",
-    "Select the parameters that minimises the impurity\n",
+    "Select the parameters that minimize the impurity\n",
     "\n",
     "$$\n",
     "\\boldsymbol{\\theta}^* = \\operatorname{argmin}_\\boldsymbol{\\theta}  G(Q_m, \\theta)\n",
@@ -3483,7 +3483,7 @@
    "source": [
     "# Quick Remedies\n",
     "\n",
-    "However, even if we have these **on-hand weapon to avoid overfitting**, it is **still hard to train a single decision tree to perform well generally**. Thus, we will use another useful training technique called **ensemble methods or bagging**, which leads to random-forest."
+    "However, even if we have these **on-hand weapons to avoid overfitting**, it is **still hard to train a single decision tree to perform well generally**. Thus, we will use another useful training technique called **ensemble methods or bagging**, which leads to random-forest."
    ]
   },
   {
@@ -3725,7 +3725,7 @@
     "- $K=\\sqrt{D}$ so it is a fixed hyper-param.\n",
     "- You have to tune $M$ but in general it needs to be large.\n",
     "- DT are **very interpretable**; DT/RF could be used for **feature selection**\n",
-    "    - To answer the question: __which feature contribute more to the label?__\n",
+    "    - To answer the question: __which feature contributes more to the label?__\n",
     "- You can evaluate them **without a validation split** (Out of Bag Generalization - OOB)"
    ]
   },
@@ -4052,7 +4052,7 @@
     "\n",
     "[Link to the Microsoft paper](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/BodyPartRecognition.pdf)\n",
     "\n",
-    "_To keep the training times down we employ a distributed implementation. Training 3 trees to depth 20 from 1 million images takes about a day on a 1000 core cluster._"
+    "_To keep the training times down we employ a distributed implementation. Training 3 trees to depth 20 from 1 million images takes about a day on a 1000-core cluster._"
    ]
   },
   {