Skip to content

Commit

Permalink
Merge pull request #49 from PeerHerholz/updates_2024
Browse files Browse the repository at this point in the history
fix n < p description
  • Loading branch information
PeerHerholz authored Oct 25, 2024
2 parents 195bd71 + 801905b commit d4f6d47
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion content/haxby_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
"```{admonition} Bonus question: ever heard of the \"small-n-high-p\" (p >> n) problem?\n",
":class: tip, dropdown\n",
"\n",
"\"Classical\" `machine learning`/`decoding` models and the underlying algorithms operate on the assumption that are more `predictors` or `features` than there are `sample`. In fact many more. Why is that?\n",
"\"Classical\" `machine learning`/`decoding` models and the underlying algorithms operate on the assumption that are more `samples` than there are `predictors` or `features` . In fact many more. Why is that?\n",
"Consider a high-dimensional `space` whose `dimensions` are defined by the number of `features` (e.g. `10 features` would result in a space with `10 dimensions`. The resulting `volume` of this `space` is the amount of `samples` that could be drawn from the `domain` and the number of `samples` entail the `samples` you need to address your `learning problem`, ie `decoding` outcome. That is why folks say: \"get more data\", `machine learning` is `data`-hungry: our `sample` needs to be as representative of the high-dimensional domain as possible. Thus, as the number of `features` increases, so should the number of `samples` so to capture enough of the `space` for the `decoding model` at hand.\n",
"\n",
"This referred to as the [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) and poses as a major problem in many fields that aim to utilize `machine learning`/`decoding` on unsuitable data. Why is that?\n",
Expand Down

0 comments on commit d4f6d47

Please sign in to comment.