Skip to content

Commit

Permalink
remove todos
Browse files Browse the repository at this point in the history
  • Loading branch information
polina-tsvilodub committed Jul 5, 2024
1 parent 8ccb43f commit 8579a27
Showing 1 changed file with 4 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,8 @@
"**Author**: Polina Tsvilodub\n",
"\n",
"One criticism often raised in context of LLMs is their blackbox nature, i.e., the inscrutability of the mechanics of the models and how or why they arrive at predictions, given the input.\n",
"We have seen some methods that help to map out how models behave on certain tasks (sheets XXX), what aspects of the input critically affect the output and which information the models process (sheet XXX).\n",
"In this sheet, we will look at methods for identifying the *computational mechanisms* that lead to the outputs, i.e., at mechanistic interpretability. It can be seen as trying to reverse-engineer the computational algorithms the mdoel has learned during training and that are active during certain tasks.\n",
"\n",
"TODO: link sheets."
"We have seen some methods that help to map out how models behave on certain tasks ([sheet 7.1](https://cogsciprag.github.io/Understanding-LLMs-course/tutorials/07a-behavioral-assessment.html)), what aspects of the input critically affect the output and which information the models process ([sheet 6.1](https://cogsciprag.github.io/Understanding-LLMs-course/tutorials/06a-attribution.html)).\n",
"In this sheet, we will look at methods for identifying the *computational mechanisms* that lead to the outputs, i.e., at mechanistic interpretability. It can be seen as trying to reverse-engineer the computational algorithms the mdoel has learned during training and that are active during certain tasks."
]
},
{
Expand All @@ -22,7 +20,7 @@
"## Early decoding\n",
"\n",
"First, we will look at early decoding, i.e., at applying the \"unembedding\" layer (projecting hidden representations into vocabulary space by applying a linear and a softmax layer) to representations in layers throughout the model (not just the last layer). For this, we will need to output results of the calculations in the layers. There are various parameters that can be passed to \n",
"Furthermore, it is helpful to be able to access different weights of the model, which we did in sheet XXX.\n",
"Furthermore, it is helpful to be able to access different weights of the model, which we did in [sheet 3.1](https://cogsciprag.github.io/Understanding-LLMs-course/tutorials/03a-tokenization-transformers.html).\n",
"\n",
"The code is largely based on [this](https://github.com/jmerullo/lm_vector_arithmetic) repository which accompanies the paper by [Merullo et al. (2024)](https://arxiv.org/pdf/2305.16130)."
]
Expand Down Expand Up @@ -395,8 +393,7 @@
"\n",
"> <strong><span style=&ldquo;color:#D83D2B;&rdquo;>Exercise 8.1.3: Activation patching</span></strong>\n",
">\n",
"> 1. (For yourself) Read through the code above and make sure that you understand what it does and why.\n",
"> 2. TODO: The functionality is wrapped under the endpoints of the library, but if you are curious, you can find details XXX. \n"
"> 1. (For yourself) Read through the code above and make sure that you understand what it does and why."
]
},
{
Expand Down

0 comments on commit 8579a27

Please sign in to comment.