You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Feedback: Clarification on Token-to-Embedding Conversion
Hi Hugging Face Team,
I recently went through the AI agents course, and I wanted to suggest a clarification in the section explaining tokenization and embeddings.
The current passage:
"Once the input text is tokenized, the model computes a representation of the sequence that captures information about the meaning and the position of each token in the input sequence. This representation goes into the model, which outputs scores that rank the likelihood of each token in its vocabulary as being the next one in the sequence."
This wording may confuse learners because it sounds like the representation is computed outside the model first and then fed back into the model. But, in reality, the model processes the tokens inside itself, including the conversion into embeddings, and then computes the representations of the sequence.
To make this clearer, I suggest modifying the explanation to emphasize that the embedding layer computes the token representations within the model itself, and then those representations are further processed by the model’s layers.
Here’s a revised version:
"Once the input text is tokenized, it is passed through the model's embedding layer, which computes a representation for each token, capturing both its meaning and position within the sequence. These embeddings are then passed through the model's deeper layers, which capture complex relationships between tokens, and the model outputs scores that rank the likelihood of each token in its vocabulary as being the next one in the sequence."
I hope this adjustment will make it clearer that the token representations are computed within the model, not outside of it, and avoid any confusion about the flow of data.
Thank you for the great course!
Best regards,
Venkat Manish
The text was updated successfully, but these errors were encountered:
Feedback: Clarification on Token-to-Embedding Conversion
Hi Hugging Face Team,
I recently went through the AI agents course, and I wanted to suggest a clarification in the section explaining tokenization and embeddings.
The current passage:
"Once the input text is tokenized, the model computes a representation of the sequence that captures information about the meaning and the position of each token in the input sequence. This representation goes into the model, which outputs scores that rank the likelihood of each token in its vocabulary as being the next one in the sequence."
This wording may confuse learners because it sounds like the representation is computed outside the model first and then fed back into the model. But, in reality, the model processes the tokens inside itself, including the conversion into embeddings, and then computes the representations of the sequence.
To make this clearer, I suggest modifying the explanation to emphasize that the embedding layer computes the token representations within the model itself, and then those representations are further processed by the model’s layers.
Here’s a revised version:
"Once the input text is tokenized, it is passed through the model's embedding layer, which computes a representation for each token, capturing both its meaning and position within the sequence. These embeddings are then passed through the model's deeper layers, which capture complex relationships between tokens, and the model outputs scores that rank the likelihood of each token in its vocabulary as being the next one in the sequence."
I hope this adjustment will make it clearer that the token representations are computed within the model, not outside of it, and avoid any confusion about the flow of data.
Thank you for the great course!
Best regards,
Venkat Manish
The text was updated successfully, but these errors were encountered: