Default Tokenization for all-MiniLM-L6-v2 (L6) #207

spehl-max · 2023-12-02T23:08:28Z

spehl-max
Dec 2, 2023

Looking through the docs, it seems, by default, NeMo uses the L6 tokenizer to embed both the text in the colang scripts and the text in the markdown files in the knowledgebase folder. It looks like the maximum input to the L6 model is 256 words. I had two questions about this process.

Batching Long Text Inputs: How does NeMo batch text sent to L6 when the developer's text is more than 256 words long?
Timing of Text Processing: Does NeMo send all the text to L6 at the start of a conversation or with every new prompt?

Any insights or pointers to relevant documentation sections would be greatly appreciated!

Answered by drazvan

Dec 4, 2023

Good questions @spehl-max!

When computing the embedding, we send the text as is, so it will be truncated automatically. This could indeed be a problem, thanks for pointing this. The user and bot messages defined in a Colang config are not typically that long and flows are indexed line by line (https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/nemoguardrails/actions/llm/generation.py#L179). So this is very unlikely to happen. But for the input coming from the user, this could be the case.
The embeddings are computed when the configuration is initialized (https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/nemoguardrails/actions/llm/generation.py#L105). In the prompt, typically …

View full answer

drazvan · 2023-12-04T22:26:33Z

drazvan
Dec 4, 2023
Maintainer

Good questions @spehl-max!

When computing the embedding, we send the text as is, so it will be truncated automatically. This could indeed be a problem, thanks for pointing this. The user and bot messages defined in a Colang config are not typically that long and flows are indexed line by line (https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/nemoguardrails/actions/llm/generation.py#L179). So this is very unlikely to happen. But for the input coming from the user, this could be the case.
The embeddings are computed when the configuration is initialized (https://github.com/NVIDIA/NeMo-Guardrails/blob/develop/nemoguardrails/actions/llm/generation.py#L105). In the prompt, typically only the top 5 examples are included, not all of them.

1 reply

spehl-max Dec 4, 2023
Author

I understand now. Thank you!

From my perspective, where it might be an issue is in the split_markdown_in_topic_chunks() call. If some relevant piece of the knowledge base were after the 256-word cut-off, the LLM would not have access to it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default Tokenization for all-MiniLM-L6-v2 (L6) #207

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Default Tokenization for all-MiniLM-L6-v2 (L6) #207

spehl-max Dec 2, 2023

Replies: 1 comment · 1 reply

drazvan Dec 4, 2023 Maintainer

spehl-max Dec 4, 2023 Author

spehl-max
Dec 2, 2023

Replies: 1 comment 1 reply

drazvan
Dec 4, 2023
Maintainer

spehl-max Dec 4, 2023
Author