components llm_rag_qa_data_generation

LLM - Generate QnA Test Data

llm_rag_qa_data_generation

Overview

Generates a test dataset of questions and answers based on the input documents.

A chunk of text is read from each input document and sent to the specified LLM with a prompt to create a question and answer based on that text. These question, answer, and context sets are saved as either a csv or jsonl file. Short-answer, long-answer, summary, and boolean-based QAs are generated.

Version: 0.0.77

Inputs

Name	Description	Type	Default	Optional
openai_api_version	Version of OpenAI API to use for communicating with LLM.	string	2023-03-15-preview
openai_api_type	Type of OpenAI endpoint hosting model. Defaults to azure for AOAI endpoints.	string	azure
input_data	Uri folder of documents containing chunks of data.	uri_folder
llm_config	JSON Configuration for what model to use for question generation. Must contain following keys: 'type' (value must be 'azure_open_ai' or 'azure'), 'model_name' (name of model to use for summary), 'deployment_name' (name of deployment for model), 'temperature' (randomness in response, float from 0 to 1), 'max_tokens' (number of tokens for response).	string	{"type": "azure_open_ai", "model_name": "gpt-35-turbo", "deployment_name": "gpt-35-turbo", "temperature": 0, "max_tokens": 2000}
llm_connection	Workspace connection resource ID for the completion model.	string		False
dataset_size	Number of questions to generate	integer	100
chunk_batch_size	Number of chunks to be read and sent to LLM in parallel	integer	5
output_format	File type to save the dataset as. Options are 'csv' and 'json'	string	json
deployment_validation	Uri file containing information on if the Azure OpenAI deployments, if used, have been validated	uri_file		True

Outputs

Name	Description	Type
output_data	csv or jsonl file containing the question, answer, context, and metadata sets	uri_folder

Environment

azureml:llm-rag-embeddings@latest

Wiki menu

Home
Reference Documentation
- Components
- Data
- Environments
- Models
Contributing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly