Skip to content

components oss_chat_completion_pipeline

github-actions[bot] edited this page Sep 25, 2024 · 10 revisions

OSS Chat Completion Pipeline

oss_chat_completion_pipeline

Overview

FTaaS Pipeline component for chat completion

Version: 0.0.22

View in Studio: https://ml.azure.com/registries/azureml/components/oss_chat_completion_pipeline/version/0.0.22

Inputs

Compute parameters

Name Description Type Default Optional Enum
instance_type_data_import Instance type to be used for data_import component in case of virtual cluster compute, eg. Singularity.D8_v3. The parameter compute_data_import must be set to 'virtual cluster' for instance_type to be used string Singularity.D8_v3 True
instance_type_finetune Instance type to be used for finetune component in case of virtual cluster compute, eg. Singularity.ND40_v2. The parameter compute_finetune must be set to 'virtual cluster' for instance_type to be used string Singularity.ND40_v2 True
number_of_gpu_to_use_finetuning number of gpus to be used per node for finetuning, should be equal to number of gpu per node in the compute SKU used for finetune integer 1 True

Continual-Finetuning model path

Name Description Type Default Optional Enum
mlflow_model_path MLflow model asset path. Special characters like \ and ' are invalid in the parameter value. mlflow_model True
pytorch_model_path Pytorch model asset path. Special characters like \ and ' are invalid in the parameter value. custom_model True

Dataset path Parameters

Name Description Type Default Optional Enum
train_file_path Path to the registered training data asset. The supported data formats are jsonl, json, csv, tsv and parquet. Special characters like \ and ' are invalid in the parameter value. uri_file False
validation_file_path Path to the registered validation data asset. The supported data formats are jsonl, json, csv, tsv and parquet. Special characters like \ and ' are invalid in the parameter value. uri_file True

Finetuning parameters Training parameters

Name Description Type Default Optional Enum
max_seq_length Default is 8192. integer 8192 True
num_train_epochs training epochs integer 1 True
per_device_train_batch_size Train batch size integer 1 True
learning_rate Start learning rate. number 0.0003 True

Validation parameters

Name Description Type Default Optional Enum
system_properties Validation parameters propagated from pipeline. string True

Compute parameters

Name Description Type Default Optional Enum
compute_data_import compute to be used for model_import eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used string virtual cluster True
compute_finetune compute to be used for finetune eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'. Special characters like \ and ' are invalid in the parameter value. If compute cluster name is provided, instance_type field will be ignored and the respective cluster will be used string virtual cluster True

Model parameters

Name Description Type Default Optional Enum
model_asset_id Asset id of model string False

Model registration

Name Description Type Default Optional Enum
registered_model_name Name of the registered model string True

Outputs

Name Description Type
output_model Output dir to save the finetuned lora weights uri_folder
Clone this wiki locally