components nlp_textclassification_multilabel

PipelineComponent for AutoML NLP Multilabel TextClassification

Pipeline component for AutoML NLP Multilabel Text classification

Version: 0.0.4

View in Studio: https://ml.azure.com/registries/azureml/components/nlp_textclassification_multilabel/version/0.0.4

Name	Description	Type	Default	Optional
compute_model_import	compute to be used for model_selector eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'	string		False
compute_preprocess	compute to be used for preprocess eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'	string		False
compute_finetune	compute to be used for finetune eg. provide 'FT-Cluster' if your compute is named 'FT-Cluster'	string		False
num_nodes_finetune	number of nodes to be used for finetuning (used for distributed training)	integer	1	True
process_count_per_instance_finetune	number of gpus to be used per node for finetuning, should be equal to number of gpu per node in the compute SKU used for finetune	integer	1	True
model_name	model id used to load model checkpoint.	string	bert-base-uncased

Data PreProcess parameters (See docs to learn more)

Name	Description	Type	Default	Optional	Enum
label_column_name	label key name	string		False

Dataset parameters

Name	Description	Type	Default	Optional	Enum
training_data	Enter the train file path	uri_file		False
validation_data	Enter the validation file path	uri_file		False

Training parameters

Name	Description	Type	Default	Optional	Enum
training_batch_size	Train batch size	integer	32	True
validation_batch_size	Validation batch size	integer	32	True
number_of_epochs	Number of epochs to train	integer	3	True
gradient_accumulation_steps	Gradient accumulation steps	integer	1	True
learning_rate	Start learning rate. Defaults to linear scheduler.	number	5e-05	True
warmup_steps	Number of steps used for a linear warmup from 0 to learning_rate	integer	0	True
weight_decay	The weight decay to apply (if not zero) to all layers except all bias and LayerNorm weights in AdamW optimizer	number	0.0	True
learning_rate_scheduler	The scheduler type to use	string	linear	True	['linear', 'cosine', 'cosine_with_restarts', 'polynomial', 'constant', 'constant_with_warmup']

AutoML NLP parameters

Name	Description	Type	Default	Optional	Enum
enable_long_range_text	label key name	boolean	True	True
precision	Apply mixed precision training. This can reduce memory footprint by performing operations in half-precision.	string	16	True	['32', '16']

MLFlow Parameters

Name	Description	Type	Default	Optional	Enum
enable_full_determinism	Ensure reproducible behavior during distributed training	string	false	True	['true', 'false']
evaluation_strategy	The evaluation strategy to adopt during training	string	epoch	True	['epoch', 'steps']
evaluation_steps_interval	The evaluation steps in fraction of an epoch steps to adopt during training. Overwrites evaluation_steps if not 0.	number	0.0	True
evaluation_steps	Number of update steps between two evals if evaluation_strategy='steps'	integer	500	True
logging_strategy	The logging strategy to adopt during training.	string	steps	True	['epoch', 'steps']
logging_steps	Number of update steps between two logs if logging_strategy='steps'	integer	500	True
primary_metric	Specify the metric to use to compare two different models	string	accuracy	True	['loss', 'f1_macro', 'mcc', 'accuracy', 'precision_macro', 'recall_macro']

Deepspeed Parameters

Name	Description	Type	Default	Optional	Enum
apply_deepspeed	If set to true, will enable deepspeed for training	string	true	True	['true', 'false']

ORT Parameters

Name	Description	Type	Default	Optional	Enum
apply_ort	If set to true, will use the ONNXRunTime training	string	true	True	['true', 'false']
deepspeed_config	Deepspeed config to be used for finetuning	uri_file		True

Name	Description	Type
pytorch_model_folder_finetune	Output dir to save the finetune model and other metadata	uri_folder
mlflow_model_folder_finetune	Output dir to save the finetune model as mlflow model	mlflow_model