diff --git a/docs/_data/navigation.yml b/docs/_data/navigation.yml index 953d038e..d45ca84f 100644 --- a/docs/_data/navigation.yml +++ b/docs/_data/navigation.yml @@ -39,6 +39,8 @@ docs: url: /docs/upgrading/ - title: "General Usage" url: /docs/usage/ + - title: "Tips and Tricks" + url: /docs/tips-and-tricks/ - title: Text Classification children: - title: "Classification Specifics" diff --git a/docs/_docs/03.5-tips-and-tricks.md b/docs/_docs/03.5-tips-and-tricks.md new file mode 100644 index 00000000..232ca9f9 --- /dev/null +++ b/docs/_docs/03.5-tips-and-tricks.md @@ -0,0 +1,485 @@ +--- +title: "Tips and Tricks" +permalink: /docs/tips-and-tricks/ +excerpt: "Various tips and tricks applicable to most tasks." +last_modified_at: 2020/08/10 00:42:16 +toc: true +--- + +This section contains various tips and tricks applicable to most tasks in the library. + +## Using early stopping + +Early stopping is a technique used to prevent model overfitting. In a nutshell, the idea is to periodically evaluate the performance of a model against a test dataset and terminate the training once the model stops improving on the test data. + +The exact conditions for early stopping can be adjusted as needed using a model's configuration options. + +**Note:** Refer the configuration options table for more details. (`early_stopping_consider_epochs`, `early_stopping_delta`, `early_stopping_metric`, `early_stopping_metric_minimize`, `early_stopping_patience`) +{: .notice--info} + +You must set `use_early_stopping` to `True` in order to use early stopping. + +```python +from simpletransformers.classification import ClassificationModel, ClassificationArgs + + +model_args = ClassificationArgs() +model_args.use_early_stopping = True +model_args.early_stopping_delta = 0.01 +model_args.early_stopping_metric = "mcc" +model_args.early_stopping_metric_minimize = False +model_args.early_stopping_patience = 5 +model_args.evaluate_during_training_steps = 1000 + +model = ClassficationModel("bert", "bert-base-cased", args=model_args) +``` + +With this configuration, the training will terminate if the `mcc` score of the model on the test data does not improve upon the best `mcc` score by at least `0.01` for 5 consecutive evaluations. An evaluation will occur once for every `1000` training steps. + +**Pro tip:** You can use the evaluation during training functionality without invoking early stopping by setting `evaluate_during_training` to `True` while keeping `use_early_stopping` as `False`. +{: .notice--success} + + +## Additional Evaluation Metrics + +Task-specific Simple Transformers models each have their own default metrics that will be calculated when a model is evaluated +on a dataset. The default metrics have been chosen according to the task, usually by looking at the metrics used in standard benchmarks for that task. + +However, it is likely that you will wish to calculate your own metrics depending on your particular use case. To facilitate this, all `eval_model()` and `train_model()` methods in Simple Transformers accepts keyword-arguments consisting of the name of the metric (str), and the metric function itself. The metric function should accept two inputs, the true labels and the model predictions (sklearn format). + +```python +from simpletransformers.classification import ClassificationModel +import sklearn + + +model = ClassficationModel("bert", "bert-base-cased") + +model.train_model(train_df, acc=sklearn.metrics.accuracy_score) + +model.eval_model(eval_df, acc=sklearn.metrics.accuracy_score) +``` + +**Pro tip:** You can combine the additional evaluation metrics functionality with early stopping by setting the name of your metrics function as the `early_stopping_metric`. +{: .notice--success} + + +## Simple-Viewer (Visualizing Model Predictions with Streamlit) + +Simple Viewer is a web-app built with the [Streamlit](https://www.streamlit.io/) framework which can be used to quickly try out trained models. + +To start Simple Viewer, run the command `simple-viewer`. + +When Simple Viewer is started, it will look for Simple Transfomers models in the current directory and any subdirectories. All detected models can be found in the `Choose Model` dropdown. Alternatively, you can load a model by specifying the Simple Transformers task, model type, and model name (model type and model name follows the usual Simple Transformers conventions). The model name may be the path to a local model, or it may be the model name for a model from the Hugging Face model [hub](https://huggingface.co/models). + +The following Simple Transformers tasks are currently supported: + +- Classification +- Multi-Label Classification +- Named Entity Recognition +- Question Answering + +## Hyperparameter Optimization + +Machine learning models can be very sensitive to the hyperparameters used to train them. While large models like Transformers can perform well across a relatively wider hyperparameter range, they can also break completely under certain conditions (like training with large learning rates for many iterations). + +**Hint:** We can define two kinds of parameters used to train Transformer models. The first is the learned parameters (like the model weights) and the second is hyperparameters. To give a high-level description of the two kinds of parameters, the hyperparameters (learning rate, batch sizes, etc.) are used to control the process of *learning* learned parameters. +{: .notice--success} + +Choosing a good set of hyperparameter values plays a huge role in developing a state-of-the-art model. Because of this, Simple Transformers has native support for the excellent [W&B Sweeps](https://docs.wandb.com/sweeps) feature for automated hyperparameter optimization. + +How to perform hyperparameter optimization with Simple Transformers and W&B Sweeps (Adapted from W&B [docs](https://docs.wandb.com/sweeps)): + +### 1. Setup the sweep + +The sweep can be configured through a Python dictionary (`sweep_config`). The dictionary contains at least 3 keys; + +1. `method` -- Specifies the search strategy + + | `method` | Meaning | + | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | + | grid | Grid search iterates over all possible combinations of parameter values. | + | random | Random search chooses random sets of values. | + | bayes | Bayesian Optimization uses a gaussian process to model the function and then chooses parameters to optimize probability of improvement. This strategy requires a metric key to be specified. | + +2. `metric` -- Specifies the metric to be optimized + + *This should be a metric that is logged to W&B by the training script* + + The `metric` key of the `sweep_config` points to another Python dictionary containing the `name`, `goal`, and (optionally) `target`. + + | sub-key | Meaning | + | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | + | name | Name of the metric to optimize | + | goal | `"minimize"` or `"maximize"` (Default is `"minimize"`) | + | target | Value that you'd like to achieve for the metric you're optimizing. When any run in the sweep achieves that target value, the sweep's state will be set to "Finished." This means all agents with active runs will finish those jobs, but no new runs will be launched in the sweep. | + +3. `parameters` -- Specifies the hyperparameters and their values to explore + + The `parameters` key of the `sweep_config` points to another Python dictionary which contains all the hyperparameters to be optimized and their possible values. Generally, these will be any combination of the `model_args` for the particular Simple Transformers model. + + W&B offers a variety of ways to define the possible values for each parameter, all of which can be found in the [W&B docs](https://docs.wandb.com/sweeps/configuration#parameters). The possible values are also represented using a Python dictionary. Two common methods are given below. + + 1. Discrete values + + A dictionary with the key `values` pointing to a Python list of discrete values. + + 2. Range of values + + A dictionary with the two keys `min` and `max` which specifies the minimum and maximum values of the range. *The range is continuous if `min` and `max` are floats and discrete if `min` and `max` are ints.* + +Example `sweep_config`: + +```python +sweep_config = { + "method": "bayes", # grid, random + "metric": {"name": "train_loss", "goal": "minimize"}, + "parameters": { + "num_train_epochs": {"values": [2, 3, 5]}, + "learning_rate": {"min": 5e-5, "max": 4e-4}, + }, +} +``` + +### 2. Initialize the sweep + +Initialize a W&B sweep with the config defined earlier. + +```python +sweep_id = wandb.sweep(sweep_config, project="Simple Sweep") +``` + +### 3. Prepare the data and default model configuration + +In order to run our sweep, we must get our data ready. This is identical to how you would normally set up datasets for training a Simple Transformers model. + +For example; + +```python +# Preparing train data +train_data = [ + ["Aragorn was the heir of Isildur", "true"], + ["Frodo was the heir of Isildur", "false"], +] +train_df = pd.DataFrame(train_data) +train_df.columns = ["text", "labels"] + +# Preparing eval data +eval_data = [ + ["Theoden was the king of Rohan", "true"], + ["Merry was the king of Rohan", "false"], +] +eval_df = pd.DataFrame(eval_data) +eval_df.columns = ["text", "labels"] +``` + +Next, we can set up the default configuration for the Simple Transformers model. This would include any `args` that are not being optimized through the sweep. + +**Hint:** As a rule of thumb, it might be a good idea to set all of `reprocess_input_data`, `overwrite_output_dir`, and `no_save` to `True` when running sweeps. +{: .notice--success} + +```python +model_args = ClassificationArgs() +model_args.reprocess_input_data = True +model_args.overwrite_output_dir = True +model_args.evaluate_during_training = True +model_args.manual_seed = 4 +model_args.use_multiprocessing = True +model_args.train_batch_size = 16 +model_args.eval_batch_size = 8 +model_args.labels_list = ["true", "false"] +model_args.wandb_project = "Simple Sweep" +``` + +### 4. Set up the training function + +W&B will call this function to run the training for a particular sweep run. This function must perform 3 critical tasks. + +1. Initialize the `wandb` run +2. Initialize a Simple Transformers model and pass in `sweep_config=wandb.config` as a `kwarg`. +3. Run the training for the Simple Transformers model. + +*`wandb.config` contains the hyperparameter values for the current sweeps run. Simple Transformers will update the model `args` accordingly.* + +An example training function is shown below. + +```python +def train(): + # Initialize a new wandb run + wandb.init() + + # Create a TransformerModel + model = ClassificationModel( + "roberta", + "roberta-base", + use_cuda=True, + args=model_args, + sweep_config=wandb.config, + ) + + # Train the model + model.train_model(train_df, eval_df=eval_df) + + # Evaluate the model + model.eval_model(eval_df) + + # Sync wandb + wandb.join() + +``` + +In addition to the 3 tasks outlined earlier, the function also performs an evaluation and manually syncs the W&B run. + +**Hint:** This function can be reused across any Simple Transformers task by simply replacing `ClassificationModel` with the appropriate model class. +{: .notice--success} + + +### 5. Run the sweeps + +The following line will execute the sweeps. + +```python +wandb.agent(sweep_id, train) +``` + +### 6. Putting it all together + +```python +import logging + +import pandas as pd +import sklearn + +import wandb +from simpletransformers.classification import ( + ClassificationArgs, + ClassificationModel, +) + +sweep_config = { + "method": "bayes", # grid, random + "metric": {"name": "train_loss", "goal": "minimize"}, + "parameters": { + "num_train_epochs": {"values": [2, 3, 5]}, + "learning_rate": {"min": 5e-5, "max": 4e-4}, + }, +} + +sweep_id = wandb.sweep(sweep_config, project="Simple Sweep") + +logging.basicConfig(level=logging.INFO) +transformers_logger = logging.getLogger("transformers") +transformers_logger.setLevel(logging.WARNING) + +# Preparing train data +train_data = [ + ["Aragorn was the heir of Isildur", "true"], + ["Frodo was the heir of Isildur", "false"], +] +train_df = pd.DataFrame(train_data) +train_df.columns = ["text", "labels"] + +# Preparing eval data +eval_data = [ + ["Theoden was the king of Rohan", "true"], + ["Merry was the king of Rohan", "false"], +] +eval_df = pd.DataFrame(eval_data) +eval_df.columns = ["text", "labels"] + +model_args = ClassificationArgs() +model_args.reprocess_input_data = True +model_args.overwrite_output_dir = True +model_args.evaluate_during_training = True +model_args.manual_seed = 4 +model_args.use_multiprocessing = True +model_args.train_batch_size = 16 +model_args.eval_batch_size = 8 +model_args.labels_list = ["true", "false"] +model_args.wandb_project = "Simple Sweep" + +def train(): + # Initialize a new wandb run + wandb.init() + + # Create a TransformerModel + model = ClassificationModel( + "roberta", + "roberta-base", + use_cuda=True, + args=model_args, + sweep_config=wandb.config, + ) + + # Train the model + model.train_model(train_df, eval_df=eval_df) + + # Evaluate the model + model.eval_model(eval_df) + + # Sync wandb + wandb.join() + + +wandb.agent(sweep_id, train) + +``` + +**Hint:** This script can also be found in the `examples` directory of the Github repo. +{: .notice--success} + +To visualize your sweep results, open the project on W&B. Please refer to [W&B docs](https://docs.wandb.com/sweeps/visualize-sweep-results) for more details on understanding the results. + +**Guide:** Guide for hyperparameter optimization [here](https://towardsdatascience.com/hyperparameter-optimization-for-optimum-transformer-models-b95a32b70949?source=friends_link&sk=7d19ce15c9ac1230642d826b9deeb638). +{: .notice--success} + +## Custom Parameter Groups (Freezing Layers) + +Simple Transformers supports custom parameter groups which can be used to set different learning rates for different layers in a model, freeze layers, train only the final layer, etc. + +All Simple Transformers models supports the following three configuration options for setting up custom parameter groups. + +### Custom parameter groups + +`custom_parameter_groups` offers the most granular configuration option. This should be a list of Python dicts where each dict contains a `params` key and any other optional keys matching the keyword arguments accepted by the optimizer (e.g. `lr`, `weight_decay`). The value for the `params` key should be a list of named parameters (e.g. `["classifier.weight", "bert.encoder.layer.10.output.dense.weight"]`) + +**Hint:** All Simple Transformers models have a `get_named_parameters()` method that returns a list of all parameter names in the model. +{: .notice--success} + +```python +model_args = ClassificationArgs() +model_args.custom_parameter_groups = [ + { + "params": ["classifier.weight", "bert.encoder.layer.10.output.dense.weight"], + "lr": 1e-2, + } +] +``` + +### Custom layer parameters + +`custom_layer_parameters` makes it more convenient to set the optimizer options for a given layer or set of layers. This should be a list of Python dicts where each dict contains a `layer` key and any other optional keys matching the keyword arguments accepted by the optimizer (e.g. `lr`, `weight_decay`). The value for the `layer` key should be an `int` (must be numeric) which specifies the layer (e.g. `0`, `1`, `11`). + +```python +model_args = ClassificationArgs() +model_args.custom_layer_parameters = [ + { + "layer": 10, + "lr": 1e-3, + }, + { + "layer": 0, + "lr": 1e-5, + }, +] +``` + +**Note:** Any named parameters specified through `custom_layer_parameters` with `bias` or `LayerNorm.weight` in the name will have their `weight_decay` set to `0.0`. This also happens for any parameters **not specified** in either `custom_parameter_groups` or in `custom_layer_parameters` but **does not happen** for parameters specified through `custom_parameter_groups`. +{: .notice--info} + +{% capture notice-text %} + +Note that `custom_parameter_groups` has *higher priority* than `custom_layer_parameters` as `custom_parameter_groups` is more specific. If a parameter specificed in `custom_parameter_groups` also happens to be in a layer specified in `custom_layer_parameters`, that particular parameter will be assigned to the parameter group specified in `custom_parameter_groups`. + +For example: + +```python +model_args = ClassificationArgs() +model_args.custom_layer_parameters = [ + { + "layer": 10, + "lr": 1e-3, + }, + { + "layer": 0, + "lr": 1e-5, + }, +] +model_args.custom_parameter_groups = [ + { + "params": ["classifier.weight", "bert.encoder.layer.10.output.dense.weight"], + "lr": 1e-2, + } +] +``` + +Here, `"bert.encoder.layer.10.output.dense.weight"` is specified in both the `custom_parameter_groups` and the `custom_layer_parameters`. However, `"bert.encoder.layer.10.output.dense.weight"` will have a `lr` of `1e-2` due to the higher precedence of `custom_parameter_groups`. + +{% endcapture %} + +
+

Order of precedence:

+ {{ notice-text | markdownify }} +
+ +**Hint:** Any parameters not specified in either `custom_parameter_groups` or in `custom_layer_parameters` will be assigned the general values from the model args. +{: .notice--success} + +### Train custom parameters only + +The `train_custom_parameters_only` option is used to facilitate the training of specific parameters only. If `train_custom_parameters_only` is set to `True`, only the parameters specified in either `custom_parameter_groups` or in `custom_layer_parameters` will be trained. + +For example, to train only the Classification layers of a `ClassificationModel`: + +```python +from simpletransformers.classification import ClassificationModel, ClassificationArgs +import pandas as pd +import logging + + +logging.basicConfig(level=logging.INFO) +transformers_logger = logging.getLogger("transformers") +transformers_logger.setLevel(logging.WARNING) + +# Preparing train data +train_data = [ + ["Aragorn was the heir of Isildur", 1], + ["Frodo was the heir of Isildur", 0], +] +train_df = pd.DataFrame(train_data) +train_df.columns = ["text", "labels"] + +# Preparing eval data +eval_data = [ + ["Theoden was the king of Rohan", 1], + ["Merry was the king of Rohan", 0], +] +eval_df = pd.DataFrame(eval_data) +eval_df.columns = ["text", "labels"] + +# Train only the classifier layers +model_args = ClassificationArgs() +model_args.train_custom_parameters_only = True +model_args.custom_parameter_groups = [ + { + "params": ["classifier.weight"], + "lr": 1e-3, + }, + { + "params": ["classifier.bias"], + "lr": 1e-3, + "weight_decay": 0.0, + }, +] +# Create a ClassificationModel +model = ClassificationModel( + "bert", "bert-base-cased", args=model_args +) + +# Train the model +model.train_model(train_df) + +``` + +## Options For Downloading Pre-Trained Models + +Most Simple Transformers models will use the `from_pretrained()` [method](https://huggingface.co/transformers/main_classes/model.html#transformers.PreTrainedModel.from_pretrained) from the Hugging Face Transformers library to download pre-trained models. You can pass `kwargs` to this method to configure things like proxies and force downloading (refer to method link above). + +You can pass these `kwargs` when initializing a Simple Transformers task-specific model to access the same functionality. For example, if you are behind a firewall and need to set the proxy settings; + +```python +model = ClassficationModel( + "bert", + "bert-base-cased", + proxies={"http": "foo.bar:3128", "http://hostname": "foo.bar:4012"} +) +```