Skip to content

Commit

Permalink
Merged with master
Browse files Browse the repository at this point in the history
  • Loading branch information
Thilina Rajapakse committed Aug 9, 2020
2 parents a426a43 + d405b4a commit db9ab1e
Show file tree
Hide file tree
Showing 89 changed files with 1,102 additions and 607 deletions.
13 changes: 12 additions & 1 deletion .all-contributorsrc
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,18 @@
"avatar_url": "https://avatars1.githubusercontent.com/u/14850762?v=4",
"profile": "http://pablomarino.me",
"contributions": [
"code"
"code",
"doc"
]
},
{
"login": "strawberrypie",
"name": "Anton Kiselev",
"avatar_url": "https://avatars2.githubusercontent.com/u/29224443?v=4",
"profile": "http://linkedin.com/in/strawberrypie/",
"contributions": [
"code",
"doc"
]
}
],
Expand Down
65 changes: 61 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,54 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## UNRELEASED
## [0.47.0] - 2020-08-09

- Removed blank string answer in Question Answering predictions
### Added

- Added support for testing models through a Streamlit app. Use the command `simpl-viewer". Currently supports:
- Classification (including multilabel)
- NER (design inspired by [displaCy Named Entity Visualizer](https://explosion.ai/demos/displacy-ent))
- QA


## [0.46.5] - 2020-08-05

### Changed

- Python version requirement changed back to 3.6 for Colab support.
- Miscellaneous bug fixes in 0.46.3 and 0.46.4

## [0.46.2] - 2020-08-01

### Fixed

- Fixed unreachable condition in Electra language modeling.

## [0.46.1] - 2020-08-01

### Fixed

- Bug in ConvAI models where cache_dir was not being created.

## [0.46.0] - 2020-08-01

### Changed

- Uses PyTorch native AMP instead of Apex. [@strawberrypie](https://github.com/strawberrypie)

## [0.45.5] - 2020-07-29

### Fixed

- Bug fixed in loading classiication models with a `labels_map` where labels are ints.

## [0.45.4] - 2020-07-28

### Fixed

## [0.45.2] - 2020-07-19
- Bug fixed in lazy loading classification tasks where `lazy_text_column=0` caused an error.

## [0.45.2] - 2020-07-25

### Added

Expand Down Expand Up @@ -1001,7 +1044,21 @@ Model checkpoint is now saved for all epochs again.

- This CHANGELOG file to hopefully serve as an evolving example of a standardized open source project CHANGELOG.

[0.45.2]: https://github.com/ThilinaRajapakse/simpletransformers/compare/3e98361...HEAD
[0.46.5]: https://github.com/ThilinaRajapakse/simpletransformers/compare/2cc77f7...HEAD

[0.46.3]: https://github.com/ThilinaRajapakse/simpletransformers/compare/7f37cb7...2cc77f7

[0.46.2]: https://github.com/ThilinaRajapakse/simpletransformers/compare/b64637c...7f37cb7

[0.46.1]: https://github.com/ThilinaRajapakse/simpletransformers/compare/121cba4...b64637c

[0.46.0]: https://github.com/ThilinaRajapakse/simpletransformers/compare/120d1e6...121cba4

[0.45.5]: https://github.com/ThilinaRajapakse/simpletransformers/compare/0ac6b69...120d1e6

[0.45.4]: https://github.com/ThilinaRajapakse/simpletransformers/compare/ac0f1a0...0ac6b69

[0.45.2]: https://github.com/ThilinaRajapakse/simpletransformers/compare/3e98361...ac0f1a0

[0.45.0]: https://github.com/ThilinaRajapakse/simpletransformers/compare/fad190f...3e98361

Expand Down
88 changes: 45 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Downloads](https://pepy.tech/badge/simpletransformers)](https://pepy.tech/project/simpletransformers)
<!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section -->
[![All Contributors](https://img.shields.io/badge/all_contributors-44-orange.svg?style=flat-square)](#contributors-)
[![All Contributors](https://img.shields.io/badge/all_contributors-45-orange.svg?style=flat-square)](#contributors-)
<!-- ALL-CONTRIBUTORS-BADGE:END -->

# Simple Transformers
Expand Down Expand Up @@ -210,10 +210,10 @@ Any feedback will be immensely helpful in improving the documentation! If you ha
- [Evaluating a Model](#evaluating-a-model)
- [Predicting from a trained Model](#predicting-from-a-trained-model)
- [Text Representation Generation](#text-representation-generation)
- [Minimal example for generating word embeddings](#minimal-start-for-generating-word-embeddings)
- [Minimal example for generating sentence embeddings](#minimal-start-for-generating-sentence-embeddings)
- [Minimal example for generating word embeddings](#minimal-example-for-generating-word-embeddings)
- [Minimal example for generating sentence embeddings](#minimal-example-for-generating-sentence-embeddings)
- [Regression](#regression)
- [Minimal Start for Regression](#minimal-start-for-regression)
- [Minimal Start for Regression](#minimal-start-for-regression)
- [Visualization Support](#visualization-support)
- [Experimental Features](#experimental-features)
- [Sliding Window For Long Sequences](#sliding-window-for-long-sequences)
Expand All @@ -224,7 +224,6 @@ Any feedback will be immensely helpful in improving the documentation! If you ha
- [*cache_dir: str*](#cache_dir-str)
- [*best_model_dir: str*](#best_model_dir-str)
- [*fp16: bool*](#fp16-bool)
- [*fp16_opt_level: str*](#fp16_opt_level-str)
- [*max_seq_length: int*](#max_seq_length-int)
- [*train_batch_size: int*](#train_batch_size-int)
- [*gradient_accumulation_steps: int*](#gradient_accumulation_steps-int)
Expand Down Expand Up @@ -263,6 +262,8 @@ Any feedback will be immensely helpful in improving the documentation! If you ha
- [*config*](#config)
- [Current Pretrained Models](#current-pretrained-models)
- [Acknowledgements](#acknowledgements)
- [How to Contribute](#how-to-contribute)
- [How to Update Docs](#how-to-update-docs)
- [Contributors ✨](#contributors-)
<!--te-->

Expand All @@ -272,16 +273,14 @@ Any feedback will be immensely helpful in improving the documentation! If you ha

1. Install Anaconda or Miniconda Package Manager from [here](https://www.anaconda.com/distribution/)
2. Create a new virtual environment and install packages.
`conda create -n transformers python pandas tqdm`
`conda activate transformers`
`conda create -n st python pandas tqdm`
`conda activate st`
If using cuda:
&nbsp;&nbsp;&nbsp;&nbsp;`conda install pytorch cudatoolkit=10.1 -c pytorch`
&nbsp;&nbsp;&nbsp;&nbsp;`conda install pytorch>=1.6 cudatoolkit=10.2 -c pytorch`
else:
&nbsp;&nbsp;&nbsp;&nbsp;`conda install pytorch cpuonly -c pytorch`

3. Install Apex if you are using fp16 training. Please follow the instructions [here](https://github.com/NVIDIA/apex). (Installing Apex from pip has caused issues for several people.)

4. Install simpletransformers.
3. Install simpletransformers.
`pip install simpletransformers`

#### Optional
Expand Down Expand Up @@ -1479,7 +1478,7 @@ LanguageGenerationModel has a few additional attributes in its `args` dictionary
```python
"do_sample": True,
"prompt": "",
"length": 20,
"max_length": 20,
"stop_token": None,
"temperature": 1.0,
"repetition_penalty": 1.0,
Expand Down Expand Up @@ -1646,6 +1645,14 @@ Note, you must set `evaluate_generated_text` to `True` to evaluate generated seq
import logging

import pandas as pd
import sklearn

from simpletransformers.classification import ClassificationModel
from simpletransformers.classification.multi_modal_classification_model import \
MultiModalClassificationModel
from simpletransformers.experimental.classification import ClassificationModel
from simpletransformers.language_representation import RepresentationModel
from simpletransformers.seq2seq import Seq2SeqModel
from simpletransformers.t5 import T5Model

logging.basicConfig(level=logging.INFO)
Expand Down Expand Up @@ -1954,10 +1961,7 @@ The prediction data should be a list of strings.
The `Seq2SeqModel` must be initialized with `encoder_decoder_type="bart"` and `encoder_decoder_name` set to a pre-trained model name or the path to a saved model directory.

```python
import logging

import pandas as pd
from simpletransformers.seq2seq import Seq2SeqModel

logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
Expand Down Expand Up @@ -2027,10 +2031,7 @@ The `Seq2SeqModel` must be initialized with `encoder_decoder_type="marian"` and
Everything else is identical to the Bart model usage.

```python
import logging

import pandas as pd
from simpletransformers.seq2seq import Seq2SeqModel

logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
Expand Down Expand Up @@ -2076,10 +2077,7 @@ for en, de in zip(src, predictions):
#### Generic Encoder-Decoder minimal start

```python
import logging

import pandas as pd
from simpletransformers.seq2seq import Seq2SeqModel

logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
Expand Down Expand Up @@ -2796,7 +2794,6 @@ If `label_list` is not given, `num_labels` is required and the labels should be
Create a `MultiModalClassificationModel`.

```python
from simpletransformers.classification.multi_modal_classification_model import MultiModalClassificationModel


model = MultiModalClassificationModel("bert", "bert-base-uncased")
Expand Down Expand Up @@ -2924,15 +2921,14 @@ _[Back to Table of Contents](#table-of-contents)_
---

## [Text Representation Generation](#text-representation-generation)
Use transformers language models to generate contextual word or sentence representations from text that you can then feed to any down-stream tasks of your preference.
For more complete examples of how to use this component with downstream tasks refer to: https://github.com/ThilinaRajapakse/simpletransformers/tree/master/examples/language_representation

Use transformers language models to generate contextual word or sentence representations from text that you can then feed to any down-stream tasks of your preference.
For more complete examples of how to use this component with downstream tasks refer to: https://github.com/ThilinaRajapakse/simpletransformers/tree/master/examples/language_representation

### Minimal example for generating word embeddings
Generate a list of contextual word embeddings for every sentence in a list
```python
from simpletransformers.language_representation import RepresentationModel


sentences = ["Example sentence 1", "Example sentence 2"]
model = RepresentationModel(
model_type="bert",
Expand All @@ -2942,11 +2938,10 @@ model = RepresentationModel(
word_vectors = model.encode_sentences(sentences, combine_strategy=None)
assert word_vectors.shape === (2, 5, 768) # token vector for every token in each sentence, bert based models add 2 tokens per sentence by default([CLS] & [SEP])
```
### Minimal example for generating sentence embeddings

### Minimal example for generating sentence embeddings
Same code as for generating word embeddings, the only differennce is that we pass combine_s`trategy="mean" parameter to `combine_strategy="mean"
```python
from simpletransformers.language_representation import RepresentationModel
sentences = ["Example sentence 1", "Example sentence 2"]
model = RepresentationModel(
model_type="bert",
Expand Down Expand Up @@ -2975,8 +2970,6 @@ Regression can be used with either single sentence or sentence pair tasks.
#### Minimal Start for Regression

```python
from simpletransformers.classification import ClassificationModel
import pandas as pd


train_data = [
Expand Down Expand Up @@ -3041,7 +3034,6 @@ _[Back to Table of Contents](#table-of-contents)_
To use experimental features, import from `simpletransformers.experimental.X`

```python
from simpletransformers.experimental.classification import ClassificationModel
```

### Sliding Window For Long Sequences
Expand All @@ -3062,9 +3054,6 @@ Currently available on binary and multiclass classification models of the follow
Set `sliding_window` to `True` for the ClassificationModel to enable this feature.

```python
from simpletransformers.classification import ClassificationModel
import pandas as pd
import sklearn

# Train and Evaluation data needs to be in a Pandas Dataframe of two columns. The first column is the text with type str, and the second column in the label with type int.
train_data = [['Example sentence belonging to class 1' * 50, 1], ['Example sentence belonging to class 0', 0], ['Example 2 sentence belonging to class 0', 0]] + [['Example sentence belonging to class 0', 0] for i in range(12)]
Expand Down Expand Up @@ -3132,7 +3121,6 @@ self.args = {
"best_model_dir": "outputs/best_model/",

"fp16": True,
"fp16_opt_level": "O1",
"max_seq_length": 128,
"train_batch_size": 8,
"eval_batch_size": 8,
Expand Down Expand Up @@ -3192,10 +3180,7 @@ The directory where cached files will be saved.
The directory where the best model (model checkpoints) will be saved if evaluate_during_training is enabled and the training loop achieves a lowest evaluation loss calculated after every evaluate_during_training_steps, or an epoch.

#### *fp16: bool*
Whether or not fp16 mode should be used. Requires NVidia Apex library.

#### *fp16_opt_level: str*
Can be '01', '02', '03'. See the [Apex docs](https://nvidia.github.io/apex/amp.html) for an explanation of the different optimization levels (opt_levels).
Whether or not fp16 mode should be used.

#### *max_seq_length: int*
Maximum sequence level the model will support.
Expand Down Expand Up @@ -3326,6 +3311,22 @@ None of this would have been possible without the hard work by the HuggingFace t

_<div>Icon for the Social Media Preview made by <a href="https://www.flaticon.com/authors/freepik" title="Freepik">Freepik</a> from <a href="https://www.flaticon.com/" title="Flaticon">www.flaticon.com</a></div>_

## How to Contribute

### How to Update Docs
The latest version of the docs is hosted on [Github Pages](https://simpletransformers.ai/), if you want to help document Simple Transformers
below are the steps to edit the docs.
Docs are built using [Jekyll](https://jekyllrb.com/) library, refer to their webpage for a detailed explanation of how it works.
1) **Install [Jekyll](https://jekyllrb.com/)**: Run the command `gem install bundler jekyll`
2) **Visualizing the docs on your local computer**:
In your terminal cd into the docs directory of this repo, eg: `cd simpletransformers/docs`
From the docs directory run this command to serve the Jekyll docs locally: `bundle exec jekyll serve`
Browse to http://localhost:4000 or whatever url you see in the console to visualize the docs.
3) **Edit and visualize changes**:
All the section pages of our docs can be found under `docs/_docs` directory, you can edit any file you want by following the markdown format and visualize the changes after refreshing the browser tab.

**Note**: The docs present in the readme.md file are going to be deprecated soon and removed, so we don't recommend spending time on it.

## Contributors ✨

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
Expand Down Expand Up @@ -3390,7 +3391,8 @@ Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/d
</tr>
<tr>
<td align="center"><a href="https://github.com/taranais"><img src="https://avatars1.githubusercontent.com/u/859916?v=4" width="100px;" alt=""/><br /><sub><b>taranais</b></sub></a><br /><a href="https://github.com/ThilinaRajapakse/simpletransformers/commits?author=taranais" title="Code">💻</a></td>
<td align="center"><a href="http://pablomarino.me"><img src="https://avatars1.githubusercontent.com/u/14850762?v=4" width="100px;" alt=""/><br /><sub><b>Pablo N. Marino</b></sub></a><br /><a href="https://github.com/ThilinaRajapakse/simpletransformers/commits?author=pablonm3" title="Code">💻</a></td>
<td align="center"><a href="http://pablomarino.me"><img src="https://avatars1.githubusercontent.com/u/14850762?v=4" width="100px;" alt=""/><br /><sub><b>Pablo N. Marino</b></sub></a><br /><a href="https://github.com/ThilinaRajapakse/simpletransformers/commits?author=pablonm3" title="Code">💻</a> <a href="https://github.com/ThilinaRajapakse/simpletransformers/commits?author=pablonm3" title="Documentation">📖</a></td>
<td align="center"><a href="http://linkedin.com/in/strawberrypie/"><img src="https://avatars2.githubusercontent.com/u/29224443?v=4" width="100px;" alt=""/><br /><sub><b>Anton Kiselev</b></sub></a><br /><a href="https://github.com/ThilinaRajapakse/simpletransformers/commits?author=strawberrypie" title="Code">💻</a> <a href="https://github.com/ThilinaRajapakse/simpletransformers/commits?author=strawberrypie" title="Documentation">📖</a></td>
</tr>
</table>

Expand Down
6 changes: 6 additions & 0 deletions docs/_data/navigation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,12 @@ docs:
url: /docs/seq2seq-data-formats/
- title: "Seq2Seq Minimal Start"
url: /docs/seq2seq-minimal-start/
- title: Text Representation Generation
children:
- title: "Text Representation Examples"
url: /docs/text-rep-examples/
- title: "Text Representation Model"
url: /docs/text-rep-model/
- title: Meta
children:
- title: "History"
Expand Down
Loading

0 comments on commit db9ab1e

Please sign in to comment.