Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
Hironsan committed Nov 24, 2017
1 parent 9cb828f commit aea21e6
Show file tree
Hide file tree
Showing 2 changed files with 53 additions and 26 deletions.
64 changes: 38 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,22 @@
# anaGo
***anaGo*** is a state-of-the-art library for sequence labeling using Keras.
***anaGo*** is a Keras implementation of sequence labeling.

anaGo can performs named-entity recognition (NER), part-of-speech tagging (POS tagging), semantic role labeling (SRL) and so on for **many languages**.
For example, **English Named-Entity Recognition** is shown in the following picture:
anaGo can perform Named Entity Recognition (NER), Part-of-Speech tagging (POS tagging), semantic role labeling (SRL) and so on for **many languages**.
For example, the following picture shows **Named Entity Recognition in English**:
<img src="https://github.com/Hironsan/anago/blob/docs/docs/images/example.en2.png?raw=true">

**Japanese Named-Entity Recognition** is shown in the following picture:
The following picture shows **Named Entity Recognition in Japanese**:
<img src="https://github.com/Hironsan/anago/blob/docs/docs/images/example.ja2.png?raw=true">

Similarly, **you can solve your task for your language.**
Similarly, **you can solve your task (NER, POS,...) for your language.**
You don't have to define features.
You have only to prepare input and output data. :)

## Feature Support
anaGo provide following features:
* learning your own task without any knowledge.
* defining your own model.
* ~~(Not yet supported)downloading learned model for many tasks. (e.g. NER, POS Tagging, etc...)~~
## anaGo Support Features
anaGo supports following features:
* training the model without any features.
* defining the custom model.
* downloading pre-trained models.


## Install
Expand All @@ -34,8 +35,8 @@ $ pip install -r requirements.txt
```

## Data and Word Vectors
The data must be in the following format(tsv).
We provide an example in train.txt:
Training data takes a tsv format.
The following text is an example of training data:

```
EU B-ORG
Expand All @@ -52,7 +53,7 @@ Peter B-PER
Blackburn I-PER
```

You also need to download [GloVe vectors](https://nlp.stanford.edu/projects/glove/) and store it in *data/glove.6B* directory.
anaGo supports pre-trained word embeddings like [GloVe vectors](https://nlp.stanford.edu/projects/glove/).

## Get Started
### Import
Expand All @@ -63,7 +64,7 @@ from anago.reader import load_data_and_labels
```

### Loading data
After importing the modules, load training, validation and test dataset:
After importing the modules, load [training, validation and test dataset](https://github.com/Hironsan/anago/blob/master/data/conll2003/en/ner/):
```python
x_train, y_train = load_data_and_labels('train.txt')
x_valid, y_valid = load_data_and_labels('valid.txt')
Expand All @@ -74,13 +75,13 @@ Now we are ready for training :)


### Training a model
Let's train a model. For training a model, we can use train method:
Let's train a model. To train a model, call `train` method:
```python
model = anago.Sequence()
model.train(x_train, y_train, x_valid, y_valid)
```

If training is progressing normally, progress bar will be displayed as follows:
If training is progressing normally, progress bar would be displayed:

```commandline
...
Expand All @@ -98,7 +99,7 @@ Epoch 5/15


### Evaluating a model
To evaluate the trained model, we can use eval method:
To evaluate the trained model, call `eval` method:

```python
model.eval(x_test, y_test)
Expand All @@ -111,20 +112,21 @@ After evaluation, F1 value is output:

### Tagging a sentence
Let's try tagging a sentence, "President Obama is speaking at the White House."
We can do it as follows:
To tag a sentence, call `analyze` method:

```python
>>> words = 'President Obama is speaking at the White House.'.split()
>>> model.analyze(words)
{
'words': [
'President',
'Obama',
'is',
'speaking',
'at',
'the',
'White',
'House.'
'President',
'Obama',
'is',
'speaking',
'at',
'the',
'White',
'House.'
],
'entities': [
{
Expand All @@ -145,6 +147,16 @@ We can do it as follows:
}
```

### Downloading pre-trained models
To download a pre-trained model, call `download` function:
```python
from anago.utils import download

dir_path = 'models'
url = 'https://storage.googleapis.com/chakki/datasets/public/models.zip'
download(url, dir_path)
model = anago.Sequence.load(dir_path)
```

## Reference
This library uses bidirectional LSTM + CRF model based on
Expand Down
15 changes: 15 additions & 0 deletions tests/wrapper_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

import anago
from anago.reader import load_data_and_labels, load_glove
from anago.utils import download

get_path = lambda path: os.path.join(os.path.dirname(__file__), path)
DATA_ROOT = get_path('../data/conll2003/en/ner')
Expand Down Expand Up @@ -91,3 +92,17 @@ def test_train_vocab_init(self):
model = anago.Sequence(max_epoch=15, embeddings=self.embeddings, log_dir='logs')
model.train(self.x_train, self.y_train, self.x_test, self.y_test, vocab_init=vocab)
model.save(dir_path=self.dir_path)

def test_train_all(self):
x_train = np.r_[self.x_train, self.x_valid, self.x_test]
y_train = np.r_[self.y_train, self.y_valid, self.y_test]
model = anago.Sequence(max_epoch=15, embeddings=self.embeddings, log_dir='logs')
model.train(x_train, y_train, self.x_test, self.y_test)
model.save(dir_path=self.dir_path)

def test_download(self):
dir_path = 'test_dir'
url = 'https://storage.googleapis.com/chakki/datasets/public/models.zip'
download(url, dir_path)
model = anago.Sequence.load(dir_path)
model.eval(self.x_test, self.y_test)

0 comments on commit aea21e6

Please sign in to comment.