Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spaCy Training Performance With different configuration and set up #7

Open
YanLiang1102 opened this issue May 27, 2018 · 13 comments
Open

Comments

@YanLiang1102
Copy link
Collaborator

YanLiang1102 commented May 27, 2018

spaCy training output

'dep_loss', 'tag_loss', 'uas', 'tags_acc', 'token_acc', 'ents_p', 'ents_r', 'ents_f', 'cpu_wps', 'gpu_wps'
eval_data=eval_data
29, 0.000, 11.698, 0.000, 58.589, 49.871, 53.880, 91.894, 85.899, 15363.7, 0.0
eval_data-training_data performance (means the model does work)
29, 0.000, 13.261, 0.000, 81.946, 73.552, 77.523, 91.815, 85.866, 13092.9, 0.0
based on these data I think the model does work , just we don't have enough data. we only have 401 tagged documents in ontoNotes data. And all the entity tags are from that 401 documents.

exception that during training spaCy throw, and I made the code to eat the exception

[E067] Invalid BILUO tag sequence: Got a tag starting with 'I' (inside an entity) without a preceding 'B' (beginning of an entity). Tag sequence:
['O', 'U-GPE', 'O', 'B-EVENT', 'I-EVENT', 'L-EVENT', 'B-EVENT', 'I-EVENT', 'I-EVENT', 'I-EVENT', 'L-EVENT', 'B-EVENT', 'I-EVENT', 'I-EVENT', 'I-EVENT', 'L-EVENT', 'B-ORG', 'L-ORG', 'U-GPE" S_OFF="1', 'U-CARDINAL', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'U-GPE', 'O', 'O', 'O', 'O', 'O', 'O', 'B-EVENT', 'I-EVENT', 'L-EVENT', 'U-ORDINAL', 'B-DATE', 'I-DATE', 'L-DATE', 'O', 'O', 'B-ORG', 'I-ORG', 'L-ORG', 'O', 'O', 'O', 'O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'B-FAC', 'L-FAC', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'I-ORG', 'L-ORG', 'O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'O', 'O', 'B-PERSON', 'I-PERSON', 'L-PERSON', 'O', 'O', 'B-ORG', 'I-ORG', 'I-ORG', 'L-ORG', 'B-PERSON', 'L-PERSON', 'O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'U-NORP', 'O', 'O', 'O', 'B-PERSON', 'L-PERSON', 'U-DATE', 'O', 'U-GPE', 'O', 'U-GPE', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'I-ORG', 'I-ORG', 'I-ORG', 'I-ORG', 'L-ORG', 'O', 'O', 'O', 'O', 'O', 'B-EVENT', 'I-EVENT', 'I-EVENT', 'I-EVENT', 'I-EVENT', 'L-EVENT', 'O', 'B-GPE', 'I-GPE', 'L-GPE', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'U-GPE', 'B-ORG', 'I-ORG', 'I-ORG', 'I-ORG', 'I-ORG', 'I-ORG', 'I-ORG', 'L-ORG', 'O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-FAC', 'L-FAC', 'O', 'U-GPE', 'O', 'O', 'B-TIME', 'I-TIME', 'L-TIME', 'U-DATE', 'O', 'B-FAC', 'I-FAC', 'I-FAC', 'L-FAC', 'O', 'O', 'O', 'B-EVENT', 'I-EVENT', 'I-EVENT', 'I-EVENT', 'I-EVENT', 'I-EVENT', 'I-EVENT', 'L-EVENT', 'O', 'O', 'B-ORG', 'L-ORG', 'O', 'U-ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'U-ORDINAL', 'O', 'O', 'O', 'O', 'U-EVENT', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'U-ORDINAL" S_OFF="1', 'O', 'B-ORG', 'L-ORG', 'O', 'B-ORG', 'L-ORG', 'O', 'O', 'U-TIME" S_OFF="1', 'U-DATE', 'U-CARDINAL', 'B-FAC', 'I-FAC', 'I-FAC', 'I-FAC', 'I-FAC', 'L-FAC', 'U-GPE', 'O', 'B-ORG', 'L-ORG', 'U-CARDINAL', 'O', 'U-CARDINAL', 'O', 'B-CARDINAL', 'I-CARDINAL', 'L-CARDINAL', 'O', 'O', 'O', 'B-CARDINAL', 'L-CARDINAL', 'O', 'O', 'B-CARDINAL', 'L-CARDINAL', 'O', 'O', 'B-CARDINAL', 'L-CARDINAL', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'O', 'O', 'O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'B-PERSON', 'I-PERSON', 'L-PERSON', 'O', 'O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'O', 'O', 'U-DATE', 'U-CARDINAL', 'O', 'O', 'B-TIME', 'L-TIME', 'U-ORG', 'O', 'U-ORG', 'B-FAC', 'I-FAC', 'I-FAC', 'L-FAC', 'O', 'U-TIME', 'U-ORG', 'O', 'U-ORG', 'B-FAC', 'I-FAC', 'I-FAC', 'L-FAC', 'O', 'U-TIME', 'U-ORG', 'O', 'U-FAC', 'O', 'O', 'O', 'O', 'U-TIME', 'U-ORG', 'O', 'B-ORG', 'L-ORG', 'O', 'U-FAC', 'O', 'O', 'U-TIME', 'U-ORG', 'O', 'U-ORG', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'L-ORG', 'O', 'B-ORG', 'L-ORG', 'O', 'U-GPE" S_OFF="1', 'O', 'O', 'O', 'O', 'O', 'B-TIME', 'I-TIME', 'I-TIME', 'L-TIME', 'B-DATE', 'I-DATE', 'L-DATE', 'O', 'O', 'B-FAC', 'L-FAC', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'L-ORG', 'O', 'O', 'O', 'U-CARDINAL', 'O', 'O', 'O', 'O', 'O', 'U-ORDINAL', 'O', 'U-DATE', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'L-ORG', 'O', 'O', 'B-DATE', 'L-DATE', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'U-CARDINAL', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'U-CARDINAL', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'I-ORG', 'I-ORG', 'I-ORG', 'L-ORG', 'O', 'U-DATE']
@YanLiang1102 YanLiang1102 changed the title Spacy Training Performance spaCy Training Performance May 27, 2018
@YanLiang1102
Copy link
Collaborator Author

99 0.000 3.572 0.000 57.933 49.355 53.302 91.894 85.899 1438.3 0.0
iteration result using eval_data 100 times ,the previosu result is iteration 30 times, so more iteration not helping

@YanLiang1102
Copy link
Collaborator Author

YanLiang1102 commented Jun 4, 2018

@YanLiang1102
Copy link
Collaborator Author

YanLiang1102 commented Jun 7, 2018

After add in the ANERCorp here is the accuracy:
29 0.000 11.407 0.000 58.158 50.018 53.782 91.894 85.899 6835.0 0.0
image

this is because out of 150k entity tokens, 88% of them are useless 'O' object so our performance not get enhanced @ahalterman @cegme

tags accuracy goes down a little bit.

so LDC+ANERCORP with no merged class, but fasttext pretrained embedding
29 0.000 10.027 0.000 56.598 51.344 53.843 91.894 85.899 13987.00.0

when trained by 10 times got a better performance:
9 0.000 270.721 0.000 56.944 52.855 54.823 91.894 85.899 13959.30.0

@ahalterman
Copy link
Collaborator

Well, not exactly "useless"..., since we need to be able to distinguish between entities and non-entities.

What are the different numbers in accuracy? Does each one represent a tag type? We need to figure out how to handle ANER not having a full range of labels that OntoNotes has. One way would be to go from spaCy format to Prodigy format, where each task is one single entity label, rather than all highlighted entities. Then when we use the more limited ANER data, we're not incorrectly telling it there's no entity there when there actually is.

It would also be a cool experiment to know whether "Prodigy-style" training underperforms spaCy training (and by how much).

@YanLiang1102
Copy link
Collaborator Author

YanLiang1102 commented Jun 11, 2018

@ahalterman @cegme so the token accuracy from 58.589 to 58.158
but token accuracy got enhanced from 49.871 to 50.018.
for the difference of these two take a look at here.
http://web.stanford.edu/class/cs224n/assignment3/assignment3.pdf

@YanLiang1102
Copy link
Collaborator Author

YanLiang1102 commented Jun 11, 2018

http://users.dsic.upv.es/~ybenajiba/downloads.html
I am thinking using the ANERGazet using thsi to filter out the entity in some arabic docs we have and train the model based on that.
so the ANER data is transferred from ANER format to BILOU format , and yes each word has a tag on it,
what does the prodigy one looks like?
and all the labeled tags has been transfered into one large documents, here is what the data looks like.
https://raw.githubusercontent.com/oudalab/Arabic-NER/master/data/ANERCorp_conll_new.ner.json

some related stuff I found similar to what you are talking about @ahalterman https://support.prodi.gy/t/remarkable-difference-between-prodigy-and-custom-training-times/467/3

@YanLiang1102
Copy link
Collaborator Author

YanLiang1102 commented Jun 13, 2018

pretrained embedding stuff? @ahalterman wonder if this is what u are talking about. or u have better examples?
explosion/spaCy#2084

https://spacy.io/usage/vectors-similarity#custom

@YanLiang1102
Copy link
Collaborator Author

YanLiang1102 commented Jun 13, 2018

  1. add embeddings
  2. make baseline (spaCy style on Ontonotes and ANER).
  • (Combine GPE and LOC)
  • Remove the labels that aren't in ANER...
  1. Prodigy style with Ontonotes, ANER, Prodigy data
  2. Make "distantly supervised" data from wiki/Gigaword, train with Prodigy (edited),using the ANERGazet provided here:http://users.dsic.upv.es/~ybenajiba/downloads.html

@YanLiang1102
Copy link
Collaborator Author

to avoid the catastrophic problem, our plan is to using spaCy and ldc+anercorp and we get the model,
then we change the ldc+aner data into prodigy style and we update the model using the (ldc+anercorp+prodigy user labeled data)
need to find the model how to update the model (probably in prodigy) or some cli code to do that @ahalterman

@YanLiang1102 YanLiang1102 changed the title spaCy Training Performance spaCy Training Performance With different configuration and set up Jun 20, 2018
@YanLiang1102
Copy link
Collaborator Author

YanLiang1102 commented Jun 24, 2018

Performance with pretrained embedding and merged tagged class is this:
as the best:

'dep_loss'  'tag_loss'  'uas'  'tags_acc'  'token_acc'  'ents_p'  'ents_r'  'ents_f'  'cpu_wps'  'gpu_wps'
0.000       275.748      0.000  58.406      54.254       56.254    91.894    85.899    15447.2   0.0

token accuracy is 58.406 and entity accuracy is 54.254
and our base line is : 58.158 50.018 (with anercorp+ldc, no pretrained embedding, no merge ner tags)
with anercorp+ldc, with merge tage with pretrained embedding
token acurracy is similar from 58.406 drop to 58.158
and the entity accuracy enhanced a lot from 50.018 to 54.254. @ahalterman

@ahalterman
Copy link
Collaborator

Can you add the header to indicate what the 11 numbers mean?

@YanLiang1102
Copy link
Collaborator Author

YanLiang1102 commented Jun 24, 2018

yeah it is at the top of this issue: and also here
'itr','dep_loss', 'tag_loss', 'uas', 'tags_acc', 'token_acc', 'ents_p', 'ents_r', 'ents_f', 'cpu_wps', 'gpu_wps'
eval_data=eval_data

@YanLiang1102
Copy link
Collaborator Author

YanLiang1102 commented Jun 24, 2018

training data tag distribution:
{'B-LOC': 1889,
'B-MISC': 3521,
'B-ORG': 3912,
'B-PERSON': 6014,
'I-LOC': 744,
'I-MISC': 3812,
'I-ORG': 3630,
'I-PERSON': 1665,
'L-LOC': 1875,
'L-MISC': 3518,
'L-ORG': 3891,
'L-PERSON': 5982,
'O': 358866,
'U-LOC': 7323,
'U-MISC': 7264,
'U-ORG': 2576,
'U-PERSON': 3149}
and test data tag distribution:
{'B-LOC': 168,
'B-MISC': 307,
'B-ORG': 341,
'B-PERSON': 383,
'I-LOC': 46,
'I-MISC': 327,
'I-ORG': 465,
'I-PERSON': 93,
'L-LOC': 169,
'L-MISC': 307,
'L-ORG': 337,
'L-PERSON': 376,
'O': 27441,
'U-LOC': 384,
'U-MISC': 765,
'U-ORG': 173,
'U-PERSON': 204}

https://github.com/izarov/cs224n/blob/master/assignment3/handouts/assignment3-soln.pdf
the confusion matrix for NER analysis is a good way to go to check which tags are really get misunderstood!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants