-
Notifications
You must be signed in to change notification settings - Fork 45
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use unique and most common ngrams as absolute confidence metric (#235)
- Loading branch information
Showing
908 changed files
with
3,663 additions
and
1,423 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Afrikaans ##### | ||
|
||
>>> Accuracy on average: 72.97% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 45.90% | ||
Erroneously classified as Unknown: 54.10% | ||
|
||
>> Detection of 1000 word pairs (average length: 16 chars) | ||
Accuracy: 74.10% | ||
Erroneously classified as Unknown: 25.90% | ||
|
||
>> Detection of 1000 sentences (average length: 102 chars) | ||
Accuracy: 98.90% | ||
Erroneously classified as Unknown: 1.10% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Albanian ##### | ||
|
||
>>> Accuracy on average: 81.40% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 57.30% | ||
Erroneously classified as Unknown: 42.70% | ||
|
||
>> Detection of 1000 word pairs (average length: 15 chars) | ||
Accuracy: 87.10% | ||
Erroneously classified as Unknown: 12.90% | ||
|
||
>> Detection of 1000 sentences (average length: 118 chars) | ||
Accuracy: 99.80% | ||
Erroneously classified as Unknown: 0.20% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Arabic ##### | ||
|
||
>>> Accuracy on average: 94.77% | ||
|
||
>> Detection of 1000 single words (average length: 6 chars) | ||
Accuracy: 86.40% | ||
Erroneously classified as Unknown: 13.60% | ||
|
||
>> Detection of 1000 word pairs (average length: 14 chars) | ||
Accuracy: 98.10% | ||
Erroneously classified as Unknown: 1.90% | ||
|
||
>> Detection of 1000 sentences (average length: 89 chars) | ||
Accuracy: 99.80% | ||
Erroneously classified as Unknown: 0.20% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Armenian ##### | ||
|
||
>>> Accuracy on average: 100.00% | ||
|
||
>> Detection of 1000 single words (average length: 9 chars) | ||
Accuracy: 100.00% | ||
Erroneously classified as | ||
|
||
>> Detection of 1000 word pairs (average length: 18 chars) | ||
Accuracy: 100.00% | ||
Erroneously classified as | ||
|
||
>> Detection of 1000 sentences (average length: 122 chars) | ||
Accuracy: 100.00% | ||
Erroneously classified as | ||
|
16 changes: 16 additions & 0 deletions
16
accuracy-reports/lingua-azerbaijani-detector/Azerbaijani.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Azerbaijani ##### | ||
|
||
>>> Accuracy on average: 89.87% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 76.10% | ||
Erroneously classified as Unknown: 23.90% | ||
|
||
>> Detection of 1000 word pairs (average length: 16 chars) | ||
Accuracy: 93.70% | ||
Erroneously classified as Unknown: 6.30% | ||
|
||
>> Detection of 1000 sentences (average length: 107 chars) | ||
Accuracy: 99.80% | ||
Erroneously classified as Unknown: 0.20% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Basque ##### | ||
|
||
>>> Accuracy on average: 77.43% | ||
|
||
>> Detection of 1000 single words (average length: 9 chars) | ||
Accuracy: 53.30% | ||
Erroneously classified as Unknown: 46.70% | ||
|
||
>> Detection of 1000 word pairs (average length: 17 chars) | ||
Accuracy: 79.10% | ||
Erroneously classified as Unknown: 20.90% | ||
|
||
>> Detection of 1000 sentences (average length: 102 chars) | ||
Accuracy: 99.90% | ||
Erroneously classified as Unknown: 0.10% | ||
|
16 changes: 16 additions & 0 deletions
16
accuracy-reports/lingua-belarusian-detector/Belarusian.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Belarusian ##### | ||
|
||
>>> Accuracy on average: 92.60% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 81.70% | ||
Erroneously classified as Unknown: 18.30% | ||
|
||
>> Detection of 1000 word pairs (average length: 17 chars) | ||
Accuracy: 96.50% | ||
Erroneously classified as Unknown: 3.50% | ||
|
||
>> Detection of 1000 sentences (average length: 105 chars) | ||
Accuracy: 99.60% | ||
Erroneously classified as Unknown: 0.40% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Bengali ##### | ||
|
||
>>> Accuracy on average: 100.00% | ||
|
||
>> Detection of 1000 single words (average length: 7 chars) | ||
Accuracy: 100.00% | ||
Erroneously classified as | ||
|
||
>> Detection of 1000 word pairs (average length: 15 chars) | ||
Accuracy: 100.00% | ||
Erroneously classified as | ||
|
||
>> Detection of 1000 sentences (average length: 87 chars) | ||
Accuracy: 100.00% | ||
Erroneously classified as | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Bokmal ##### | ||
|
||
>>> Accuracy on average: 77.27% | ||
|
||
>> Detection of 1000 single words (average length: 9 chars) | ||
Accuracy: 53.40% | ||
Erroneously classified as Unknown: 46.60% | ||
|
||
>> Detection of 1000 word pairs (average length: 17 chars) | ||
Accuracy: 79.90% | ||
Erroneously classified as Unknown: 20.10% | ||
|
||
>> Detection of 1000 sentences (average length: 98 chars) | ||
Accuracy: 98.50% | ||
Erroneously classified as Unknown: 1.50% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Bosnian ##### | ||
|
||
>>> Accuracy on average: 64.10% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 36.20% | ||
Erroneously classified as Unknown: 63.80% | ||
|
||
>> Detection of 1000 word pairs (average length: 16 chars) | ||
Accuracy: 61.20% | ||
Erroneously classified as Unknown: 38.80% | ||
|
||
>> Detection of 1000 sentences (average length: 105 chars) | ||
Accuracy: 94.90% | ||
Erroneously classified as Unknown: 5.10% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Bulgarian ##### | ||
|
||
>>> Accuracy on average: 81.53% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 60.90% | ||
Erroneously classified as Unknown: 39.10% | ||
|
||
>> Detection of 1000 word pairs (average length: 17 chars) | ||
Accuracy: 85.30% | ||
Erroneously classified as Unknown: 14.70% | ||
|
||
>> Detection of 1000 sentences (average length: 89 chars) | ||
Accuracy: 98.40% | ||
Erroneously classified as Unknown: 1.60% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Catalan ##### | ||
|
||
>>> Accuracy on average: 72.20% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 49.20% | ||
Erroneously classified as Unknown: 50.80% | ||
|
||
>> Detection of 1000 word pairs (average length: 16 chars) | ||
Accuracy: 74.20% | ||
Erroneously classified as Unknown: 25.80% | ||
|
||
>> Detection of 1000 sentences (average length: 103 chars) | ||
Accuracy: 93.20% | ||
Erroneously classified as Unknown: 6.80% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Chinese ##### | ||
|
||
>>> Accuracy on average: 100.00% | ||
|
||
>> Detection of 1000 single words (average length: 1 chars) | ||
Accuracy: 100.00% | ||
Erroneously classified as | ||
|
||
>> Detection of 1000 word pairs (average length: 2 chars) | ||
Accuracy: 100.00% | ||
Erroneously classified as | ||
|
||
>> Detection of 729 sentences (average length: 48 chars) | ||
Accuracy: 100.00% | ||
Erroneously classified as | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Croatian ##### | ||
|
||
>>> Accuracy on average: 67.63% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 40.60% | ||
Erroneously classified as Unknown: 59.40% | ||
|
||
>> Detection of 1000 word pairs (average length: 17 chars) | ||
Accuracy: 65.00% | ||
Erroneously classified as Unknown: 35.00% | ||
|
||
>> Detection of 1000 sentences (average length: 127 chars) | ||
Accuracy: 97.30% | ||
Erroneously classified as Unknown: 2.70% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Czech ##### | ||
|
||
>>> Accuracy on average: 73.70% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 52.10% | ||
Erroneously classified as Unknown: 47.90% | ||
|
||
>> Detection of 1000 word pairs (average length: 16 chars) | ||
Accuracy: 75.30% | ||
Erroneously classified as Unknown: 24.70% | ||
|
||
>> Detection of 1000 sentences (average length: 93 chars) | ||
Accuracy: 93.70% | ||
Erroneously classified as Unknown: 6.30% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Danish ##### | ||
|
||
>>> Accuracy on average: 81.50% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 59.20% | ||
Erroneously classified as Unknown: 40.80% | ||
|
||
>> Detection of 1000 word pairs (average length: 16 chars) | ||
Accuracy: 86.00% | ||
Erroneously classified as Unknown: 14.00% | ||
|
||
>> Detection of 1000 sentences (average length: 112 chars) | ||
Accuracy: 99.30% | ||
Erroneously classified as Unknown: 0.70% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Dutch ##### | ||
|
||
>>> Accuracy on average: 75.23% | ||
|
||
>> Detection of 1000 single words (average length: 9 chars) | ||
Accuracy: 50.10% | ||
Erroneously classified as Unknown: 49.90% | ||
|
||
>> Detection of 1000 word pairs (average length: 17 chars) | ||
Accuracy: 76.40% | ||
Erroneously classified as Unknown: 23.60% | ||
|
||
>> Detection of 1000 sentences (average length: 107 chars) | ||
Accuracy: 99.20% | ||
Erroneously classified as Unknown: 0.80% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### English ##### | ||
|
||
>>> Accuracy on average: 71.63% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 46.20% | ||
Erroneously classified as Unknown: 53.80% | ||
|
||
>> Detection of 1000 word pairs (average length: 16 chars) | ||
Accuracy: 70.30% | ||
Erroneously classified as Unknown: 29.70% | ||
|
||
>> Detection of 1000 sentences (average length: 108 chars) | ||
Accuracy: 98.40% | ||
Erroneously classified as Unknown: 1.60% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Esperanto ##### | ||
|
||
>>> Accuracy on average: 72.77% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 45.90% | ||
Erroneously classified as Unknown: 54.10% | ||
|
||
>> Detection of 1000 word pairs (average length: 16 chars) | ||
Accuracy: 74.50% | ||
Erroneously classified as Unknown: 25.50% | ||
|
||
>> Detection of 1000 sentences (average length: 101 chars) | ||
Accuracy: 97.90% | ||
Erroneously classified as Unknown: 2.10% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Estonian ##### | ||
|
||
>>> Accuracy on average: 78.27% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 56.90% | ||
Erroneously classified as Unknown: 43.10% | ||
|
||
>> Detection of 1000 word pairs (average length: 16 chars) | ||
Accuracy: 79.80% | ||
Erroneously classified as Unknown: 20.20% | ||
|
||
>> Detection of 1000 sentences (average length: 101 chars) | ||
Accuracy: 98.10% | ||
Erroneously classified as Unknown: 1.90% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Finnish ##### | ||
|
||
>>> Accuracy on average: 85.77% | ||
|
||
>> Detection of 1000 single words (average length: 10 chars) | ||
Accuracy: 70.50% | ||
Erroneously classified as Unknown: 29.50% | ||
|
||
>> Detection of 1000 word pairs (average length: 19 chars) | ||
Accuracy: 88.00% | ||
Erroneously classified as Unknown: 12.00% | ||
|
||
>> Detection of 1000 sentences (average length: 103 chars) | ||
Accuracy: 98.80% | ||
Erroneously classified as Unknown: 1.20% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### French ##### | ||
|
||
>>> Accuracy on average: 74.87% | ||
|
||
>> Detection of 1000 single words (average length: 8 chars) | ||
Accuracy: 53.40% | ||
Erroneously classified as Unknown: 46.60% | ||
|
||
>> Detection of 1000 word pairs (average length: 17 chars) | ||
Accuracy: 74.10% | ||
Erroneously classified as Unknown: 25.90% | ||
|
||
>> Detection of 1000 sentences (average length: 112 chars) | ||
Accuracy: 97.10% | ||
Erroneously classified as Unknown: 2.90% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##### Ganda ##### | ||
|
||
>>> Accuracy on average: 84.90% | ||
|
||
>> Detection of 1000 single words (average length: 9 chars) | ||
Accuracy: 67.30% | ||
Erroneously classified as Unknown: 32.70% | ||
|
||
>> Detection of 1000 word pairs (average length: 17 chars) | ||
Accuracy: 87.80% | ||
Erroneously classified as Unknown: 12.20% | ||
|
||
>> Detection of 1000 sentences (average length: 125 chars) | ||
Accuracy: 99.60% | ||
Erroneously classified as Unknown: 0.40% | ||
|
Oops, something went wrong.