Skip to content

Commit

Permalink
Use unique and most common ngrams as absolute confidence metric (#235)
Browse files Browse the repository at this point in the history
  • Loading branch information
pemistahl committed Oct 4, 2024
1 parent ef28e8b commit 4858cd2
Show file tree
Hide file tree
Showing 908 changed files with 3,663 additions and 1,423 deletions.
76 changes: 76 additions & 0 deletions accuracy-reports/average-accuracy-values.csv

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-afrikaans-detector/Afrikaans.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Afrikaans #####

>>> Accuracy on average: 72.97%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 45.90%
Erroneously classified as Unknown: 54.10%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 74.10%
Erroneously classified as Unknown: 25.90%

>> Detection of 1000 sentences (average length: 102 chars)
Accuracy: 98.90%
Erroneously classified as Unknown: 1.10%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-albanian-detector/Albanian.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Albanian #####

>>> Accuracy on average: 81.40%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 57.30%
Erroneously classified as Unknown: 42.70%

>> Detection of 1000 word pairs (average length: 15 chars)
Accuracy: 87.10%
Erroneously classified as Unknown: 12.90%

>> Detection of 1000 sentences (average length: 118 chars)
Accuracy: 99.80%
Erroneously classified as Unknown: 0.20%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-arabic-detector/Arabic.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Arabic #####

>>> Accuracy on average: 94.77%

>> Detection of 1000 single words (average length: 6 chars)
Accuracy: 86.40%
Erroneously classified as Unknown: 13.60%

>> Detection of 1000 word pairs (average length: 14 chars)
Accuracy: 98.10%
Erroneously classified as Unknown: 1.90%

>> Detection of 1000 sentences (average length: 89 chars)
Accuracy: 99.80%
Erroneously classified as Unknown: 0.20%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-armenian-detector/Armenian.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Armenian #####

>>> Accuracy on average: 100.00%

>> Detection of 1000 single words (average length: 9 chars)
Accuracy: 100.00%
Erroneously classified as

>> Detection of 1000 word pairs (average length: 18 chars)
Accuracy: 100.00%
Erroneously classified as

>> Detection of 1000 sentences (average length: 122 chars)
Accuracy: 100.00%
Erroneously classified as

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-azerbaijani-detector/Azerbaijani.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Azerbaijani #####

>>> Accuracy on average: 89.87%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 76.10%
Erroneously classified as Unknown: 23.90%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 93.70%
Erroneously classified as Unknown: 6.30%

>> Detection of 1000 sentences (average length: 107 chars)
Accuracy: 99.80%
Erroneously classified as Unknown: 0.20%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-basque-detector/Basque.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Basque #####

>>> Accuracy on average: 77.43%

>> Detection of 1000 single words (average length: 9 chars)
Accuracy: 53.30%
Erroneously classified as Unknown: 46.70%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 79.10%
Erroneously classified as Unknown: 20.90%

>> Detection of 1000 sentences (average length: 102 chars)
Accuracy: 99.90%
Erroneously classified as Unknown: 0.10%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-belarusian-detector/Belarusian.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Belarusian #####

>>> Accuracy on average: 92.60%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 81.70%
Erroneously classified as Unknown: 18.30%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 96.50%
Erroneously classified as Unknown: 3.50%

>> Detection of 1000 sentences (average length: 105 chars)
Accuracy: 99.60%
Erroneously classified as Unknown: 0.40%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-bengali-detector/Bengali.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Bengali #####

>>> Accuracy on average: 100.00%

>> Detection of 1000 single words (average length: 7 chars)
Accuracy: 100.00%
Erroneously classified as

>> Detection of 1000 word pairs (average length: 15 chars)
Accuracy: 100.00%
Erroneously classified as

>> Detection of 1000 sentences (average length: 87 chars)
Accuracy: 100.00%
Erroneously classified as

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-bokmal-detector/Bokmal.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Bokmal #####

>>> Accuracy on average: 77.27%

>> Detection of 1000 single words (average length: 9 chars)
Accuracy: 53.40%
Erroneously classified as Unknown: 46.60%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 79.90%
Erroneously classified as Unknown: 20.10%

>> Detection of 1000 sentences (average length: 98 chars)
Accuracy: 98.50%
Erroneously classified as Unknown: 1.50%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-bosnian-detector/Bosnian.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Bosnian #####

>>> Accuracy on average: 64.10%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 36.20%
Erroneously classified as Unknown: 63.80%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 61.20%
Erroneously classified as Unknown: 38.80%

>> Detection of 1000 sentences (average length: 105 chars)
Accuracy: 94.90%
Erroneously classified as Unknown: 5.10%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-bulgarian-detector/Bulgarian.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Bulgarian #####

>>> Accuracy on average: 81.53%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 60.90%
Erroneously classified as Unknown: 39.10%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 85.30%
Erroneously classified as Unknown: 14.70%

>> Detection of 1000 sentences (average length: 89 chars)
Accuracy: 98.40%
Erroneously classified as Unknown: 1.60%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-catalan-detector/Catalan.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Catalan #####

>>> Accuracy on average: 72.20%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 49.20%
Erroneously classified as Unknown: 50.80%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 74.20%
Erroneously classified as Unknown: 25.80%

>> Detection of 1000 sentences (average length: 103 chars)
Accuracy: 93.20%
Erroneously classified as Unknown: 6.80%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-chinese-detector/Chinese.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Chinese #####

>>> Accuracy on average: 100.00%

>> Detection of 1000 single words (average length: 1 chars)
Accuracy: 100.00%
Erroneously classified as

>> Detection of 1000 word pairs (average length: 2 chars)
Accuracy: 100.00%
Erroneously classified as

>> Detection of 729 sentences (average length: 48 chars)
Accuracy: 100.00%
Erroneously classified as

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-croatian-detector/Croatian.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Croatian #####

>>> Accuracy on average: 67.63%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 40.60%
Erroneously classified as Unknown: 59.40%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 65.00%
Erroneously classified as Unknown: 35.00%

>> Detection of 1000 sentences (average length: 127 chars)
Accuracy: 97.30%
Erroneously classified as Unknown: 2.70%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-czech-detector/Czech.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Czech #####

>>> Accuracy on average: 73.70%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 52.10%
Erroneously classified as Unknown: 47.90%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 75.30%
Erroneously classified as Unknown: 24.70%

>> Detection of 1000 sentences (average length: 93 chars)
Accuracy: 93.70%
Erroneously classified as Unknown: 6.30%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-danish-detector/Danish.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Danish #####

>>> Accuracy on average: 81.50%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 59.20%
Erroneously classified as Unknown: 40.80%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 86.00%
Erroneously classified as Unknown: 14.00%

>> Detection of 1000 sentences (average length: 112 chars)
Accuracy: 99.30%
Erroneously classified as Unknown: 0.70%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-dutch-detector/Dutch.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Dutch #####

>>> Accuracy on average: 75.23%

>> Detection of 1000 single words (average length: 9 chars)
Accuracy: 50.10%
Erroneously classified as Unknown: 49.90%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 76.40%
Erroneously classified as Unknown: 23.60%

>> Detection of 1000 sentences (average length: 107 chars)
Accuracy: 99.20%
Erroneously classified as Unknown: 0.80%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-english-detector/English.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### English #####

>>> Accuracy on average: 71.63%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 46.20%
Erroneously classified as Unknown: 53.80%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 70.30%
Erroneously classified as Unknown: 29.70%

>> Detection of 1000 sentences (average length: 108 chars)
Accuracy: 98.40%
Erroneously classified as Unknown: 1.60%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-esperanto-detector/Esperanto.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Esperanto #####

>>> Accuracy on average: 72.77%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 45.90%
Erroneously classified as Unknown: 54.10%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 74.50%
Erroneously classified as Unknown: 25.50%

>> Detection of 1000 sentences (average length: 101 chars)
Accuracy: 97.90%
Erroneously classified as Unknown: 2.10%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-estonian-detector/Estonian.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Estonian #####

>>> Accuracy on average: 78.27%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 56.90%
Erroneously classified as Unknown: 43.10%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 79.80%
Erroneously classified as Unknown: 20.20%

>> Detection of 1000 sentences (average length: 101 chars)
Accuracy: 98.10%
Erroneously classified as Unknown: 1.90%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-finnish-detector/Finnish.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Finnish #####

>>> Accuracy on average: 85.77%

>> Detection of 1000 single words (average length: 10 chars)
Accuracy: 70.50%
Erroneously classified as Unknown: 29.50%

>> Detection of 1000 word pairs (average length: 19 chars)
Accuracy: 88.00%
Erroneously classified as Unknown: 12.00%

>> Detection of 1000 sentences (average length: 103 chars)
Accuracy: 98.80%
Erroneously classified as Unknown: 1.20%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-french-detector/French.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### French #####

>>> Accuracy on average: 74.87%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 53.40%
Erroneously classified as Unknown: 46.60%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 74.10%
Erroneously classified as Unknown: 25.90%

>> Detection of 1000 sentences (average length: 112 chars)
Accuracy: 97.10%
Erroneously classified as Unknown: 2.90%

16 changes: 16 additions & 0 deletions accuracy-reports/lingua-ganda-detector/Ganda.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Ganda #####

>>> Accuracy on average: 84.90%

>> Detection of 1000 single words (average length: 9 chars)
Accuracy: 67.30%
Erroneously classified as Unknown: 32.70%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 87.80%
Erroneously classified as Unknown: 12.20%

>> Detection of 1000 sentences (average length: 125 chars)
Accuracy: 99.60%
Erroneously classified as Unknown: 0.40%

Loading

0 comments on commit 4858cd2

Please sign in to comment.