Skip to content

Datasets

Fredrik Karlström edited this page Aug 31, 2014 · 5 revisions

Datasets

TODO: Summarize the characteristics of the different sets.

SVG-based dataset that provides us with means to analyze online strokes, build our clusters, and train the routing network.

Stroke analysis

Type Percentage Count
A 33.20% 26,160
B 1.47% 1,159
C 0.00% 0
D 0.00% 0
E 0.31% 245
F 12.72% 10,018
G 29.98% 23,616
H 22.32% 17,587
Total strokes 78,785

What's with this inconsistency? Should be 78,983..? Check it.

Nakagawa's TUAT Nakayosi/Kuchibue

http://www.tuat.ac.jp/~nakagawa/database/index.shtml

"A collection of images of about 1.2 million hand-written and machine-printed numerals, symbols, Latin alphabets and Japanese characters and compiled in 9 datasets (ETL-1 to ETL-9)."


http://www.gavo.t.u-tokyo.ac.jp/~qiao/database.html


Online Handwriting Database Online Western Handwritting Unipen Database http://hwr.nici.kun.nl/unipen/

5 million characters, from more than 2200 writers. (Large variance.) (about 110USD) IRESTE IRONOFF Online/Offline Handwriting database. It includes 4 086 isolated digits, 10 685 isolated lower case letters, 10 679 isolated upper case letters + 410 EURO signs and 31 346 isolated words from a 197 word lexicon (French: 28 657 and English: 2 689). Please refer to the word file for obtaining. http://www.infres.enst.fr/~elc/GRCE/news/IRONOFF.doc TUAT Nakagawa Lab Online Handwritting Database http://www.tuat.ac.jp/~nakagawa/ipdb/

Online Kanji (Chinese) Handwritting Database. Need application. Online Handwritting Digits in UCI machine learning database http://www.ics.uci.edu/~mlearn/MLSummary.html

7494 training cases, 3498 test cases Avaliable at ftp://ftp.ics.uci.edu/pub/machine-learning-databases/pendigits UJI Pen Characters Data Set (for UJIpenchars) http://archive.ics.uci.edu/ml/datasets/UJI+Pen+Characters and UJIpenchars2 http://archive.ics.uci.edu/ml/datasets/UJI+Pen+Characters+(Version+2)

Clone this wiki locally