Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

None of the traineddata works for me #14

Open
arrrrny opened this issue Oct 17, 2019 · 10 comments
Open

None of the traineddata works for me #14

arrrrny opened this issue Oct 17, 2019 · 10 comments

Comments

@arrrrny
Copy link

arrrrny commented Oct 17, 2019

I am using traineddatas here and all of the crashes. I am using https://github.com/adaptech-cz/Tesseract4Android

I can use other custom trained datas without any issue. Am I missing a setting or something?
Thanks!

E/Tesseract(native)( 6126): Could not initialize Tesseract API with language=engrestricted_best!
F/libc ( 6126): Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x8 in tid 6164 (Thread-2), pid 6126 (act_ocr_example)


Build fingerprint: 'google/sdk_gphone_x86/generic_x86:9/PSR1.180720.093/5456446:userdebug/dev-keys'
Revision: '0'
ABI: 'x86'
pid: 6126, tid: 6164, name: Thread-2 >>> io.paratoner.tesseract_ocr_example <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x8
Cause: null pointer dereference
eax 00000000 ebx c7707f14 ecx c8475870 edx 00000000
edi c85bb740 esi c8454700
ebp c72ffa38 esp c72ff910 eip c73da039
backtrace:
#00 pc 000d9039 /data/app/io.paratoner.tesseract_ocr_example-tcEmsNHnRrF98KxA7MlFKQ==/lib/x86/libtesseract.so (tesseract::Tesseract::recog_all_words(PAGE_RES*, ETEXT_DESC*, TBOX const*, char const*, int)+217)
#1 pc 000bca80 /data/app/io.paratoner.tesseract_ocr_example-tcEmsNHnRrF98KxA7MlFKQ==/lib/x86/libtesseract.so (tesseract::TessBaseAPI::Recognize(ETEXT_DESC*)+1152)
#2 pc 000bb0fc /data/app/io.paratoner.tesseract_ocr_example-tcEmsNHnRrF98KxA7MlFKQ==/lib/x86/libtesseract.so (tesseract::TessBaseAPI::GetUTF8Text()+76)
#3 pc 002d254a /data/app/io.paratoner.tesseract_ocr_example-tcEmsNHnRrF98KxA7MlFKQ==/lib/x86/libtesseract.so (Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetUTF8Text+74)
#4 pc 005f6b97 /system/lib/libart.so (art_quick_generic_jni_trampoline+71)
#5 pc 005f0b82 /system/lib/libart.so (art_quick_invoke_stub+338)
#6 pc 000a30ce /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+222)
#7 pc 0029bca2 /system/lib/libart.so (art::interpreter::ArtInterpreterToCompiledCodeBridge(art::Thread*, art::ArtMethod*, art::ShadowFrame*, unsigned short, art::JValue*)+338)
#8 pc 00293e48 /system/lib/libart.so (bool art::interpreter::DoCall<false, false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, art::JValue*)+1048)
#9 pc 005bda66 /system/lib/libart.so (MterpInvokeDirect+342)
#10 pc 005e2e21 /system/lib/libart.so (ExecuteMterpImpl+14497)
#11 pc 00015814 /dev/ashmem/dalvik-classes.dex extracted in memory from /data/app/io.paratoner.tesseract_ocr_example-tcEmsNHnRrF98KxA7MlFKQ==/base.apk (deleted) (com.googlecode.tesseract.android.TessBaseAPI.getUTF8Text+12)
#12 pc 00266216 /system/lib/libart.so (_ZN3art11interpreterL7ExecuteEPNS_6ThreadERKNS_20CodeItemDataAccessorERNS_11ShadowFrameENS_6JValueEb.llvm.2093054539+598)
#13 pc 0026c79c /system/lib/libart.so (art::interpreter::ArtInterpreterToInterpreterBridge(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame*, art::JValue*)+220)
#14 pc 00293e2b /system/lib/libart.so (bool art::interpreter::DoCall<false, false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, art::JValue*)+1019)
#15 pc 005bc493 /system/lib/libart.so (MterpInvokeVirtual+691)
#16 pc 005e2d21 /system/lib/libart.so (ExecuteMterpImpl+14241)
#17 pc 000301a2 /dev/ashmem/dalvik-classes.dex extracted in memory from /data/app/io.paratoner.tesseract_ocr_example-tcEmsNHnRrF98KxA7MlFKQ==/base.apk (deleted) (io.paratoner.tesseract_ocr.TesseractOcrPlugin$1.run+22)
#18 pc 00266216 /system/lib/libart.so (_ZN3art11interpreterL7ExecuteEPNS_6ThreadERKNS_20CodeItemDataAccessorERNS_11ShadowFrameENS_6JValueEb.llvm.2093054539+598)
#19 pc 0026c79c /system/lib/libart.so (art::interpreter::ArtInterpreterToInterpreterBridge(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame*, art::JValue*)+220)
#20 pc 00293e2b /system/lib/libart.so (bool art::interpreter::DoCall<false, false>(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, art::JValue*)+1019)
#21 pc 005bd574 /system/lib/libart.so (MterpInvokeInterface+1444)
#22 pc 005e2f21 /system/lib/libart.so (ExecuteMterpImpl+14753)
#23 pc 000ca806 /system/framework/boot.vdex (java.lang.Thread.run+12)
#24 pc 00266216 /system/lib/libart.so (_ZN3art11interpreterL7ExecuteEPNS_6ThreadERKNS_20CodeItemDataAccessorERNS_11ShadowFrameENS_6JValueEb.llvm.2093054539+598)
#25 pc 0026c68e /system/lib/libart.so (art::interpreter::EnterInterpreterFromEntryPoint(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame*)+126)
#26 pc 005a953d /system/lib/libart.so (artQuickToInterpreterBridge+1277)
#27 pc 005f6c6d /system/lib/libart.so (art_quick_to_interpreter_bridge+77)
#28 pc 005f0b82 /system/lib/libart.so (art_quick_invoke_stub+338)
#29 pc 000a30ce /system/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+222)
#30 pc 004d3349 /system/lib/libart.so (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+89)
#31 pc 004d45f7 /system/lib/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, jvalue*)+471)
#32 pc 0050958c /system/lib/libart.so (art::Thread::CreateCallback(void*)+1484)
#33 pc 0008f065 /system/lib/libc.so (__pthread_start(void*)+53)
#34 pc 0002485b /system/lib/libc.so (__start_thread+75)
Lost connection to device.
Exited (sigterm)

@Shreeshrii
Copy link
Owner

Shreeshrii commented Oct 17, 2019 via email

@arrrrny
Copy link
Author

arrrrny commented Oct 18, 2019

I am using Tesseract 4 with oem 1,
https://github.com/tesseract-ocr/tessdata_fast/eng.traineddata
https://github.com/anuraghkp1/tessdata/blob/master/financial.traineddata

both work fine, but none of the traineddatas in your library.

@Shreeshrii
Copy link
Owner

Shreeshrii commented Oct 18, 2019 via email

@arrrrny
Copy link
Author

arrrrny commented Oct 18, 2019

I am interested in digits_comma.traineddata

@Shreeshrii
Copy link
Owner

Shreeshrii commented Oct 19, 2019 via email

@Shreeshrii
Copy link
Owner

Shreeshrii commented Oct 19, 2019

 tesseract -v
tesseract 5.0.0-alpha-479-g247c
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.4.4 : libopenjp2 2.3.0
ubuntu@tesseract-ocr:~/TEST$ wget https://github.com/Shreeshrii/tessdata_shreetest/raw/master/digits_comma.traineddata
--2019-10-19 02:18:38--  https://github.com/Shreeshrii/tessdata_shreetest/raw/master/digits_comma.traineddata
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/Shreeshrii/tessdata_shreetest/master/digits_comma.traineddata [following]
--2019-10-19 02:18:38--  https://raw.githubusercontent.com/Shreeshrii/tessdata_shreetest/master/digits_comma.traineddata
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.52.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.52.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11297390 (11M) [application/octet-stream]
Saving to: ‘digits_comma.traineddata’

digits_comma.traineddata                          100%[==========================================================================================================>]  10.77M  67.9MB/s    in 0.2s

2019-10-19 02:18:39 (67.9 MB/s) - ‘digits_comma.traineddata’ saved [11297390/11297390]
ubuntu@tesseract-ocr:~/TEST$ tesseract num-comma.png - --oem 1 --psm 6 -l digits_comma --tessdata-dir ./
1092000 1,092,000 001,092,000 1,092,000.00
924000 924,000 000,924,000 924,000.00

@vadash
Copy link

vadash commented Dec 9, 2019

Same problem (DotProductAVX can't be used on Android). I am using fresh Emgucv 4.1 x64 with net 4.8 and AVX CPU.
https://github.com/tesseract-ocr/tessdata_fast works fine
Had to train my own data ;)

@Shreeshrii
Copy link
Owner

Shreeshrii commented Dec 10, 2019 via email

@arrrrny
Copy link
Author

arrrrny commented Dec 10, 2019

@vadash can you share how you did it? I mean training.

@vadash
Copy link

vadash commented Dec 10, 2019

@vadash can you share how you did it? I mean training.

I am using 2 specific traineddata and 1 generic. For specific generate a font (or pick one close to) -> upload to http://ocr7.com/ -> use it. Its pretty simple. For universal (more slow) one you will need linux machine (virtual is fine). Follow this guide https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants