Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup Error: Error loading transliteration module #13

Closed
TheLegend29 opened this issue Mar 12, 2017 · 11 comments
Closed

Setup Error: Error loading transliteration module #13

TheLegend29 opened this issue Mar 12, 2017 · 11 comments

Comments

@TheLegend29
Copy link

Trying to install on the following :
Linux HP-Pavilion-15-Notebook-PC 4.8.0-41-generic #44~16.04.1-Ubuntu SMP Fri Mar 3 17:11:16 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

When I try to check ./src/address_parser I get the following error:
Loading models...
ERR Error loading transliteration module, dir=(null)
at libpostal_setup_datadir (libpostal.c:1069) errno: No such file or directory

The following are the steps I've used:

cd ~
rm -rf libpostal
git clone https://github.com/openvenues/libpostal
cd libpostal
./bootstrap.sh
./configure LDFLAGS=-L/usr/lib64 --datadir=$(pwd)/data --prefix=$(realpath $(pwd)) --bindir=$(realpath $(pwd)/bin)
make install
sudo ldconfig
./src/address_parser

Any guidance would be great. I'm a newbie ubuntu user but would love to check this module out for python.

@albarrentine
Copy link
Contributor

albarrentine commented Mar 12, 2017

That sounds like the data files didn't download. Does the dir you specified for --datadir when running configure have enough space? It needs approximately 2.2G free to build the current version of libpostal (that number may change in subsequent updates to the models).

Also, looking at the "dir=(null)" piece, which prints the value of LIBPOSTAL_DATA_DIR on error, it's possible that something went wrong in configure. Did all of those commands run successfully?

Another random thing I noticed: it looks like $(realpath $(pwd)) is used for the other paths but only $(pwd) is used --datadir, if that has anything to do with it (it shouldn't, pwd returns an absolute path already AFAIK but if it doesn't on your system that might be it).

@TheLegend29
Copy link
Author

According to the properties there is 682gb of free space. Is there a better way you'd suggest to check this?
If I try to put realpath in the datadir like in the following :
./configure LDFLAGS=-L/usr/lib64 --datadir=$(realpath $(pwd)) /data --prefix=$(realpath $(pwd)) --bindir=$(realpath $(pwd)/bin)
I get a config fail with the output looking like this:
configure: WARNING: you should use --build, --host, --target
configure: WARNING: invalid host type: /data
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... Invalid configuration /data': machine /data' not recognized
configure: error: /bin/bash ./config.sub /data failed

When running configure using the original command above I get the following output, which seems successful to me?:
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking how to print strings... printf
checking for style of include used by make... GNU
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking for a sed that does not truncate output... /bin/sed
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking how to convert x86_64-pc-linux-gnu file names to x86_64-pc-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-pc-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @file support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for a working dd... /bin/dd
checking how to truncate binary pipes... /bin/dd bs=4096 count=1
checking for mt... mt
checking if mt is a manifest tool... no
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking for gcc option to accept ISO C99... none needed
checking for library containing snappy_compress... -lsnappy
checking for library containing log... -lm
checking for ANSI C header files... (cached) yes
checking whether time.h and sys/time.h may both be included... yes
checking for dirent.h that defines DIR... yes
checking for library containing opendir... none required
checking for stdbool.h that conforms to C99... yes
checking for _Bool... yes
checking fcntl.h usability... yes
checking fcntl.h presence... yes
checking for fcntl.h... yes
checking float.h usability... yes
checking float.h presence... yes
checking for float.h... yes
checking for inttypes.h... (cached) yes
checking limits.h usability... yes
checking limits.h presence... yes
checking for limits.h... yes
checking locale.h usability... yes
checking locale.h presence... yes
checking for locale.h... yes
checking malloc.h usability... yes
checking malloc.h presence... yes
checking for malloc.h... yes
checking for memory.h... (cached) yes
checking stddef.h usability... yes
checking stddef.h presence... yes
checking for stddef.h... yes
checking for stdint.h... (cached) yes
checking for stdlib.h... (cached) yes
checking for string.h... (cached) yes
checking for unistd.h... (cached) yes
checking for inline... inline
checking for int16_t... yes
checking for int32_t... yes
checking for int64_t... yes
checking for int8_t... yes
checking for off_t... yes
checking for size_t... yes
checking for ssize_t... yes
checking for uint16_t... yes
checking for uint32_t... yes
checking for uint64_t... yes
checking for uint8_t... yes
checking for ptrdiff_t... yes
checking for stdlib.h... (cached) yes
checking for unistd.h... (cached) yes
checking for sys/param.h... yes
checking for getpagesize... yes
checking for working mmap... yes
checking for malloc... yes
checking for realloc... yes
checking for getcwd... yes
checking for gettimeofday... yes
checking for memmove... yes
checking for memset... yes
checking for munmap... yes
checking for regcomp... yes
checking for setlocale... yes
checking for sqrt... yes
checking for strdup... yes
checking for strndup... yes
checking for shuf... yes
configure: extra cflags for scanner.c:
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating libpostal.pc
config.status: creating src/Makefile
config.status: creating src/sparkey/Makefile
config.status: creating test/Makefile
config.status: creating config.h
config.status: executing depfiles commands
config.status: executing libtool commands

Lastly, in the make install I noticed a few lines. Are these normal to get?
In file included from collections.h:8:0,
from averaged_perceptron.h:26,
from address_parser.h:49,
from address_parser.c:1:
address_parser.c: In function ‘address_parser_context_fill’:
log/log.h:26:55: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘int64_t {aka long int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
address_parser.c:329:17: note: in expansion of macro ‘log_debug’
log_debug("token i=%lld, null phrase membership\n", i);
^
log/log.h:26:55: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘int64_t {aka long int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
address_parser.c:333:17: note: in expansion of macro ‘log_debug’
log_debug("token i=%lld, phrase membership=%lld\n", i, j);
^
log/log.h:26:55: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 4 has type ‘int64_t {aka long int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
address_parser.c:333:17: note: in expansion of macro ‘log_debug’
log_debug("token i=%lld, phrase membership=%lld\n", i, j);
^
log/log.h:26:55: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘int64_t {aka long int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
address_parser.c:340:9: note: in expansion of macro ‘log_debug’
log_debug("token i=%lld, null phrase membership\n", i);
^
log/log.h:26:55: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘int64_t {aka long int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
address_parser.c:356:17: note: in expansion of macro ‘log_debug’
log_debug("token i=%lld, null geo phrase membership\n", i);
^
log/log.h:26:55: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘int64_t {aka long int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
address_parser.c:361:17: note: in expansion of macro ‘log_debug’
log_debug("token i=%lld, geo phrase membership=%lld\n", i, j);
^
log/log.h:26:55: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 4 has type ‘int64_t {aka long int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
address_parser.c:361:17: note: in expansion of macro ‘log_debug’
log_debug("token i=%lld, geo phrase membership=%lld\n", i, j);
^
log/log.h:26:55: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘int64_t {aka long int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
address_parser.c:367:9: note: in expansion of macro ‘log_debug’
log_debug("token i=%lld, null geo phrase membership\n", i);
^
log/log.h:26:55: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘int64_t {aka long int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
address_parser.c:383:17: note: in expansion of macro ‘log_debug’
log_debug("token i=%lld, null component phrase membership\n", i);
^
log/log.h:26:55: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘int64_t {aka long int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
address_parser.c:388:17: note: in expansion of macro ‘log_debug’
log_debug("token i=%lld, component phrase membership=%lld\n", i, j);
^
log/log.h:26:55: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 4 has type ‘int64_t {aka long int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
address_parser.c:388:17: note: in expansion of macro ‘log_debug’
log_debug("token i=%lld, component phrase membership=%lld\n", i, j);
^
log/log.h:26:55: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘int64_t {aka long int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
address_parser.c:394:9: note: in expansion of macro ‘log_debug’
log_debug("token i=%lld, null component phrase membership\n", i);
gcc -DHAVE_CONFIG_H -I.. -I/usr/local/include -Wfloat-equal -Wpointer-arith -DLIBPOSTAL_DATA_DIR='"/home/omer/libpostal/data/libpostal"' -g -g -O2 -O3 -MT build_numex_table-numex.o -MD -MP -MF .deps/build_numex_table-numex.Tpo -c -o build_numex_table-numex.o test -f 'numex.c' || echo './'numex.c
In file included from collections.h:8:0,
from numex.h:14,
from numex.c:3:
numex.c: In function ‘numex_table_read’:
log/log.h:26:55: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘uint64_t {aka long unsigned int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
numex.c:424:5: note: in expansion of macro ‘log_debug’
log_debug("read num_languages = %llu\n", num_languages);
^
log/log.h:26:55: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘uint64_t {aka long unsigned int}’ [-Wformat=]
#define log_debug(M, ...) do { if (0) fprintf(stderr, "\33[34mDEBUG\33[39m " M " \33[90m at %s (%s:%d) \33[39m\n", ##VA_ARGS, func, _
^
numex.c:446:5: note: in expansion of macro ‘log_debug’
log_debug("read num_rules = %llu\n", num_rules);

@albarrentine
Copy link
Contributor

There's an extra space in --datadir=$(realpath $(pwd)) /data, I think you want: --datadir=$(realpath $(pwd))/data instead.

It looks like everything compiled properly. The main thing to check is: du -h $(realpath $(pwd))/data and make sure that it's 2.2G. If not, try rm-ing the data dir and run make again. Toward the end of the make command there should be a line like ./libpostal_data download all $YOUR_DATA_DIR. If something went wrong when downloading the data files, that's where to look. If all else fails, nuke the entire checkout and start from a fresh clone.

@TheLegend29
Copy link
Author

I think you're onto something with the 2.2G it's stuck on producing the below output before and after. I've also tried the whole process from scratch a few times with the same outcomes (the added realpath doesn't appear to make a difference). Not sure why the 2.2G isn't downloading there's no folder limit or read/write issues associated with the datadir. Sorry for the lengthy back and forth trying to get this to work.
8.0K /home/user/libpostal/data/address_parser
20K /home/user/libpostal/data/libpostal
8.0K /home/user/libpostal/data/geonames
8.0K /home/user/libpostal/data/transliteration
8.0K /home/user/libpostal/data/numex
8.0K /home/user/libpostal/data/geodb
8.0K /home/user/libpostal/data/address_expansions
72K /home/user/libpostal/data

@albarrentine
Copy link
Contributor

Try running ./src/libpostal_data download all $(pwd)/data and see what happens.

@TheLegend29
Copy link
Author

I get the following output -- aren't these the files that should be in data not in source?

Checking for new libpostal data file...
./src/libpostal_data: 108: ./src/libpostal_data: curl: not found
libpostal data file up to date
Checking for new libpostal geodb data file...
./src/libpostal_data: 108: ./src/libpostal_data: curl: not found
libpostal geodb data file up to date
Checking for new libpostal parser data file...
./src/libpostal_data: 108: ./src/libpostal_data: curl: not found
libpostal parser data file up to date
Checking for new libpostal language classifier data file...
./src/libpostal_data: 108: ./src/libpostal_data: curl: not found
libpostal language classifier data file up to date

looks like it's saving in src, looking at the size is this the expected behavior?

12K	./src/murmur
44K	./src/utf8proc/.deps
516K	./src/utf8proc/.libs
7.2M	./src/utf8proc
96K	./src/sparkey/.deps
752K	./src/sparkey/.libs
1.5M	./src/sparkey
20K	./src/cmp/.deps
188K	./src/cmp/.libs
1.5M	./src/cmp
8.0K	./src/log
20K	./src/geohash/.deps
52K	./src/geohash/.libs
320K	./src/geohash
64K	./src/klib
12K	./src/linenoise/.deps
156K	./src/linenoise
1.6M	./src/.deps
42M	./src/.libs
230M	./src/

@albarrentine
Copy link
Contributor

curl is not installed. apt-get install curl, delete the datadir, and run make again.

@TheLegend29
Copy link
Author

oh wow, that was a silly oversight, now it works. That likely means this problem is probably also silly but what would be the reason now when I try to bring it into python I get a similar error. I've done the following:

cd libpostal/scripts
python3 setup.py build_ext --inplace
cd /home/user/.local/lib/python3.5/site-packages/
nosetests postal/tests

I get the same kind of error (dir=null) (both in the nose test and if i try to import in python)

ERR   Error loading transliteration module, dir=(null)
   at libpostal_setup_datadir (libpostal.c:1069) errno: No such file or directory
ERR   Error loading transliteration module, dir=(null)
   at libpostal_setup_datadir (libpostal.c:1069) errno: No such file or directory
EE
======================================================================
ERROR: Failure: SystemError (initialization of _expand raised unreported exception)

@albarrentine
Copy link
Contributor

Those are not the Python bindings, these are. Try: pip3 install postal

@TheLegend29
Copy link
Author

yup so that's what I did before the above. However, when I ran in python I end up with the following error. i thought it was an error like the one we just solved where the libpostal didn't get put on properly. But now I'm confused since it works outside of python.

from postal.expand import expand_address
ERR Error loading transliteration module, dir=(null)
at libpostal_setup_datadir (libpostal.c:1069) errno: No such file or directory
Traceback (most recent call last):
File "", line 1, in
File "/home/omer/.local/lib/python3.5/site-packages/postal/expand.py", line 5, in
from postal import _expand
SystemError: initialization of _expand raised unreported exception

@albarrentine
Copy link
Contributor

It seems like there were some non-standard steps taken during the install, so it may be in a weird half-installed state. Try a clean build now that curl is installed and the data files can be downloaded using the exact steps from the README for Ubuntu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants