Take steps towards thread safety #307

horgh · 2018-01-21T01:39:20Z

This makes libpostal_parse_address() and libpostal_expand_address() thread safe. The main change is to make them no longer mutate any shared memory (e.g., context).

The global loading and destruction calls are still not thread safe. I notice they were mentioned in #34. However, I think leaving them that way would be okay as multithreaded programs could do them at startup (or get an exclusive lock or something). Especially as they are quite expensive operations.

There is still more to do, and it's possible we will have to rework the approach, but I thought I would send this and get some feedback.

Some comments:

This will make the parsing calls more expensive since we have to allocate and free memory we did not before. I have not benchmarked this to see the impact. If we wanted to avoid this, we could possibly add alternate functions and keep the current ones as they are. For example, we could have a libpostal_parse_address() that takes an address_parser_context_t from the caller and uses that.
To be completely thread safe there are two more things that need to be done (possibly more, but this is what I'm aware of):
1. The use of strerror() in the logging calls should be replaced as that is not thread safe. We could possibly use strerror_r() instead.
2. I have not been able to test the averaged_perceptron_tagger_predict() path in the parser. If there are instructions on loading that model, I can do that too. I've commented one spot that makes it not thread safe.
I'm not sure if the Windows build will work with the new test program I added as it uses pthreads. If it fails, we can probably exclude it when building on Windows.
I tested that there were no threading issues using the new program I added. To do that, I compiled with Clang's thread sanitizer:
- export CC=clang-5.0
- export CFLAGS=-fsanitize=thread
- Rebuild everything
- Run the program with multiple threads and a bunch of lookups. I tested using -t 4 -i 1000. As input I used the sample inputs in the README under "examples of normalization". If there were thread safety issues, the sanitizer would print them out.

This tests both parsing and expansion.

This is primarily so we can do things like CFLAGS=-fsanitize=thread. We can't compile src with it and test without it.

I found this was leading to accessing invalid memory and I suspect a segfault on Travis. I cannot see how the changes so far caused this invalid access.

horgh · 2018-01-21T05:04:50Z

Regarding the commit de-optimizing the matrix exp() function. It is weird, but based on my debugging, it was causing invalid access. Valgrind output from the test_crf_context.c test before changing that:

==7662== Invalid read of size 16
==7662==    at 0x412A85: remez9_0_log2_sse (vector_math.h:372)
==7662==    by 0x412A85: double_array_exp (collections.h:48)
==7662==    by 0x412A85: double_matrix_exp (matrix.h:445)
==7662==    by 0x412A85: crf_context_exp_trans (crf_context.c:341)
==7662==    by 0x412A85: test_crf_context (test_crf_context.c:107)
==7662==    by 0x412A85: libpostal_crf_context_tests (test_crf_context.c:266)
==7662==    by 0x401C20: greatest_run_suite (test.c:11)
==7662==    by 0x401943: main (test.c:23)
==7662==  Address 0x952c0f0 is 8 bytes after a block of size 72 alloc'd
==7662==    at 0x4C31E76: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7662==    by 0x4C31F91: posix_memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7662==    by 0x40E3C5: _aligned_malloc (vector.h:15)
==7662==    by 0x40E3C5: double_matrix_new_aligned (matrix.h:445)
==7662==    by 0x40E3C5: crf_context_new (crf_context.c:67)
==7662==    by 0x411D49: test_crf_context (test_crf_context.c:31)
==7662==    by 0x411D49: libpostal_crf_context_tests (test_crf_context.c:266)
==7662==    by 0x401C20: greatest_run_suite (test.c:11)
==7662==    by 0x401943: main (test.c:23)
==7662==
==7662== Invalid write of size 8
==7662==    at 0x412C55: remez9_0_log2_sse (vector_math.h:476)
==7662==    by 0x412C55: double_array_exp (collections.h:48)
==7662==    by 0x412C55: double_matrix_exp (matrix.h:445)
==7662==    by 0x412C55: crf_context_exp_trans (crf_context.c:341)
==7662==    by 0x412C55: test_crf_context (test_crf_context.c:107)
==7662==    by 0x412C55: libpostal_crf_context_tests (test_crf_context.c:266)
==7662==    by 0x401C20: greatest_run_suite (test.c:11)
==7662==    by 0x401943: main (test.c:23)
==7662==  Address 0x952c0e8 is 0 bytes after a block of size 72 alloc'd
==7662==    at 0x4C31E76: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7662==    by 0x4C31F91: posix_memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7662==    by 0x40E3C5: _aligned_malloc (vector.h:15)
==7662==    by 0x40E3C5: double_matrix_new_aligned (matrix.h:445)
==7662==    by 0x40E3C5: crf_context_new (crf_context.c:67)
==7662==    by 0x411D49: test_crf_context (test_crf_context.c:31)
==7662==    by 0x411D49: libpostal_crf_context_tests (test_crf_context.c:266)
==7662==    by 0x401C20: greatest_run_suite (test.c:11)
==7662==    by 0x401943: main (test.c:23)

I can't reproduce it on master which is very suspicious. But I've been staring at the code and I can't see what change could have impacted it.

horgh · 2018-01-21T05:29:46Z

I see what caused it: The change to CFLAGS in test/Makefile.am. Without that change there is no invalid memory access. Looks like it is because we use the unoptimized version:

#else
#define VECTOR_INIT_NUMERIC_DOUBLE VECTOR_INIT_NUMERIC_FLOAT
#endif

Possibly the optimized version expects a certain multiple of values in the matrix. In that case, we could re-add the optimization if we update the test to respect that. We may want to check other uses to ensure they respect it too though.

horgh added 14 commits January 20, 2018 13:31

Comment a couple improvements

3f639aa

Add a program to test multithreaded use

c1028e1

This tests both parsing and expansion.

Respect CFLAGS in tests

4236c72

This is primarily so we can do things like CFLAGS=-fsanitize=thread. We can't compile src with it and test without it.

Make error message refer to correct model

d66bf10

Do not share address parser context between lookups

f5bed24

Make crf model use thread safe

287d45a

Comment a thread safety issue

2d46547

Add comment about use after free / double free

5442abf

Comment a couple more improvements in the parser

1c29ce4

Add comment about a thread safety issue

8a61600

Update trainer to be aware of fields changed for thread safety

0b5c619

Build thread_test only if json-c is available

865fce4

Add check for json-c to windows configure.ac

890a911

Deoptimize matrix double exp() function

3935a2a

I found this was leading to accessing invalid memory and I suspect a segfault on Travis. I cannot see how the changes so far caused this invalid access.

NickCrews mentioned this pull request Dec 13, 2023

Notes on global context Maxxen/duckdb-postal#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take steps towards thread safety #307

Take steps towards thread safety #307

horgh commented Jan 21, 2018

horgh commented Jan 21, 2018

horgh commented Jan 21, 2018

Take steps towards thread safety #307

Are you sure you want to change the base?

Take steps towards thread safety #307

Conversation

horgh commented Jan 21, 2018

horgh commented Jan 21, 2018

horgh commented Jan 21, 2018