-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
non-deterministic segfault #6
Comments
I'll also add that I've tried compiling with both clang v3.4.1 and gcc v4.9. Both exhibit the same problem |
So far I haven't been able to reproduce this on Windows 8.1 or CentOS 6.5 amd64 in more than 1,000 iterations. Can you follow Ian's instructions to see what is at PC=0x40e52e? We need to find out where in the sqlite code the crash happens. Feel free to send me the core dump. |
Here is a full backtrace:
And here is another one I got that is different :/
|
What does It's good that the crash happens in the same location, so we at least know that it's something related to the lookaside memory allocator. The two pointers in question are db->lookaside and pBuf, and I'm guessing that it's the later that is referring to an invalid (non-zero) address. What happens if you build the package with SQLITE_OMIT_LOOKASIDE? |
|
Also FWIW - the post-process used your interface directly. I rewrote it to use database/sql from the go std library with your driver as the backend, and it seems to be stable that way. |
the old segfaulting post-process is now on in my repo |
drh suggests building the package with SQLITE_OMIT_LOOKASIDE and then running the program under valgrind: http://marc.info/?l=sqlite-users&m=140269886427294&w=2 Unfortunately, valgrind doesn't seem to work with Go: https://code.google.com/p/go/issues/detail?id=782 I don't have any good ideas at the moment and I'll be gone for the entire weekend. If you don't figure it out before then, I'll take another look next week. I don't see any reason why switching to the database/sql driver would solve the problem. |
My machine is Linux Mint in VirtualBox, on Windows 7 host: I downloaded rwcarlsen/cyan, commit 5a27279d4a36d094c424662c4296b4364be672b6, of June 5, 2014. I have run it for 3000 iterations, and there was no crash at all. I just wanted to say that unfortunately, I cannot reproduce the problem. |
I have run the program again, with the following flags in sqlite.go:
Again, with 1300 iterations, no crash happened. Like you, I think that the problem occurs in pBuf, that is referring to an invalid address. db->lookaside.pFree is only updated by sqlite3DbMallocRaw() and sqlite3DbFree(). The only way I see to corrupt db->lookaside.pFree is that a slot has been written to after being freed. This overwriting will corrupt the first bytes of the slot, which contain the pointer pNext that points to the next free slot. To check if a slot in the free list has been overwritten, I added these two checking functions in sqlite.c.
I inserted check_lookaside_freelist() in sqlite3DbFree() function, and check_lookaside_slot_is_not_overwritten() in sqlite3DbMallocRaw(), to check that the returned buffer has not been overwritten. In setupLookaside() that initializes the lookaside allocator, I also put this line in the loop that fills the free list with the 500 slots available in the allocator's memory block, to fill the slots with 0xaa bit pattern: I ran again the program, and unfortunately, there is no crash. I don't detect any write-after-free operation :-( I also put runtime.GC() in the cycpost program, as GC may run some terminators in the sqlite driver, but I am still unable to reproduce the issue.
|
@rwcarlsen
You said that with SQLITE_OMIT_LOOKASIDE, it is stable. |
Forget my previous messages, I think the cause is the Stmt finalizer. |
I can reproduce the error. It is a race condition between Stmt finalizer dedicated goroutine and the main goroutine. You can modify sqlite3.c to boost this improbable event.
When you run
the error should occur within 2 seconds, because the finalizers run at the beginning of the program.
I obtain:
See also:
With this option, I obtain no more crash. |
Thanks for looking into this! SQLITE_THREADSAFE=1 will reduce the performance in the common case, so I think removing finalizers would be the better option. I added them just as an extra safety net, but I didn't consider the fact that they could run in parallel with other calls on the same connection object. Are there any good reasons for leaving them in? |
Hi, Maxim !
If you did not, it means that very few people do. It is the proof that finalizers are dangerous beasts. Personnally, I avoid using finalizers, and it seems I am not alone in this case. Personnally, I am using your driver in a long running program, and just modified it a little bit and put a panic in the finalizer: e.g. panic("You have forgotten to close the Stmt object "" + stmt.text + """). So, for the moment, I see only two possibilities:
In the end, finalizers will even hide huge memory consumption issue if a long-running program is e.g. creating a lot of Stmts and forgets to close them, which is not good. If Stmts are created in a faster rate than finalizers close them, the situation is even worse and memory will explode. |
Besides, if the user reads only a few records from a table, and let the Stmt go out of scope, the lock on the database will not be released until the the finalizer runs. |
I've been using your driver to do some post-processing analysis on various sqlite databases I have. The analysis involves many queryies, table creation, insertions, etc. To the best of my knowledge, there is no concurrency in the post-process.
To reproduce:
cp orig-segfault.sqlite segfault.sqlite; cycpost segfault.sqlite
over and over - eventually you get a segfault (sample output below). I usually get a segfault after a couple hundred runs. Since cycpost modifies the database in-place, you need to keep a fresh, un-post-processed copy of the db around.Other info:
Sample failure output:
The text was updated successfully, but these errors were encountered: