Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock on parallel call to spatial.addNodes #355

Open
cormander opened this issue Feb 23, 2019 · 2 comments
Open

Deadlock on parallel call to spatial.addNodes #355

cormander opened this issue Feb 23, 2019 · 2 comments

Comments

@cormander
Copy link

If two separate queries that end up doing an addNodes call, I get this:

Neo4jError: Failed to invoke procedure spatial.addNodes: Caused by: org.neo4j.kernel.DeadlockDetectedException: ForsetiClient[1] can't acquire ExclusiveLock{owner=ForsetiClient[2]} on NODE(200100), because holders of that lock are waiting for ForsetiClient[1]. Wait list:ExclusiveLock[Client[2] waits for [1]]

That's the node that has the RTREE_METADATA to the layer node:

neo4j> match (a) where ID(a) = 200100 return a;
+----------------------------------------------------------+
| a                                                        |
+----------------------------------------------------------+
| ({maxNodeReferences: 100, totalGeometryCount: 47045060}) |
+----------------------------------------------------------+

I altered the query to hold a lock the spatial_root ReferenceNode;

match (a:ReferenceNode {name:"spatial_root"}) with collect(a) as lock call apoc.lock.nodes(lock) ... call spatial.addNodes ...

I see the deadlock exception much less as a result, but still see it sometimes. Parallel processing is important for doing very large imports into the graph.

Any thoughts? Thanks,

@craigtaverner
Copy link
Contributor

The original spatial library was written within the context of low concurrency embedded applications. This means that several parts including the RTree are not thread safe. It is not recommended to run parallel bulk imports into the RTree.

The particular issue you are seeing is likely related to the way the total counts are maintained, which is not a good design and something we would like to fix, but even once fixed, the overall lack of thread safety in the RTree will remain and the risks with parallel imports remains, and would need to be addressed.

If you are only importing Point data, you could use a different index, hilbert curve or geohash over lucene. However lucene is known to perform badly for concurrent reads and writes, so you could face a different set of performance problems, depending on your usage scenario. If you work with points, the best option by far would be to use Neo4j's built-in spatial index only, and avoid this libraries indexing.

If you have points in one layer and complex geometries like polygons in another, you could actually use the native Neo4j point index for the points, and the spatial library for the polygons. The main consequence would be that you would have two quite different spatial models in place, but it could be an option to avoid the concurrency problems if the high volume data are the points.

@cormander
Copy link
Author

Hi Craig!

I do already use the native point index, and also the brand new NativePointEncoder to reference them with complex polygons.

So what I’m hearing is, parallel execution is unsupported and can’t be?

Any thoughts on why things didn’t get complete solved by holding the apoc lock in the spatial root? Perhaps that doesn’t behave quite like I expect?

Perhaps related;

When I moved my Neo4j storage from a traditional HDD to a M.2 SSD (so the write speed increased by a factor of about 10x) I noticed the startup on my app on a fresh database started to sometimes fail on the second call to spatial.addLayerWithEncoder. They happen one after the other, not at the same time, and my “solution” was to add a sleep of one second in between them. What happened after the error was there was more than one ReferenceNode with name “spatial_root”.

Perhaps there’s something not holding a file system lock properly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants