-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk Fetch : Low performance SQL are generated for Objects that are having lists of other objects #52
Comments
You mean BULK FETCH, that was introduced in DN 4.x, generates SQL to attempt to load objects now, whereas in earlier versions it simply ignored the users fetch plan and you had lazy loading always. If you think the SQL could/should be improved then you can kindly go to the code and contribute improvements. Here is the starting point |
Thank you for your fast response Andy. I will have a look on how this can be done better! |
No comment on how this can be improved, nor a test that demonstrates the problem, so closing. Feel free to comment here and it could be reopened, or raise a new issue with some specific improvement and pull request that provides it. |
The way to fix this is choosing JOIN over EXISTS (SELECT ..FROM). |
Actually not totally correct; the OP is talking about loading objects into memory, NOT the actual SQL invoked. No demonstration of this was presented. A comparison of different methods for bulk-fetching is available from the authors of EclipseLink, https://java-persistence-performance.blogspot.co.uk/2010/08/batch-fetching-optimizing-object-graph.html, and while JOIN actually comes out best with a simple query, it comes out poorest when the query it is utilised with becomes complex. Clearly if someone is particularly interested in this feature for their projects they can contribute time to provide the JOIN bulk-fetch implementation, as well as investigate what this issue is actually about. |
Unfortunately, I am a MySQL performance engineer and not a developer. Therefore, I am only able to analyse the SQL thats hitting the database and make recommendations on how to fix them. I can tell you that the vast majority of cases would benefit (sometimes greatly) from JOIN over EXISTS. In the event where you have many JOINs, this would be a more complicated issue that may be solved with better indexes, index hints... but it is more of a specialist issue at that point. |
What would be great to know if there is a queryHint or some annotation where we can help datanucleus decide whether to use a JOIN or an EXISTS and then we can see for ourselves if there is a performance improvement. |
As my comment above said, JOIN is not yet implemented; needs resource to do it and not a priority for me. So if your project/company needs it then it is there to do, and any comments relating to it should be put against the correct issue ... #171 and the query extension is documented at http://www.datanucleus.org/products/accessplatform_5_0/jdo/query.html#FetchPlan as datanucleus.rdbms.query.multivaluedFetch with values of "exists" and "none" currently. |
Description
For one to many relationship, after migrating to datanucleus 4 the JDO queries are having a very low performance. Datanucelus brings all sub-objects in memory and for tables with millions of entries this takes the CPU at 100%.
Model example
When fetching the activities without any filter all field changes are taken into the memory.
Generated SQL that is causing problems:
Note
The problem is present for defined fetch groups as well, not only for defaultFetchGroup = "true".
Used Workaround
Removed defaultFetchGroup = "true" and made lazy programatic fetching for this list. In this case filtered by parent key queries were generated when the field change items were fetched.
The text was updated successfully, but these errors were encountered: