Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full support for fetch size in JPA #696

Open
jborgers opened this issue Jan 14, 2025 · 21 comments
Open

Add full support for fetch size in JPA #696

jborgers opened this issue Jan 14, 2025 · 21 comments

Comments

@jborgers
Copy link

jborgers commented Jan 14, 2025

As a Java performance tuner, I like to have full fetch size support in JPA, just like Hibernate API offers with:
Query setFetchSize(int fetchSize)

Obviously, for thousands of rows, fetching all in one round trip in stead of 10 at a time is much faster.

The current way with a hint is lousy and difficult to use. Gavin agrees that is sucks;-)

@jborgers
Copy link
Author

jborgers commented Jan 14, 2025

In my Java performance trainings, I explain the working of jdbc fetch size. And people then ask: why is it so difficult in JPA? And I don't have an answer..

@gavinking
Copy link
Contributor

Correct me if I'm wrong, but is it the case that essentially every JDBC driver except for Oracle defaults to retrieving all rows (which is typically the right thing for JPA), but that Oracle has an absurdly low default fetch size of 10?

@gavinking
Copy link
Contributor

Oh, from my experiments it looks like SQL Server defaults to 128.

@gavinking
Copy link
Contributor

And people then ask: why is it so difficult in JPA?

So I guess my question in return is: why doesn't it work to just set this globally using hibernate.jdbc.fetch_size or whatever the equivalent thing in EclipseLink is? I guess I would just set this once to a very large number and then I would not usually need to change it at the level of an individual Query.

@jborgers
Copy link
Author

jborgers commented Jan 15, 2025

Good question. The thing is, in case you don't need all the result rows, you are over-allocating memory and over-fetching rows. And with that setting, you do that for all the queries in your app/service. Say, you do pagination by the user of 25 rows on a page, this can be implemented in Hibernate nicely and efficiently with:

Query q = session.getNamedQuery("yourQuery");
 q.setFirstResult(getOffset());
 q.setMaxResults(getResultsPerPage());
 q.setFetchSize(getResultsPerPage());
 return q.list();

This will be quick. In case of thousands of rows, and a very large fetch size, the first page will take much longer to load and unnecessarily with wasteful processing, in case the user just chooses to see one page.

We have documented that here: too-many-returned-rows-or-roundtrips (feedback welcome)

@gavinking
Copy link
Contributor

gavinking commented Jan 15, 2025

The thing is, in case you don't need all the result rows, you are over-allocating memory and over-fetching rows.

Well, no, that's not right.

  1. JPA fully-instantiates the a List with the complete results of the query anyway. Setting a smaller JDBC fetch size isn't going to change the size of the result list. It's only going to increase the number of round trips. It can only make things worse, never better.
  2. In JPA we control the size of the result list using setMaxResults(). We don't use the JDBC fetch size for that.

Essentially, in any multi-user system, we almost always want to avoid leaving resources open on the database side after we return from a round trip.

This will be quick. In case of thousands of rows, and a very large fetch size, the first page will take much longer to load and unnecessarily with wasteful processing, in case the user just chooses to see one page.

Nononononononnonono that's not how pagination works at all!

We use offset and limit or fetch. We don't leave a cursor open!

@gavinking
Copy link
Contributor

feedback welcome

When you read rows from a ResultSet, under the hood JDBC will fetch 10 rows at a time

AFAICT, this is only true for Oracle, FTR.

Be aware! that setFetchSize and setMaxResults are still needed because internally Hibernate calls q.list() without these optimizations. Adding these optimizes for the only valid number of results: 1 and it protects against memory overload of wrong queries while still allowing the exception to be thrown with maxResults.

Nonononono that's just not right.

  1. If you call uniqueResult() or getSingleResult(), and there is more than one object returned, that would be an error, and Hibernate would throw an exception, and you don't need to optimize that case, you need to fix your broken query!
  2. Worse, if your root entity has collections that you're eagerly fetching using join fetch or whatever, setting the JDBC fetch size to 2 will make the query much slower, not faster!

You do not, ever need to set a fetch size when calling uniqueResult() or getSingleResult(), and if you do you can only possibly harm performance, never improve it.

However, setting the fetch size for each query individually is the better solution

No, that's bad advice except in very special cases.

The only case I can think of where this advice is correct is if you're using ScrollableResults or getResultStream(), but my strong advice is to steer well away from those unless you really know what you're doing.

@gavinking
Copy link
Contributor

Look, if were so important to futz about directly with the fetch size for every query, there's no way that JPA would have made it through 20 years of existence without us adding it and without anyone even opening a feature request for it until now.

I'm not saying "no" to the feature, because, well, Hibernate has it obviously, and even though it's not much used, I'm sure the need for it arises occasionally. (Apparently we use it in Hibernate Search for batch processing entities, for example.)

But it sounds like all you need is to set hibernate.jdbc.fetch_size to some sensible number (much bigger than 10).

@jborgers
Copy link
Author

jborgers commented Jan 15, 2025

Thanks for the feedback! Clarifying, I will update the documentation.

For uniqueResult() or getSingleResult() we had a case of a broken query which caused an out of memory error because of the q.list() call, and we didn't get to see the hibernate exception. This is something we don't like in production. But I agree that setting fetch size is also not a proper solution.

Wouldn't the implementation be improved by using setMaxResults()?

@gavinking
Copy link
Contributor

gavinking commented Jan 15, 2025

We had a case of a broken query which caused an out of memory error because of the q.list() call, and we didn't get to see the hibernate exception

It would have to be a pretty massive result set before you would get an OOME!

And how is setting the JDBC fetch size going to help with that? Hibernate is still going to eagerly read all the rows from the database, because Hibernate doesn't know until after it's materialized the whole object graph that you have more than one root entity in it.

Wouldn't the implementation be improved by using setMaxResults()?

That would break any query which fetches a collection.

In principle we could append an unnecessary limit clause if the query doesn't fetch a collection, even though you've just told us that the query returns one root entity. But, well, no. Yuck, Let's not do that. :-)

If you want to call setMaxResults(1), you can, I can't stop you, but it's much better to test your queries properly at development time instead of letting them do arbitrary things which might blow up in production.

@gavinking
Copy link
Contributor

Look, honestly, it really looks to me like you're overthinking this.

People get into trouble when they think they can do something better/different to what has already been working perfectly well in millions of programs for more than two decades.

@jborgers
Copy link
Author

jborgers commented Jan 15, 2025

Okay, thanks for your remarks.
So, basically you are saying:

  • Setting fetch size makes sense especially for Oracle because it's default is so ridiculously low: 10. A more sensible value would be 128 (Sql Server) or higher.
  • Since typically (in a multi-user system) you create and return the whole list (or limited/maxed) of results of a query to the user, and close the cursor, there is no need for a fetch size smaller than all (or limited/maxed) query result rows to save data-fetching or memory (case: pagination by the user)
  • Easy solution would be to just set hibernate.jdbc.fetch_size to a high value
  • Only in exceptional cases like batch processing with huge numbers of rows, it may be useful to use ScrollableResults or getResultStream() and set fetch size per query to control and optimize in detail.
  • Hibernate will not save you from broken queries, make sure to test properly. In case of the risk of too many result rows, even when it is supposed to be 1, use setMaxResults() to safeguard.

Do you agree?

@gavinking
Copy link
Contributor

Yep, exactly.

(None of that means we shouldn't provide a Query.setFetchSize() method in JPA; I'm only saying why it's rarely used.)

@jborgers
Copy link
Author

Ah, that's a quick response :-) I just updated it.

@jborgers
Copy link
Author

Note that DB2 jdbc driver seems to have a default fetch size of 32.

@gavinking
Copy link
Contributor

Note that DB2 jdbc driver seems to have a default fetch size of 32.

How did you determine this?

On Db2 LUW, Statement.getFetchSize(), PreparedStatement.getFetchSize(), and ResultSet.getFetchSize() all return 0 for me.

@jborgers
Copy link
Author

jborgers commented Jan 16, 2025 via email

@jborgers
Copy link
Author

jborgers commented Jan 16, 2025 via email

@gavinking
Copy link
Contributor

I found 32 here:

I mean, these are just some blogs from > 10 years ago. People post wrong information on blogs all the time.

db2 on mainframe

Yeah, well, again, I'm not sure, because that's a different product to Db2 LUW, so perhaps the defaults are different.

And this advice seems strange if the default fetch size is unlimited:

That's the documentation for InfoSphere Information Server you linked to. I have no idea what InfoSphere Information Server is, but I don't think it's Db2.

I think I'm going to just assume that the Db2 JDBC driver isn't lying to me and that the default is indeed 0 i.e. unlimited at least on LUW.

@jborgers
Copy link
Author

That sounds like a reasonable assumption :-)

DB2 on mainframe may be different indeed; with the app and db2 on the same LPAR, roundtrips are much cheaper than usual, on Linux.

@AleksNo
Copy link

AleksNo commented Jan 17, 2025

Maybe it is a good idea to collect very common options like fetchsize, batchsize, batchtype (IN, JOIN, Subquery, EXISTS etc.), cache size etc. and make them a part of the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants