-
Notifications
You must be signed in to change notification settings - Fork 677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IteratorSpliterator returns estimateSize which violates Spliterator's contract breaking parallel stream processing #3027
Comments
In the case of MongoDB we're adopting Have you tried to spike on parallelization by adopting a |
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed. |
Thanks for the hint, returning Just wondering if that is documented anywhere that the returned For reference, below is the code I used to test this. This is Kotlin code using a Repository method (if you'd use the magic @Query("{}")
fun all(): Iterable<Item> Using it: fun testParallelItemProcessing() {
StreamSupport.stream(itemDao.all().spliterator(), true).use { stream ->
val sum = stream
.map {
val size = it.size
logger.info { "Item has size $size" }
size
}
.reduce(Int::plus)
.orElse(0)
logger.info { "Total size: $sum" }
}
} Logging shows that multiple threads are involved in 'processing' the items. |
Just realized that by using Does that mean that I can EITHER process results in chunks OR process results in parallel? |
|
So the answer is I can only get both when using the Reactive API? This would mean to setup a parallel reactive set of code (reactive MongoDB connection, reactive repository, reactive access & transformations). That's something we may need to consider 🤔 |
For the time being, this is correct. Would you mind reaching out to the MongoDB team asking to make their |
Not sure if I'm missing something, but it looks like elements from a stream returned from Spring Data (in my case it's a
MongoRepository
) will never be processed in parallel even if I instruct the stream to do so by converting it to a parallel stream.After some debugging and digging into the details, it looks like the reason is that
IteratorSpliterator.estimateSize()
does return-1
and not a proper value as indicated inSpliterator
's JavaDoc. This causes the fork-join implementation to never fork.JavaDoc of
Spliterator.estimateSize()
:Is this simply something that was missed or does this have a particular reason?
The text was updated successfully, but these errors were encountered: