You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is the bug?
A clear and concise description of the bug.
In Opensearch 2.12.0:
When using GPU and initiating concurrent requests using neural retrieval, an ArrayIndexOutOfBoundsException exception was encountered.
I'm not sure if it's a concurrency issue, but what I can know is that a single request is successful, and exceptions only occur when there are concurrent requests.
[2024-03-28T19:30:27,355][WARN ][r.suppressed ] [opensearch-cluster_manager] path: /irp_index_vec/_search, params: {typed_keys=true, index=irp_index_vec}
org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:722) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:379) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.action.search.FetchSearchPhase.moveToNextPhase(FetchSearchPhase.java:298) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.action.search.FetchSearchPhase.lambda$innerRun$1(FetchSearchPhase.java:138) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.action.search.CountedCollector.countDown(CountedCollector.java:66) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.action.search.CountedCollector.onFailure(CountedCollector.java:85) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.action.search.FetchSearchPhase$2.onFailure(FetchSearchPhase.java:257) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:766) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1725) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:404) [opensearch-security-2.12.0.0.jar:2.12.0.0]
at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1511) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.transport.InboundHandler.lambda$handleException$5(InboundHandler.java:447) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:343) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.transport.InboundHandler.handleException(InboundHandler.java:445) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:437) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:170) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:127) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:770) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:175) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:150) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:115) [opensearch-2.12.0.jar:2.12.0]
at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95) [transport-netty4-client-2.12.0.jar:2.12.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) [netty-handler-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475) [netty-handler-4.1.106.Final.jar:4.1.106.Final]
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338) [netty-handler-4.1.106.Final.jar:4.1.106.Final]
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387) [netty-handler-4.1.106.Final.jar:4.1.106.Final]
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) [netty-codec-4.1.106.Final.jar:4.1.106.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) [netty-codec-4.1.106.Final.jar:4.1.106.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) [netty-codec-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.106.Final.jar:4.1.106.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.106.Final.jar:4.1.106.Final]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: org.opensearch.OpenSearchException$3: Index 495884 out of bounds for length 1
at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:708) ~[opensearch-core-2.12.0.jar:2.12.0]
at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:377) [opensearch-2.12.0.jar:2.12.0]
... 49 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 495884 out of bounds for length 1
at org.apache.lucene.util.SparseFixedBitSet.get(SparseFixedBitSet.java:129) ~[lucene-core-9.9.2.jar:9.9.2 a2939784c4ca60bc28bf488b5479c02fc2e5e22c - 2024-01-25 09:51:09]
at org.opensearch.search.fetch.FetchPhase.findRootDocumentIfNested(FetchPhase.java:283) ~[opensearch-2.12.0.jar:2.12.0]
at org.opensearch.search.fetch.FetchPhase.prepareHitContext(FetchPhase.java:299) ~[opensearch-2.12.0.jar:2.12.0]
at org.opensearch.search.fetch.FetchPhase.execute(FetchPhase.java:172) ~[opensearch-2.12.0.jar:2.12.0]
at org.opensearch.search.SearchService.lambda$executeFetchPhase$3(SearchService.java:782) ~[opensearch-2.12.0.jar:2.12.0]
at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74) ~[opensearch-2.12.0.jar:2.12.0]
at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89) ~[opensearch-2.12.0.jar:2.12.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.12.0.jar:2.12.0]
at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) ~[opensearch-2.12.0.jar:2.12.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.12.0.jar:2.12.0]
at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) ~[opensearch-2.12.0.jar:2.12.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) ~[opensearch-2.12.0.jar:2.12.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.12.0.jar:2.12.0]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
... 1 more
Related component
Search
To Reproduce
Create neural retrieval concurrent requests
View cluster startup logs
See the error logs
Expected behavior
Neural Search is ok when used GPU.
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
OS: Linux CentOS 7.9
Version: 2.12.0
Plugins: ml_commons
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
@lihuimingxs can you try removing the neural search query clause from the query and run the query with 5 concurrent request? Because what I can see is stack trace is from fetch phase this could be an issue in opensearch core. Because neural query doesn't touch fetch phase of Search.
Describe the bug
What is the bug?
A clear and concise description of the bug.
In Opensearch 2.12.0:
When using GPU and initiating concurrent requests using neural retrieval, an ArrayIndexOutOfBoundsException exception was encountered.
I'm not sure if it's a concurrency issue, but what I can know is that a single request is successful, and exceptions only occur when there are concurrent requests.
Number of concurrent requests: More than 5 times
Request:
Exception:
Related component
Search
To Reproduce
Expected behavior
Neural Search is ok when used GPU.
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: