Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No response from the database when same SQL statement is sent to the database multiple times #658

Open
spc16670 opened this issue Jul 28, 2024 · 8 comments
Labels
status: waiting-for-triage An issue we've not yet triaged

Comments

@spc16670
Copy link

spc16670 commented Jul 28, 2024

  • Driver: r2dbc-postgresql 1.0.5
  • Database: 15.6
  • Java: 17
  • OS: Windows

Current Behavior

We decided to use r2dbc in our new new project what required porting some of our existing blocking code onto a reactive stack. This included paging functions - these were rewritten to preserve the semantics from the traditional model. Instead of creating Fluxes and reading items to page limit, the SQL query is written to use LIMIT with OFFSET.

We noticed that that the query would always time out while fetching the 7th page.

Solution

It turned out that 'salting' the SQL statement to make it unique solves the issue. As a workaround we wrap all queries we send to the database into a sub query:

alias = now()
SELECT (a + alias).* FROM (SQL)as (a + alias)

The above fixes the problem - all our queries now return without timeout as expected, even if 10s of pages are involved.

This however seriously undermined our confidence in our new stack. We cannot tell for sure if the problem is with the driver, the DatabaseClient Spring class, or the database. The solution to the problem we found was more of a luck rather than anything else, neither the driver nor the database would offer any clues as to which end was causing the behaviour. We have taken a guess that the database is responding with data but the driver fails to notice the response therefore triggering the statement timeout exception.

I will try to rewrite our code to provide a reproduction here when I have more time.

@spc16670 spc16670 added the status: waiting-for-triage An issue we've not yet triaged label Jul 28, 2024
@mp911de
Copy link
Collaborator

mp911de commented Jul 29, 2024

You can enable debug logging to see at which point the statement gets stuck. This should help to identify the cause.

@spc16670
Copy link
Author

spc16670 commented Jul 29, 2024

Hope this is going to help:

2024-07-29T19:14:55.660+01:00  INFO 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] c.i.a.r.a.s.r.q.FindByAssigneesAndScopes : Getting page 6/152 for a batch of 152
2024-07-29T19:14:55.660+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.pool.ConnectionPool             : Obtaining new connection from the pool
2024-07-29T19:14:55.660+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [0] to: In{Inferred: java.lang.String}
2024-07-29T19:14:55.660+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [1] to: In{Inferred: java.lang.String}
2024-07-29T19:14:55.660+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [2] to: In{Inferred: java.lang.String}
2024-07-29T19:14:55.660+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [3] to: In{Inferred: [Ljava.lang.Integer;}
2024-07-29T19:14:55.661+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [4] to: In{Inferred: [Ljava.lang.Integer;}
2024-07-29T19:14:55.661+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [5] to: In{Inferred: java.lang.Integer}
2024-07-29T19:14:55.661+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.PARAM                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Bind parameter [6] to: In{Inferred: java.lang.Integer}
2024-07-29T19:14:55.661+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.postgresql.QUERY                : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Executing query: SELECT DISTINCT
<fields>
FROM <table>
WHERE <preds> ORDER BY a.id LIMIT $6 OFFSET $7
2024-07-29T19:14:55.661+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Request:  [Bind{name='B_7', parameterFormats=[FORMAT_TEXT, FORMAT_TEXT, FORMAT_TEXT, FORMAT_TEXT, FORMAT_TEXT, FORMAT_BINARY, FORMAT_BINARY], parameters=[CompositeByteBuf(ridx: 0, widx: 37, cap: 37, components=1), CompositeByteBuf(ridx: 0, widx: 37, cap: 37, components=1), CompositeByteBuf(ridx: 0, widx: 4, cap: 4, components=1), CompositeByteBuf(ridx: 0, widx: 13, cap: 13, components=1), CompositeByteBuf(ridx: 0, widx: 13, cap: 13, components=1), CompositeByteBuf(ridx: 0, widx: 4, cap: 4, components=1), CompositeByteBuf(ridx: 0, widx: 4, cap: 4, components=1)], resultFormats=[], source='S_2'}, Describe{name='B_7', type=PORTAL}, Execute{name='B_7', rows=0}, Close{name='B_7', type=PORTAL}, Sync{}]
2024-07-29T19:14:55.661+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] WRITE: 211B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 42 00 00 00 ac 42 5f 37 00 53 5f 32 00 00 07 00 |B....B_7.S_2....|
<HEX FRAME>
|000000d0| 00 00 04                                        |...             |
+--------+-------------------------------------------------+----------------+
2024-07-29T19:14:55.661+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] FLUSH
2024-07-29T19:14:55.662+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] READ COMPLETE
2024-07-29T19:15:55.709+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] READ: 350B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 32 00 00 00 04 54 00 00 00 e4 00 07 69 64 00 00 |2....T......id..|
<HEX FRAME>
+--------+-------------------------------------------------+----------------+
2024-07-29T19:15:55.709+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Response: BindComplete{}
2024-07-29T19:15:55.709+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Response: RowDescription{<FIELDS>}]}
2024-07-29T19:15:55.710+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Response: ErrorResponse{fields=[Field{type=SEVERITY_LOCALIZED, value='ERROR'}, Field{type=SEVERITY_NON_LOCALIZED, value='ERROR'}, Field{type=CODE, value='57014'}, Field{type=MESSAGE, value='canceling statement due to statement timeout'}, Field{type=FILE, value='postgres.c'}, Field{type=LINE, value='3514'}, Field{type=ROUTINE, value='ProcessInterrupts'}]}
2024-07-29T19:15:55.710+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : Error: SEVERITY_LOCALIZED=ERROR, SEVERITY_NON_LOCALIZED=ERROR, CODE=57014, MESSAGE=canceling statement due to statement timeout, FILE=postgres.c, LINE=3514, ROUTINE=ProcessInterrupts
2024-07-29T19:15:55.711+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.postgresql.util.FluxDiscardOnCancel  : received cancel signal
2024-07-29T19:15:55.712+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] io.r2dbc.pool.PooledConnection           : Releasing connection
2024-07-29T19:15:55.712+01:00 ERROR 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.a.r.a.s.r.e.CubeRdsExceptionTranslator : Database query timed out canceling statement due to statement timeout
2024-07-29T19:15:55.712+01:00 DEBUG 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Response: ReadyForQuery{transactionStatus=IDLE}
2024-07-29T19:15:55.713+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] READ COMPLETE



**<HANGS HERE>**


com.xyz.RdsRepositoryException: io.r2dbc.postgresql.ExceptionFactory$PostgresqlNonTransientResourceException: [57014] canceling statement due to statement timeout

	<LONG STACKTRACE>
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: io.r2dbc.postgresql.ExceptionFactory$PostgresqlNonTransientResourceException: [57014] canceling statement due to statement timeout
	at io.r2dbc.postgresql.ExceptionFactory.createException(ExceptionFactory.java:109)
	at io.r2dbc.postgresql.ExceptionFactory.createException(ExceptionFactory.java:65)
	at io.r2dbc.postgresql.ExceptionFactory.handleErrorResponse(ExceptionFactory.java:132)
	at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onNext(FluxHandleFuseable.java:179)
	... 47 more

2024-07-29T19:15:55.745+01:00 DEBUG 86417 --- [cube-agreement-api] [ionShutdownHook] i.r.p.client.ReactorNettyClient          : [cid: 0x1][pid: 19726][id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] Request:  Terminate{}
2024-07-29T19:15:55.747+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] WRITE: 5B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| 58 00 00 00 04                                  |X....           |
+--------+-------------------------------------------------+----------------+
2024-07-29T19:15:55.747+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] FLUSH
2024-07-29T19:15:55.748+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 - R:localhost/127.0.0.1:15432] CLOSE
2024-07-29T19:15:55.748+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 ! R:localhost/127.0.0.1:15432] INACTIVE
2024-07-29T19:15:55.749+01:00 TRACE 86417 --- [cube-agreement-api] [tor-tcp-epoll-1] i.r.p.client.ReactorNettyClient          : [id: 0xf680ec6f, L:/127.0.0.1:43528 ! R:localhost/127.0.0.1:15432] UNREGISTERED

Process finished with exit code 255

@mp911de
Copy link
Collaborator

mp911de commented Jul 30, 2024

Alright, the database reports a query timeout after 60 seconds. With you receiving an exception that propagates the timeout, everything is fine here and I'm not sure what else the driver could do.

@k163377
Copy link

k163377 commented Aug 1, 2024

It appears that we had the same problem as you.
jOOQ/jOOQ#16669 (comment)

It seems to be the same solution you were offering, but I have confirmed that adding a condition, e.g. alias = alias (where alias is the timestamp), can work around the problem.
Also, as far as I have tried, it seems to work around the problem by issuing the query as a pure string instead of using a prepared statement.

@spc16670
Very big tip, thanks for your report.

@spc16670
Copy link
Author

spc16670 commented Aug 1, 2024

When you serialize to a pure string the values you are passing make the SQL query unique enough from the perspective of the database - so it is another way to apply the same workaround. I am still not sure where the fix to this is supposed to be applied - the database?

@k163377
Copy link

k163377 commented Aug 1, 2024

Sorry if I have misunderstood something.

What I have tried is to execute a query with embedded parameters (without using a prepared statement) directly against Connection with createStatement.
With this method, the query no longer hangs even if I issue the exact same query many times, down to the parameters.
So I am wondering if there is some problem with the handling of the prepared statement.

However, I have only been able to reproduce it using production code, and I am sorry that I cannot publish a sample.

@k163377
Copy link

k163377 commented Aug 5, 2024

I have tried the check by adding print in the following place.

When hanging, CommandComplete{command=SET, rowId=null, rows=null} and ReadyForQuery{transactionStatus=IDLE} occurred successfully, then the subsequent BindComplete{} did not occur for 2 or 3 minutes.
Therefore, I believe that the problem I am encountering is at least related to the processing of prepared statements.

@szycza
Copy link

szycza commented Nov 18, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-triage An issue we've not yet triaged
Projects
None yet
Development

No branches or pull requests

4 participants