-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NetworkInputStream #23
Conversation
fixed the test in PGConnectionTLSTest
Conflicts: src/main/java/org/postgresql/sql2/communication/network/DescribeResponse.java src/test/java/org/postgresql/sql2/communication/BEFrameReaderTest.java
Conflicts: src/main/java/org/postgresql/sql2/communication/BEFrameParser.java src/main/java/org/postgresql/sql2/communication/NetworkConnection.java src/main/java/org/postgresql/sql2/communication/NetworkReadContext.java src/main/java/org/postgresql/sql2/communication/TableCell.java src/main/java/org/postgresql/sql2/communication/network/AuthenticationResponse.java src/main/java/org/postgresql/sql2/communication/network/BindResponse.java src/main/java/org/postgresql/sql2/communication/network/DescribeResponse.java src/main/java/org/postgresql/sql2/communication/network/ExecuteResponse.java src/main/java/org/postgresql/sql2/communication/network/NetworkConnectRequest.java src/main/java/org/postgresql/sql2/communication/network/ParseResponse.java src/main/java/org/postgresql/sql2/communication/network/ReadyForQueryResponse.java src/main/java/org/postgresql/sql2/communication/packets/AuthenticationRequest.java src/main/java/org/postgresql/sql2/communication/packets/CommandComplete.java src/main/java/org/postgresql/sql2/communication/packets/DataRow.java src/main/java/org/postgresql/sql2/communication/packets/ErrorPacket.java src/main/java/org/postgresql/sql2/communication/packets/ParameterStatus.java src/main/java/org/postgresql/sql2/communication/packets/RowDescription.java src/main/java/org/postgresql/sql2/communication/packets/parsers/BinaryParser.java src/main/java/org/postgresql/sql2/communication/packets/parts/ColumnTypes.java src/test/java/org/postgresql/sql2/execution/NioLoopTest.java
</properties> | ||
<distributionManagement> | ||
<repository> | ||
<id>repo.officefloor.sf.net</id> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of commits keep coming in with some of this stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this is my local deployment to make the jar available for download. This is necessary for TechEmpower to access the jar for building the OfficeFloor server to use pgsql2. We should possibly consider starting to apply to release to maven central to make early adopter interest available.
Nice, I will read through it, but one thing regarding portals. I now realize that it's not mandatory to give the portals names. But regarding named portals, those are state in the connection, and I think the server can deallocate them if it runs out of resources or by some other rule, so we need to determine what should happen if we try to use a named portal that no longer exist on the server end. I think we have two options
Both have drawbacks, with 1 it means that statements might be executed in the wrong order and with 2 it exposes the user to something that should be the responsibility of the connection library. This needs to be investigated further, maybe it's enough to set a upper limit on the number of named portals per connection. |
@alexanderkjall I'm not sure of this, however, I'm wondering if the database sends a notification if it is deallocating a portal. But I'm guessing if we following patterns in pgjdbc (i.e. the create portal only after 5 queries) this would be a good start. I don't believe we need to "reinvent the wheel" for this driver. For me, we are just creating a new driver that takes all the pgjdbc patterns except opens up:
Hence, for me, I'm going to fix the Portal/Query to follow pgjdbc as a starting point (to re-use all their experience). Then once we have working (and deployed in a few applications for "real-world" use) we can then look for enhancements. Note: please excuse me if this sounds a little simplistic of what we are trying to achieve. I'm just thinking that as starting point to re-use as much as possible of pgjdbc, within the context of asynchronous calls. pgjdbc is a "proven" driver and I'm thinking default designs should base itself a little off learning from this driver... and then diverge to allow the asynchronous functionality. |
I agree that reusing as much as possible from pgjdbc is a good idea, but I'm not sure that this logic can be reused, as we do pipe lining of requests and pgjdbc does not. Imagine the following scenario: 1 We have two queries, q1 and q2, q1 have a named portal. At point 4 we have the dilemma from my last post. |
The server does not de-allocate Portals, nor does it send any message saying it did de-allocate them |
Aha, my misstake, I based it on that I got the message "prepared statement "q" does not exist" from this test https://github.com/pgjdbc/pgsql2/blob/bd92a7d0272fdaaa23b1311a92dc53a65698c688/src/test/java/org/postgresql/sql2/SelectDataTypesTest.java#L321 . But after closer inspection of the network packets it turns out that something produces broken Bind packages. Lets track that issue in another thread, here; #24 |
Conflicts: src/main/java/org/postgresql/sql2/communication/network/AuthenticationResponse.java src/main/java/org/postgresql/sql2/communication/network/NetworkConnectRequest.java
Ok, I tried to merge in the latest of master into this branch and found some significant changes to the NetworkOutputStream... seems to have a lot of synchronized around the write buffer (so much so I'm thinking it would significantly degrade performance... synchronize on each get of buffer is a lot of overhead at this level of performance). The idea I was trying to achieve:
Basically, the NioLoop does very little in the way of synchronizing its state, as it runs on a single thread (so no memory synchronizing is necessary). It is only when this thread interacts with other threads (application thread submitting a submission or registered response handler is invoked on another thread) does there need to be any synchronizing |
@sagenschneider I would also like to reduce the amount of synchronization, and I also discovered that the code still have a race condition even after I added the synchronization, sometimes the READ / WRITE registration happens in the wrong order, so one the selector gets a read only registration while there still is bytes to be written, debugging that right now. The problem that led me to add the synchronization was in usage of the getCurrentBuffer function, sometimes a buffer could end up being used by both the query producing thread and the network thread, and that caused memory corruption and incorrect packaged being written to the network. One way to reduce the synchronization would be to have a ConcurrentLinkedQueue as a handover point between the threads, and the query producing thread puts buffers that it's finished with on it, and the network thread pulls from it when it needs more data to write to the network. |
Thought about this a bit more, and one way to do this would be to move the written buffers to the queue that the network thread reads when flush() is called, that way we shouldn't need any synchronization on writes. And if flush also wakes up the selector i think we can get around that other race condition. |
Possibly, though I would like to keep the network thread without too much synchronizing internally. Question: can we make the submissions effectively immutable? Ideally, I would be thinking:
The result is there is no "course grained" lock slowing down performance. The only lock is within each submission on results. As this is less likely to change frequently, there should be little lock contention causing slow down. Note: if we start putting locks in the network layer, they typically end up being course grained locking. As many threads will contend over the locks (network thread will contend with application threads submitting), this will show up as noticeable slow down. |
Do you feel that using a ConcurrentLinkedQueue is too much synchronization? |
I would be thinking:
|
That sounds like working plan, only thing I can think about that would be tricky is that a submission can have futures as parameters. How the results of the queries are supplied to the user is really out of scope for this issue but I think the best thing might be to have a ExecutorPool since the callback is a function that is supplied by the user. |
I would likely expect the parameter futures would be resolved before the submission. This way the network layer does not have to deal with the "promises" of the asynchronous layer. Basically then:
|
Isn't that over complicating things a bit? Wouldn't it be enough to check if the future is completed with isDone() on the future, and if that is false just not pull anything from the queue of submissions? |
It's a complicated way of saying, don't submit anything to the network queue unless it is ready for execution. For me, there could be multiple "parallel" execution trees by the same client. I would not assumed that there is any relationship between each branch of this execution tree (think of it like promise tree). As each node is resolved, it can then submit it's "immutable network submission". I would not queue one submission behind another submission due to future resolution. This could:
Basically, what I'm saying is that the network layer only receives "immutable submissions" that ready for execution. If it contains a future that needs resolution, this should be handled at the higher layers. |
Note: also we want to avoid "polling"... once the final future is resolved for the node then the submission is sent to the network. Otherwise, we end up having to poll the head of the queue waiting for isDone() to return true. |
I don't think we can start reordering queries that the user send, that could severely break the users application. Imagine two update queries that operate on the same rows, if we reorder them then a completely different result would be produced. |
It is not "re-ordering" queries. It is executing queries when they are ready to execute. If a query has an input parameter with a future not yet resolved, then unless there is a "then" relationship, the query should be held and other ready queries (that have all their parameters resolved) be submitted. For me this is all part of the "promise tree" of the asynchronous execution. Order of asynchronous operations is not guaranteed unless some "then" relationship is between the queries. If I submit two queries without some relationship between them, from asynchronous programming point of view, there is no guarantee to the order they are executed. For me, this is all part of the asynchronous query layer problem. It should not be put into the Network Layer. |
(but then again... I'm not totally up with the ADBA specs... however, my take on asynchronous programming is that order is not something that is guaranteed... unless the client explicitly sets ups up some form of relationship between the operations... e.g. Promise.then(..).then(...) etc) |
You can always ask on the http://mail.openjdk.java.net/pipermail/jdbc-spec-discuss/ list, I might have understood the api wrong :) |
The latest version of the API have a comment section that specifically mentions this use case, let me quote:
|
@alexanderkjall thanks for merging the other PR. So that I don't get too much divergence, I've raised the PR with the NetworkInputStream for you to get an idea of where I'm going with this.
There are test failures on this branch, however, as per other PR... raising early for you to see the changes and where they are to avoid too much merge conflicts down the line... once I can get it working.
One thing I believe I have wrong is the Portal / Query relationship. I think the objects need to be the other way round. Query is submitted and associated to a potential portal. Once the query is undertaken more than 5 times (following pgjdbc), then can start naming the portal to associate.
Interesting is that (as per @davecramer on #19 ), is portal created:
For first above an LRU cache would be ok. For the second, LRU cache could create some "interesting" hard to reproduce errors. For each, I'm thinking would need to manage separately. However, for now prepared statements are my interest for performance.