Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EndpointAPI.wait_until_connected_to #86

Conversation

pipermerriam
Copy link
Member

What was wrong?

Needed a non-polling mechanism to wait for a connection.

How was it fixed?

Additive changes to our connections are now guarded with an asyncio.Condition. This allows implementation of wait_until_connected_to in a non-polling manner. After the initial connection check, the API waits for an update to the connection before checking again.

Cute Animal Picture

b97196bbcd1219819dbbd3aaf74c8847

@pipermerriam pipermerriam requested a review from lithp May 24, 2019 11:52
@pipermerriam
Copy link
Member Author

I've got some PR stacking going on here. This contains the diff from 3 PRs, only latest commit is relevant.

@lithp
Copy link
Contributor

lithp commented May 24, 2019

Don't know whether you intentionally didn't use this, but if you push the first two commits to this repo under a branch you can then set the base branch of this PR so that the diff only shows the last commit. Just have to remember to set the base back to master once the preceding PRs have been merged!

@pipermerriam
Copy link
Member Author

@lithp since this stretches across forks, that results in the PR being only open in my fork which lacks visibility. Maybe I should just open these as drafts so that there's less confusion and they can be reviewed sequentially as they get merged and rebased.

@pipermerriam pipermerriam force-pushed the piper/asyncio-endpoint-establish-reverse-connection branch 2 times, most recently from 45159e5 to 840ff41 Compare May 24, 2019 21:03
@pytest.mark.asyncio
async def test_connect_to_endpoint(ipc_base_path):
config = ConnectionConfig.from_name("server", base_path=ipc_base_path)
async with AsyncioEndpoint.serve(config) as server:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this API 😍

finally:
self._half_connections.remove(remote)
if remote in self._half_connections:
self._half_connections.remove(remote)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a little cleaner to replace this finally block with another task.add_done_callback(self._half_connections.remote(remote)) inside _accept_conn.

try:
await remote.wait_stopped()
except asyncio.CancelledError:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this suppress CancelledError? I think that in this situation it's okay but letting it propagate up is safer.

From the docs:

cancelled() can be used to check if the Task was cancelled. The method returns True if the wrapped coroutine did not suppress the CancelledError exception and was actually cancelled.

Suppressing the error prevents callers of _handle_client which didn't use ensure_future from knowing they've been cancelled, and it also prevents other callers of _handle_client from determining whether the coro stopped because of cancellation. Neither kind of caller exists right now but this still seems like unexpected behavior.

Copy link
Contributor

@lithp lithp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, esp one about suppressing CancellationError, but no major complaints here (only looking at the last commit). One potential source of errors here is that the code which adds to _half_connections is fairly removed from the code which removes from it (and same with _full_connections). It'd be nice if there was some way of bringing them closer together.

@pipermerriam pipermerriam force-pushed the piper/asyncio-endpoint-establish-reverse-connection branch 3 times, most recently from 4133f0f to 4469d20 Compare May 26, 2019 13:23
@pipermerriam
Copy link
Member Author

@lithp I had to make some structural changes to fix a race condition where subscription updates could end up not transmitting if they occured at a specific timing during the connection. Would like one more pass if you will.

@pipermerriam pipermerriam requested a review from lithp May 26, 2019 13:25
return f"RemoteEndpoint[{self.name if self.name is not None else id(self)}]"

def __repr__(self) -> str:
return f"<{self}>"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can/should extract to standalone.

self._received_subscription.notify_all()
if message.response_expected:
await self.send_message(SubscriptionsAck())
if message.response_expected:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lithp can you confirm this should be how this is structured or do you think it's ok to send the Ack after we've released the lock...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to send the Ack from outside the lock's block, and maybe even preferable! The Condition is there to coordinate the local process, since the remote process doesn't even know about this Condition I don't think that releasing the lock first could create any race conditions.

while event not in self.subscribed_messages:
await self.wait_until_subscription_received()
async with self._received_subscription:
while True:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lithp can you confirm this is ok according to your understanding of conditions. rather than release and then re-acquire at the entrance to each loop, we eliminate this and only switch on the release/acquire that happens as part of the wait call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

)
)

async def _connect_receiving_queue(self) -> None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't intend to relocate this but I think I like it here better. Need to add some comments to deliniate the body of this class more clearly but I'll do that as a cleanup PR since re-organizing makes code hard to audit.

TODO: revert this move.

def _notify_subscriptions_changed_nowait(self) -> None:
self._subscription_updates_queue.put_nowait(None)

async def _do_notify_subscriptions_changed(self) -> None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: remove this method and inline it to _process_subscription_updates since we never want anyone to call this directly.


async def _process_subscription_updates(self) -> None:
self._subscription_updates_running.set()
async with self._subscription_updates_condition:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lithp Needed this to ensure we don't end up with concurrent subscription updates being sent out since it's possible that they arrive out of order in that case and then the other side will have potentially in-accurate subscription records.

TOOD: add a code comment explaining this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would they arrive out of order? Domain sockets always deliver messages in order and _notify_lock ensures there's only one update per socket happening at any time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started undoing this but then stopped as I'm inclined to leave it this way. Here's my reasoning.

  1. we end up needing to do subscription updates from synchronous methods with the addition of subscribe_nowait which seems to be a necessary API.
  2. I'm much more comfortable with that method doing a Queue.put_nowait than I am adding an additional call to asyncio.ensure_future(self._notify_subscription_updates()).
  3. I think that the asyncio.Condition based approach is more appropriate and under that approach we still need a background process to listen for changes. The various branches that I have in progress have made good use of that Condition API.

I'm fine unrolling this if you're still not convinced at the point where the trio based endpoint is being added to the library.

if self._process.pid is not None:
os.kill(self._process.pid, signal.SIGINT)
else:
self._process.terminate()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: extract to standalone.

@pipermerriam pipermerriam force-pushed the piper/asyncio-endpoint-establish-reverse-connection branch from 4469d20 to 3f30e62 Compare May 27, 2019 09:49
@@ -216,9 +222,13 @@ def __init__(
endpoint has acknowledged the new subscription set. If ``block`` is ``False`` then this
function will return immediately after the send finishes.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment just above this line isn't quite right anymore: "Alert the Endpoint which has connected to us that our subscription set has changed." Since RemoteEndpoint now handles both inbound and outbound this should be something like "Alert the remote that this endpoint's subscription set has changed"

self._receiving_loop_running = asyncio.Event()
self._receiving_queue = asyncio.Queue()

self._subscription_updates_running = asyncio.Event()
self._subscription_updates_condition = asyncio.Condition()
self._subscription_updates_queue = asyncio.Queue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be initialized in __init__ which would save you from needing to declare them at the beginning of this class and would also make it easier to reason about the potential for None dereferences.

The await *.running() pattern sounds like it could be pulled out into a helper which removes the potentially None instance attribute entirely. Something like await self.start_coro(self._process_subscription_updates) which doesn't return until the coro has started.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave this for now and open an issue to track as I've seen this issue arising as well but I don't want to conflate this PR with an extra fix that I'm not sure I know the right solution for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while True:
if event in self.subscribed_messages:
return
await self._received_subscription.wait()


@asynccontextmanager # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of this change _handle_client and _handle_server are the same method and together they constitute the only callers of run_remote_endpoint. Maybe they could all be merged into one method (untested):

async def run_remote_endpoint(remote: RemoteEndpoint) -> None
    await remote.start()
    try:
        await remote.wait_stopped()
    finally:
        await remote.stop()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch.

Copy link
Contributor

@lithp lithp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out from these comments what the problem with missed updates was so it's possible that this is the best solution and I'm willing to be talked into approving, but this solution is very heavyweight!

This method with _subscription_updates_condition which uses a single coro to do the notifications has the effect of serializing all updates across all remote endpoints. It's not only a lot of code but it's also much slower than it needs to be! I think that in most cases the endpoint doesn't need to block at all, it's okay if we take a while to inform the remotes of the messages we're expecting, because endpoints already drop all messages which occurred before the endpoint started listening for them.

Blocking before returning from connect_to_endpoint and stream was added to make it easier to write tests. Once those methods return the caller is able to call broadcast() on other endpoints and be sure the message will be sent.

I'm not sure what was causing the dropped updates but I'm hoping there's a different way to fix it, and adding sleep() calls to the tests sounds preferable to the complication of putting this condition everywhere the endpoint might update a subscription, and adding a queue and a coro to allow asynchronously updating subscriptions. I think this change tips this class over into the territory where it's too scary to touch, which will slow down the API improvements I think we plan on adding in the future.

# potentially being redundant.
async with self._subscription_updates_condition:
await remote.notify_subscriptions_updated(self.subscribed_events)
await self._add_half_connection(remote)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, okay, I see the problem now, _add_half_connection blocks when it tries to acquire connections_changed so there's a period in time where the remote isn't receiving subscription updates. Double-sending the same set of subscriptions has a very small cost though! And I think that trying not to send it introduced greater inefficiencies elsewhere, serializing all updates isn't cheap!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I thought I responded to this somewhere but I don't see it)

I'm inclined to leave it this way because it correctly manages the appropriate locks on the resource, where-as the other mechanism solves the problem through indirection. First, I think the overall cost here is low because our usage patterns for the library don't involve adding new connections at a high frequency. Second, doing it the alternate way of inverting these statements seems like a solution that could easily be undone incidentally in a refactor since it's an implicit solution.

@pipermerriam
Copy link
Member Author

pipermerriam commented May 29, 2019 via email

@pipermerriam
Copy link
Member Author

@lithp lets do a call later today when we're both online and see if we can find something that works for both of us.

  • We need something that can work from synchronous methods as both Subscription.remove and EndpointAPI.subscribe_nowait need to be able to trigger subscription updates.
  • I don't like using asyncio.ensure_future(...) as the mechanism for handling the synchronous context problem.
  • I am pretty sure that a Condition based approach is justified as I've spent some serious time debugging race conditions over the last week around subscriptions not being properly updated and while I know we can achieve working code without the locks, I'm not excited about such solutions.

@pipermerriam
Copy link
Member Author

replaced by #110 and #108

@pipermerriam pipermerriam deleted the piper/asyncio-endpoint-establish-reverse-connection branch May 30, 2019 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants