Skip to content
This repository has been archived by the owner on Jan 20, 2025. It is now read-only.

Prepare move to https://github.com/ESP32Async/AsyncTCP #192

Merged
merged 165 commits into from
Jan 20, 2025

Conversation

mathieucarbou
Copy link
Contributor

No description provided.

OttoWinter and others added 30 commits October 14, 2019 12:46
The AsyncClient::send() methods sets a boolean to true after pushing
data over the TCP socket successfully using tcp_output(). It also sets
a timestamp to remember at what time the data was sent.

The AsyncClient::_sent() callback method reacts to ACKs coming from
the connected client. This method sets the boolean to false.

In the AsyncClient::_poll() method, a check is done to see if the
boolean is true ("I'm waiting for an ACK") and if the time at which
the data was sent is too long ago (5000 ms). If this is the case,
a connection issue with the connected client is assumed and the
connection is forcibly closed by the server.

The race condition is when these operations get mixed up, because
of multithreading behavior. The _sent() method can be called during
the execution of the send() method:

1. send() sends out data using tcp_output()
2. _sent() is called because an ACK is processed, sets boolean to false
3. send() continues and sets boolean to true + timestamp to "now"

After this, the data exchange with the client was successful. Data were
sent and the ACK was seen.
However, the boolean ended up as true, making the _poll() method think
that an ACK is still to be expected. As a result, 5000 ms later, the
connection is dropped.

This commit fixes the code by first registering that an ACK is
expected, before calling tcp_output(). This way, there is no race
condition when the ACK is processed right after that call.

Additionally, I changed the boolean to an integer counter value.
The server might send multiple messages to the client, resulting in
multiple expected ACKs. A boolean does not cover this situation.

Co-authored-by: Maurice Makaay <[email protected]>
After my first attempt at fixing the client disconnects
(OttoWinter/AsyncTCP#4) got merged
into AsyncTCP, it turned out that there was regression
for some devices: the connection stability actually went
down instead of up.

After a lot of debugging and discussion with @glmnet (some of
the results can be found in the above pull request discussion),
we came up with an improved fix for the disconnect issues.

**Changed:**

The code that checks for ACK timeouts has been simplified in
such way, that only two timestamps are now used to determine if
an ACK timeout has happened: the time of the last sent packet
(this was already recorded), and the time of the last received
ACK from the client (this has been added).

Using these timestamps, there is no more need for a separate field
to keep track if we are waiting for an ACK or not (`_pcb_busy`).
Therefore, this field was completely removed from the code.

While I was at it, I renamed a few variables to make the code
easier to read and more consistent.

**Results:**

I connected Home Assistant plus 8 OTA loggers at the same time,
using very verbose logging output. This normally was an easy way
to trigger the disconnect errors.

It turned out, this solution runs as solid for me, as when disabling
the ACK timeout checks completely
(using `AsyncClient::setAckTimeout(0)`).
Better fix for "ack timeout 4" client disconnects.
Co-authored-by: Paweł pidpawel Kozubal <[email protected]>
Arduino 3 / ESP IDF 5 compatibility
mathieucarbou and others added 27 commits December 11, 2024 15:02
throttle polling events when message queue gets filled up
Refer: mathieucarbou/ESPAsyncWebServer#165

Let's try to coalesce two (or more) consecutive poll events into one
this usually happens with poor implemented user-callbacks that are runs too long and makes poll events to stack in the queue
if consecutive user callback for a same connection runs longer that poll time then it will fill the queue with events until it deadlocks.
This is a workaround to mitigate such poor designs and won't let other events/connections to starve the task time.
It won't be effective if user would run multiple simultaneous long running callbacks due to message interleaving.
todo: implement some kind of fair dequeing or (better) simply punish user for a bad designed callbacks by resetting hog connections
 - do not let `_remove_events_with_arg()` call to block on queue pertubation, otherwise it could deadlock

 - in any case do not block on adding LWIP_TCP_POLL event to the queue, it does not make much sense anyway
   due to poll events are repetitive

 - `_get_async_event()` will discard LWIP_TCP_POLL events when q gets filled up
   this will work in combination with throttling poll events when adding to the queue
   Poor designed apps and multiple parallel connections could flood the queue with interleaved poll events
   that can't be properly throttled or coalesced. So we can discard it in the eviction task after coalescing
   It will work in this way:
    - queue is up to 1/4 full - all events are entering the queue and serviced
    - queue is from 1/4 and up to 3/4 full - new poll events are throttled on enqueue with linear probability
    - queue is from 3/4 up to full top - all new poll events are ignored on enqueue and existing poll events
      already in the queue are discarded on eviction with linear probability giving away priority for all other
      events to be serviced. It is expected that on a new poll timer connection polls could reenter the queue
Coalesce poll events on queue eviction
it must free message's mem if failed to requeue the pointer
fix: _remove_events_with_arg() might probably leak mem on queue overflow
Specifying libCompatMode in library.json prevents projects from overriding the library compatibility mode as needed. This prevents the library from being used in builds where Arduino is a component to ESP-IDF.
Remove libCompatMode from library.json
@me-no-dev me-no-dev merged commit ef448a8 into me-no-dev:master Jan 20, 2025
@mathieucarbou mathieucarbou deleted the move branch January 20, 2025 21:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.