This repository has been archived by the owner on Jan 20, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 434
Prepare move to https://github.com/ESP32Async/AsyncTCP #192
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The AsyncClient::send() methods sets a boolean to true after pushing data over the TCP socket successfully using tcp_output(). It also sets a timestamp to remember at what time the data was sent. The AsyncClient::_sent() callback method reacts to ACKs coming from the connected client. This method sets the boolean to false. In the AsyncClient::_poll() method, a check is done to see if the boolean is true ("I'm waiting for an ACK") and if the time at which the data was sent is too long ago (5000 ms). If this is the case, a connection issue with the connected client is assumed and the connection is forcibly closed by the server. The race condition is when these operations get mixed up, because of multithreading behavior. The _sent() method can be called during the execution of the send() method: 1. send() sends out data using tcp_output() 2. _sent() is called because an ACK is processed, sets boolean to false 3. send() continues and sets boolean to true + timestamp to "now" After this, the data exchange with the client was successful. Data were sent and the ACK was seen. However, the boolean ended up as true, making the _poll() method think that an ACK is still to be expected. As a result, 5000 ms later, the connection is dropped. This commit fixes the code by first registering that an ACK is expected, before calling tcp_output(). This way, there is no race condition when the ACK is processed right after that call. Additionally, I changed the boolean to an integer counter value. The server might send multiple messages to the client, resulting in multiple expected ACKs. A boolean does not cover this situation. Co-authored-by: Maurice Makaay <[email protected]>
After my first attempt at fixing the client disconnects (OttoWinter/AsyncTCP#4) got merged into AsyncTCP, it turned out that there was regression for some devices: the connection stability actually went down instead of up. After a lot of debugging and discussion with @glmnet (some of the results can be found in the above pull request discussion), we came up with an improved fix for the disconnect issues. **Changed:** The code that checks for ACK timeouts has been simplified in such way, that only two timestamps are now used to determine if an ACK timeout has happened: the time of the last sent packet (this was already recorded), and the time of the last received ACK from the client (this has been added). Using these timestamps, there is no more need for a separate field to keep track if we are waiting for an ACK or not (`_pcb_busy`). Therefore, this field was completely removed from the code. While I was at it, I renamed a few variables to make the code easier to read and more consistent. **Results:** I connected Home Assistant plus 8 OTA loggers at the same time, using very verbose logging output. This normally was an easy way to trigger the disconnect errors. It turned out, this solution runs as solid for me, as when disabling the ACK timeout checks completely (using `AsyncClient::setAckTimeout(0)`).
Better fix for "ack timeout 4" client disconnects.
Co-authored-by: Paweł pidpawel Kozubal <[email protected]>
…re version (currently v3 rc1)
Arduino 3 / ESP IDF 5 compatibility
throttle polling events when message queue gets filled up
Refer: mathieucarbou/ESPAsyncWebServer#165 Let's try to coalesce two (or more) consecutive poll events into one this usually happens with poor implemented user-callbacks that are runs too long and makes poll events to stack in the queue if consecutive user callback for a same connection runs longer that poll time then it will fill the queue with events until it deadlocks. This is a workaround to mitigate such poor designs and won't let other events/connections to starve the task time. It won't be effective if user would run multiple simultaneous long running callbacks due to message interleaving. todo: implement some kind of fair dequeing or (better) simply punish user for a bad designed callbacks by resetting hog connections
- do not let `_remove_events_with_arg()` call to block on queue pertubation, otherwise it could deadlock - in any case do not block on adding LWIP_TCP_POLL event to the queue, it does not make much sense anyway due to poll events are repetitive - `_get_async_event()` will discard LWIP_TCP_POLL events when q gets filled up this will work in combination with throttling poll events when adding to the queue Poor designed apps and multiple parallel connections could flood the queue with interleaved poll events that can't be properly throttled or coalesced. So we can discard it in the eviction task after coalescing It will work in this way: - queue is up to 1/4 full - all events are entering the queue and serviced - queue is from 1/4 and up to 3/4 full - new poll events are throttled on enqueue with linear probability - queue is from 3/4 up to full top - all new poll events are ignored on enqueue and existing poll events already in the queue are discarded on eviction with linear probability giving away priority for all other events to be serviced. It is expected that on a new poll timer connection polls could reenter the queue
Coalesce poll events on queue eviction
it must free message's mem if failed to requeue the pointer
fix: _remove_events_with_arg() might probably leak mem on queue overflow
Specifying libCompatMode in library.json prevents projects from overriding the library compatibility mode as needed. This prevents the library from being used in builds where Arduino is a component to ESP-IDF.
Remove libCompatMode from library.json
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.