Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Discovery Protocol #5

Open
PropGit opened this issue Mar 17, 2016 · 7 comments
Open

Enhance Discovery Protocol #5

PropGit opened this issue Mar 17, 2016 · 7 comments

Comments

@PropGit
Copy link
Contributor

PropGit commented Mar 17, 2016

Please enhance the loader's module discovery protocol according to what is decided in the Parallax Wi-Fi Module's firmware repo issue#1.

@PropGit
Copy link
Contributor Author

PropGit commented Mar 28, 2016

UPDATE: In a test with 4 Parallax Wi-Fi Modules (all within a 10 foot radius of the central access point), the current Discovery process successfully identified all modules 65% of the time (13 out of 20 tries).

In a second test (with same modules, moments later), the Discovery process successfully identified all modules 73% of the time (22 out of 30 tries); two of the tries failed to identify two of the modules while all other attempts identified 3 or 4 (not always the same modules appear in, or disappear from, the results).

Note this was just a simple test; results are expected to vary based on network traffic, separation distance, signal strength, wireless noise, and pseudo-random chance.

@PropGit
Copy link
Contributor Author

PropGit commented Mar 28, 2016

UPDATE: This is much improved! Using esp-link-2016-03-28.zip and proploader.exe from 03/28, I performed the same tests as above.

The Discovery process identified all 4 modules 86% of the time (26 out of 30 tries); one of the tries reported only 2 of the 4 modules, and three tries reported 3 out of 4. Different modules disappeared from the results at different times.

In a second series of 30 discovery request, all four modules were found 100% of the time.

It appears that the changes made to the discovery process greatly improved the reliability of the results.

@dbetz
Copy link
Collaborator

dbetz commented Mar 28, 2016

Thanks for testing. Maybe I should change the timeout to something longer than 250ms to see if that helps. How about three attempts separated by 500ms? That means the entire discovery process would complete in 1.5 seconds? Or maybe four at 500ms. Will 2 seconds be acceptable?

@PropGit
Copy link
Contributor Author

PropGit commented Mar 28, 2016

To answer that, here's the same related response I just sent in email (copied here for history):

Lets try keeping it at 250 ms timeout, the duplicate responses are already filtered out by proploader and they don't seem to happen too often in my tests (that I can tell).

Also, please make it attempt not a static 4 tries, but rather have it keep trying until 3 contiguous tries don't result in any responses from previously unknown modules. There are some important implications in this:

  • A network devoid of any active modules will cause it to give up after the first 3 tries. Pretty quick result.
  • A network full of many active modules may cause it to hear from a few unknown modules each successive attempt, requiring more than a fixed number of attempts, finding new identities along the way, and giving up only after it doesn't learn of more with due dilligence. This means discovery of many modules may take longer than discovery of few (or no) modules, but it will be worth it.
  • Responses to the last requests by already-known modules, and responses that don't contain proper expected content in their payloads (perhaps from some other network device responding to the broadcast) are treated as non-responses; ensuring an exit from the discovery process loop.

The above means the discovery process will take a minimum of 750 ms (for a network with no active modules), and somewhere around 3.25 seconds when there are a lot of them (assuming there are 200 modules and the host only hears from about 20 new modules upon each transmitted discovery request).

In my tests, on occasion I see three module addresses appear right away... then a brief (maybe 1/4 second) delay, followed by the fourth module address. That's the kind of thing I'd expect to see if the fourth module's response collided the first time, then was received the next time. However; there are still cases where only two or, more often, three modules that are "found" in the entire process... it's curious why the fourth was never found despite a sizable delay following the discovery of the first three (where I'm assuming it's sending out more discovery requests).

@PropGit
Copy link
Contributor Author

PropGit commented Mar 28, 2016

Questions:

  • Now that this is working so well, will you be adding the suggested leading text in the payloads of the discovery requests and responses? Or did you have another thought in mind to distinguish them from other potential traffic on the same port(s)?
  • What is the 0x00000000 that is at the start of each request?

@dbetz
Copy link
Collaborator

dbetz commented Mar 29, 2016

The 0x00000000 at the start of each request marks it as a request and not a response. The response is a JSON string that starts with "{" and hence will never be zero.

@PropGit
Copy link
Contributor Author

PropGit commented Mar 29, 2016

[Using 2016-03-28 firmware and proploader.exe from 2016-03-28 03:09 pm PST]

I've been studying the packet traffic during discovery, looking for any oddities during cases where only three of my four active modules are discovered. Here's what I found:

  • Unlike when I was testing with older firmware and software, with my current tests using the latest firmware and software, it appears the module that isn't discovered is always the same one. This is different than what I've seen before (where the missing module(s) were different and random)
    • I have four modules (192.168.1.117, ...127, ...128, and ...129) on the network. .117 is the one that goes missing from the discovery occasionally.
  • The packet traffic clearly shows that at the moment proploader misses .117, the .117 module never replied at all. Proploader is not missing a response, there truly is no response for Proploader to see. And Proploader is clearly following the protocol and timing we defined.
  • Then I started a constant ping process to watch for any loss of connection with the module and guess what I found? About every 22 seconds, the .117 module seems to disappear; not responding to a ping request, however it does continue to flash the Associate LED at the same rate as normal, so it does not appear that it loses contact with my access point.
  • More odd things:
    • The moment of loss does not coincide with the module's moment of association; instead, the moment of loss happens like clockwork independent of the module's time awake as long as it's time away from the network (which I'm achieving via power-cycling) is relatively short (around 5 seconds or so)
      • Exception: If I leave the module powered-down for > 20 seconds, after powering up, there is no loss in connection (ping response) for 2 min, and then the 22 second loss cycle begins again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants