Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

handle relay addr dialing intelligently #118

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

vyzo
Copy link
Contributor

@vyzo vyzo commented Apr 17, 2019

We have a problem with dialing relay addrs: we dial all relays in the address set simultianeously and end up being connected to all of them.
This unnecessary increases the load in relays, and we end up initiating a bunch of parallel dials to all relay addrs. Couple that with addrsplosion, and we make a mess trying to dial relays.

This patch implements a dial scheduler that schedules relay addrs accordingly:

  • it schedules non-relay dials immediately, without any delay
  • relay addrs are scheduled with an initial 250ms or 1s depending on whether we had public addrs in the immediate set. If the immedate set was empty, then there is no delay.
  • the relay address set is split to connected and not connected
    • connected relay addrs are aggregated to /p2p/QmRelay/p2p-circuit and issued serially, with a 1.5s delay between them.
    • non-connected relay addrs are issued in groups by relay, with a 2s delay between them.

TBD: tests.

@vyzo vyzo requested review from Stebalien and raulk April 17, 2019 15:41
@ghost ghost assigned vyzo Apr 17, 2019
@ghost ghost added the status/in-progress In progress label Apr 17, 2019
@vyzo vyzo force-pushed the feat/relay-dialing branch 4 times, most recently from 1422eb8 to 7f178e1 Compare April 17, 2019 17:03
Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just feedback on the design, not on the code. I actually dislike pacing dials based on hardcoded time delays. It feels we're artificially penalising nodes with good networks.

I'd prefer a feedback loop (like in dialer v2), where the scheduling component receives the result of the dial, and reacts by emitting a new job if necessary.

That way we pace dials by reaction, not by assumption.

@vyzo vyzo force-pushed the feat/relay-dialing branch from 7f178e1 to e55ea43 Compare April 17, 2019 17:16
@raulk
Copy link
Member

raulk commented Apr 17, 2019

Implementing the above feels like we'd be poor-man mimicking dialer v2. I think I have the bandwidth this week to take it over the finish line (now that I got the huge refactor out of the way).

@vyzo
Copy link
Contributor Author

vyzo commented Apr 17, 2019

I totally agree but that's a little too hard with the current dialer, it will need a whole bunch of refactorings... which is what we are doing in dialer v2.
This is an interim solution after all with v2 in the pipeline.

@raulk
Copy link
Member

raulk commented Apr 17, 2019

How urgently do we need this? Can it wait 1 week for a cleaner solution (dialer v2)?

@vyzo
Copy link
Contributor Author

vyzo commented Apr 17, 2019

We can wait a week, there is no imminent ipfs release. But I fear v2 won't be ready by then either :p

@vyzo
Copy link
Contributor Author

vyzo commented Apr 17, 2019

And also, I don't want to rush you on this. We can live with the patch for a while.

@raulk
Copy link
Member

raulk commented Apr 17, 2019

Yeah, it's had a rocky history, due to my focus being all over the place :-( But it's clear we have increased pressure and impetus on it now, so I can commit to delivering it within a week.

@raulk
Copy link
Member

raulk commented Apr 17, 2019

^^ may be reckless to give a concrete 1 week horizon. Let's say 2.

@vyzo
Copy link
Contributor Author

vyzo commented Apr 17, 2019

ok, this patch can stay in the queue for a week or two. If it turns out v2 is not gonna be ready for a while yet longer, we can merge and go ahead.

Also, please take into account the new logic about relay dialing: We want to prioritize connected relays and aggregate their addrs.

@upperwal
Copy link

Hey @raulk

If it ease out some of your burden, you can commit the intermediate code to some branch along with the TODO's you think needs to be done for the master branch. Contributors can help you achieve those tasks. I am interested to work on the dialer v2 logic and would be happy to contribute.

@raulk
Copy link
Member

raulk commented Apr 17, 2019

@upperwal thanks for the offer, very much appreciated. Given the nature of the change, it’s best if executed by a single individual. I suggest kicking off a discussion in the forums for the community to propose projects where help is needed: https://discuss.libp2p.io

@upperwal
Copy link

@raulk 👍

@Stebalien
Copy link
Member

@vyzo should I review this?

@vyzo
Copy link
Contributor Author

vyzo commented Apr 19, 2019

@Stebalien I would say take a look for the scheduling logic, as we want it to carry over to dialer v2.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the logic here but there has to be a more general way to do this.

  • Transport preference. When constructing libp2p, users could specify some kind of policy for dialing with various transports: ordering, delays, etc.
  • Batching. I wonder if the transport interface needs to be changed to take a set of multiaddrs.

A less intrusive alternative would be to allow transports to inject their own "planner" modules that take in an existing plan and updated it (somehow).

if err != nil {
// uspecfic relay addr; push it to immediate -- note that these addrs should not be
// advertised any more, but we don't want to break older nodes or tests
immediateAddrs = append(immediateAddrs, a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we just drop this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I really don't know what to do with these!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should schedule them for last, after all specific relay addrs have been scheduled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really gain anything? This literally means: dial me through random relays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not.
Only concern is that we do "support" users requesting dials with ipfs swarm connect /p2p-circuit/ipfs/QmTarget, and we may not want to completely break that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah. What if we just hack that into a higher layer? Or just drop it? Nobody should be relying on that in production and I don't feel all that bad about breaking demos.

Copy link
Contributor Author

@vyzo vyzo Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should drop it -- but we might want to disable circuit.OptDiscovery in ipfs as well.
We can substitute it with /ipfs/QmRelay/p2p-circuit/ipfs/QmTarget anyway.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants