Skip to content
This repository has been archived by the owner on Jun 3, 2020. It is now read-only.

Attempt to reuse existing streams when adding an archive #4

Open
RangerMauve opened this issue Aug 15, 2018 · 10 comments
Open

Attempt to reuse existing streams when adding an archive #4

RangerMauve opened this issue Aug 15, 2018 · 10 comments
Labels
enhancement New feature or request

Comments

@RangerMauve
Copy link

Beaker is already capable of replicating mutliple dat archives over a single replication stream.

I propose that it should take advantage of this by attempting to load feeds from existing connections when opening a new archive. Essentially, beaker should track replication new streams here, and attempt to send the "feed" message to them once the archive is ready around here.

This would be useful for the cases where you're participating in a social network where the peers you've connected to are likely also to have the archives of mutual "friends". In that scenario you can start replicating data for the archive without having to go through the discovery swarm (and likely finding that same peer again). This should improve bandwidth by reducing the number of connections that need to be made for a successful replication.


Some considerations:

  • Will the added "feed" messages be a significant overhead?
  • Will having invalid "feed" messages cause problems for clients?
  • What are the privacy implications of notifying your peers about discovery keys they might not know about?
@pfrazee pfrazee added the enhancement New feature or request label Aug 20, 2018
@pfrazee
Copy link
Member

pfrazee commented Aug 20, 2018

No comment on this yet, just want to say that there's some potential to this and I'm thinking about it

@RangerMauve
Copy link
Author

Yeah, right now my biggest concern is malicious actors being able to map out social graphs based on these events.

Basically:

  • find a key you want to stalk
  • Pretend to be a peer for this key
  • Listen for when people load new archives and try to get them from you via the discover key
  • Keep track of when it happened
  • Find peers that are seeding that discovery key
  • Build a social graph for the targets based on IP addresses

Not sure how useful something like this would be. You can't see what data they're sharing, just who they're talking to. Plus, it's only if you can convince somebody to connect to you for a key that you already know.

@pfrazee
Copy link
Member

pfrazee commented Aug 20, 2018

I was discussing this with @mafintosh. One idea we had was to use this technique on archives that are used by a datsite for assets (JS, CSS, fonts, etc). That way it would optimize first-load, and it would only reveal information about the site (instead of info about the user).

@RangerMauve
Copy link
Author

pfrazee: So you'd include references to it in the dat.json, I'd imagine?

What do you think about the idea of sharing dats over connections in a social graph setting? I think this would be useful for Fritter where you're going to be connecting to people with mutual followers a lot of the time.

@pfrazee
Copy link
Member

pfrazee commented Aug 20, 2018

We wanted to see if it could be done by other heuristics (like "oh this was referenced via a <script> tag), but explicit APIs are also an option, and that might also enable the fritter case

@RangerMauve
Copy link
Author

RangerMauve commented Aug 20, 2018

pfrazee: So if it's implicit, would a valid heuristic be that the DatArchive was opened within the page?

I.e. When opening a DatArchive on a page, attempt to replicate it through replication streams opened for that page and other dats on that page?

Then you're less likely to leak archives that aren't related to the content you're loading from. I'm not sure how this would work for seeding archives in the background, though. I guess that doesn't matter as much.

@pfrazee
Copy link
Member

pfrazee commented Aug 20, 2018

@RangerMauve it's certainly a valid heuristic, but that's a very good case of leaking information about the user. For instance, on Fritter, you'd basically be announcing your profile on load, because that's the first DatArchive opened

@pfrazee
Copy link
Member

pfrazee commented Aug 20, 2018

That said, it might still be a good temporary solution to consider.

@RangerMauve
Copy link
Author

you'd basically be announcing your profile on load

Since the hypercore protocol uses discovery keys for requesting feeds, you wouldn't be sending your profile information. Though you'd potentially be exposing all the IPs that are seeding your profile, which might identify your devices and followers IPs.

@RangerMauve
Copy link
Author

RangerMauve commented Aug 20, 2018

If somebody was already crawling fritter, they would be able to get your profile based on the discovery key, actually. 🤔

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants