Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

feat(51/WAKU2-RELAY-SHARDING): Autosharding update #607

Merged
merged 20 commits into from
Aug 17, 2023
115 changes: 65 additions & 50 deletions content/docs/rfcs/51/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ category: Standards Track
tags:
editor: Daniel Kaiser <[email protected]>
contributors:
- Simon-Pierre Vivier <[email protected]>
---

# Abstract
Expand Down Expand Up @@ -38,10 +39,6 @@ This document also covers discovery of topic shards.
It is RECOMMENDED for App protocols to follow the naming structure detailed in [23/WAKU2-TOPICS](/spec/23/).
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
With named sharding, managing discovery falls into the responsibility of apps.

The default Waku pubsub topic `/waku/2/default-waku/proto` can be seen as a named shard available to all app protocols.



From an app protocol point of view, a subscription to a content topic `waku2/xxx` on a shard named /mesh/v1.1.1/xxx would look like:

`subscribe("/waku2/xxx", "/mesh/v1.1.1/xxx")`
Expand All @@ -56,14 +53,13 @@ Static shards are managed in shard clusters of 1024 shards per cluster.
Waku static sharding can manage $2^16$ shard clusters.
Each shard cluster is identified by its index (between $0$ and $2^16-1$).

A specific shard cluster is either globally available to all apps (like the default pubsub topic),
A specific shard cluster is either globally available to all apps,
specific for an app protocol,
or reserved for automatic sharding (see next section).

> *Note:* This leads to $2^16 * 1024 = 2^26$ shards for which Waku manages discovery.

App protocols can either choose to use global shards, or app specific shards.
(In future versions of this document, automatic sharding, described in the next section, will become the default.)

Like the [IANA ports](https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml),
shard clusters are divided into ranges:
Expand Down Expand Up @@ -173,45 +169,80 @@ This example node is part of shards `13`, `14`, and `45` in the Status main-net

# Automatic Sharding

> *Note:* Automatic sharding is not yet part of this specification.
This section merely serves as an outlook.
A specification of automatic sharding will be added to this document in a future version.
Autosharding select shards automatically and is the default behaviour for shard choice but opting out is always possible.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
Shards (pubsub topics) not longer have to be chosen, they are instead computed from content topics with the procedure below.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved

## Rendezvous Hashing

Also known as the Highest Random Weight (HRW) method, has many properties useful for shard selection.
- Distributed agreement. Local only computation without communication.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
- Load balancing. Content topics are spread as equally as possible to shards.
- Minimal disruption. Content topics switch shards infrequently.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
- Low overhead. Easy to compute.

### Algorithm

For each shard,
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
hash using Sha2-256 the concatenation of
the content topic `application` field (N UTF-8 bytes),
`version` (N UTF-8 bytes),
the cluster index (2 bytes) and
the shard index (2 bytes)
take the first 64 bits of the hash,
divided by 2^64,
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
compute the natural logarithm of this number then
take the negative of the weight (default 1.0) divided by it.
Finally, sort the values and pick the shard with the highest one.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved

## Content Topic Prefixes
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
So that apps can manipulate the shard selection, 2 prefixes CAN be added to content topics.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
Generation & bias.
When omitted default values are used.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved

- Short format `/application/version/subject/encoding`
- Long format `/generation/bias/application/version/subject/encoding`

### Topic design
Content topics have 2 purposes filtering and routing.
Filtering is done by changing the `subject` field as this part is not hashed, it will not affect routing (shard selection).
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
The `application` and `version` fields do affect routing.
Application designer should understand that using different `application` field has a cost.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
It increase the traffic a node has to relay when subscribed to all topics but can benefit nodes that only use a subset of topics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this strictly true? In other words, if an application uses multiple application fields the impact is that the traffic for that application will be routed in different shards which adds routing complexity and may increase relayed traffic. However, the application could still e.g. use filter to subscribe to content from all these shards without relaying, but this adds complexity.

I think from this RFC's perspective there is no definitive knowledge about implied relay (or not) of different shards - this happens at another layer - but we could add that applications SHOULD consider the added complexity and possible bandwidth implication of possibly splitting their traffic across several shards by using multiple application fields. In fact, the recommendation is that application designers SHOULD group traffic that needs to be processed together with the same application field and SHOULD use different application fields for traffic that can be processed separately, to increase scalability.

Copy link
Contributor Author

@SionoiS SionoiS Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from this RFC's perspective there is no definitive knowledge about implied relay (or not) of different shards

True and it also apply to this below.

application designers SHOULD group traffic that needs to be processed together with the same application field and SHOULD use different application fields for traffic that can be processed separately, to increase scalability.

We can't make that statement either. It's a case by case basis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but in the latter case I think the SHOULD implies that there's a general best practice scenario here and we might want to be somewhat helpful to those completely new to the concept. But no major objection to leaving this out.


### Generation
Monotonously increase and indirectly refer to the total number of shards of the Waku network.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved

Automatic sharding is a method for scaling Waku relay in the number of (smaller) content topics.
It automatically maps Waku content topics to pubsub topics.
Clients and protocols building on Waku relay only see content topics, while Waku relay internally manages the mapping.
This provides both scaling as well as removes confusion about content and pubsub topics on the consumer side.
Default: `0`

From an app point of view, a subscription to a content topic `waku2/xxx` using automatic sharding would look like:
The first generation (zero) use 8 shards in total.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
This document will be updated with future generation shard numbers.

`subscribe("/waku2/xxx", auto=true)`
Community consensus should be reach before increasing the generation and shards in the network as it would affect everyone.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved

The app is oblivious to the pubsub topic layer.
(Future versions could deprecate the default pubsub topic and remove the necessity for `auto=true`.)
### Bias
Bias is used to skew the priority of shards via weights.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
Unspecified for now but may be used in the future.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved

*The basic idea behind automatic sharding*:
Content topics are mapped using [consistent hashing](https://en.wikipedia.org/wiki/Consistent_hashing).
Like with DHTs, the hash space is split into parts,
each covered by a Pubsub topic (mesh network) that carries content topics which are mapped into the respective part of the hash space.
Default: `unbiased`

There are (at least) two issues that have to be solved: *Hot spots* and *Discovery* (see next subsection).
## Problems

Hot spots occur (similar to DHTs), when a specific mesh network becomes responsible for (several) large multicast groups (content topics).
### Hotspots

Hot spots occur (similar to DHTs), when a specific mesh network (shard) becomes responsible for (several) large multicast groups (content topics).
SionoiS marked this conversation as resolved.
Show resolved Hide resolved
The opposite problem occurs when a mesh only carries multicast groups with very few participants: this might cause bad connectivity within the mesh.
Our research goal here is finding efficient ways of distribution.
We could get inspired by the DHT literature.
We also have to consider:
If a node is part of many content topics which are all spread over different shards,
the node will potentially be exposed to a lot of network traffic.

## Discovery
The current autosharding methods cannot solve this problem.

A new version of autosharding based on network traffic mesurements could be designed to migrate content topics from shards to shards but would require further research and development.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved

### Discovery

For the discovery of automatic shards this document specifies two methods (the second method will be detailed in a future version of this document).

The first method uses the discovery introduced above in the context of static shards.
The index range `49152 - 65535` is reserved for automatic sharding.
Each index can be seen as a hash bucket.
Consistent hashing maps content topics in one of these buckets.
Each cluster index can be seen as a hash bucket.
Rendezvous hashing maps content topics in one of these buckets.
SionoiS marked this conversation as resolved.
Show resolved Hide resolved

The second discovery method will be a successor to the first method,
but is planned to preserve the index range allocation.
Expand Down Expand Up @@ -239,25 +270,9 @@ We will add more on security considerations in future versions of this document.
## Receiver Anonymity

The strength of receiver anonymity, i.e. topic receiver unlinkablity,
depends on the number of content topics (`k`) that get mapped onto a single pubsub topic (shard).
depends on the number of content topics (`k`), as a proxy for the number of peers and messages, that get mapped onto a single pubsub topic (shard).
For *named* and *static* sharding this responsibility is at the app protocol layer.

## Default Topic

Until automatic sharding is fully specified, (smaller) Apps SHOULD use the default PubSub topic unless there is a good reason not to,
e.g. a requirement to scale to large user numbers (in a rollout phase, the default pubsub topic might still be the better option).

Using a single PubSub topic ensures a connected network, as well as some degree of metadata protection.
See [section on Anonymity/Unlinkability](/spec/10/#anonymity--unlinkability).

Using another pubsub topic might lead to

- weaker metadata protection
- connectivity problems if there are not enough nodes within the respective pubsub mesh
- store nodes might not store messages for the chosen pubsub topic

Apps that use named (not the default) or static sharding likely have to setup their own infrastructure nodes which may render the application less robust.

# Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
Expand All @@ -272,7 +287,7 @@ Copyright and related rights waived via [CC0](https://creativecommons.org/public
* [51/WAKU2-RELAY-SHARDING](/spec/51/)
* [Ethereum ENR sharding bit vector](https://github.com/ethereum/consensus-specs/blob/dev/specs/altair/p2p-interface.md#metadata)
* [Ethereum discv5 specification](https://github.com/ethereum/devp2p/blob/master/discv5/discv5-theory.md)
* [consistent hashing](https://en.wikipedia.org/wiki/Consistent_hashing).
* [Rendezvous Hashing](https://www.eecs.umich.edu/techreports/cse/96/CSE-TR-316-96.pdf)
* [Research log: Waku Discovery](https://vac.dev/wakuv2-apd)
* [45/WAKU2-ADVERSARIAL-MODELS](/spec/45)