Skip to content

Latest commit

 

History

History
323 lines (245 loc) · 14.8 KB

ips.md

File metadata and controls

323 lines (245 loc) · 14.8 KB

Intelligent Peer Sharing (IPS)

Objective

Extend the crawler so that it is capable of recommending a peerlist optimized for another node’s maximum benefit.

Concept

To achieve the objective, Intelligent Peer Sharing has been introduced as a feature of crunchy. Crunchy works over „samples” generated by crawler. A sample consists of a list of node data, each node containing a socket address and list of connections (or peers). Crunchy uses the data to calculate each node’s metrics and then runs the IPS algorithm in order to to select peers to be added to (or deleted from) each node.

The Algorithm

The Algorithm is divided into two main parts: one for security and one for optimization.

The main algorithm itself contains several steps and conditions used to choose a new peerlist for a specific node. The Algorithm works on each node independently, so previous changes to the network -- i.e., previous peerlists generated -- are not taken into account. This is because we can't be certain that previous nodes apply those peerlists.

Each peerlist is calculated over the original network. Due to this, it is proposed to limit the number of peers changed for each node at once, in order to limit the influence of potential bad decisions on the network. Changing a small amount of nodes at once creates a more evolutionary approach to network shaping.

The algorithm code is in ips/algorithm.rs and is extensively commented.

Factors

The following factors are taken into account when talking about the network:

  • degree – a count, representing how many direct, ‘one hop’ connections each node has to other nodes in the network. IPS is trying to keep this measurement close to the network average for degree, and attempt to construct a peerlist with neighbor counts somewhere close to the middle between the current and average degrees.
  • betweenness – broadly, this tells us us how often a node lies on a path between other network nodes. It is computed by identifying all the shortest paths and then counting how many times each node falls on one. IPS tries to keep it close to the network average for the node; node should search peers with a high betweenness value as it means that the peer is often on the shortest path.
  • eigenvector – this tells how much influence node's neighbours have.
  • closeness – this measure calculates the shortest paths between all nodes, then assigns each node a score based on its sum of shortest paths. This is not very relevant here, as neither the density or sparseness of a network is intrinsically bad. IPS tries to keep its own centrality high and connect to peers with high closeness (if the MCDA weights allow it).

All of the above factors are written to the IPS result log, allowing one to check how a particular run influenced the network: the state before may be compared to the the state afterwards.

Security checks

One of the most important properties of a network topology is the presence or absence of network islands. Presence of islands may influence every other network or node parameter that we are observing. Merging two massive islands can be risky and should not be done automatically. The islands could have been disconnected for a long time and produced a different history of their blockchain. IPS can detect such situations, and notify the user about the existence of islands.

Next, IPS checks if the network can be easily fragmented by attacking a given percent of the nodes and if so, preventing such cases by creating new connections between their neighbours. Nodes selected by that fragmentation simulation are chosen from the "hot" nodes, which means the nodes with highest betweenness factor.

The final checks are related to the network bridges - such graph edges are identified and algorithm prevents their removal to ensure there will be no new islands.

Optimization

Selection is based on the "beauty" contest of the nodes: each node is evaluated based on its degree, betweenness, closeness and eigenvector centrality. Then, if requested, the ranking is updated with the geolocation factor. Each factor has its own weight that is used to determine the factor's importance to the calculation of the final ranking. That enables testing different approaches to the selection of the peers, without recompiling the code.

At its core, the Algorithm is a multiple criteria decision analysis (MCDA) problem represented in criterion space, where criteria are evaluated instead of possible decisions. The first pass is done using a weighted sums method which allows us to treat the problem as a single criterion problem. One modification is the allowing the use of negative sums (the original method assumes all weights are positive).

Treating the problem as an MCDA allows us to adapt the Algorithm to different types of networks: e.g., more sparse, more dense, more decentralized or more centralized strategies.

Each node is rated based on the sum of factors multiplied by their weights. Each factor is normalized (X value to X’ normalized) to be able to create single ranking. The rating is computed as follows:

rating = D * Dw + B * Bw + C * Cw + E * Ew + L * Lw

where:
D - degree
Dw - degree weight
B - betweenness
Bw - betweenness weight
C - closeness
Cw - closeness weight
E - eigenvector
Ew - eigenvector weight
L - location rating
Lw - location weight

Note that weights can be positive or negative giving ability to promote higher (large positive weight) or lower values (small positive weight) but also yield a larger or smaller penalty (negative weights) to the node’s overall ranking.

Location

The node geographical location can be one of the factors taken into account when creating a new peerlist. Here are two approaches, or rather, opposing strategies:

  • prefer distant nodes

    pros:

    • distant nodes are not likely to suffer damage at the same time caused by blackouts, natural disasters etc.;
    • ISP separation – distant nodes are not likely to run under the same ISP (who can suffer technical problems – DNS etc);

    cons:

    • connection can be worse in terms of speed, latency and jitter;
  • refer closer nodes

    pros:

    • often the connection between closer nodes is better, if data is partitioned by location it’s more probable to have it closer to the data recipients;

    cons:

    • likely to suffer the same problems – power failures, ISP problems etc;

It is also possible to set location rating to off, in order to not take location into account.

Configuration

All inputd to IPS are configurable via an external file. Currently, IPS is configured using the crunchy configuration file. The IPS section in the crunchy configuration file looks like this:

[ips_config]
peer_file_path = "testdata/peers.json"      #place to put output file
log_path = "ips.log"                        #place for log file
geolocation = "PreferCloser"                #location ranking should prefer closer or farther peers (Off, PreferDistant, PreferCloser)
geolocation_minmax_distance_km = 1000       #minimum or maximum distance in km for geolocation ranking
change_at_least = 1                         #minimum number of peers to change
change_no_more = 2                          #maximum number of peers to change
bridge_threshold_adjustment = 1.25          #adjustment to bridge threshold

[ips_config.mcda_weights]
location = 0.3
degree = -0.3
eigenvector = 0.2
betweenness = -0.3
closeness = 0.1

The first section contains basic IPS configuration and the second one, weights to be used by the MCDA algorithm. A sample config is placed in the testadata directory.

The user may easily adjust weights for each MCDA factor to experiment with different strategies.

Final remarks

  • Performance is currently not taken into account. The only metric that gives any insight into performance is handshake_time, which only gives information about time elapsed between starting a connection and successful handshake between the node and the crawler. That is a one time metric and may be affected by many factors like network delays or host load peak at the moment. A single measure may lead to false conclusions about the real performance of another node or network connection. Moreover, network performance between the crawler and the node tells nothing about possible performance of node_a to node_b.
  • A node’s own performance (hardware resources, system load etc.) can be a factor of possible node utilization in the network – either the Algorithm should try to put more traffic through that node or attempt to place it farther away from the main traffic flow. Currently there is no way to obtain any of that information from the node in the external network. Still, there may be potential here to find an enhancement or improvement.

Sample

Sample results from IPS run:

IPS algorithm started...
Checking for nodes connected to themselves...
172.218.177.148:16125 is connected to itself.
154.53.63.9:16125 is connected to itself.
146.59.124.83:16125 is connected to itself.
37.187.88.103:16125 is connected to itself.
54.37.141.22:16125 is connected to itself.

...
[nodes connected to themself list]
...

37.59.134.165:16125 is connected to itself.
142.167.129.14:16125 is connected to itself.
65.108.99.214:16125 is connected to itself.
Generating initial network state and its statistics... 
Statistics for the initial network:
----------------------------------------
Nodes count: 6729

Degree measures:
Average: 1673.378510922871
Median: 1775
Min: 1, max: 2948, delta: 2947

Betweenness measures:
Average: 0.0006914887442478738
Median: 0.0007390585712156514
Min: 0, max: 0.002065442852915205, delta: 0.002065442852915205

Closeness measures:
Average: 2.0009819311242922
Median: 2.000098387783965
Min: 1.9989458610770605, max: 2.9992496962479973, delta: 1.0003038351709368

Eigenvector measures:
Average: 1.000000000000002
Median: 1.056610881384425
Min: 0.0005267475705698542, max: 1.7232354729355281, delta: 1.7227087253649582
----------------------------------------

Generated initial state and statistics in 383 s
IPS detected no islands
IPS detected no fragmentation possibility even when top nodes would be disconnected
The MCDA procedure is starting...
All IPS computations done in 412 s from IPS start
Statistics for the final network:
----------------------------------------
Nodes count: 6729

Degree measures:
Average: 1668.1619854361718
Median: 1776
Min: 2, max: 2910, delta: 2908

Betweenness measures:
Average: 0.0007233121796668225
Median: 0.0007800175457913357
Min: 0, max: 0.002065442852915205, delta: 0.002065442852915205

Closeness measures:
Average: 2.000953117674391
Median: 1.9996456008839623
Min: 1.9989458610770605, max: 2.9992496962479973, delta: 1.0003038351709368

Eigenvector measures:
Average: 1.0000000000000022
Median: 1.061306347054873
Min: 0.001138872735927244, max: 1.6923985711437681, delta: 1.691259698407841
----------------------------------------

Comparing if network parameters got changed on plus or minus:
Deltas for given statistics pair:
----------------------------------------
Nodes count: 0 (0.000%)

Degree measures:
Average: -5.216525486699311 (-0.312%)
Median: 1 (0.056%)
Min: 1 (100.000%), max: -38 (-1.289%), delta: -39 (-1.323%)

Betweenness measures:
Average: 0.00003182343541894867 (4.602%)
Median: 0.00004095897457568427 (5.542%)
Min: 0 (0.000%), max: 0 (0.000%), delta: 0 (0.000%)

Closeness measures:
Average: -0.00002881344990113277 (-0.001%)
Median: -0.0004527869000026108 (-0.023%)
Min: 0 (0.000%), max: 0 (0.000%), delta: 0 (0.000%)

Eigenvector measures:
Average: 0.0000000000000002220446049250313 (0.000%)
Median: 0.004695465670447874 (0.444%)
Min: 0.0006121251653573898 (116.208%), max: -0.03083690179176002 (-1.789%), delta: -0.031449026957117265 (-1.826%)
----------------------------------------

IPS has been working for 8848 seconds

IPS algorithm started...
Checking for nodes connected to themselves...
37.59.129.121:16125 is connected to itself.
51.178.98.198:16125 is connected to itself.
5.9.74.158:8233 is connected to itself.
138.201.252.11:8233 is connected to itself.
89.58.53.161:16125 is connected to itself.
5.161.81.6:16125 is connected to itself.

...
[nodes connected to themself list]
...

142.132.144.56:16125 is connected to itself.
57.128.152.20:16125 is connected to itself.
Generating initial network state and its statistics... 
Statistics for the initial network:
----------------------------------------
Nodes count: 2472

Degree measures:
Average: 347.84021035598704
Median: 362
Min: 4, max: 468, delta: 464

Betweenness measures:
Average: 0.00020269540490710155
Median: 0.00021315968556486295
Min: 0.000000011402470586985047, max: 0.0005637115400558378, delta: 0.0005637001375852508

Closeness measures:
Average: 2.00092942092042
Median: 1.9982984574223353
Min: 1.996533523649682, max: 2.995343149165006, delta: 0.9988096255153243

Eigenvector measures:
Average: 1.000000000000001
Median: 1.0408706699768735
Min: 0.009224464066643153, max: 1.338099762915545, delta: 1.3288752988489019
----------------------------------------

Generated initial state and statistics in 16 s
IPS detected no islands
IPS detected no fragmentation possibility even when top nodes would be disconnected
The MCDA procedure is starting...
All IPS computations done in 18 s from IPS start
Statistics for the final network:
----------------------------------------
Nodes count: 2472

Degree measures:
Average: 349.1383495145631
Median: 364
Min: 6, max: 465, delta: 459

Betweenness measures:
Average: 0.00020262725860806008
Median: 0.0002135033940386251
Min: 0.000000011402470586985047, max: 0.0005637115400558378, delta: 0.0005637001375852508

Closeness measures:
Average: 2.0007477447791993
Median: 1.9994259982940346
Min: 1.996533523649682, max: 2.995343149165006, delta: 0.9988096255153243

Eigenvector measures:
Average: 0.9999999999999994
Median: 1.0410820150716993
Min: 0.017614444104272146, max: 1.3234689707907403, delta: 1.3058545266864683
----------------------------------------

Comparing if network parameters got changed on plus or minus:
Deltas for given statistics pair:
----------------------------------------
Nodes count: 0 (0.000%)

Degree measures:
Average: 1.2981391585760775 (0.373%)
Median: 2 (0.552%)
Min: 2 (50.000%), max: -3 (-0.641%), delta: -5 (-1.078%)

Betweenness measures:
Average: -0.00000006814629904147132 (-0.034%)
Median: 0.0000003437084737621665 (0.161%)
Min: 0 (0.000%), max: 0 (0.000%), delta: 0 (0.000%)

Closeness measures:
Average: -0.0001816761412207768 (-0.009%)
Median: 0.0011275408716993063 (0.056%)
Min: 0 (0.000%), max: 0 (0.000%), delta: 0 (0.000%)

Eigenvector measures:
Average: -0.0000000000000016653345369377348 (-0.000%)
Median: 0.00021134509482578778 (0.020%)
Min: 0.008389980037628994 (90.954%), max: -0.014630792124804781 (-1.093%), delta: -0.02302077216243359 (-1.732%)
----------------------------------------

IPS has been working for 80 seconds