-
Notifications
You must be signed in to change notification settings - Fork 16
codes fattree
CODES dragonfly network model can be configured using the fattee config file (currently located in codes/src/network-workloads/conf). Below is an example config file:
MODELNET_GRP
{
repetitions="12";
server="4";
modelnet_fattree="4";
fattree_switch="3";
}
PARAMS
{
....
ft_type="0";
num_levels="3";
switch_count="12";
switch_radix="8";
....
}
The first section, MODELNET_GRP specifies the LP types, the number of LPs per type, and their configuration. In the above case, there are 12 repetitions of 4 server LPs, 4 fat tree network node/terminal LPs and 3 fat-tree switch LPs. Each repetition represents a leaf level switch, nodes connected to it, and higher level switches that may be needed to construct the fat-tree. The 'fattree_switch' parameter indicates there are 3 levels to this fat tree and each repetition will have one switch from each level. This configuration will create a total of (fattree_switch)repetitions=123=36 switch LPs, with 'fattree_switch' many switch LPs per level.
Variable | Description | Calculation |
---|---|---|
k | Switch Radix | User selected |
N | Number of terminals | User selected |
Nl | Number of levels of switches | User selected |
Ns | Number of switches per pod | k/2 |
Np | Number of pods | ceil( N/[ Ns*( k/2 ) ] ) |
Ne | Number of edge switches | Np * Ns |
Na | Number of aggregate switches | Ne |
Nc | Number of core switches | ceil{ [ Ns*( k/2 ) ] / floor( k/Np ) } |
Lr | Number of link repetitions between any core and aggregate level switch | floor( k/Np ) |
MODELNET_GRP | Description | Value | Note |
---|---|---|---|
repetitions | Number of sets of the given LPGROUP | Ne | |
server | Number of server workload LPs per repetition | m*( k/2 ) | m is number of servers per terminal |
modelnet_fattree | Number of terminal LPs per repetition | k/2 | |
fattree_switch | Number of switch LPs per repetition | k/2 |
PARAMS | Description | Value | Note |
---|---|---|---|
ft_type | Number of sets of the given LPGROUP | 0 or 1 | type 0 currently only supports Nl=3, 0:pruned, 1:standard |
num_levels | Number of server workload LPs per repetition | 2 or 3 | num_levels = Nl |
switch_count | Number of switches per each edge and aggregate level | Ne | |
switch_radix | Number of ports per switch | k | |
routing | Intra-rail routing algorithm | {adaptive, static} | |
terminal_radix | Number of rails connected to each terminal | User selected | only supported on fattree-multirail branch |
rail_routing | Inter-rail injection policy | {adaptive, random} | only supported on fattree-multirail branch |
The fat-tree layout is composed of typically two or three switch
levels, where all switches
have as many uplinks as downlinks.
Hence, for a given radix of k, each switch will have k/2 links to
switches in upper levels and k/2 links to switches or compute
nodes in lower levels. For cost-saving purposes, the core level uses half the number of switches with k
downlinks each. Having an equal number of links in both directions allows
each node to communicate via a unique
path in the network. Theoretically, this results in the full-bisection bandwidth,
making the fat-tree a popular choice for modern HPC and data center
networks.
The pruned ft_type starts with the standard full fat tree and then removes pods and adjusts core-aggregate switch connections
as needed to drop the total node/terminal count in the system. This approach still maintains
full bisection bandwidth. Knowing a full standard fat tree uses k pods of k/2 switches per
pod (k/2 aggregate switches and k/2 edge switches) and each edge switch connects to k/2 terminals,
then each pod connects to (k/2)(k/2) terminals. Therefore, the number of pods needed to get
N-many terminals using the pruned ft_type is Np = ceil(N/[(k/2)(k/2)]). So the config file
should have "repetitions" = "switch_count" = Ne = Np*(k/2). In the figure below, darker colored lines
between Core and Aggregate levels indicate a bundle of two links.
The fattree-multirail branch supports multi-rail fat-tree networks. Multi-rail networks can be deployed to utilize multiple network interface cards (NICs) to gain access to additional network planes. We assume that each network plane has the same fat-tree topology construction/layout. All rails, and their corresponding planes, are independent of one another. Furthermore, an additional rail injection layer is needed to distribute the traffic from the terminals across the available rails. Currently, "adaptive" and "random" methods are available. Adaptive senses rail congestion on the terminals and routes along the rail with less congestion. In the case of ties, the first rail is selected. Random selects the rail from a uniform distribution.
- packet_size, chunk_size (ideally kept same)
- modelnet_scheduler - NIC message scheduler
- modelnet_order=( "fattree" );
- router_delay : delay caused by switched in ns
- num_levels : number of levels in the fattree (same as fattree_switch)
- switch_count : number of leaf level switches (same as repetitions)
- switch_radix : radix of the switches
- vc_size : size of switch VCs in bytes
- cn_vc_size : size of VC between NIC and switch in bytes
- link_bandwidth, cn_bandwidth : in GB/s
If static routing is chosen, two more PARAMS must be provided:
- routing_folder : folder that contain lft files generated using method described below.
- dot_file : name used for dotfile generation in the method described below.
(dump_topo should be set to 0 or not set when during simulations)
To generate static routing tables, first do an "empty" run to dump the topology of the fat-tree by setting the following PARAMS:
- routing : static
- routing_folder : folder to which topology files should be written
- dot_file : prefix used for creating topology files inside the folder
- dump_topo : 1
When dump_topo is set, the simulator dumps the topology inside the folder specified by routing_folder and exits. Next, follow these steps created by Jens Domke to generate the routing tables stored as LFT files:
Prerequisite: sudo apt install pkg-config tcl-dev graphviz-dev tcl8.6-dev tk8.5-dev gawk) There are some dependent packages that should be installed depends on your system. If you cannot pass the following step 1, you should check /home/peixin/simulation/ibutils/configure.log to see if you need to install some other packages.
(you should replace $P_PATH with your path)
-
Install fall-in-place toolchain: (patch files can be found in src/util/patches folder of CODES):
a. wget http://htor.inf.ethz.ch/sec/fts.tgz
b. tar xzf fts.tgz
c. cd fault_tolerance_simulation/
d. rm 0001-.patch 0002-.patch 0003-.patch 0004-.patch 0005-*.patch
e. tar xzf $P_PATH/sar.patches.tgz
f. wget http://downloads.openfabrics.org/management/opensm-3.3.20.tar.gz
g. mv opensm-3.3.20.tar.gz opensm.tar.gz
h. wget http://downloads.openfabrics.org/ibutils/ibutils-1.5.7-0.2.gbd7e502.tar.gz
i. mv ibutils-1.5.7-0.2.gbd7e502.tar.gz ibutils.tar.gz
(if using ubuntu, replace previous two commands with:
wget https://launchpad.net/ubuntu/+archive/primary/+files/ibutils_1.5.7.orig.tar.gz
mv ibutils_1.5.7.orig.tar.gz ibutils.tar.gz)
j. wget http://downloads.openfabrics.org/management/infiniband-diags-1.6.7.tar.gz
k. mv infiniband-diags-1.6.7.tar.gz infiniband-diags.tar.gz
l. wget https://www.openfabrics.org/downloads/management/libibmad-1.3.12.tar.gz
m. mv libibmad-1.3.12.tar.gz libibmad.tar.gz
n. wget https://www.openfabrics.org/downloads/management/libibumad-1.3.10.2.tar.gz
o. mv libibumad-1.3.10.2.tar.gz libibumad.tar.gz
p. patch -p1 < $P_PATH/fts.patch
q. vim simulate.py (search for print and find step 6, then there are 2 ./configure,
add —with-tk-lib --with-graphviz-lib just before the '--with-osm' string for both)
r. ./simuate.py -s
```
-
Add LFT creating scripts to the fall-in-place toolchain.
a. cd $HOME/simulation/scripts
b. patch -p1 < $P_PATH/lft.patch
c. chmod +x post_process_*
d. chmod +x create_static_lft.sh
```
-
Choose a routing algorithm which should be used by OpenSM (possible options: updn, dnup, ftree, lash, dor, torus-2QoS, dfsssp, sssp)
a. export OSM_ROUTING="ftree"
b. ~/simulation/scripts/create_static_lft.sh routing_folder dot_file
```
(in above, routing_folder and dot_file should be same as the one used during the run used to dump the topology). Now, the routing table stored as LFT files should be in the routing_folder.