codes fattree

CODES Fat tree network model

Configuring CODES dragonfly network model

CODES dragonfly network model can be configured using the fattee config file (currently located in codes/src/network-workloads/conf). Below is an example config file:

MODELNET_GRP
{
    repetitions="12";
    server="4";
    modelnet_fattree="4";
    fattree_switch="3";
} 
PARAMS
{
    ....
    ft_type="0";
    num_levels="3";
    switch_count="12";
    switch_radix="8";
    ....
}

The first section, MODELNET_GRP specifies the LP types, the number of LPs per type, and their configuration. In the above case, there are 12 repetitions of 4 server LPs, 4 fat tree network node/terminal LPs and 3 fat-tree switch LPs. Each repetition represents a leaf level switch, nodes connected to it, and higher level switches that may be needed to construct the fat-tree. The 'fattree_switch' parameter indicates there are 3 levels to this fat tree and each repetition will have one switch from each level. This configuration will create a total of (fattree_switch)repetitions=123=36 switch LPs, with 'fattree_switch' many switch LPs per level.

Fat-Tree Construction/Layout Variables

Variable	Description	Calculation
k	Switch Radix	User selected
N	Number of terminals	User selected
Nl	Number of levels of switches	User selected
Ns	Number of switches per pod	k/2
Np	Number of pods	ceil( N/[ Ns*( k/2 ) ] )
Ne	Number of edge switches	Np * Ns
Na	Number of aggregate switches	Ne
Nc	Number of core switches	ceil{ [ Ns*( k/2 ) ] / floor( k/Np ) }
Lr	Number of link repetitions between any core and aggregate level switch	floor( k/Np )

Config File Variables

MODELNET_GRP	Description	Value	Note
repetitions	Number of sets of the given LPGROUP	Ne
server	Number of server workload LPs per repetition	m*( k/2 )	m is number of servers per terminal
modelnet_fattree	Number of terminal LPs per repetition	k/2
fattree_switch	Number of switch LPs per repetition	k/2

PARAMS	Description	Value	Note
ft_type	Number of sets of the given LPGROUP	0 or 1	type 0 currently only supports Nl=3, 0:pruned, 1:standard
num_levels	Number of server workload LPs per repetition	2 or 3	num_levels = Nl
switch_count	Number of switches per each edge and aggregate level	Ne
switch_radix	Number of ports per switch	k
routing	Intra-rail routing algorithm	{adaptive, static}
terminal_radix	Number of rails connected to each terminal	User selected	only supported on fattree-multirail branch
rail_routing	Inter-rail injection policy	{adaptive, random}	only supported on fattree-multirail branch

Standard Full Fat-Tree (ft_type=1)

The fat-tree layout is composed of typically two or three switch levels, where all switches have as many uplinks as downlinks. Hence, for a given radix of k, each switch will have k/2 links to switches in upper levels and k/2 links to switches or compute nodes in lower levels. For cost-saving purposes, the core level uses half the number of switches with k downlinks each. Having an equal number of links in both directions allows each node to communicate via a unique path in the network. Theoretically, this results in the full-bisection bandwidth, making the fat-tree a popular choice for modern HPC and data center networks. Standard Full Fat-Tree k8.pdf

Pruned Fat-Tree (ft_type=0)

The pruned ft_type starts with the standard full fat tree and then removes pods and adjusts core-aggregate switch connections as needed to drop the total node/terminal count in the system. This approach still maintains full bisection bandwidth. Knowing a full standard fat tree uses k pods of k/2 switches per pod (k/2 aggregate switches and k/2 edge switches) and each edge switch connects to k/2 terminals, then each pod connects to (k/2)(k/2) terminals. Therefore, the number of pods needed to get N-many terminals using the pruned ft_type is Np = ceil(N/[(k/2)(k/2)]). So the config file should have "repetitions" = "switch_count" = Ne = Np*(k/2). In the figure below, darker colored lines between Core and Aggregate levels indicate a bundle of two links. Pruned Fat-Tree k8.png

Multiple Rails

The fattree-multirail branch supports multi-rail fat-tree networks. Multi-rail networks can be deployed to utilize multiple network interface cards (NICs) to gain access to additional network planes. We assume that each network plane has the same fat-tree topology construction/layout. All rails, and their corresponding planes, are independent of one another. Furthermore, an additional rail injection layer is needed to distribute the traffic from the terminals across the available rails. Currently, "adaptive" and "random" methods are available. Adaptive senses rail congestion on the terminals and routes along the rail with less congestion. In the case of ties, the first rail is selected. Random selects the rail from a uniform distribution.

Supported configuration parameters:

packet_size, chunk_size (ideally kept same)
modelnet_scheduler - NIC message scheduler
modelnet_order=( "fattree" );
router_delay : delay caused by switched in ns
num_levels : number of levels in the fattree (same as fattree_switch)
switch_count : number of leaf level switches (same as repetitions)
switch_radix : radix of the switches
vc_size : size of switch VCs in bytes
cn_vc_size : size of VC between NIC and switch in bytes
link_bandwidth, cn_bandwidth : in GB/s

Enabling static routing

If static routing is chosen, two more PARAMS must be provided:

routing_folder : folder that contain lft files generated using method described below.
dot_file : name used for dotfile generation in the method described below.

(dump_topo should be set to 0 or not set when during simulations)

To generate static routing tables, first do an "empty" run to dump the topology of the fat-tree by setting the following PARAMS:

routing : static
routing_folder : folder to which topology files should be written
dot_file : prefix used for creating topology files inside the folder
dump_topo : 1

When dump_topo is set, the simulator dumps the topology inside the folder specified by routing_folder and exits. Next, follow these steps created by Jens Domke to generate the routing tables stored as LFT files:

Prerequisite: sudo apt install pkg-config tcl-dev graphviz-dev tcl8.6-dev tk8.5-dev gawk) There are some dependent packages that should be installed depends on your system. If you cannot pass the following step 1, you should check /home/peixin/simulation/ibutils/configure.log to see if you need to install some other packages.

(you should replace $P_PATH with your path)

Install fall-in-place toolchain: (patch files can be found in src/util/patches folder of CODES):

a. wget http://htor.inf.ethz.ch/sec/fts.tgz
b. tar xzf fts.tgz
c. cd fault_tolerance_simulation/
d. rm 0001-.patch 0002-.patch 0003-.patch 0004-.patch 0005-*.patch
e. tar xzf $P_PATH/sar.patches.tgz
f. wget http://downloads.openfabrics.org/management/opensm-3.3.20.tar.gz
g. mv opensm-3.3.20.tar.gz opensm.tar.gz
h. wget http://downloads.openfabrics.org/ibutils/ibutils-1.5.7-0.2.gbd7e502.tar.gz
i. mv ibutils-1.5.7-0.2.gbd7e502.tar.gz ibutils.tar.gz
(if using ubuntu, replace previous two commands with: wget https://launchpad.net/ubuntu/+archive/primary/+files/ibutils_1.5.7.orig.tar.gz
mv ibutils_1.5.7.orig.tar.gz ibutils.tar.gz)
j. wget http://downloads.openfabrics.org/management/infiniband-diags-1.6.7.tar.gz
k. mv infiniband-diags-1.6.7.tar.gz infiniband-diags.tar.gz
l. wget https://www.openfabrics.org/downloads/management/libibmad-1.3.12.tar.gz
m. mv libibmad-1.3.12.tar.gz libibmad.tar.gz
n. wget https://www.openfabrics.org/downloads/management/libibumad-1.3.10.2.tar.gz
o. mv libibumad-1.3.10.2.tar.gz libibumad.tar.gz
p. patch -p1 < $P_PATH/fts.patch
q. vim simulate.py (search for print and find step 6, then there are 2 ./configure, add —with-tk-lib --with-graphviz-lib just before the '--with-osm' string for both)
r. ./simuate.py -s
```

Add LFT creating scripts to the fall-in-place toolchain.

a. cd $HOME/simulation/scripts
b. patch -p1 < $P_PATH/lft.patch
c. chmod +x post_process_*
d. chmod +x create_static_lft.sh
```

Choose a routing algorithm which should be used by OpenSM (possible options: updn, dnup, ftree, lash, dor, torus-2QoS, dfsssp, sssp)

a. export OSM_ROUTING="ftree"
b. ~/simulation/scripts/create_static_lft.sh routing_folder dot_file
```

(in above, routing_folder and dot_file should be same as the one used during the run used to dump the topology). Now, the routing table stored as LFT files should be in the routing_folder.

Home

Provide feedback

Saved searches

Use saved searches to filter your results more quickly