Skip to content

codes fattree

Nikhil edited this page Apr 29, 2019 · 26 revisions

CODES Fat tree network model

Configuring CODES dragonfly network model

CODES dragonfly network model can be configured using the fattee config file (currently located in codes/src/network-workloads/conf). Below is an example config file:

MODELNET_GRP
{
    repetitions="12";
    server="4";
    modelnet_fattree="4";
    fattree_switch="3";
} 
PARAMS
{
    ....
    ft_type="0";
    num_levels="3";
    switch_count="12";
    switch_radix="8";
    ....
}

The first section, MODELNET_GRP specifies the LP types, the number of LPs per type, and their configuration. In the above case, there are 12 repetitions of 4 server LPs, 4 fat tree network node/terminal LPs and 3 fat-tree switch LPs. Each repetition represents a leaf level switch, nodes connected to it, and higher level switches that may be needed to construct the fat-tree. The 'fattree_switch' parameter indicates there are 3 levels to this fat tree and each repetition will have one switch from each level. This configuration will create a total of (fattree_switch)repetitions=123=36 switch LPs, with 'fattree_switch' many switch LPs per level.

Fat-Tree Construction/Layout Variables

Variable Description Calculation
k Switch Radix User selected
N Number of terminals User selected
Nl Number of levels of switches User selected
Ns Number of switches per pod k/2
Np Number of pods ceil( N/[ Ns*( k/2 ) ] )
Ne Number of edge switches Np * Ns
Na Number of aggregate switches Ne
Nc Number of core switches ceil{ [ Ns*( k/2 ) ] / floor( k/Np ) }
Lr Number of link repetitions between any core and aggregate level switch floor( k/Np )

Config File Variables

MODELNET_GRP Description Value Note
repetitions Number of sets of the given LPGROUP Ne
server Number of server workload LPs per repetition m*( k/2 ) m is number of servers per terminal
modelnet_fattree Number of terminal LPs per repetition k/2
fattree_switch Number of switch LPs per repetition k/2
PARAMS Description Value Note
ft_type Number of sets of the given LPGROUP 0 or 1 type 0 currently only supports Nl=3, 0:pruned, 1:standard
num_levels Number of server workload LPs per repetition 2 or 3 num_levels = Nl
switch_count Number of switches per each edge and aggregate level Ne
switch_radix Number of ports per switch k
routing Intra-rail routing algorithm {adaptive, static}
terminal_radix Number of rails connected to each terminal User selected only supported on fattree-multirail branch
rail_routing Inter-rail injection policy {adaptive, random} only supported on fattree-multirail branch

Standard Full Fat-Tree (ft_type=1)

The fat-tree layout is composed of typically two or three switch levels, where all switches have as many uplinks as downlinks. Hence, for a given radix of k, each switch will have k/2 links to switches in upper levels and k/2 links to switches or compute nodes in lower levels. For cost-saving purposes, the core level uses half the number of switches with k downlinks each. Having an equal number of links in both directions allows each node to communicate via a unique path in the network. Theoretically, this results in the full-bisection bandwidth, making the fat-tree a popular choice for modern HPC and data center networks. Standard Full Fat-Tree k8.pdf

Pruned Fat-Tree (ft_type=0)

The pruned ft_type starts with the standard full fat tree and then removes pods and adjusts core-aggregate switch connections as needed to drop the total node/terminal count in the system. This approach still maintains full bisection bandwidth. Knowing a full standard fat tree uses k pods of k/2 switches per pod (k/2 aggregate switches and k/2 edge switches) and each edge switch connects to k/2 terminals, then each pod connects to (k/2)(k/2) terminals. Therefore, the number of pods needed to get N-many terminals using the pruned ft_type is Np = ceil(N/[(k/2)(k/2)]). So the config file should have "repetitions" = "switch_count" = Ne = Np*(k/2). In the figure below, darker colored lines between Core and Aggregate levels indicate a bundle of two links. Pruned Fat-Tree k8.png

Multiple Rails

The fattree-multirail branch supports multi-rail fat-tree networks. Multi-rail networks can be deployed to utilize multiple network interface cards (NICs) to gain access to additional network planes. We assume that each network plane has the same fat-tree topology construction/layout. All rails, and their corresponding planes, are independent of one another. Furthermore, an additional rail injection layer is needed to distribute the traffic from the terminals across the available rails. Currently, "adaptive" and "random" methods are available. Adaptive senses rail congestion on the terminals and routes along the rail with less congestion. In the case of ties, the first rail is selected. Random selects the rail from a uniform distribution.

Supported configuration parameters:

  • packet_size, chunk_size (ideally kept same)
  • modelnet_scheduler - NIC message scheduler
  • modelnet_order=( "fattree" );
  • router_delay : delay caused by switched in ns
  • num_levels : number of levels in the fattree (same as fattree_switch)
  • switch_count : number of leaf level switches (same as repetitions)
  • switch_radix : radix of the switches
  • vc_size : size of switch VCs in bytes
  • cn_vc_size : size of VC between NIC and switch in bytes
  • link_bandwidth, cn_bandwidth : in GB/s

Enabling static routing

If static routing is chosen, two more PARAMS must be provided:

  • routing_folder : folder that contain lft files generated using method described below.
  • dot_file : name used for dotfile generation in the method described below.

(dump_topo should be set to 0 or not set when during simulations)

To generate static routing tables, first do an "empty" run to dump the topology of the fat-tree by setting the following PARAMS:

  • routing : static
  • routing_folder : folder to which topology files should be written
  • dot_file : prefix used for creating topology files inside the folder
  • dump_topo : 1

When dump_topo is set, the simulator dumps the topology inside the folder specified by routing_folder and exits. Next, follow these steps created by Jens Domke to generate the routing tables stored as LFT files:

Prerequisite: sudo apt install pkg-config tcl-dev graphviz-dev tcl8.6-dev tk8.5-dev gawk) There are some dependent packages that should be installed depends on your system. If you cannot pass the following step 1, you should check /home/peixin/simulation/ibutils/configure.log to see if you need to install some other packages.

(you should replace $P_PATH with your path)

  1. Install fall-in-place toolchain: (patch files can be found in src/util/patches folder of CODES):

a. wget http://htor.inf.ethz.ch/sec/fts.tgz
b. tar xzf fts.tgz
c. cd fault_tolerance_simulation/
d. rm 0001-.patch 0002-.patch 0003-.patch 0004-.patch 0005-*.patch
e. tar xzf $P_PATH/sar.patches.tgz
f. wget http://downloads.openfabrics.org/management/opensm-3.3.20.tar.gz
g. mv opensm-3.3.20.tar.gz opensm.tar.gz
h. wget http://downloads.openfabrics.org/ibutils/ibutils-1.5.7-0.2.gbd7e502.tar.gz
i. mv ibutils-1.5.7-0.2.gbd7e502.tar.gz ibutils.tar.gz
(if using ubuntu, replace previous two commands with: wget https://launchpad.net/ubuntu/+archive/primary/+files/ibutils_1.5.7.orig.tar.gz
mv ibutils_1.5.7.orig.tar.gz ibutils.tar.gz)
j. wget http://downloads.openfabrics.org/management/infiniband-diags-1.6.7.tar.gz
k. mv infiniband-diags-1.6.7.tar.gz infiniband-diags.tar.gz
l. wget https://www.openfabrics.org/downloads/management/libibmad-1.3.12.tar.gz
m. mv libibmad-1.3.12.tar.gz libibmad.tar.gz
n. wget https://www.openfabrics.org/downloads/management/libibumad-1.3.10.2.tar.gz
o. mv libibumad-1.3.10.2.tar.gz libibumad.tar.gz
p. patch -p1 < $P_PATH/fts.patch
q. vim simulate.py (search for print and find step 6, then there are 2 ./configure, add —with-tk-lib --with-graphviz-lib just before the '--with-osm' string for both)
r. ./simuate.py -s
```

  1. Add LFT creating scripts to the fall-in-place toolchain.

a. cd $HOME/simulation/scripts
b. patch -p1 < $P_PATH/lft.patch
c. chmod +x post_process_*
d. chmod +x create_static_lft.sh
```

  1. Choose a routing algorithm which should be used by OpenSM (possible options: updn, dnup, ftree, lash, dor, torus-2QoS, dfsssp, sssp)

a. export OSM_ROUTING="ftree"
b. ~/simulation/scripts/create_static_lft.sh routing_folder dot_file
```

(in above, routing_folder and dot_file should be same as the one used during the run used to dump the topology). Now, the routing table stored as LFT files should be in the routing_folder.

Clone this wiki locally