This is a repo to show MetalLB with FRR doing service IP announcements over BGP to a single top of rack switch. This is a common ask now for someone to be able to simply run FRR + Metallb in either openshift or vanilla Kubernetes to be able to advetise kubernetes services aka VIPs in the network world.
Metallb is a placeholder and loadbalancer for VIP space if someone is entirely unfamiliar with this concept. Metallb is used in private data centers to be able to have kubernetes request a public IP automatically and pull from a ip pool space and automatically assign an IP either v4/v6 to a kubernetes service. This is done with a lot of kubernetes operators and webhooks. Metallb also has the functionality to either have a Layer2 segment just for VIPs and have a load balancer like itself arp for a new address everytime on a L2 segment for each new service. The issue with the Layer2 segment style of load balancing is typically, you will have a single node advertising a VIP and there is no ECMP.
The preferred method for most network people is always to squash layer 2 if possible. This is where Metallb will talk to FRR via the operator. There is a kubernetes sidecar that runs per what is called metallb speaker. Each time a kubernetes service is either added or removed the FRR routers will simply reload its configuration via the operator and advertise a host route. So if there is a new kubernetes service called 1.2.3.4/32 for each VIP FRR will then advertise the VIP into BGP. It does not have to be a host route. This can be a /24 if posible of the entire services network. The Metallb documentation has a lot of good examples of advanced configuration.
This is all interestingly possible because containerlab has the ability to leverage KIND as node types. So kubernetes minions can be linked entirely to KIND docker in docker nodes. Networking is not entirely fun in this environment because of complex kubernetes nat and docker in docker but this can drive the point across of demoing FRR+Metallb with Kubernetes on private data centers.
sudo containerlab -t topology.yaml deploy
kind export kubeconfig --name=k01
kubectl label nodes k01-worker ingress=boarder1
The reason for this is because we want k01-worker to be where our FRR router lives and not a random kubernetes node. This way we can control where the router is and the peering with Node selector
daemonset.yaml
nodeSelector:
kubernetes.io/os: linux
ingress: boarder1
kubectl apply -f manifest/calico.yaml
- name: CALICO_IPV4POOL_VXLAN
value: "Always"
ansible-playbook playbooks/fabric-deploy-config.yaml -e avd_ignore_requirements=True
kubectl apply -f manifest/metallb-frr.yaml
kubectl apply -f manifest/pools.yaml
manifest/pools.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: first-pool
namespace: metallb-system
spec:
addresses:
- 10.100.1.0/24
So what this means is all of our kubernetes services will use 10.100.1.0/24 pool of IP's for kubernetes services or what is a load balancer vip.
kubectl apply -f manifest/bgp.yaml
manifest/bgp.yaml
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
name: example
namespace: metallb-system
spec:
ipAddressPools:
- first-pool
peers:
- example
---
apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
name: example
namespace: metallb-system
spec:
myASN: 64513
peerASN: 65103
peerAddress: 10.1.11.1
peerPort: 179
nodeSelectors:
- matchLabels:
ingress: boarder1
What this is saying in the first yaml block is to advertise what we created within the first-pool so every service from 10.100.1.0/24. The big takeaway is the matchLabels ingress: boarder1. This means to only place this configuration on any node that has the labels ingress: boarder1. This is a lot more flexibility with this but this is a very static example.
kubectl apply -f manifest/daemonset.yaml
Mentioned previously but the daemonset will apply the single router for frr on only the nodeselector of ingress: boarder1
daemonset.yaml
nodeSelector:
kubernetes.io/os: linux
ingress: boarder1
kubectl apply -f manifest/servicetest.yaml
➜ cl-kind kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 19m
nginx-service LoadBalancer 10.96.192.245 10.100.1.0 80:32460/TCP 15m
We can see we have our nginx-service with the ip of 10.100.1.0 which will be advertised via bgp to dc1-boarder1.
docker exec -it clab-cl-kind-boarder1 Cli
DC1_BOARDER1#show ip bgp summary vrf Tenant_A_OP_Zon
VRF Tenant_A_OP_Zon does not exist
DC1_BOARDER1#show ip bgp summary vrf Tenant_A_OP_Zone
BGP summary information for VRF Tenant_A_OP_Zone
Router identifier 192.168.255.7, local AS number 65103
Neighbor Status Codes: m - Under maintenance
Description Neighbor V AS MsgRcvd MsgSent InQ OutQ Up/Down State PfxRcd PfxAcc
frr 10.1.11.5 4 64513 7 14 0 0 00:00:04 Estab 1 1
DC1_BOARDER2_Vlan3009 10.255.251.9 4 65103 20 19 0 0 00:04:18 Estab 7 7
DC1_BOARDER1#show ip route vrf Tenant_A_OP_Zone bgp
VRF: Tenant_A_OP_Zone
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B - Other BGP Routes,
B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
I L2 - IS-IS level 2, O3 - OSPFv3, A B - BGP Aggregate,
A O - OSPF Summary, NG - Nexthop Group Static Route,
V - VXLAN Control Service, M - Martian,
DH - DHCP client installed default route,
DP - Dynamic Policy Route, L - VRF Leaked,
G - gRIBI, RC - Route Cache Route,
CL - CBF Leaked Route
B E 10.100.1.0/32 [20/0] via 10.1.11.5, Vlan111
kubectl exec -it speaker-zvx7h -n metallb-system -- sh
Defaulted container "frr" out of: frr, reloader, frr-metrics, speaker, cp-frr-files (init), cp-reloader (init), cp-metrics (init)
/ #
vtysh
show ip bgp summary
k01-worker# show ip bgp summary
IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.32.4, local AS number 64513 vrf-id 0
BGP table version 1
RIB entries 1, using 96 bytes of memory
Peers 1, using 13 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
10.1.11.1 4 65103 41 23 1 0 0 00:16:05 0 1 N/A
Total number of neighbors 1
k01-worker# show running-config
Building configuration...
Current configuration:
!
frr version 9.1_git
frr defaults traditional
hostname k01-worker
log file /etc/frr/frr.log informational
log timestamp precision 3
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 64513
no bgp ebgp-requires-policy
no bgp hard-administrative-reset
no bgp default ipv4-unicast
no bgp graceful-restart notification
bgp graceful-restart preserve-fw-state
no bgp network import-check
neighbor 10.1.11.1 remote-as 65103
!
address-family ipv4 unicast
network 10.100.1.0/32
neighbor 10.1.11.1 activate
neighbor 10.1.11.1 route-map 10.1.11.1-in in
neighbor 10.1.11.1 route-map 10.1.11.1-out out
exit-address-family
!
address-family ipv6 unicast
neighbor 10.1.11.1 activate
neighbor 10.1.11.1 route-map 10.1.11.1-in in
neighbor 10.1.11.1 route-map 10.1.11.1-out out
exit-address-family
exit
!
ip prefix-list 10.1.11.1-pl-ipv4 seq 1 permit 10.100.1.0/32
!
ipv6 prefix-list 10.1.11.1-pl-ipv4 seq 2 deny any
!
route-map 10.1.11.1-in deny 20
exit
!
route-map 10.1.11.1-out permit 1
match ip address prefix-list 10.1.11.1-pl-ipv4
exit
!
route-map 10.1.11.1-out permit 2
match ipv6 address prefix-list 10.1.11.1-pl-ipv4
exit
!
end
k01-worker#
We can see that network 10.100.1.0/32 is advertised to the top of rack switches. Assuming another services comes in it would then get network 10.100.1.1/32 within the network.