Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not start forwarder #1170

Open
lapnd opened this issue Sep 18, 2024 · 1 comment
Open

Can not start forwarder #1170

lapnd opened this issue Sep 18, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@lapnd
Copy link

lapnd commented Sep 18, 2024

Hi,
I'm using k3s on ubuntu 22.04 and trying to install NSM.
Hugepage was setup as

cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-6.8.0-40-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro console=tty0 console=ttyS0,115200n8 default_hugepagesz=2M hugepagesz=2M hugepages=1024


cat /proc/meminfo | grep HugePages
AnonHugePages:   4329472 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:    1024
HugePages_Free:     1024
HugePages_Rsvd:        0
HugePages_Surp:        0

However, the forwarder can not start

Sep 18 09:45:46.616 [INFO] [cmd:/bin/forwarder] Config: &config.Config{Name:"forwarder-vpp-5xlwh", Labels:map[string]string{"p2p":"true"}, NSName:"forwarder", ConnectTo:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/var/lib/networkservicemesh/nsm.io.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}, ListenOn:url.URL{Scheme:"unix", Opaque:"", User:(*url.Userinfo)(nil), Host:"", Path:"/listen.on.sock", RawPath:"", OmitHost:false, ForceQuery:false, RawQuery:"", Fragment:"", RawFragment:""}, MaxTokenLifetime:600000000000, RegistryClientPolicies:[]string{"etc/nsm/opa/common/.*.rego", "etc/nsm/opa/registry/.*.rego", "etc/nsm/opa/client/.*.rego"}, LogLevel:"INFO", DialTimeout:750000000, OpenTelemetryEndpoint:"otel-collector.observability.svc.cluster.local:4317", MetricsExportInterval:10000000000, PprofEnabled:false, PprofListenOn:"localhost:6060", PrometheusListenOn:":8081", PrometheusServerHeaderTimeout:5000000000, TunnelIP:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0x4, 0x0, 0xd}, VxlanPort:0x0, VppAPISocket:"/var/run/vpp/external/vpp-api.sock", VppInit:vppinit.Func{f:(func(context.Context, api.Connection, net.IP) (net.IP, error))(0xf47e60)}, VppInitParams:"", ResourcePollTimeout:30000000000, DevicePluginPath:"/var/lib/kubelet/device-plugins/", PodResourcesPath:"/var/lib/kubelet/pod-resources/", DeviceSelectorFile:"", SRIOVConfigFile:"", PCIDevicesPath:"/sys/bus/pci/devices", PCIDriversPath:"/sys/bus/pci/drivers", CgroupPath:"/host/sys/fs/cgroup/devices", VFIOPath:"/host/dev/vfio", MechanismPriority:[]string(nil)}
Sep 18 09:45:46.616 [INFO] [cmd:/bin/forwarder] [duration:4.971254ms] completed phase 1: get config from environment
Sep 18 09:45:46.616 [INFO] [cmd:/bin/forwarder] executing phase 2: run vpp and get a connection to it (time since start: 5.179467ms)
Sep 18 09:45:46.650 [INFO] Configuration file: "/etc/vpp/helper/vpp.conf" not found, using defaults
Sep 18 09:45:46.652 [INFO] [cmd:/bin/forwarder] local vpp is being used
Sep 18 09:45:46.652 [INFO] [cmd:/bin/forwarder] [duration:35.846189ms] completed phase 2: run vpp and get a connection to it
Sep 18 09:45:46.652 [WARN] [cmd:/bin/forwarder] skipping phases 3-5: no PCI resources config
Sep 18 09:45:46.652 [WARN] [cmd:/bin/forwarder] SR-IOV is not enabled
Sep 18 09:45:46.652 [INFO] [cmd:/bin/forwarder] executing phase 6: retrieving svid, check spire agent logs if this is the last line you see (time since start: 41.104723ms)
Sep 18 09:45:46.991 [INFO] [cmd:vpp] vpp[55423]: buffer: numa[0] falling back to non-hugepage backed buffer pool (vlib_physmem_shared_map_create: pmalloc_map_pages: failed to mmap 64 pages at 0x1000000000 fd 5 numa 0 flags 0x11: Cannot allocate memory)
Sep 18 09:45:47.675 [INFO] [cmd:vpp] vpp[55423]: buffer: numa[1] falling back to non-hugepage backed buffer pool (vlib_physmem_shared_map_create: pmalloc_map_pages: failed to mmap 64 pages at 0x1008000000 fd 6 numa 1 flags 0x11: Cannot allocate memory)
Sep 18 09:45:47.682 [INFO] SVID: "spiffe://k8s.nsm/ns/nsm-system/pod/forwarder-vpp-5xlwh"
Sep 18 09:45:47.682 [INFO] [cmd:/bin/forwarder] [duration:1.029478195s] completed phase 6: retrieving svid
Sep 18 09:45:47.682 [INFO] [cmd:/bin/forwarder] executing phase 7: create xconnect network service endpoint (time since start: 1.070753722s)
Sep 18 09:45:47.753 [INFO] [ReadConfig:] [cmd:/bin/forwarder] Using default VPP init parameters &{AF_PACKET:{&{Mode:AF_PACKET_API_MODE_ETHERNET RxFrameSize:10240 TxFrameSize:10240 RxFramesPerBlock:1024 TxFramesPerBlock:1024 NumRxQueues:1 NumTxQueues:0 Flags:AF_PACKET_API_FLAG_VERSION_2}},AF_XDP:{&{Mode:AF_XDP_API_MODE_AUTO RxqSize:8192 TxqSize:8192 Flags:AfXdpFlag(0)}},}
Sep 18 09:45:48.395 [INFO] [cmd:vpp] vpp[55423]: vlib_sort_init_exit_functions:201: init function 'pci_bus_init' not found (before 'idpf_init')
Sep 18 09:45:48.681 [INFO] [cmd:vpp] vpp[55423]: vnet_feature_arc_init:272: feature node 'ip4-sv-reassembly-output-feature' not found (before 'npt66-output', arc 'ip6-output')
Sep 18 09:45:48.681 [INFO] [cmd:vpp] vpp[55423]: vnet_feature_arc_init:272: feature node 'ip4-sv-reassembly-feature' not found (before 'npt66-input', arc 'ip6-unicast')
Sep 18 09:45:48.850 [INFO] [cmd:vpp] vpp[55423]: af_packet: Failed to bind rx packet socket: No such device (errno 19)
Sep 18 09:45:48.859 [INFO] [cmd:vpp] vpp[55423]: af_packet: Failed to set queue 0 error
Sep 18 09:45:48.859 [INFO] [cmd:vpp] vpp[55423]: af_packet: Failed to init device error
panic: error: VPPApiError: System call error #1 (-11)

goroutine 1 [running]:
github.com/networkservicemesh/cmd-forwarder-vpp/internal/vppinit.Must(...)
	/build/internal/vppinit/vppinit.go:112
main.main()
	/build/main.go:262 +0x364c

Could you kindly provide any insights or clues regarding possible issues that could lead to this failure? Any suggestions for steps to troubleshoot or resolve the problem would be greatly appreciated.

@denis-tingaikin denis-tingaikin added the bug Something isn't working label Sep 18, 2024
@lapnd
Copy link
Author

lapnd commented Sep 19, 2024

I have additional information. When I try NSM on an Ubuntu 22.04 VM (using the Ubuntu cloud image), it works fine. The issue occurs on Kubernetes running on a bare-metal host. I haven't figured out the possible cause yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

2 participants