Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single node allocation fails to use CXI directly #2

Open
amirshehataornl opened this issue Jul 5, 2023 · 1 comment
Open

Single node allocation fails to use CXI directly #2

amirshehataornl opened this issue Jul 5, 2023 · 1 comment
Labels
Frontier Frontier@ORNL

Comments

@amirshehataornl
Copy link
Collaborator

System Information
Frontier

OMPI-X Release Version
ompix-a1

Short description of error
CXI not supported for single node allocation

Exact steps for others to reproduce the error
Run command

mpirun --wdir /tmp -x FI_CXI_RX_MATCH_MODE -x FI_XPMEM_MEMCPY_CHUNKSIZE -x FI_LOG_LEVEL -x FI_USE_XPMEM -x FI_OFI_LINKX_SRQ_SUPPORT -x LD_LIBRARY_PATH --report-bindings --mca btl ^tcp,openib --mca pml ^ucx --mca mtl ofi --mca opal_common_ofi_provider_include "cxi" --prtemca plm slurm --prtemca ras slurm --bind-to core --map-by ppr:1:l3 --report-bindings  --mca mtl_base_verbose 10  --mca btl_base_verbose 10 -np 2 /sw/crusher/ums/ompix/DEVELOP/cce/13.0.0/install/osu-micro-benchmarks-7.0//build-ompi/_install/libexec/osu-micro-benchmarks/mpi/collective/osu_alltoall

Other Details
On Single node allocation the VNI environment variables are not being set, which results in CXI provider initialization failure.
If CXI provider is explicitly set in the "OMPI_MCA_opal_common_ofi_provider_include" mca parameter then OFI/MTL will not be used for PT2PT and/or Collective tests. It will use BTL.

--------------------------------------------------------------------------
Open MPI failed an OFI Libfabric library call (fi_domain).  This is highly
unusual; your job may behave unpredictably (and/or abort) after this.

  Local host: crusher037
  Location: ../../../../../../../../source/openmpi-main-June_23/ompi/mca/mtl/ofi/mtl_ofi_component.c:982
  Error: Function not implemented (38)
--------------------------------------------------------------------------
@amirshehataornl amirshehataornl added the Frontier Frontier@ORNL label Jul 5, 2023
@efposadac
Copy link

Add the following parameter to the salloc/srun command: --network=single_node_vni. For instance:

salloc -A  xxx -N 1 -n 8 c 7 --gpus-per-node=8 --gpu-bind=closest -t 00:30:00  --network=single_node_vni

And then;

mpirun --wdir /tmp -x FI_CXI_RX_MATCH_MODE -x FI_XPMEM_MEMCPY_CHUNKSIZE -x FI_LOG_LEVEL -x FI_USE_XPMEM -x FI_OFI_LINKX_SRQ_SUPPORT -x LD_LIBRARY_PATH --report-bindings --mca btl ^tcp,openib --mca pml ^ucx --mca mtl ofi --mca opal_common_ofi_provider_include "cxi" --prtemca plm slurm --prtemca ras slurm --bind-to core --map-by ppr:1:l3 --report-bindings  --mca mtl_base_verbose 10  --mca btl_base_verbose 10 -np 2 /sw/crusher/ums/ompix/DEVELOP/cce/13.0.0/install/osu-micro-benchmarks-7.0//build-ompi/_install/libexec/osu-micro-benchmarks/mpi/collective/osu_alltoall
[frontier06736:03092] Rank 0 bound to package[0][core:1]
[frontier06736:03092] Rank 1 bound to package[0][core:9]
[frontier06736:03112] mca: base: components_register: registering framework btl components
[frontier06736:03112] mca: base: components_register: found loaded component self
[frontier06736:03112] mca: base: components_register: component self register function successful
[frontier06736:03112] mca: base: components_register: found loaded component ofi
[frontier06736:03112] mca: base: components_register: component ofi register function successful
[frontier06736:03112] mca: base: components_register: found loaded component sm
[frontier06736:03112] mca: base: components_register: component sm register function successful
[frontier06736:03112] mca: base: components_open: opening btl components
[frontier06736:03112] mca: base: components_open: found loaded component self
[frontier06736:03112] mca: base: components_open: component self open function successful
[frontier06736:03112] mca: base: components_open: found loaded component ofi
[frontier06736:03112] mca: base: components_open: component ofi open function successful
[frontier06736:03112] mca: base: components_open: found loaded component sm
[frontier06736:03112] mca: base: components_open: component sm open function successful
[frontier06736:03112] mca: base: components_register: registering framework mtl components
[frontier06736:03112] mca: base: components_register: found loaded component ofi
[frontier06736:03112] mca: base: components_register: component ofi register function successful
[frontier06736:03112] mca: base: components_open: opening mtl components
[frontier06736:03112] mca: base: components_open: found loaded component ofi
[frontier06736:03112] mca: base: components_open: component ofi open function successful
[frontier06736:03111] mca: base: components_register: registering framework btl components
[frontier06736:03111] mca: base: components_register: found loaded component self
[frontier06736:03111] mca: base: components_register: component self register function successful
[frontier06736:03111] mca: base: components_register: found loaded component ofi
[frontier06736:03111] mca: base: components_register: component ofi register function successful
[frontier06736:03111] mca: base: components_register: found loaded component sm
[frontier06736:03111] mca: base: components_register: component sm register function successful
[frontier06736:03111] mca: base: components_open: opening btl components
[frontier06736:03111] mca: base: components_open: found loaded component self
[frontier06736:03111] mca: base: components_open: component self open function successful
[frontier06736:03111] mca: base: components_open: found loaded component ofi
[frontier06736:03111] mca: base: components_open: component ofi open function successful
[frontier06736:03111] mca: base: components_open: found loaded component sm
[frontier06736:03111] mca: base: components_open: component sm open function successful
[frontier06736:03111] mca: base: components_register: registering framework mtl components
[frontier06736:03111] mca: base: components_register: found loaded component ofi
[frontier06736:03111] mca: base: components_register: component ofi register function successful
[frontier06736:03111] mca: base: components_open: opening mtl components
[frontier06736:03111] mca: base: components_open: found loaded component ofi
[frontier06736:03111] mca: base: components_open: component ofi open function successful
[frontier06736:03112] mca:base:select: Auto-selecting mtl components
[frontier06736:03112] mca:base:select:(  mtl) Querying component [ofi]
[frontier06736:03112] mca:base:select:(  mtl) Query of component [ofi] set priority to 25
[frontier06736:03112] mca:base:select:(  mtl) Selected component [ofi]
[frontier06736:03112] select: initializing mtl component ofi
[frontier06736:03111] mca:base:select: Auto-selecting mtl components
[frontier06736:03111] mca:base:select:(  mtl) Querying component [ofi]
[frontier06736:03111] mca:base:select:(  mtl) Query of component [ofi] set priority to 25
[frontier06736:03111] mca:base:select:(  mtl) Selected component [ofi]
[frontier06736:03111] select: initializing mtl component ofi
[frontier06736:03111] select: init returned success
[frontier06736:03111] select: component ofi selected
[frontier06736:03111] select: initializing btl component self
[frontier06736:03111] select: init of component self returned success
[frontier06736:03111] select: initializing btl component ofi
[frontier06736:03112] select: init returned success
[frontier06736:03112] select: component ofi selected
[frontier06736:03112] select: initializing btl component self
[frontier06736:03112] select: init of component self returned success
[frontier06736:03112] select: initializing btl component ofi
[frontier06736:03111] select: init of component ofi returned success
[frontier06736:03111] select: initializing btl component sm
[frontier06736:03111] select: init of component sm returned success
[frontier06736:03112] select: init of component ofi returned success
[frontier06736:03112] select: initializing btl component sm
[frontier06736:03112] select: init of component sm returned success

# OSU MPI All-to-All Personalized Exchange Latency Test v7.0
# Size       Avg Latency(us)
1                       3.19
2                       3.17
4                       3.18
8                       3.18
16                      3.19
32                      3.19
64                      3.29
128                     3.41
256                     4.44
512                     4.49
1024                    4.58
2048                    4.76
4096                    4.95
8192                    7.38
16384                   8.18
32768                  10.04
65536                  14.35
131072                 22.86
262144                 41.77
524288                 82.06
1048576               159.40
[frontier06736:03112] mca: base: close: component ofi closed
[frontier06736:03112] mca: base: close: unloading component ofi
[frontier06736:03112] mca: base: close: component self closed
[frontier06736:03112] mca: base: close: unloading component self
[frontier06736:03111] mca: base: close: component ofi closed
[frontier06736:03111] mca: base: close: unloading component ofi
[frontier06736:03111] mca: base: close: component self closed
[frontier06736:03111] mca: base: close: unloading component self
[frontier06736:03112] mca: base: close: component ofi closed
[frontier06736:03112] mca: base: close: unloading component ofi
[frontier06736:03112] mca: base: close: component sm closed
[frontier06736:03112] mca: base: close: unloading component sm
[frontier06736:03111] mca: base: close: component ofi closed
[frontier06736:03111] mca: base: close: unloading component ofi
[frontier06736:03111] mca: base: close: component sm closed
[frontier06736:03111] mca: base: close: unloading component sm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frontier Frontier@ORNL
Projects
None yet
Development

No branches or pull requests

2 participants