Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defect: UCX warnings in CentOS #780

Open
1 task done
SineBell opened this issue Aug 3, 2023 · 2 comments
Open
1 task done

Defect: UCX warnings in CentOS #780

SineBell opened this issue Aug 3, 2023 · 2 comments

Comments

@SineBell
Copy link

SineBell commented Aug 3, 2023

  • I am reporting a bug others will be able to reproduce and not asking a question or requesting a new feature.

System information including:

  • OpenCoarrays Version: 2.8.0

  • Fortran Compiler: gfortran 8.3.1

  • C compiler used for building lib: gcc 8.3.1

  • Installation method: cmake from source using a git clone. Passed all tests from make test

  • All flags & options passed to the installer
    gfortran and gcc specifications FC=/path/to/gfortran8 CC=/path/to/gcc8

  • Output of uname -a: Linux debye4 4.18.0-193.14.2.el8_2.x86_64 #1 SMP Sun Jul 26 03:54:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • MPI library being used: OpenMPI 4.1.1

  • Machine architecture and number of physical cores: x86_64, 64 cores

  • Version of CMake: cmake 3.20.2

To help us debug your issue please explain:

What you were trying to do (and why)

Running any fortran code with more than 1 images.

What happened (include command output, screenshots, logs, etc.)

At the end of the execution, numerous UCX warnings are printed on screen.
E.g.

[1691070750.032589] [debye4:964906:0]      tag_match.c:61   UCX  WARN  unexpected tag-receive descriptor 0x7f5c5710dfc0 was not matched
[1691070750.032604] [debye4:964906:0]      tag_match.c:61   UCX  WARN  unexpected tag-receive descriptor 0x7f5c570fdf40 was not matched
[1691070750.032616] [debye4:964906:0]      tag_match.c:61   UCX  WARN  unexpected tag-receive descriptor 0x7f5c570edec0 was not matched
[1691070750.032650] [debye4:964905:0]      tag_match.c:61   UCX  WARN  unexpected tag-receive descriptor 0x7f5a84b00f40 was not matched

What you expected to happen

The execution appears to end successfully. The large number of warnings, however, clutters the output making difficult to read the output on screen.

Step-by-step reproduction instructions to reproduce the error/bug

Any code I tested with cafrun and -n > 1.

For example, this simple code

program bugcheck
    write(*,*) "hello by ", this_image()
end program

Compiled with caf -o bugcheck bugcheck.f90
Run with cafrun -n 2 bugcheck will output

 hello by            1
 hello by            2
[1691071152.339731] [debye4:965700:0]      tag_match.c:61   UCX  WARN  unexpected tag-receive descriptor 0x7fb352acbfc0 was not matched
[1691071152.339731] [debye4:965701:0]      tag_match.c:61   UCX  WARN  unexpected tag-receive descriptor 0x7fb0ef50efc0 was not matched
@jthies
Copy link

jthies commented Dec 8, 2023

We see the same issue, checked out OpenCoarrays today, compiled with GCC 11.3.0 or GCC 8.5.0, OpenMPI 4.1.4.
When run with p images, p*(p-1) such warning messages are printed. AFAIK they are triggered by MPI_Finalize if MPI_Send's were not matched with an MPI_Recv.

@cprich01
Copy link

cprich01 commented Aug 2, 2024

I get the same response and I agree aside from repeated osc_ucx_component.c:369 Error:, the output is as expected. I am thinking that there is a setting that can be passed to the underlining mpifort to do a loopback to itself or another processor. Difficult to solve when there are so many wrappers inside of wrappers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants