Skip to content

uct_md.c:203 UCX ERROR #338

Answered by mesonepigreco
biglinn asked this question in Q&A
Jun 6, 2024 · 4 comments · 29 replies
Discussion options

You must be logged in to vote

Are you submitting the job in parallel?

The error here is from matplotlib, as unveiled from the last lines of your error:

TimeoutError: Lock error: Matplotlib failed to acquire the following lock file:
/public/home/biglinn/.cache/matplotlib/fontlist-v390.json.matplotlib-lock
This maybe due to another process holding this lock file. If you are sure no
other Matplotlib process is running, remove this file and try again

You can manually remove the file, it says. Indeed, if you run the job in parallel, it is possible that multiple process created that file, thus leading to the error. This should not occur, but it is most likely due to an error of how MPI is configured.

Try removing that file…

Replies: 4 comments 29 replies

Comment options

You must be logged in to vote
2 replies
@biglinn
Comment options

@biglinn
Comment options

Comment options

You must be logged in to vote
4 replies
@mesonepigreco
Comment options

Answer selected by mesonepigreco
@biglinn
Comment options

@mesonepigreco
Comment options

@biglinn
Comment options

Comment options

You must be logged in to vote
13 replies
@biglinn
Comment options

@diegomartinez2
Comment options

@biglinn
Comment options

@mesonepigreco
Comment options

@biglinn
Comment options

Comment options

You must be logged in to vote
10 replies
@mesonepigreco
Comment options

@biglinn
Comment options

@biglinn
Comment options

@mesonepigreco
Comment options

@biglinn
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants