爱气象,爱气象家园! 

气象家园

 找回密码
 立即注册

QQ登录

只需一步,快速开始

新浪微博登陆

只需一步, 快速开始

搜索
查看: 411|回复: 1

关于WRF多节点并行计算报错,求解决

[复制链接]

新浪微博达人勋

发表于 2025-1-2 22:18:32 | 显示全部楼层 |阅读模式

登录后查看更多精彩内容~

您需要 登录 才可以下载或查看,没有帐号?立即注册 新浪微博登陆

x
单节点多核并行可以进行计算,但是多节点多核运行会报如下错误,求大佬指点!
[1735826843.548901] [cn553:31756:0]         glex_md.c:793  UCX  ERROR Cannot create the specified type of ep, try another type
[1735826843.548873] [cn553:31780:0]         glex_md.c:793  UCX  ERROR Cannot create the specified type of ep, try another type
[1735826843.548882] [cn553:31782:0]         glex_md.c:793  UCX  ERROR Cannot create the specified type of ep, try another type
[1735826843.548878] [cn553:31785:0]         glex_md.c:793  UCX  ERROR Cannot create the specified type of ep, try another type
[1735826843.548867] [cn553:31786:0]         glex_md.c:793  UCX  ERROR Cannot create the specified type of ep, try another type
[1735826843.548859] [cn553:31800:0]         glex_md.c:793  UCX  ERROR Cannot create the specified type of ep, try another type
[1735826843.548894] [cn553:31803:0]         glex_md.c:793  UCX  ERROR Cannot create the specified type of ep, try another type
[1735826843.548904] [cn553:31805:0]         glex_md.c:793  UCX  ERROR Cannot create the specified type of ep, try another type
[1735826843.720603] [cn553:31756:0]         glex_md.c:797  UCX  ERROR Could not create endpoint on glex device #0
[1735826843.720709] [cn553:31756:0]     ucp_context.c:773  UCX  WARN  transport 'glex' is not available, please use one or more of: cma, knem, mm, posix, self, shm, sm, sysv, tcp
[1735826843.720724] [cn553:31756:0]     ucp_context.c:1041 UCX  ERROR no usable transports/devices (asked glex on all devices)
Abort(1090959) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(159).......:
MPID_Init(597)..............:
MPIDI_UCX_mpi_init_hook(242):  ucx function returned with failed status(ucx_init.c 242 MPIDI_UCX_mpi_init_hook No such device)
Abort(1090959) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(159).......:
MPID_Init(597)..............:
MPIDI_UCX_mpi_init_hook(242):  ucx function returned with failed status(ucx_init.c 242 MPIDI_UCX_mpi_init_hook No such device)
slurmstepd: error: *** STEP 4631142.5 ON cn553 CANCELLED AT 2025-01-02T22:07:23 ***
yhrun: Job step aborted: Waiting up to 32 seconds for job step to finish.
yhrun: error: cn553: task 9: Killed
wrf.exe failed!


密码修改失败请联系微信:mofangbao
您需要登录后才可以回帖 登录 | 立即注册 新浪微博登陆

本版积分规则

Copyright ©2011-2014 bbs.06climate.com All Rights Reserved.  Powered by Discuz! (京ICP-10201084)

本站信息均由会员发表,不代表气象家园立场,禁止在本站发表与国家法律相抵触言论

快速回复 返回顶部 返回列表