- 积分
- 13
- 贡献
-
- 精华
- 在线时间
- 小时
- 注册时间
- 2024-5-22
- 最后登录
- 1970-1-1

|
登录后查看更多精彩内容~
您需要 登录 才可以下载或查看,没有帐号?立即注册
x
单节点多核并行可以进行计算,但是多节点多核运行会报如下错误,求大佬指点!
[1735826843.548901] [cn553:31756:0] glex_md.c:793 UCX ERROR Cannot create the specified type of ep, try another type
[1735826843.548873] [cn553:31780:0] glex_md.c:793 UCX ERROR Cannot create the specified type of ep, try another type
[1735826843.548882] [cn553:31782:0] glex_md.c:793 UCX ERROR Cannot create the specified type of ep, try another type
[1735826843.548878] [cn553:31785:0] glex_md.c:793 UCX ERROR Cannot create the specified type of ep, try another type
[1735826843.548867] [cn553:31786:0] glex_md.c:793 UCX ERROR Cannot create the specified type of ep, try another type
[1735826843.548859] [cn553:31800:0] glex_md.c:793 UCX ERROR Cannot create the specified type of ep, try another type
[1735826843.548894] [cn553:31803:0] glex_md.c:793 UCX ERROR Cannot create the specified type of ep, try another type
[1735826843.548904] [cn553:31805:0] glex_md.c:793 UCX ERROR Cannot create the specified type of ep, try another type
[1735826843.720603] [cn553:31756:0] glex_md.c:797 UCX ERROR Could not create endpoint on glex device #0
[1735826843.720709] [cn553:31756:0] ucp_context.c:773 UCX WARN transport 'glex' is not available, please use one or more of: cma, knem, mm, posix, self, shm, sm, sysv, tcp
[1735826843.720724] [cn553:31756:0] ucp_context.c:1041 UCX ERROR no usable transports/devices (asked glex on all devices)
Abort(1090959) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(159).......:
MPID_Init(597)..............:
MPIDI_UCX_mpi_init_hook(242): ucx function returned with failed status(ucx_init.c 242 MPIDI_UCX_mpi_init_hook No such device)
Abort(1090959) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(159).......:
MPID_Init(597)..............:
MPIDI_UCX_mpi_init_hook(242): ucx function returned with failed status(ucx_init.c 242 MPIDI_UCX_mpi_init_hook No such device)
slurmstepd: error: *** STEP 4631142.5 ON cn553 CANCELLED AT 2025-01-02T22:07:23 ***
yhrun: Job step aborted: Waiting up to 32 seconds for job step to finish.
yhrun: error: cn553: task 9: Killed
wrf.exe failed!
|
|