- 积分
- 314
- 贡献
-
- 精华
- 在线时间
- 小时
- 注册时间
- 2021-1-4
- 最后登录
- 1970-1-1
|
登录后查看更多精彩内容~
您需要 登录 才可以下载或查看,没有帐号?立即注册
x
本帖最后由 AllenLu 于 2023-12-5 20:25 编辑
求助大家,在多节点运行CESM时会报错:
cesm.exe 00000000004265A9 Unknown Unknown Unknown
Abort(538560399) on node 28 (rank 28 in comm 0): Fatal error in PMPI_Recv: Other MPI error, error stack:
PMPI_Recv(173).................: MPI_Recv(buf=0x2b5bfe6b1010, count=8838096, MPI_DOUBLE, src=0, tag=9, comm=0xc40000a8, status=0x7ffda30db890) failed
MPID_Recv(590).................:
MPIDI_recv_unsafe(205).........:
MPIDI_OFI_handle_cq_error(1042): OFI poll failed (ofi_events.c:1042:MPIDI_OFI_handle_cq_error:Transport endpoint is not connected)
单节点虽然可以运行一段时间,但是最后还是报:
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 151022 RUNNING AT node15
= KILLED BY SIGNAL: 9 (Killed)
|
|