在南信大slurm系统提交作业后显示在运行,但并没有计算, 查看cpu使用是0,日志文件也是空的,也没有报错。。实在不知道从何下手,有大佬知道如何解决吗?
看了以前的帖子发现有人修改<MAX_TASKS_PER_NODE>和</MAX_MPITASKS_PER_NODE>后解决了,但我试着改了几次也没有解决
我config_machines.xml中设置的是
<MAX_TASKS_PER_NODE>12</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>12</MAX_MPITASKS_PER_NODE>
此外我lscpu命令显示有28个CPU核心,但nproc返回值是3,不知道有没有关系
附上我的job.sh文件
#!/bin/bash
#SBATCH --job-name=dyx
#SBATCH --output=cesm_output.log
#SBATCH --error=cesm_error.log
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=28
#SBATCH --time=12:00:00
#SBATCH --partition=Regular
#SBATCH --exclusive
module purge
module load intel/18.0.0
module load ncview/2.1.7
module load mvapich2/2.3
source /nuist/p/public/app/intel/compilers_and_libraries_2018.0.128/linux/mkl/bin/mklvars.sh intel64
export LD_LIBRARY_PATH=/nuist/p/public/app/intel/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/nuist/p/public/app/intel/compilers_and_libraries_2018.0.128/linux/mkl/lib/intel64_lin:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/nuist/p/public/app/ncl/6.5.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/nuist/p/public/app/netcdf/4.3.0/intel/18.0.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/nuist/p/public/app/lapack/3.8.0/intel/18.0.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/nuist/p/public/app/hdf-eos/2.19v1.00/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/nuist/p/public/app/blas/3.8.0/intel/18.0.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/nuist/p/public/app/hdf5/1.8.20/intel/18.0.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/nuist/p/public/app/mvapich2/2.3/intel/18.0.0/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/nuist/p/public/app/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64:$LD_LIBRARY_PATH
srun /nuist/scratch/wangc/wc_linyiton/my_cesm_sandbox/case/dyx/dyx/bld/cesm.exe
这里是我检查输出的内容:
[wc_linyiton@log05 dyx]$ sstat -j 353125.batch --format=AveCPU,AveRSS,MaxRSS
AveCPU AveRSS MaxRSS
---------- ---------- ----------
00:00:00 3992K 3992K
[wc_linyiton@log05 dyx]$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
353125 Regular dyx wc_linyi R 34:37 1 c01n03
[wc_linyiton@log05 dyx]$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 28
On-line CPU(s) list: 0-27
Thread(s) per core: 1
Core(s) per socket: 14
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
Stepping: 1
CPU MHz: 1200.093
CPU max MHz: 3300.0000
CPU min MHz: 1200.0000
BogoMIPS: 4800.13
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 35840K
NUMA node0 CPU(s): 0-13
NUMA node1 CPU(s): 14-27
|