操作步骤

使用PuTTY工具，以root用户登录服务器。
执行以下命令创建工作目录。
```
mkdir -p /path/to/CASE
```

执行以下命令进入工作目录并将算例和二进制文件拷贝到工作目录。

cd /path/to/CASE
cp /path/to/LAMMPS/lammps-5Jun19/bench/in.lj  ./
cp /path/to/LAMMPS/lammps-5Jun19/src/lmp_mpi  ./

执行以下命令进行运行。

CentOS系统上运行单节点指令。

mpirun --allow-run-as-root -np 96 --mca btl ^openib  ./lmp_mpi -in in.lj >>test_OneNode.log

openEuler系统上运行单节点指令。

mpirun --allow-run-as-root -np 96  -mca pml ucx -mca btl ^vader,tcp,openib,uct -x UCX_TLS=self,sm --bind-to core --map-by socket --rank-by core -x UCX_BUILTIN_BCAST_ALGORITHM=3 -x UCX_BUILTIN_BARRIER_ALGORITHM=5 -x UCX_BUILTIN_ALLREDUCE_ALGORITHM=10  ./lmp_mpi -in in.lj >> ./test_OneNode.log

将结果输出到“test_OneNode.log”日志中（当前目录），查看的“Performance”数值，单位是“timesteps/s”，数值越高性能越优。

测试结果样例如下所示。

Performance: 1134386.210 tau/day, 2625.894 timesteps/s
99.7% CPU use with 96 MPI tasks x no OpenMP threads
MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.021651   | 0.023624   | 0.025878   |   0.6 | 62.03
Neigh   | 0.002734   | 0.0029155  | 0.0031491  |   0.2 |  7.66
Comm    | 0.007533   | 0.010058   | 0.012192   |   1.1 | 26.41
Output  | 5.6281e-05 | 0.00050681 | 0.00062442 |   0.0 |  1.33
Modify  | 0.00056003 | 0.00061947 | 0.00070919 |   0.0 |  1.63
Other   |            | 0.0003584  |            |       |  0.94

CentOS系统上运行双节点指令。

mpirun --allow-run-as-root -np 192 -N 96 -x PATH=$PATH -x LD_LIBRARY_PATH=$LD_LIBRARY_PATH  -machinefile machinefile --mca btl ^openib ./lmp_mpi -in in.lj >> ./test_TwoNodes.log

machinefile文件中放入自己指定的两个计算节点的主机名（如n1和n2），如图所示。

点击放大

openEuler系统上运行双节点指令。

mpirun --allow-run-as-root -np 192 -N 96 -machinefile machinefile  -x PATH=$PATH -x LD_LIBRARY_PATH=$LD_LIBRARY_PATH -mca pml ucx -mca btl ^vader,tcp,openib,uct  --bind-to core  --rank-by core ./lmp_mpi -in in.lj >> ./test_TwoNodes.log

machinefile文件中放入自己指定的两个计算节点的主机名（如n1和n2），如图所示。

点击放大

将结果输出到“test_TwoNodes.log”日志中（当前目录），查看的“Performance”数值，单位是“timesteps/s”，数值越高性能越优。

测试结果样例如下所示。

Performance: 1605508.300 tau/day, 3716.454 timesteps/s
91.0% CPU use with 192 MPI tasks x no OpenMP threads
MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 0.01035    | 0.01174    | 0.013347   |   0.6 | 43.63
Neigh   | 0.0013662  | 0.0014931  | 0.0016205  |   0.1 |  5.55
Comm    | 0.01112    | 0.012935   | 0.014469   |   0.6 | 48.07
Output  | 7.3471e-05 | 0.00010074 | 0.00017455 |   0.0 |  0.37
Modify  | 0.00023517 | 0.00032949 | 0.00040742 |   0.0 |  1.22
Other   |            | 0.0003095  |            |       |  1.15

表1 参数说明
参数	说明
-np	MPI运行的总进程数。
-N	每个节点上运行的进程数。
-machinefile	使用的节点名字。

双节点算例运行在共享目录中，若在配置编译环境配置编译中已配置PATH，LD_LIBRARY_PATH环境变量，则无需再配置。
若没有开启超线程技术，np值应该小于等于节点数乘以每个节点CPU核数。
n1和n2为hostname。

运行和验证

操作步骤