几何尺寸与公差论坛

 找回密码
 注册
查看: 446|回复: 0

error by using cuda-aware-mpi-example, bandwidth was wrong (Issue #41)

[复制链接]
发表于 2023-1-20 19:55:35 | 显示全部楼层 |阅读模式
Thanks. I can't spot anything regarding your Software Setup. As the performance difference between CUDA-aware MPI and regular MPI on a single node is about 2x and CUDA-aware MPI is faster for 2 processes on two nodes I am suspecting there is an issue with the GPU affinity handling. I.e. ENV_LOCAL_RANK defined the wrong way (but you seem to have that right) or that on the system you are using CUDA_VISIBLE_DEVICES is set in a funky way. As this code has not been updated for quite some time can you try with https://github.com/NVIDIA/multi-gpu-programming-models (also a Jacobi solver but with a simpler code that I regularly use in tutorials).
I also checked the math for the bandwidth: The formula used does not consider caches, see https://github.com/NVIDIA-develo ... ple/src/Host.c#L291 which explains why you are seeing too large memory bandwidths.


Reply to this email directly, view it on GitHub, or unsubscribe.
您需要登录后才可以回帖 登录 | 注册

本版积分规则

QQ|Archiver|小黑屋|几何尺寸与公差论坛

GMT+8, 2024-12-31 02:43 , Processed in 0.037107 second(s), 21 queries .

Powered by Discuz! X3.4 Licensed

© 2001-2023 Discuz! Team.

快速回复 返回顶部 返回列表