利用region模式整体分析对比
图1 case 9
region模式支持多个函数或循环的插桩,可以将多个method在一个新的case中依次运行,示例中增加method 9,依次调用以下method:
- 1(parallel_matmult)
- 2(transpose_B_matmult)
- 4(block_transpose_B_matmult)
- 5(intrinsics_transpose_B_matmult)
- 6(kml_matmult_8192)
- 运行矩阵行列大小为8192的multi_method_matmult示例。
./matmul 8192 9
返回信息如下:
Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0 Initialization time = 2.663596s Matrix multiplication time(parallel_matmult) = 524.732915s Matrix multiplication time(transpose_B_matmult) = 12.199910s Matrix multiplication time(block_transpose_B_matmult) = 3.940094s Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.543300s Matrix multiplication time(kml_matmult) = 0.320360s Matrix multiplication time = 543.736720s
可以依次看到各个method所花的时间。
- 创建multi_method_matmult 8192 case的Roofline任务。
devkit tuner roofline -o multi_method_matmult_8192 -m region ./matmul 8192 9
返回信息如下:
Note: 1. Roofline task is currently only supported on the 920 platform. 2. The application must be a binary file in ELF format. 3. Roofline task collection needs to ensure the application has finished running. 4. The estimated time of roofline collection is about 3 * application estimated time. 5. You can learn about the roofline profiling method by looking at document /usr/local/devkit/tuner/docs/ROOFLINE_KNOW_HOW.MD RFCOLLECT: Start collection for ./matmul RFCOLLECT: Launch application to collect performance metrics of ./matmul Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0 Initialization time = 2.712200s ROOFLINE_EVENTS are initialized. Matrix multiplication time(parallel_matmult) = 522.059154s Matrix multiplication time(transpose_B_matmult) = 10.515641s Matrix multiplication time(block_transpose_B_matmult) = 3.325110s Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.258315s Matrix multiplication time(kml_matmult) = 0.287929s Matrix multiplication time = 538.468327s RFCOLLECT: Launch application to do binary instrumentation of ./matmul Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0 Initialization time = 8.095281s Matrix multiplication time(parallel_matmult) = 348.475675s Matrix multiplication time(transpose_B_matmult) = 17.144564s Matrix multiplication time(block_transpose_B_matmult) = 3.646071s Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.427023s Matrix multiplication time(kml_matmult) = 0.297098s Matrix multiplication time = 371.991296s RFCOLLECT: Launch benchmarks for measuring roofs RFCOLLECT: Processing all collected data RFCOLLECT: Result is captured at /matrix_multiplication/rfcollect-20240507-115538.json RFCOLLECT: Run "rfreport /matrix_multiplication/rfcollect-20240507-115538.json" to get report. Get roofline report ... The roofline json report: /matrix_multiplication/multi_method_matmult_8192.json The roofline html report: /matrix_multiplication/multi_method_matmult_8192.html
- 查看multi_method_matmult_8192报告。
父主题: 使用Roofline进行性能分析