中文
注册
我要评分
文档获取效率
文档正确性
内容完整性
文档易理解
在线提单
论坛求助

利用region模式整体分析对比

图1 case 9

region模式支持多个函数或循环的插桩,可以将多个method在一个新的case中依次运行,示例中增加method 9,依次调用以下method:

  • 1(parallel_matmult)
  • 2(transpose_B_matmult)
  • 4(block_transpose_B_matmult)
  • 5(intrinsics_transpose_B_matmult)
  • 6(kml_matmult_8192)
  1. 运行矩阵行列大小为8192的multi_method_matmult示例。
    ./matmul 8192 9

    返回信息如下:

    Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0
    Initialization time = 2.663596s
    Matrix multiplication time(parallel_matmult) = 524.732915s
    Matrix multiplication time(transpose_B_matmult) = 12.199910s
    Matrix multiplication time(block_transpose_B_matmult) = 3.940094s
    Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.543300s
    Matrix multiplication time(kml_matmult) = 0.320360s
    Matrix multiplication time = 543.736720s

    可以依次看到各个method所花的时间。

  2. 创建multi_method_matmult 8192 case的Roofline任务。
    devkit tuner roofline -o multi_method_matmult_8192 -m region ./matmul 8192 9

    返回信息如下:

    Note:
      1. Roofline task is currently only supported on the 920 platform.
      2. The application must be a binary file in ELF format.
      3. Roofline task collection needs to ensure the application has finished running.
      4. The estimated time of roofline collection is about 3 * application estimated time.
      5. You can learn about the roofline profiling method by looking at document /usr/local/devkit/tuner/docs/ROOFLINE_KNOW_HOW.MD
    RFCOLLECT: Start collection for ./matmul
    RFCOLLECT: Launch application to collect performance metrics of ./matmul
    Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0
    Initialization time = 2.712200s
    ROOFLINE_EVENTS are initialized.
    Matrix multiplication time(parallel_matmult) = 522.059154s
    Matrix multiplication time(transpose_B_matmult) = 10.515641s
    Matrix multiplication time(block_transpose_B_matmult) = 3.325110s
    Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.258315s
    Matrix multiplication time(kml_matmult) = 0.287929s
    Matrix multiplication time = 538.468327s
    RFCOLLECT: Launch application to do binary instrumentation of ./matmul
    Size is 8192, Matrix multiplication method is: 9, Check correctness is: 0
    Initialization time = 8.095281s
    Matrix multiplication time(parallel_matmult) = 348.475675s
    Matrix multiplication time(transpose_B_matmult) = 17.144564s
    Matrix multiplication time(block_transpose_B_matmult) = 3.646071s
    Matrix multiplication time(intrinsics_transpose_B_matmult) = 2.427023s
    Matrix multiplication time(kml_matmult) = 0.297098s
    Matrix multiplication time = 371.991296s
    RFCOLLECT: Launch benchmarks for measuring roofs
    RFCOLLECT: Processing all collected data
    RFCOLLECT: Result is captured at /matrix_multiplication/rfcollect-20240507-115538.json
    RFCOLLECT: Run "rfreport /matrix_multiplication/rfcollect-20240507-115538.json" to get report.
    
    Get roofline report ...
    The roofline json report: /matrix_multiplication/multi_method_matmult_8192.json
    The roofline html report: /matrix_multiplication/multi_method_matmult_8192.html
  3. 查看multi_method_matmult_8192报告。
    可以将并行示例KML示例的示例汇聚到一张图上,便捷地进行相关分析。
    图2 multi_method_matmult_8192报告