Roofline插桩指导

插桩说明

Roofline分析采集模式指定为region时,支持对应用中已经插桩的region块进行分别采集,实现function/loop级别的定量数据分析,该能力需要用户手动对待分析的源码进行插桩,并重新编译。

region插桩与分析:
  1. 在源代码中插入Roofline Events API。
    • Roofline Events API定义在/usr/bin/devkit/tuner/include/roofline_events.h(.mod) 。
    • roofline_events.h用于C/C++程序;roofline_events.mod用于Fortran程序。
  2. 使用新的编译标志重新编译应用程序:
    • C/C++:-DROOFLINE_EVENTS -I /usr/bin/devkit/tuner/include -L/usr/bin/devkit/tuner/lib -lrfevents
    • Fortran:-I /usr/bin/devkit/tuner/include -L/usr/bin/devkit/tuner/lib -lrfevents
  3. 需保证运行时动态库寻址路径包含/usr/bin/devkit/tuner/lib, 比如在LD_LIBRARY_PATH中增加/usr/bin/devkit/tuner/lib路径。
  4. Roofline分析任务采集模式选择region模式,对插桩后编译生成的应用进行采集。

Roofline Events API介绍

数据是按线程收集,因此需注意以下规则:

  • 在串行代码中initialize/finalize(例如主线程)。
  • 如果需要分析所有线程数据,start/stop API需要放置在并行代码中。
  • 支持多个region,但不支持region之间嵌套,即同一个region的start/stop API需要成对且region之间不交错。
  • region名称用于匹配线程之间的region数据。
  • 以ROOFLINE_EVENTS开头的接口可以通过ROOFLINE_EVENTS编译选项进行开启和关闭,宏定义能力适用于C/C++。
  • 以perf_roofline_events结尾的接口适用于C/C++/Fortran,不支持编译选项开关。
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#ifdef ROOFLINE_EVENTS
#define ROOFLINE_EVENTS_INIT init_perf_roofline_events()
#define ROOFLINE_EVENTS_START_REGION(region_label) start_perf_roofline_events(region_label)
#define ROOFLINE_EVENTS_STOP_REGION(region_label) stop_perf_roofline_events(region_label)
#define ROOFLINE_EVENTS_FINALIZE finalize_perf_roofline_events()
#else
#define ROOFLINE_EVENTS_INIT
#define ROOFLINE_EVENTS_START_REGION(region_label)
#define ROOFLINE_EVENTS_STOP_REGION(region_label)
#define ROOFLINE_EVENTS_FINALIZE
#endif

#ifdef __cplusplus
extern "C" {
#endif
// read system counters -> init
// should be called in serial code before start_perf_roofline_events
extern void init_perf_roofline_events(void) __attribute__((visibility("default")));
// start roofline events for current thread and provided region
// should be called in parallel code
extern void start_perf_roofline_events(const char* region) __attribute__((visibility("default")));
// stop roofline events for current thread and provided region
// should be called in parallel code
extern void stop_perf_roofline_events(const char* region) __attribute__((visibility("default")));
// summarize data for all regions
// should be called in serial code after stop_perf_roofline_events for all regions/threads
extern void finalize_perf_roofline_events(void) __attribute__((visibility("default")));
#ifdef __cplusplus
}
#endif

插桩示例