使用OmniAdvisor参数调优特性计算出Hive任务的最佳运行参数，从而优化任务性能。

修改需要调优的参数列表、参数默认值和参数范围。

打开“$OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/config/hive/hive_config.yml”文件配置项。
```
vi $OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/config/hive/hive_config.yml
```

按“i”进入编辑模式，根据实际情况新增、删除、保留调优的参数，同时需要指定参数名称、参数选择范围区间、参数缺省值、数据类型和单位。

表1以hive.exec.reducers.max参数为例，介绍参数配置项。

表1 hive_config.yml参数配置项说明
配置项名称	配置说明
hive.exec.reducers.max	被调优的Hive配置参数名称。用于限制在执行Hive查询时可以启动的最大Reduce任务数量。
choices	参数值的选择范围。参数调优在进行推荐时，算法会从choices提供的选择范围中选择值。choices的范围通常以default_value作为中间值，按照实际可利用的资源进行上下范围的扩展。
default_value	参数缺省值。可参考实际业务所使用的值配置。缺省值必须包含在choices所提供的范围中，无特殊需求一般设为choices序列的中间值。
type	数据类型。支持类型有int、boolean、float。
unit	单位。支持的单位有K、M、G分别代表KB、MB、GB，一般默认使用GB表示。

常用配置如下，配置值仅供用户参考，实际使用时可以根据实际业务场景和可利用资源对choices、default_value进行调整，或者增加/减少参与调优的参数。

hive.exec.reducers.max: # 用于限制在执行Hive查询时可以启动的最大Reduce任务数量
  choices: [ 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500 ] # 参数选择的范围区间
  default_value: 600 # 参数默认值
  type: int # 支持的数据类型：int，boolean，float

hive.tez.container.size: # 用于配置Tez容器的大小
  choices: [ 5120, 6144, 7168, 8192, 9216, 10240, 11264, 12288, 13312, 14336, 15360, 16384, 17408, 18432, 19456, 20480 ]
  default_value: 8192
  type: int

tez.runtime.io.sort.mb: # 用于配置Tez运行时的I/O排序内存大小
  choices: [ 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352 ]
  default_value: 64
  type: int

tez.am.resource.memory.mb: # 用于配置Tez应用管理器（AM）的内存大小
  choices: [ 3072, 4096, 5120, 6144, 7168, 8192, 9216, 10240 ]
  default_value: 3072
  type: int

tez.grouping.split-count: # 用于控制在Tez DAG中生成的任务数量
  choices: [ 500, 1000, 1500, 2000, 2500, 3000 ]
  default_value: 1500
  type: int

tez.grouping.min-size: # 定义了每个任务处理的数据分片（Input Split）的最小字节
  choices: [ 8000000, 16000000, 32000000, 64000000 ]
  default_value: 8000000
  type: int

tez.grouping.max-size: # 指定了每个任务处理的最大数据分片大小
  choices: [ 64000000, 128000000, 256000000, 512000000, 768000000, 1024000000 ]
  default_value: 1024000000
  type: int

hive.stats.fetch.column.stats: # 用于配置是否启用Hive查询优化器来提取列级别的统计信息，以帮助优化查询计划
  choices: [ "true", "false" ]
  default_value: "true"
  type: boolean

hive.auto.convert.join: # 用于配置是否启用Hive的自动连接优化
  choices: [ "true", "false" ]
  default_value: "true"
  type: boolean

hive.optimize.skewjoin: # 用于配置是否启用Hive的倾斜连接优化
  choices: [ "true", "false" ]
  default_value: "true"
  type: boolean

hive.exec.compress.output: # 控制最后的输出是否需要压缩
  choices: [ "true", "false" ]
  default_value: "true"
  type: boolean

hive.exec.compress.intermediate: # 在MapReduce中间的临时文件是否需要被压缩
  choices: [ "true", "false" ]
  default_value: "true"
  type: boolean

hive.exec.parallel.thread.number: # 指定任务并行执行的数量
  choices: [ 4, 8, 16, 32, 64, 96 ]
  default_value: 8
  type: int

hive.auto.convert.join.noconditionaltask: # 是否允许将Common Join转成Map Join
  choices: [ "true", "false" ]
  default_value: "true"
  type: boolean

hive.auto.convert.join.noconditionaltask.size: # 需要将hive.auto.convert.join.noconditionaltask开启，决定是否可以转成Map Join的阈值
  choices: [ 10000000, 100000000, 1000000000, 2000000000, 5000000000, 6000000000 ]
  default_value: 10000000
  type: int

hive.limit.optimize.enable: # 是否启用对limit的优化
  choices: [ "true", "false" ]
  default_value: "true"
  type: boolean

按“Esc”键，输入:wq!，按“Enter”保存并退出编辑。

确认配置中的默认Hive conf。

当对以下Hive SQL(TPCDS sql12)进行调优。

         
              hive --hiveconf hive.cbo.enable=true --hiveconf hive.exec.reducers.max=600 --hiveconf hive.exec.compress.intermediate=true --hiveconf hive.tez.container.size=8192 --hiveconf tez.am.resource.memory.mb=8192 --hiveconf tez.task.resource.memory.mb=8192 --hiveconf tez.runtime.io.sort.mb=128 --hiveconf hive.merge.tezfiles=true --hiveconf tez.am.container.reuse.enabled=true --hiveconf hive.session.id=sql12 --database tpcds_bin_partitioned_decimal_orc_100 -f /home/hive-tpcds/sql12.sql

将相关配置可分为调优参数和非调优参数。其中调优参数指用户选择的需要进行调优的参数，非调优参数包括不可调优的参数以及用户不需要进行调优的参数。

以上述Hive SQL为例，调优参数为：

           
                --hiveconf hive.exec.reducers.max=600 --hiveconf hive.exec.compress.intermediate=true --hiveconf hive.tez.container.size=8192 --hiveconf tez.am.resource.memory.mb=8192 --hiveconf tez.task.resource.memory.mb=8192 --hiveconf tez.runtime.io.sort.mb=128

非调优参数为：

           
                --hiveconf hive.cbo.enable=true --hiveconf tez.am.container.reuse.enabled=true --hiveconf hive.merge.tezfiles=true

将非调优参数添加到“$OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/config/common_config.cfg”文件配置项中。

打开“$OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/config/common_config.cfg”文件配置项。
```
vi $OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/config/common_config.cfg
```

按“i”进入编辑模式，在hive_default_config字段追加非调优参数配置。

           
                # hive默认参数，一般默认参数不参与参数采样过程
hive_default_config = --hiveconf hive.cbo.enable=true --hiveconf tez.am.container.reuse.enabled=true --hiveconf hive.merge.tezfiles=true

按“Esc”键，输入:wq!，按“Enter”保存并退出编辑。

在管理节点初始化数据库及同步参数配置到日志解析模块。
1. 使用OmniAdvisor参数调优CLI选择Hive引擎。
```
python $OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/main.pyc
```
2. 输入init_environment或者按“Tab”查看提示选择init_environment，按“Enter”执行。
  执行成功后，会在test_advisor数据库中创建history_config表、best_config表。
  
  该步骤同时会将hive_config.yml中的调优参数同步至日志解析模块的配置中。当Hive调优参数有调整时，需要重新执行init_environment将参数配置同步到日志解析模块。

调用日志解析模块，将解析的数据写入数据库。

打开“$OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/config/common_config.cfg”配置文件。
```
vi $OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/config/common_config.cfg
```

按“i”进入编辑模式，修改日志起止时间。common_config.cfg配置文件参数说明可参见common_config.cfg。

         
              [hive]
# Tez运行日志的起始时间，可以从Hadoop UI上查看日期
log_start_time = 2023-09-14 19:12:45
# Tez运行日志的终止时间
log_end_time = 2023-09-14 19:19:45

按“Esc”键，输入:wq!，按“Enter”保存并退出编辑。
执行采集命令。
使用OmniAdvisor参数调优CLI选择Hive引擎。
```
python $OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/main.pyc
```
输入fetch_history_data或者按“Tab”查看提示选择fetch_history_data，按“Enter”执行。

历史任务信息解析成功后，将结果写入了history_config表和best_config表中。

采样历史任务的参数。
1. 使用OmniAdvisor参数调优CLI选择Hive引擎。
```
python $OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/main.pyc
```
2. 输入parameter_sampling或者按“Tab”查看提示选择parameter_sampling，按“Enter”执行。
3. 输入数字指定参数采样的轮数。
进行参数调优。
- 输入“yes”对数据库中所有可调优的任务进行参数采样n轮，等待采样结束即可。
- 输入“no”则对列举的任务进行筛选，输入需要采样调优的任务的identification，多个任务用“,”隔开，按“Enter”即可对指定的任务进行参数采样n轮，等待采样结束即可。
- 每次参数采样执行完成，会调用日志解析模块对该参数运行的任务状态、任务运行时间等任务信息进行解析保存到history_config表中，同时刷新best_config表中的最优配置等信息。
- 参数采样完成之后，才能对该任务进行参数推荐。
推荐采样中运行最优的参数来执行任务。
1. 使用OmniAdvisor参数调优CLI选择Hive引擎。
```
python $OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/main.pyc
```
2. 输入parameter_recommend或者按“Tab”查看提示选择parameter_recommend，按“Enter”执行。
3. 输入要调优的Hive任务指令，并通过OmniAdvisor参数调优来提交Hive任务。
  SQL场景以sql12.sql为例。
4. 快速采用OmniAdvisor参数调优进行参数推荐和提交。
```
python $OMNIADVISOR_HOME/BoostKit-omniadvisor_1.1.0/main.pyc -e hive -i parameter_recommend -c "hive --hiveconf hive.cbo.enable=true --hiveconf hive.exec.reducers.max=600 --hiveconf hive.exec.compress.intermediate=true --hiveconf hive.tez.container.size=8192 --hiveconf tez.am.resource.memory.mb=8192 --hiveconf tez.task.resource.memory.mb=8192 --hiveconf tez.runtime.io.sort.mb=128 --hiveconf hive.merge.tezfiles=true --hiveconf tez.am.container.reuse.enabled=true --hiveconf hive.session.id=sql12 --database tpcds_bin_partitioned_decimal_orc_100 -f /home/hive_sql/sample-queries-tpcds/query12.sql"
```
- 当进行参数推荐时，会根据配置中的identification_type中的类型计算推荐任务的identification值，并匹配best_config中的最优参数，替换原始参数提交到Hive执行。
- 当未在best_config表中成功匹配，或者匹配到的参数执行失败，则回退到原始提交的参数执行。

调优Hive任务