（可选）安装Spark/Hive UDF插件

如需使用特定的数据处理操作，可使用UDF函数下推到OmniData算子下推服务时，需安装UDF依赖包、配置Hive UDF插件等步骤，请根据实际情况上传UDF的JAR包。

当需要把UDF函数下推到OmniData算子下推服务时，需要安装UDF依赖包，以huawei-udf为例。

HDFS上传huawei_udf.jar。

       
            hdfs dfs -mkdir -p /user/BICoreData/hive/fiudflib2/
hdfs dfs -put huawei_udf.jar /user/BICoreData/hive/fiudflib2/

运行UDF前需要将其注册到metastore，注册的方式有很多，本节以isEmpty为例。

       
            set spark.sql.ndp.udf.whitelist=isEmpty;
CREATE TEMPORARY FUNCTION isEmpty AS 'com.huawei.platform.bi.udf.common.IsEmptyUDF' USING JAR 'hdfs:/user/BICoreData/hive/fiudflib2/huawei_udf.jar';

配置Hive UDF插件。
在OmniData算子下推Hive UDF配置文件hive.properties中添加注册IsEmpty com.huawei.platform.bi.udf.common.IsEmptyUDF函数。详见添加Hive UDF插件。

运行Spark Hive UDF算子下推。

       
            /usr/local/spark/bin/spark-sql  --driver-class-path '/opt/boostkit/*' --jars '/opt/boostkit/*' --conf 'spark.executor.extraClassPath=./*' --name IsEmptyUDF.sql --driver-memory 50G --driver-java-options -Dlog4j.configuration=file:../conf/log4j.properties --executor-memory 32G --num-executors 30 --executor-cores 18 --properties-file  tpch_query.conf  --database tpch_flat_orc_date_5 -f IsEmptyUDF.sql;

执行结果如下。

IsEmptyUDF.sql文件的内容如下。

         
              set spark.sql.ndp.udf.whitelist=isEmpty;

CREATE TEMPORARY FUNCTION isEmpty AS 'com.huawei.platform.bi.udf.common.IsEmptyUDF' USING JAR 'hdfs:/user/BICoreData/hive/fiudflib2/huawei_udf.jar';
select sum(l_extendedprice) as sum_base_price from lineitem where   !isEmpty(l_shipmode);

父主题： 在Spark引擎上的应用