任务执行
机器学习
在客户端下载并解压开发程序中样例代码中对应的数据集到“/tmp/data/ epsilon”目录,并执行任务,具体步骤如下:
- 进入“/tmp/data/epsilon”目录。
cd /tmp/data/epsilon
- 下载训练集。
wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.bz2
- 下载测试集。
wget https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/epsilon_normalized.t.bz2
- 解压训练集和测试集与当前目录。
bzip2 -d epsilon_normalized.bz2 bzip2 -d epsilon_normalized.t.bz2
- 上传训练集和测试集到HDFS上。
hadoop fs -put /tmp/data/epsilon/epsilon_normalized /tmp/data/epsilon/ hadoop fs -put /tmp/data/epsilon/epsilon_normalized.t /tmp/data/epsilon/
- 将任务执行中生成的kal_examples_2.11-0.1.jar和run_gbdt.sh放入3.2中客户端 “/home/test/sophon/”目录。 run_gbdt.sh内容如下:
spark-submit \ --class com.bigdata.examples.GBDTRunner \ --driver-class-path "./lib/*" \ --jars "lib/fastutil-8.3.1.jar,lib/sophon-ml-acc_2.11-1.2.0.jar,lib/sophon-ml-core_2.11-1.2.0.jar,lib/sophon-mlkernel-2.11-1.2.0-aarch_64.jar" \ --conf "spark.executor.extraClassPath=fastutil-8.3.1.jar:sophon-ml-acc_2.11-1.2.0.jar:sophon-ml-core_2.11-1.2.0.jar:sophon-ml-kernel-2.11-1.2.0-aarch_64.jar" \ --master yarn \ --deploy-mode client \ --driver-cores 40 \ --driver-memory 50g \ --executor-cores 19 --num-executors 12 --executor-memory 77g \ --numPartitions 228 \ ./kal_examples_2.11-0.1.jar
- 执行任务。
sh run_gbdt.sh
屏幕上查看打印结果。
- 结果说明(总共迭代100次,共用生成100棵子树),选取其中的3个子树(Tree 0, 1, 99)进行展示:
Test Error = 0.1687594970527827 // 预测的分类误差 Learned classification GBT model: GBTClassificationModel (uid=gbtc_dbb4de23ca65) with 100 trees Tree 0 (weight 1.0): //每棵子树权重 If (feature 818 <= 0.0028371200000000003) //第818维特征的分裂点为0.002831200000000003 If (feature 1866 <= 0.0064008599999999995) If (feature 315 <= -0.0067819) If (feature 789 <= -0.0100215) If (feature 936 <= 0.0018099549999999998) Predict: -0.21098494850805388 //第0棵子树预测的结果 Else (feature 936 > 0.0018099549999999998) Predict: 0.15191210648637227 Else (feature 789 > -0.0100215) If (feature 936 <= -4.79549E-4) Predict: 0.17726731948384736 Else (feature 936 > -4.79549E-4) Predict: 0.49173760640961445 Else (feature 315 > -0.0067819) If (feature 789 <= -0.0100215) If (feature 936 <= 0.00412001) Predict: -0.5011764705882353 Else (feature 936 > 0.00412001) Predict: -0.21027097384924862 Else (feature 789 > -0.0100215) If (feature 649 <= -0.008577419999999999) Predict: -0.20268122451800152 Else (feature 649 > -0.008577419999999999) Predict: 0.12942312334057446 Else (feature 1866 > 0.0064008599999999995) If (feature 1697 <= 0.02054865) If (feature 649 <= -0.00122175) If (feature 315 <= 1.620675E-4) Predict: 0.31944546321425954 Else (feature 315 > 1.620675E-4) Predict: -0.014997100008285691 Else (feature 649 > -0.00122175) If (feature 315 <= 0.01095895) Predict: 0.5328728914862532 Else (feature 315 > 0.01095895) Predict: 0.2712697181277476 Else (feature 1697 > 0.02054865) If (feature 649 <= -0.008577419999999999) If (feature 315 <= 1.620675E-4) Predict: 0.5659284497444633 Else (feature 315 > 1.620675E-4) Predict: 0.29297616536595655 Else (feature 649 > -0.008577419999999999) If (feature 1519 <= -0.0024157199999999997) Predict: 0.5493390716261912 Else (feature 1519 > -0.0024157199999999997) Predict: 0.7277585664885257 Else (feature 818 > 0.0028371200000000003) If (feature 1866 <= 0.008616374999999999) If (feature 789 <= -0.0100215) If (feature 1794 <= -0.015113399999999999) If (feature 315 <= -0.009021685) Predict: -0.14581734458940906 Else (feature 315 > -0.009021685) Predict: -0.43055000665867627 Else (feature 1794 > -0.015113399999999999) If (feature 649 <= -0.008577419999999999) Predict: -0.6742799137165334 Else (feature 649 > -0.008577419999999999) Predict: -0.466970082323807 Else (feature 789 > -0.0100215) If (feature 755 <= 0.00919656) If (feature 315 <= -0.002158415) Predict: 0.04372444164831708 Else (feature 315 > -0.002158415) Predict: -0.2799099183635169 Else (feature 755 > 0.00919656) If (feature 1697 <= 0.0110916) Predict: -0.5381727158948686 Else (feature 1697 > 0.0110916) Predict: -0.2597821083320546 Else (feature 1866 > 0.008616374999999999) If (feature 789 <= -0.0144677) If (feature 315 <= -0.00447581) If (feature 755 <= -0.0100993) Predict: 0.22025316455696203 Else (feature 755 > -0.0100993) Predict: -0.1234739607479524 Else (feature 315 > -0.00447581) If (feature 936 <= 0.0018099549999999998) Predict: -0.517205957883924 Else (feature 936 > 0.0018099549999999998) Predict: -0.18999735379730087 Else (feature 789 > -0.0144677) If (feature 315 <= 0.002566035) If (feature 649 <= -0.01302385) Predict: 0.11884615384615385 Else (feature 649 > -0.01302385) Predict: 0.43213844252163164 Else (feature 315 > 0.002566035) If (feature 936 <= 0.0018099549999999998) Predict: -0.1847658260422028 Else (feature 936 > 0.0018099549999999998) Predict: 0.14976641934597418 Tree 1 (weight 0.1): If (feature 1389 <= -0.0025451) If (feature 994 <= -0.00340289) If (feature 1814 <= -4.766875E-4) ..... Tree 99 (weight 0.1): If (feature 539 <= 0.0201035) If (feature 994 <= -0.0271962) If (feature 738 <= 0.020128649999999998) If (feature 1678 <= 0.001967885) If (feature 830 <= -0.01119755) Predict: -0.10394440691166018 Else (feature 830 > -0.01119755) Predict: 0.06684711260628788 Else (feature 1678 > 0.001967885) If (feature 958 <= -0.01238975) Predict: -0.05886208031154687 Else (feature 958 > -0.01238975) Predict: 0.19195532805094076 ......
父主题: 样例工程