中文
注册
我要评分
文档获取效率
文档正确性
内容完整性
文档易理解
在线提单
论坛求助

构建机器学习Spark算法库适配代码

构建机器学习算法加速库适配代码Spark-ml-algo-lib过程如下。此过程以适配Spark 2.3.2代码的构建为例,适配Spark2.4.6代码与之相似,可参考下面过程。:

  1. 下载Spark 2.3.2源码zip包到“/opt/”目录并解压,得到Spark源码目录“/opt/spark-2.3.2”

    获取地址:https://github.com/apache/spark/archive/v2.3.2.zip

    1
    2
    wget https://github.com/apache/spark/archive/v2.3.2.zip
    unzip v2.3.2.zip
    
  2. 获取Breeze 0.13.1源码zip包到“/opt/”目录并解压,得到Breeze源码目录“/opt/breeze-releases-v0.13.1”

    获取地址:https://github.com/scalanlp/breeze/archive/releases/v0.13.1.zip

    1
    2
    wget https://github.com/scalanlp/breeze/archive/releases/v0.13.1.zip
    unzip v0.13.1.zip
    
  3. 获取xgboost 1.1.0源码包到“/opt/”目录并解压,得到xgboost源码目录“/opt/xgboost-1.1.0”
    1
    2
    wget https://github.com/dmlc/xgboost/archive/refs/tags/v1.1.0.zip
    unzip xgboost-1.1.0.zip
    
  4. 获取cub源码包到“/opt/xgboost-1.1.0”目录中并解压,得到cub源码目录“/opt/xgboost-1.1.0/cub-b20808b1b04ec3d6a625e51fbc1eb76f337754ad”,然后将“/opt/xgboost-1.1.0/cub”目录删除,删除后将“/opt/xgboost-1.1.0/cub-b20808b1b04ec3d6a625e51fbc1eb76f337754ad”目录重命名为“/opt/xgboost-1.1.0/cub”
    1
    2
    3
    4
    wget https://github.com/NVlabs/cub/archive/b20808b1b04ec3d6a625e51fbc1eb76f337754ad.zip
    unzip cub-b20808b1b04ec3d6a625e51fbc1eb76f337754ad.zip
    rm -rf cub
    mv cub-b20808b1b04ec3d6a625e51fbc1eb76f337754ad cub
    
  5. 获取dmlc-core源码包到“/opt/xgboost-1.1.0”目录中并解压,得到dmlc-core源码目录“/opt/xgboost-1.1.0/dmlc-core-5df8305fe699d3b503d10c60a231ab0223142407”,然后将“/opt/xgboost-1.1.0/dmlc-core”目录删除,删除后将“/opt/xgboost-1.1.0/dmlc-core-5df8305fe699d3b503d10c60a231ab0223142407”目录重命名为“/opt/xgboost-1.1.0/dmlc-core”
    1
    2
    3
    4
    wget https://github.com/dmlc/dmlc-core/archive/5df8305fe699d3b503d10c60a231ab0223142407.zip
    unzip dmlc-core-5df8305fe699d3b503d10c60a231ab0223142407.zip
    rm -rf dmlc-core
    mv dmlc-core-5df8305fe699d3b503d10c60a231ab0223142407 dmlc-core
    
  6. 获取rabit源码包到“/opt/xgboost-1.1.0”目录中并解压,得到rabit源码目录“/opt/xgboost-1.1.0/rabit-4fb34a008db6437c84d1877635064e09a55c8553”,然后将“/opt/xgboost-1.1.0/rabit”目录删除,删除后将“/opt/xgboost-1.1.0/rabit-4fb34a008db6437c84d1877635064e09a55c8553”目录重命名为“/opt/xgboost-1.1.0/rabit”
    1
    2
    3
    4
    wget https://github.com/dmlc/rabit/archive/4fb34a008db6437c84d1877635064e09a55c8553.zip
    unzip rabit-4fb34a008db6437c84d1877635064e09a55c8553.zip
    rm -rf rabit
    mv rabit-4fb34a008db6437c84d1877635064e09a55c8553 rabit
    
  7. “/opt/”目录下建立一个层级为如下所示的目录的工程Spark-ml-algo-lib。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    cd /opt/
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/breeze/optimize
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/breeze/numerics
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/classification
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/optim/aggregator
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/optim/loss
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/recommendation
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/regression
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/stat
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/tree/impl
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/clustering
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/fpm
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/distributed
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/stat/correlation
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/tree
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/tree/impl
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/clustering
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/fpm
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/tree/impurity
    cp xgboost-1.1.0 Spark-ml-algo-lib/ml-xgboost
    
  8. 按照表1表2的对应关系将Spark 2.3.2和Breeze 0.13.1中的对应原文件复制到Spark-ml-algo-lib目录,表格左边两列是目标目录和文件名,右边两列的是需要移动的原文件目录及文件名。按照表表3将xgboost原生的代码中不需要的部分删除,然后将剩下的代码拷贝至“Spark-ml-algo-lib/ml-xgboost”目录下。按照表4将部分文件夹修改为所需要的名字,第一列为当前目录的名字,第二列为修改后目录的名字。由于需要复制的文件很多,操作的代码只给出两个示例。

    有些文件在复制到目标文件夹后需要改名。

    操作命令示例:
    1
    cp /opt/spark-2.3.2/mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala /opt/Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala
    
    1
    cp /opt/breeze-releases-v0.13.1/math/src/main/scala/breeze/optimize/FirstOrderMinimizer.scala /opt/Bigdata_ML_ALGO_ACC_LIB/ml-accelerator/src/main/scala/breeze/optimize/FirstOrderMinimizerX.scala
    
    表1 Spark中需要放入Spark-ml-algo-lib工程的文件

    Spark-ml-algo-lib工程目录

    Spark-ml-algo-lib工程文件名

    Spark原文件所在目录

    Spark原文件名

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/classification/

    GBTClassifier.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/classification/

    GBTClassifier.scala

    LinearSVC.scala

    LinearSVC.scala

    RandomForestClassifier.scala

    RandomForestClassifier.scala

    DecisionTreeClassifier.scala

    DecisionTreeClassifier.scala

    LogisticRegression.scala

    LogisticRegression.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/optim/aggregator/

    DifferentiableLossAggregatorX.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/optim/aggregator/

    DifferentiableLossAggregator.scala

    HingeAggregatorX.scala

    HingeAggregator.scala

    HuberAggregatorX.scala

    HuberAggregator.scala

    LeastSquaresAggregatorX.scala

    LeastSquaresAggregator.scala

    LogisticAggregatorX.scala

    LogisticAggregator.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/optim/loss/

    RDDLossFunctionX.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/optim/loss/

    RDDLossFunction.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/recommendation/

    ALS.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/recommendation

    ALS.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/regression/

    DecisionTreeRegressor.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/optim/loss/

    DecisionTreeRegressor.scala

    GBTRegressor.scala

    GBTRegressor.scala

    LinearRegression.scala

    LinearRegression.scala

    RandomForestRegressor.scala

    RandomForestRegressor.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/stat/

    Correlation.scala

    spark-2.3.2/mllib/src/main/scalaorg/apache/spark/ml/stat/

    Correlation.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/tree/impl/

    GradientBoostedTrees.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/tree/impl/

    GradientBoostedTrees.scala

    NodeIdCache.scala

    NodeIdCache.scala

    RandomForest.scala

    RandomForest.scala

    RandomForest4GBDTX.scala

    RandomForest.scala

    RandomForestRaw.scala

    RandomForest.scala

    DecisionForest.scala

    RandomForest.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/tree/

    treeParams.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/tree/

    treeParams.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/clustering/

    KMACCm.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/mllib/clustering

    KMeans.scala

    KMeans.scala

    KMeans.scala

    LDA.scala

    LDA.scala

    LDAOptimizer.scala

    LDAOptimizer.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/fpm/

    PrefixSpan.scala

    spark-2.3.2\mllib\src\main\scala\org\apache\spark\mllib\fpm

    PrefixSpan.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/distributed/

    RowMatrix.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/mllib/linalg/distributed

    RowMatrix.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/

    EigenValueDecomposition.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/mllib/linalg

    EigenValueDecomposition.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/stat/correlation/

    Correlation.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/mllib/stat/correlation/

    Correlation.scala

    PearsonCorrelation.scala

    PearsonCorrelation.scala

    SpearmanCorrelation.scala

    SpearmanCorrelation.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/tree/

    DecisionTree.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/mllib/tree

    DecisionTree.scala

    Spark-ml-algo-lib/ml-core/ src/main/scala/org/apache/spark/ml/tree/

    Node.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/tree/

    Node.scala

    Split.scala

    Split.scala

    Spark-ml-algo-lib/ml-core/ src/main/scala/org/apache/spark/ml/tree/impl

    BaggedPoint.scala

    spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/tree/impl/

    BaggedPoint.scala

    DTFeatureStatsAggregator.scala

    DTStatsAggregator.scala

    DTStatsAggregator.scala

    DTStatsAggregator.scala

    GradientBoostedTreesCore.scala

    RandomForest.scala

    TreePointX.scala

    TreePoint.scala

    TreePointY.scala

    TreePoint.scala

    Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/clustering/

    LDAUtilsX.scala

    spark-2.3.2/mllib/src/main/scala/org/apache/spark/mllib/clustering

    LDAUtils.scala

    OnlineLDAOptimizerXObj.scala

    LDAOptimizer.scala

    Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/fpm/

    LocalPrefixSpan.scala

    spark-2.3.2/mllib/src/main/scala/org/apache/spark/mllib/fpm/

    LocalPrefixSpan.scala

    PrefixSpanBase.scal

    PrefixSpan.scala

    Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/tree/impurity/

    Entropy.scala

    spark-2.3.2/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity

    Entropy.scala

    Gini.scala

    Gini.scala

    Impurities.scala

    Impurities.scala

    Impurity.scala

    Impurity.scala

    Variance.scala

    Variance.scala

    表2 Breeze中需要放入Spark-ml-algo-lib工程的文件

    Spark-ml-algo-lib工程目录

    Spark-ml-algo-lib工程文件名

    Breeze原文件所在目录

    Breeze原文件名

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/breeze/optimize

    FirstOrderMinimizerX.scala

    breeze-releases-v0.13.1/math/src/ main/scala/breeze/optimize

    FirstOrderMinimizer.scala

    LBFGSX.scala

    LBFGS.scala

    OWLQNX.scala

    OWLQN.scala

    Spark-ml-algo-lib/ml-core/ src/main/scala/breeze/numerics/

    DigammaX.scala

    breeze-releases-v0.13.1/math/src/main/scala/breeze/numerics/

    package.scala

    表3 xgboost原生代码中需要删除的文件或目录

    需要删除的文件或目录

    xgboost-1.1.0/.github

    xgboost-1.1.0/cub/.settings

    xgboost-1.1.0/cub/.project

    xgboost-1.1.0/dmlc-core/.github

    xgboost-1.1.0/dmlc-core/make/config.mk

    xgboost-1.1.0/dmlc-core/test/unittest/sample.rec

    xgboost-1.1.0/doc/_static

    xgboost-1.1.0/rabit/lib

    xgboost-1.1.0/R-package/data

    xgboost-1.1.0/.gitignore

    表4 需要修改名字的目录

    Spark-ml-algo-lib工程目录

    修改后目录的名字

    Spark-ml-algo-lib/ml-xgboost/jvm-package/xgboost4j

    Spark-ml-algo-lib/ml-xgboost/jvm-package/boostkit-xgboost4j

    Spark-ml-algo-lib/ml-xgboost/jvm-package/xgboost4j-example

    Spark-ml-algo-lib/ml-xgboost/jvm-package/boostkit-xgboost4j-example

    Spark-ml-algo-lib/ml-xgboost/jvm-package/xgboost4j-flink

    Spark-ml-algo-lib/ml-xgboost/jvm-package/boostkit-xgboost4j-flink

    Spark-ml-algo-lib/ml-xgboost/jvm-package/xgboost4j-spark

    Spark-ml-algo-lib/ml-xgboost/jvm-package/boostkit-xgboost4j-spark

    Spark-ml-algo-lib/ml-xgboost/jvm-package/xgboost4j-tester

    Spark-ml-algo-lib/ml-xgboost/jvm-package/boostkit-xgboost4j-tester

    完成8后,Spark-ml-algo-lib工程的目录结构及目录下的文件如下:

    Spark-ml-algo-lib
    ├── ml-accelerator
    │   └── src
    │       └── main
    │           └── scala
    │               ├── breeze
    │               │   └── optimize
    │               │       ├── FirstOrderMinimizeXr.scala
    │               │       ├── LBFGSX.scala
    │               │       └── OWLQNX.scala
    │               └── org
    │                   └── apache
    │                       └── spark
    │                           ├── ml
    │                           │   ├── classification
    │                           │   │   ├── DecisionTreeClassifier.scala
    │                           │   │   ├── GBTClassifier.scala
    │                           │   │   ├── LinearSVC.scala
    │                           │   │   ├── LogisticRegression.scala
    │                           │   │   └── RandomForestClassifier.scala
    │                           │   ├── optim
    │                           │   │   ├── aggregator
    │                           │   │   │   ├── DifferentiableLossAggregatorX.scala
    │                           │   │   │   ├── HingeAggregatorX.scala
    │                           │   │   │   ├── HuberAggregatorX.scala
    │                           │   │   │   ├── LeastSquaresAggregatorX.scala
    │                           │   │   │   └── LogisticAggregatorX.scala
    │                           │   │   └── loss
    │                           │   │       └── RDDLossFunctionX.scala
    │                           │   ├── recommendation
    │                           │   │   └── ALS.scala
    │                           │   ├── regression
    │                           │   │   ├── DecisionTreeRegressor.scala
    │                           │   │   ├── GBTRegressor.scala
    │                           │   │   ├── LinearRegression.scala
    │                           │   │   └── RandomForestRegressor.scala
    │                           │   ├── stat
    │                           │   │   └── Correlation.scala
    │                           │   └── tree
    │                           │       ├── impl
    │                           │       │   ├── DecisionForest.scala
    │                           │       │   ├── GradientBoostedTrees.scala
    │                           │       │   ├── NodeIdCache.scala
    │                           │       │   ├── RandomForest4GBDTX.scala
    │                           │       │   ├── RandomForestRaw.scala
    │                           │       │   └── RandomForest.scala
    │                           │       └── treeParams.scala
    │                           └── mllib
    │                               ├── clustering
    │                               │   ├── KMACCm.scala
    │                               │   ├── KMeans.scala
    │                               │   ├── LDAOptimizer.scala
    │                               │   └── LDA.scala
    │                               ├── fpm
    │                               │   └── PrefixSpan.scala
    │                               ├── linalg
    │                               │   ├── distributed
    │                               │   │   └── RowMatrix.scala
    │                               │   └── EigenValueDecomposition.scala
    │                               ├── stat
    │                               │   └── correlation
    │                               │       ├── Correlation.scala
    │                               │       ├── PearsonCorrelation.scala
    │                               │       └── SpearmanCorrelation.scala
    │                               └── tree
    │                                   └── DecisionTree.scala
    ├── ml-core
    │   └── src
    │       └── main
    │           └── scala
    │               ├── breeze
    │               │   └── numerics
    │               │       └── DigammaX.scala
    │               └── org
    │                   └── apache
    │                       └── spark
    │                           ├── ml
    │                           │   └── tree
    │                           │       ├── impl
    │                           │       │   ├── BaggedPoint.scala
    │                           │       │   ├── DTFeatureStatsAggregator.scala
    │                           │       │   ├── DTStatsAggregator.scala
    │                           │       │   ├── GradientBoostedTreesCore.scala
    │                           │       │   ├── TreePointX.scala
    │                           │       │   └── TreePointY.scala
    │                           │       ├── Node.scala
    │                           │       └── Split.scala
    │                           └── mllib
    │                               ├── clustering
    │                               │   ├── LDAUtilsX.scala
    │                               │   └── OnlineLDAOptimizerXObj.scala
    │                               ├── fpm
    │                               │   ├── LocalPrefixSpan.scala
    │                               │   └── PrefixSpanBase.scala
    │                               └── tree
    │                                   └── impurity
    │                                       ├── Entropy.scala
    │                                       ├── Gini.scala
    │                                       ├── Impurities.scala
    │                                       ├── Impurity.scala
    │                                       └── Variance.scala
    └── ml-xgboost
        └──.../...
  9. 下载Spark-ml-algo-lib.patch到“/opt/Spark-ml-algo-lib/”目录下,将patch解压后并入Spark-ml-algo-lib,得到完整的机器学习算法加速库适配代码Spark-ml-algo-lib。
    1
    2
    3
    cd /opt/Spark-ml-algo-lib
    wget https://github.com/kunpengcompute/Spark-ml-algo-lib/releases/download/v1.3.0/Spark-ml-algo-lib.patch
    patch -p1 < Spark-ml-algo-lib.patch
    
    完整的机器学习算法加速库适配代码Spark-ml-algo-lib的目录及目录下的文件如下:
    Spark-ml-algo-lib
    ├── LICENSE
    ├── ml-accelerator
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── scala
    │               ├── breeze
    │               │   └── optimize
    │               │       ├── FirstOrderMinimizerX.scala
    │               │       ├── LBFGSX.scala
    │               │       └── OWLQNX.scala
    │               └── org
    │                   └── apache
    │                       └── spark
    │                           ├── ml
    │                           │   ├── classification
    │                           │   │   ├── DecisionTreeClassifier.scala
    │                           │   │   ├── GBTClassifier.scala
    │                           │   │   ├── LinearSVC.scala
    │                           │   │   ├── LogisticRegression.scala
    │                           │   │   └── RandomForestClassifier.scala
    │                           │   ├── optim
    │                           │   │   ├── aggregator
    │                           │   │   │   ├── DifferentiableLossAggregatorX.scala
    │                           │   │   │   ├── HingeAggregatorX.scala
    │                           │   │   │   ├── HuberAggregatorX.scala
    │                           │   │   │   ├── LeastSquaresAggregatorX.scala
    │                           │   │   │   └── LogisticAggregatorX.scala
    │                           │   │   └── loss
    │                           │   │       └── RDDLossFunctionX.scala
    │                           │   ├── recommendation
    │                           │   │   └── ALS.scala
    │                           │   ├── regression
    │                           │   │   ├── DecisionTreeRegressor.scala
    │                           │   │   ├── GBTRegressor.scala
    │                           │   │   ├── LinearRegression.scala
    │                           │   │   └── RandomForestRegressor.scala
    │                           │   ├── stat
    │                           │   │   └── Correlation.scala
    │                           │   └── tree
    │                           │       ├── impl
    │                           │       │   ├── DecisionForest.scala
    │                           │       │   ├── GradientBoostedTrees.scala
    │                           │       │   ├── NodeIdCache.scala
    │                           │       │   ├── RandomForest4GBDTX.scala
    │                           │       │   ├── RandomForestRaw.scala
    │                           │       │   └── RandomForest.scala
    │                           │       └── treeParams.scala
    │                           └── mllib
    │                               ├── clustering
    │                               │   ├── KMACCm.scala
    │                               │   ├── KMeans.scala
    │                               │   ├── LDAOptimizer.scala
    │                               │   └── LDA.scala
    │                               ├── fpm
    │                               │   └── PrefixSpan.scala
    │                               ├── linalg
    │                               │   ├── distributed
    │                               │   │   └── RowMatrix.scala
    │                               │   └── EigenValueDecomposition.scala
    │                               ├── stat
    │                               │   └── correlation
    │                               │        ├── Correlation.scala
    │                               │        ├── PearsonCorrelation.scala
    │                               │        └── SpearmanCorrelation.scala
    │                               └── tree
    │                                   └── DecisionTree.scala
    ├── ml-core
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── scala
    │               ├── breeze
    │               │   └── numerics
    │               │       └── DigammaX.scala
    │               └── org
    │                   └── apache
    │                       └── spark
    │                           ├── ml
    │                           │   └── tree
    │                           │       ├── impl
    │                           │       │   ├── BaggedPoint.scala
    │                           │       │   ├── DTFeatureStatsAggregator.scala
    │                           │       │   ├── DTStatsAggregator.scala
    │                           │       │   ├── GradientBoostedTreesCore.scala
    │                           │       │   ├── TreePointX.scala
    │                           │       │   └── TreePointY.scala
    │                           │       ├── Node.scala
    │                           │       └── Split.scala
    │                           └── mllib
    │                               ├── clustering
    │                               │   ├── LDAUtilsX.scala
    │                               │   └── OnlineLDAOptimizerXObj.scala
    │                               ├── fpm
    │                               │   ├── LocalPrefixSpan.scala
    │                               │   └── PrefixSpanBase.scala
    │                               └── tree
    │                                   └── impurity
    │                                       ├── Entropy.scala
    │                                       ├── Gini.scala
    │                                       ├── Impurities.scala
    │                                       ├── Impurity.scala
    │                                       └── Variance.scala
    ├── ml-kernel-client
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── scala
    │               ├── breeze
    │               │   ├── linalg
    │               │   │   ├── blas
    │               │   │   │   ├── Dgemv.scala
    │               │   │   │   ├── Gramian.scala
    │               │   │   │   └── YTYUtils.scala
    │               │   │   ├── DenseMatrixUtil.scala
    │               │   │   ├── DenseVectorUtil.scala
    │               │   │   └── lapack
    │               │   │       └── EigenDecomposition.scala
    │               │   └── optimize
    │               │       ├── ACC.scala
    │               │       ├── LBFGSL.scala
    │               │       └── OWLQNL.scala
    │               └── org
    │                   └── apache
    │                       └── spark
    │                           ├── ml
    │                           │   ├── neighbors
    │                           │   │   ├── KNN.scala
    │                           │   │   └── KNNUtils.scala
    │                           │   ├── recommendation
    │                           │   │   └── ALSUtils.scala
    │                           │   ├── StaticUtils.scala
    │                           │   └── tree
    │                           │       └── impl
    │                           │           ├── DTUtils.scala
    │                           │           ├── GradientBoostedTreesUtil.scala
    │                           │           └── RFUtils.scala
    │                           ├── mllib
    │                           │   └── fpm
    │                           │       └── PrefixSpanUtils.scala
    │                           ├── mllib.clustering
    │                           │   ├── KmeansUtil.scala
    │                           │   └── LDAUtilsXOpt.scala
    │                           └── mllib.linalg.distributed
    │                               └── RowMatrixUtil.scala
    ├── ml-kernel-client-core
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── scala
    │               └── org
    │                   └── apache
    │                       └── spark
    │                           └── mllib
    │                               ├── clustering
    │                               │   └── GammaX.scala
    │                               └── fpm
    │                                   └── LocalPrefixSpanUtils.scala
    ├── ml-xgboost
    │   └── .../...
    ├── pom.xml
    ├── README.md
    └── scalastyle-config.xml