构建机器学习Spark算法库适配代码
构建机器学习算法加速库适配代码Spark-ml-algo-lib过程如下。此过程以适配Spark 2.3.2代码的构建为例,适配Spark2.4.6代码与之相似,可参考下面过程。:
- 下载Spark 2.3.2源码zip包到“/opt/”目录并解压,得到Spark源码目录“/opt/spark-2.3.2”。
获取地址:https://github.com/apache/spark/archive/v2.3.2.zip。
1 2
wget https://github.com/apache/spark/archive/v2.3.2.zip unzip v2.3.2.zip
- 获取Breeze 0.13.1源码zip包到“/opt/”目录并解压,得到Breeze源码目录“/opt/breeze-releases-v0.13.1”。
获取地址:https://github.com/scalanlp/breeze/archive/releases/v0.13.1.zip
1 2
wget https://github.com/scalanlp/breeze/archive/releases/v0.13.1.zip unzip v0.13.1.zip
- 获取xgboost 1.1.0源码包到“/opt/”目录并解压,得到xgboost源码目录“/opt/xgboost-1.1.0”。
1 2
wget https://github.com/dmlc/xgboost/archive/refs/tags/v1.1.0.zip unzip xgboost-1.1.0.zip
- 获取cub源码包到“/opt/xgboost-1.1.0”目录中并解压,得到cub源码目录“/opt/xgboost-1.1.0/cub-b20808b1b04ec3d6a625e51fbc1eb76f337754ad”,然后将“/opt/xgboost-1.1.0/cub”目录删除,删除后将“/opt/xgboost-1.1.0/cub-b20808b1b04ec3d6a625e51fbc1eb76f337754ad”目录重命名为“/opt/xgboost-1.1.0/cub”。
1 2 3 4
wget https://github.com/NVlabs/cub/archive/b20808b1b04ec3d6a625e51fbc1eb76f337754ad.zip unzip cub-b20808b1b04ec3d6a625e51fbc1eb76f337754ad.zip rm -rf cub mv cub-b20808b1b04ec3d6a625e51fbc1eb76f337754ad cub
- 获取dmlc-core源码包到“/opt/xgboost-1.1.0”目录中并解压,得到dmlc-core源码目录“/opt/xgboost-1.1.0/dmlc-core-5df8305fe699d3b503d10c60a231ab0223142407”,然后将“/opt/xgboost-1.1.0/dmlc-core”目录删除,删除后将“/opt/xgboost-1.1.0/dmlc-core-5df8305fe699d3b503d10c60a231ab0223142407”目录重命名为“/opt/xgboost-1.1.0/dmlc-core”。
1 2 3 4
wget https://github.com/dmlc/dmlc-core/archive/5df8305fe699d3b503d10c60a231ab0223142407.zip unzip dmlc-core-5df8305fe699d3b503d10c60a231ab0223142407.zip rm -rf dmlc-core mv dmlc-core-5df8305fe699d3b503d10c60a231ab0223142407 dmlc-core
- 获取rabit源码包到“/opt/xgboost-1.1.0”目录中并解压,得到rabit源码目录“/opt/xgboost-1.1.0/rabit-4fb34a008db6437c84d1877635064e09a55c8553”,然后将“/opt/xgboost-1.1.0/rabit”目录删除,删除后将“/opt/xgboost-1.1.0/rabit-4fb34a008db6437c84d1877635064e09a55c8553”目录重命名为“/opt/xgboost-1.1.0/rabit”。
1 2 3 4
wget https://github.com/dmlc/rabit/archive/4fb34a008db6437c84d1877635064e09a55c8553.zip unzip rabit-4fb34a008db6437c84d1877635064e09a55c8553.zip rm -rf rabit mv rabit-4fb34a008db6437c84d1877635064e09a55c8553 rabit
- 在“/opt/”目录下建立一个层级为如下所示的目录的工程Spark-ml-algo-lib。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
cd /opt/ mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/breeze/optimize mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/breeze/numerics mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/classification mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/optim/aggregator mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/optim/loss mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/recommendation mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/regression mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/stat mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/tree/impl mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/clustering mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/fpm mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/distributed mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/stat/correlation mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/tree mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/tree/impl mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/clustering mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/fpm mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/tree/impurity cp xgboost-1.1.0 Spark-ml-algo-lib/ml-xgboost
- 按照表1、表2的对应关系将Spark 2.3.2和Breeze 0.13.1中的对应原文件复制到Spark-ml-algo-lib目录,表格左边两列是目标目录和文件名,右边两列的是需要移动的原文件目录及文件名。按照表表3将xgboost原生的代码中不需要的部分删除,然后将剩下的代码拷贝至“Spark-ml-algo-lib/ml-xgboost”目录下。按照表4将部分文件夹修改为所需要的名字,第一列为当前目录的名字,第二列为修改后目录的名字。由于需要复制的文件很多,操作的代码只给出两个示例。
有些文件在复制到目标文件夹后需要改名。
操作命令示例:1
cp /opt/spark-2.3.2/mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala /opt/Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala
1
cp /opt/breeze-releases-v0.13.1/math/src/main/scala/breeze/optimize/FirstOrderMinimizer.scala /opt/Bigdata_ML_ALGO_ACC_LIB/ml-accelerator/src/main/scala/breeze/optimize/FirstOrderMinimizerX.scala
表1 Spark中需要放入Spark-ml-algo-lib工程的文件 Spark-ml-algo-lib工程目录
Spark-ml-algo-lib工程文件名
Spark原文件所在目录
Spark原文件名
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/classification/
GBTClassifier.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/classification/
GBTClassifier.scala
LinearSVC.scala
LinearSVC.scala
RandomForestClassifier.scala
RandomForestClassifier.scala
DecisionTreeClassifier.scala
DecisionTreeClassifier.scala
LogisticRegression.scala
LogisticRegression.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/optim/aggregator/
DifferentiableLossAggregatorX.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/optim/aggregator/
DifferentiableLossAggregator.scala
HingeAggregatorX.scala
HingeAggregator.scala
HuberAggregatorX.scala
HuberAggregator.scala
LeastSquaresAggregatorX.scala
LeastSquaresAggregator.scala
LogisticAggregatorX.scala
LogisticAggregator.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/optim/loss/
RDDLossFunctionX.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/optim/loss/
RDDLossFunction.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/recommendation/
ALS.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/recommendation
ALS.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/regression/
DecisionTreeRegressor.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/optim/loss/
DecisionTreeRegressor.scala
GBTRegressor.scala
GBTRegressor.scala
LinearRegression.scala
LinearRegression.scala
RandomForestRegressor.scala
RandomForestRegressor.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/stat/
Correlation.scala
spark-2.3.2/mllib/src/main/scalaorg/apache/spark/ml/stat/
Correlation.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/tree/impl/
GradientBoostedTrees.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/tree/impl/
GradientBoostedTrees.scala
NodeIdCache.scala
NodeIdCache.scala
RandomForest.scala
RandomForest.scala
RandomForest4GBDTX.scala
RandomForest.scala
RandomForestRaw.scala
RandomForest.scala
DecisionForest.scala
RandomForest.scala
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/tree/
treeParams.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/tree/
treeParams.scala
Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/clustering/
KMACCm.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/mllib/clustering
KMeans.scala
KMeans.scala
KMeans.scala
LDA.scala
LDA.scala
LDAOptimizer.scala
LDAOptimizer.scala
Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/fpm/
PrefixSpan.scala
spark-2.3.2\mllib\src\main\scala\org\apache\spark\mllib\fpm
PrefixSpan.scala
Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/distributed/
RowMatrix.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/mllib/linalg/distributed
RowMatrix.scala
Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/
EigenValueDecomposition.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/mllib/linalg
EigenValueDecomposition.scala
Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/stat/correlation/
Correlation.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/mllib/stat/correlation/
Correlation.scala
PearsonCorrelation.scala
PearsonCorrelation.scala
SpearmanCorrelation.scala
SpearmanCorrelation.scala
Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/tree/
DecisionTree.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/mllib/tree
DecisionTree.scala
Spark-ml-algo-lib/ml-core/ src/main/scala/org/apache/spark/ml/tree/
Node.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/tree/
Node.scala
Split.scala
Split.scala
Spark-ml-algo-lib/ml-core/ src/main/scala/org/apache/spark/ml/tree/impl
BaggedPoint.scala
spark-2.3.2/mllib/src/main/scala/org/ apache/spark/ml/tree/impl/
BaggedPoint.scala
DTFeatureStatsAggregator.scala
DTStatsAggregator.scala
DTStatsAggregator.scala
DTStatsAggregator.scala
GradientBoostedTreesCore.scala
RandomForest.scala
TreePointX.scala
TreePoint.scala
TreePointY.scala
TreePoint.scala
Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/clustering/
LDAUtilsX.scala
spark-2.3.2/mllib/src/main/scala/org/apache/spark/mllib/clustering
LDAUtils.scala
OnlineLDAOptimizerXObj.scala
LDAOptimizer.scala
Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/fpm/
LocalPrefixSpan.scala
spark-2.3.2/mllib/src/main/scala/org/apache/spark/mllib/fpm/
LocalPrefixSpan.scala
PrefixSpanBase.scal
PrefixSpan.scala
Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/tree/impurity/
Entropy.scala
spark-2.3.2/mllib/src/main/scala/org/apache/spark/mllib/tree/impurity
Entropy.scala
Gini.scala
Gini.scala
Impurities.scala
Impurities.scala
Impurity.scala
Impurity.scala
Variance.scala
Variance.scala
表2 Breeze中需要放入Spark-ml-algo-lib工程的文件 Spark-ml-algo-lib工程目录
Spark-ml-algo-lib工程文件名
Breeze原文件所在目录
Breeze原文件名
Spark-ml-algo-lib/ml-accelerator/ src/main/scala/breeze/optimize
FirstOrderMinimizerX.scala
breeze-releases-v0.13.1/math/src/ main/scala/breeze/optimize
FirstOrderMinimizer.scala
LBFGSX.scala
LBFGS.scala
OWLQNX.scala
OWLQN.scala
Spark-ml-algo-lib/ml-core/ src/main/scala/breeze/numerics/
DigammaX.scala
breeze-releases-v0.13.1/math/src/main/scala/breeze/numerics/
package.scala
表3 xgboost原生代码中需要删除的文件或目录 需要删除的文件或目录
xgboost-1.1.0/.github
xgboost-1.1.0/cub/.settings
xgboost-1.1.0/cub/.project
xgboost-1.1.0/dmlc-core/.github
xgboost-1.1.0/dmlc-core/make/config.mk
xgboost-1.1.0/dmlc-core/test/unittest/sample.rec
xgboost-1.1.0/doc/_static
xgboost-1.1.0/rabit/lib
xgboost-1.1.0/R-package/data
xgboost-1.1.0/.gitignore
表4 需要修改名字的目录 Spark-ml-algo-lib工程目录
修改后目录的名字
Spark-ml-algo-lib/ml-xgboost/jvm-package/xgboost4j
Spark-ml-algo-lib/ml-xgboost/jvm-package/boostkit-xgboost4j
Spark-ml-algo-lib/ml-xgboost/jvm-package/xgboost4j-example
Spark-ml-algo-lib/ml-xgboost/jvm-package/boostkit-xgboost4j-example
Spark-ml-algo-lib/ml-xgboost/jvm-package/xgboost4j-flink
Spark-ml-algo-lib/ml-xgboost/jvm-package/boostkit-xgboost4j-flink
Spark-ml-algo-lib/ml-xgboost/jvm-package/xgboost4j-spark
Spark-ml-algo-lib/ml-xgboost/jvm-package/boostkit-xgboost4j-spark
Spark-ml-algo-lib/ml-xgboost/jvm-package/xgboost4j-tester
Spark-ml-algo-lib/ml-xgboost/jvm-package/boostkit-xgboost4j-tester
完成8后,Spark-ml-algo-lib工程的目录结构及目录下的文件如下:
Spark-ml-algo-lib ├── ml-accelerator │ └── src │ └── main │ └── scala │ ├── breeze │ │ └── optimize │ │ ├── FirstOrderMinimizeXr.scala │ │ ├── LBFGSX.scala │ │ └── OWLQNX.scala │ └── org │ └── apache │ └── spark │ ├── ml │ │ ├── classification │ │ │ ├── DecisionTreeClassifier.scala │ │ │ ├── GBTClassifier.scala │ │ │ ├── LinearSVC.scala │ │ │ ├── LogisticRegression.scala │ │ │ └── RandomForestClassifier.scala │ │ ├── optim │ │ │ ├── aggregator │ │ │ │ ├── DifferentiableLossAggregatorX.scala │ │ │ │ ├── HingeAggregatorX.scala │ │ │ │ ├── HuberAggregatorX.scala │ │ │ │ ├── LeastSquaresAggregatorX.scala │ │ │ │ └── LogisticAggregatorX.scala │ │ │ └── loss │ │ │ └── RDDLossFunctionX.scala │ │ ├── recommendation │ │ │ └── ALS.scala │ │ ├── regression │ │ │ ├── DecisionTreeRegressor.scala │ │ │ ├── GBTRegressor.scala │ │ │ ├── LinearRegression.scala │ │ │ └── RandomForestRegressor.scala │ │ ├── stat │ │ │ └── Correlation.scala │ │ └── tree │ │ ├── impl │ │ │ ├── DecisionForest.scala │ │ │ ├── GradientBoostedTrees.scala │ │ │ ├── NodeIdCache.scala │ │ │ ├── RandomForest4GBDTX.scala │ │ │ ├── RandomForestRaw.scala │ │ │ └── RandomForest.scala │ │ └── treeParams.scala │ └── mllib │ ├── clustering │ │ ├── KMACCm.scala │ │ ├── KMeans.scala │ │ ├── LDAOptimizer.scala │ │ └── LDA.scala │ ├── fpm │ │ └── PrefixSpan.scala │ ├── linalg │ │ ├── distributed │ │ │ └── RowMatrix.scala │ │ └── EigenValueDecomposition.scala │ ├── stat │ │ └── correlation │ │ ├── Correlation.scala │ │ ├── PearsonCorrelation.scala │ │ └── SpearmanCorrelation.scala │ └── tree │ └── DecisionTree.scala ├── ml-core │ └── src │ └── main │ └── scala │ ├── breeze │ │ └── numerics │ │ └── DigammaX.scala │ └── org │ └── apache │ └── spark │ ├── ml │ │ └── tree │ │ ├── impl │ │ │ ├── BaggedPoint.scala │ │ │ ├── DTFeatureStatsAggregator.scala │ │ │ ├── DTStatsAggregator.scala │ │ │ ├── GradientBoostedTreesCore.scala │ │ │ ├── TreePointX.scala │ │ │ └── TreePointY.scala │ │ ├── Node.scala │ │ └── Split.scala │ └── mllib │ ├── clustering │ │ ├── LDAUtilsX.scala │ │ └── OnlineLDAOptimizerXObj.scala │ ├── fpm │ │ ├── LocalPrefixSpan.scala │ │ └── PrefixSpanBase.scala │ └── tree │ └── impurity │ ├── Entropy.scala │ ├── Gini.scala │ ├── Impurities.scala │ ├── Impurity.scala │ └── Variance.scala └── ml-xgboost └──.../...
- 下载Spark-ml-algo-lib.patch到“/opt/Spark-ml-algo-lib/”目录下,将patch解压后并入Spark-ml-algo-lib,得到完整的机器学习算法加速库适配代码Spark-ml-algo-lib。
1 2 3
cd /opt/Spark-ml-algo-lib wget https://github.com/kunpengcompute/Spark-ml-algo-lib/releases/download/v1.3.0/Spark-ml-algo-lib.patch patch -p1 < Spark-ml-algo-lib.patch
完整的机器学习算法加速库适配代码Spark-ml-algo-lib的目录及目录下的文件如下:Spark-ml-algo-lib ├── LICENSE ├── ml-accelerator │ ├── pom.xml │ └── src │ └── main │ └── scala │ ├── breeze │ │ └── optimize │ │ ├── FirstOrderMinimizerX.scala │ │ ├── LBFGSX.scala │ │ └── OWLQNX.scala │ └── org │ └── apache │ └── spark │ ├── ml │ │ ├── classification │ │ │ ├── DecisionTreeClassifier.scala │ │ │ ├── GBTClassifier.scala │ │ │ ├── LinearSVC.scala │ │ │ ├── LogisticRegression.scala │ │ │ └── RandomForestClassifier.scala │ │ ├── optim │ │ │ ├── aggregator │ │ │ │ ├── DifferentiableLossAggregatorX.scala │ │ │ │ ├── HingeAggregatorX.scala │ │ │ │ ├── HuberAggregatorX.scala │ │ │ │ ├── LeastSquaresAggregatorX.scala │ │ │ │ └── LogisticAggregatorX.scala │ │ │ └── loss │ │ │ └── RDDLossFunctionX.scala │ │ ├── recommendation │ │ │ └── ALS.scala │ │ ├── regression │ │ │ ├── DecisionTreeRegressor.scala │ │ │ ├── GBTRegressor.scala │ │ │ ├── LinearRegression.scala │ │ │ └── RandomForestRegressor.scala │ │ ├── stat │ │ │ └── Correlation.scala │ │ └── tree │ │ ├── impl │ │ │ ├── DecisionForest.scala │ │ │ ├── GradientBoostedTrees.scala │ │ │ ├── NodeIdCache.scala │ │ │ ├── RandomForest4GBDTX.scala │ │ │ ├── RandomForestRaw.scala │ │ │ └── RandomForest.scala │ │ └── treeParams.scala │ └── mllib │ ├── clustering │ │ ├── KMACCm.scala │ │ ├── KMeans.scala │ │ ├── LDAOptimizer.scala │ │ └── LDA.scala │ ├── fpm │ │ └── PrefixSpan.scala │ ├── linalg │ │ ├── distributed │ │ │ └── RowMatrix.scala │ │ └── EigenValueDecomposition.scala │ ├── stat │ │ └── correlation │ │ ├── Correlation.scala │ │ ├── PearsonCorrelation.scala │ │ └── SpearmanCorrelation.scala │ └── tree │ └── DecisionTree.scala ├── ml-core │ ├── pom.xml │ └── src │ └── main │ └── scala │ ├── breeze │ │ └── numerics │ │ └── DigammaX.scala │ └── org │ └── apache │ └── spark │ ├── ml │ │ └── tree │ │ ├── impl │ │ │ ├── BaggedPoint.scala │ │ │ ├── DTFeatureStatsAggregator.scala │ │ │ ├── DTStatsAggregator.scala │ │ │ ├── GradientBoostedTreesCore.scala │ │ │ ├── TreePointX.scala │ │ │ └── TreePointY.scala │ │ ├── Node.scala │ │ └── Split.scala │ └── mllib │ ├── clustering │ │ ├── LDAUtilsX.scala │ │ └── OnlineLDAOptimizerXObj.scala │ ├── fpm │ │ ├── LocalPrefixSpan.scala │ │ └── PrefixSpanBase.scala │ └── tree │ └── impurity │ ├── Entropy.scala │ ├── Gini.scala │ ├── Impurities.scala │ ├── Impurity.scala │ └── Variance.scala ├── ml-kernel-client │ ├── pom.xml │ └── src │ └── main │ └── scala │ ├── breeze │ │ ├── linalg │ │ │ ├── blas │ │ │ │ ├── Dgemv.scala │ │ │ │ ├── Gramian.scala │ │ │ │ └── YTYUtils.scala │ │ │ ├── DenseMatrixUtil.scala │ │ │ ├── DenseVectorUtil.scala │ │ │ └── lapack │ │ │ └── EigenDecomposition.scala │ │ └── optimize │ │ ├── ACC.scala │ │ ├── LBFGSL.scala │ │ └── OWLQNL.scala │ └── org │ └── apache │ └── spark │ ├── ml │ │ ├── neighbors │ │ │ ├── KNN.scala │ │ │ └── KNNUtils.scala │ │ ├── recommendation │ │ │ └── ALSUtils.scala │ │ ├── StaticUtils.scala │ │ └── tree │ │ └── impl │ │ ├── DTUtils.scala │ │ ├── GradientBoostedTreesUtil.scala │ │ └── RFUtils.scala │ ├── mllib │ │ └── fpm │ │ └── PrefixSpanUtils.scala │ ├── mllib.clustering │ │ ├── KmeansUtil.scala │ │ └── LDAUtilsXOpt.scala │ └── mllib.linalg.distributed │ └── RowMatrixUtil.scala ├── ml-kernel-client-core │ ├── pom.xml │ └── src │ └── main │ └── scala │ └── org │ └── apache │ └── spark │ └── mllib │ ├── clustering │ │ └── GammaX.scala │ └── fpm │ └── LocalPrefixSpanUtils.scala ├── ml-xgboost │ └── .../... ├── pom.xml ├── README.md └── scalastyle-config.xml