中文
注册

构建机器学习算法加速库适配代码

  • 构建机器学习算法加速库适配代码Spark-ml-algo-lib过程如下。此过程以适配Spark 3.3.1代码的构建为例。
  • 以下操作请在Linux环境下操作,该章节仅供参考。
  1. 下载Spark 3.1.1源码zip包到“/opt/”目录并解压,得到Spark源码目录。

    获取地址:https://github.com/apache/spark/archive/v3.3.1.zip

    1
    wget https://github.com/apache/spark/archive/v3.3.1.zip
    
  2. 获取Breeze 0.13.1源码zip包到“/opt/”目录并解压,得到Breeze源码目录。

    获取地址:https://github.com/scalanlp/breeze/archive/releases/v1.0.zip

    1
    wget https://github.com/scalanlp/breeze/archive/releases/v1.0.zip
    
  3. “/opt/”目录下建立一个层级为如下所示的目录的工程Spark-ml-algo-lib。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    cd /opt/
    
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/classification 
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/feature
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/fpm
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/recommendation 
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/regression 
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/tree/impl
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/tuning
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/clustering 
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/feature
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/fpm 
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/distributed 
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/optimization
    mkdir -p Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/tree 
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/breeze/numerics 
    
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/classification
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/recommendation
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/regression
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/tree/impl 
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/ml/tuning
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/clustering 
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/feature 
    mkdir -p Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/fpm 
    
  4. 按照表1表2的对应关系将Spark 3.3.1、Breeze 1.0中的对应原文件复制到“Spark-ml-algo-lib”目录,表格左边两列是目标目录和文件名,右边两列的是需要移动的原文件目录及文件名。由于需要复制的文件很多,操作的代码只给出两个示例。

    有些文件在复制到目标文件夹后需要改名。

    操作命令示例:
    1
    2
    cp /opt/spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala /opt/Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala
    cp /opt/breeze-releases-v1.0/math/src/main/scala/breeze/numerics/package.scala /opt/Spark-ml-algo-lib/ml-core/src/main/scala/breeze/numerics/DigammaX.scala
    
    表1 Spark中需要放入Spark-ml-algo-lib工程的文件

    Spark-ml-algo-lib工程目录

    Spark-ml-algo-lib工程文件名

    Spark原文件所在目录

    Spark原文件名

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/classification/

    GBTClassifier.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/classification/

    GBTClassifier.scala

    LinearSVC.scala

    LinearSVC.scala

    RandomForestClassifier.scala

    RandomForestClassifier.scala

    DecisionTreeClassifier.scala

    DecisionTreeClassifier.scala

    FMClassifier.scala

    FMClassifier.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/feature

    IDF.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/feature

    IDF.scala

    Word2Vec.scala

    Word2Vec.scala

    DecisionTreeBucketizer.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/classification

    RandomForestClassifier.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/fpm

    PrefixSpan.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/fpm

    PrefixSpan.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/recommendation/

    ALS.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/recommendation

    ALS.scala

    NMF.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/recommendation

    ALS.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/regression/

    DecisionTreeRegressor.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/regression/

    DecisionTreeRegressor.scala

    GBTRegressor.scala

    GBTRegressor.scala

    FMRegressor.scala

    FMRegressor.scala

    RandomForestRegressor.scala

    RandomForestRegressor.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/tree/impl/

    GradientBoostedTrees.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tree/impl/

    GradientBoostedTrees.scala

    RandomForest.scala

    RandomForest.scala

    RandomForest4GBDTX.scala

    RandomForest.scala

    RandomForestRaw.scala

    RandomForest.scala

    DecisionForest.scala

    RandomForest.scala

    DecisionTreeBucket.scala

    RandomForest.scala

    DecisionTreeMetadata.scala

    DecisionTreeMetadata.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/tree/

    treeParams.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tree/

    treeParams.scala

    treeModels.scala

    treeModels.scala

    Spark-ml-algo-lib/ml-accelerator/ src/main/scala/org/apache/spark/ml/tuning/

    BayesianCrossValidator.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tuning/

    CrossValidator.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/clustering/

    LDA.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/clustering

    LDA.scala

    LDAOptimizer.scala

    LDAOptimizer.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/feature

    IDF.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/feature/

    IDF.scala

    Word2Vec.scala

    Word2Vec.scala

    PCA.scala

    PCA.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/fpm/

    PrefixSpan.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/fpm

    PrefixSpan.scala

    FPGrowth.scala

    FPGrowth.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/distributed/

    RowMatrix.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed

    RowMatrix.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/linalg/

    EigenValueDecomposition.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/linalg

    EigenValueDecomposition.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/optimization/

    LBFGSN.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/optimization

    LBFGS.scala

    Spark-ml-algo-lib/ml-accelerator/src/main/scala/org/apache/spark/mllib/tree/

    DecisionTree.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/tree

    DecisionTree.scala

    Spark-ml-algo-lib/ml-core/ src/main/scala/org/apache/spark/ml/tree/

    Node.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tree/

    Node.scala

    Spark-ml-algo-lib/ml-core/ src/main/scala/org/apache/spark/ml/tree/impl

    BaggedPoint.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/ml/tree/impl/

    BaggedPoint.scala

    DTFeatureStatsAggregator.scala

    DTStatsAggregator.scala

    GradientBoostedTreesCore.scala

    GradientBoostedTrees.scala

    TreePointX.scala

    TreePoint.scala

    TreePointY.scala

    TreePoint.scala

    Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/clustering/

    LDAUtilsX.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/clustering

    LDAUtils.scala

    OnlineLDAOptimizerXObj.scala

    LDAOptimizer.scala

    Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/feature/

    VocabWord.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/feature

    Word2Vec.scala

    Spark-ml-algo-lib/ml-core/src/main/scala/org/apache/spark/mllib/fpm/

    LocalPrefixSpan.scala

    spark-3.3.1/mllib/src/main/scala/org/apache/spark/mllib/fpm/

    LocalPrefixSpan.scala

    PrefixSpanBase.scala

    PrefixSpan.scala

    FPGrowthCore.scala

    FPGrowth.scala

    表2 Breeze中需要放入Spark-ml-algo-lib工程的文件

    Spark-ml-algo-lib工程目录

    Spark-ml-algo-lib工程文件名

    Breeze原文件所在目录

    Breeze原文件名

    Spark-ml-algo-lib/ml-core/ src/main/scala/breeze/numerics/

    DigammaX.scala

    breeze-releases-v1.0/math/src/main/scala/breeze/numerics/

    package.scala

  5. 下载patch到“/opt/Spark-ml-algo-lib/”目录下,以Spark 3.3.1为例,将Spark 3.3.1的patch并入Spark-ml-algo-lib,得到完整的机器学习算法加速库适配代码Spark-ml-algo-lib。
    1
    2
    3
    cd /opt/Spark-ml-algo-lib
    wget https://github.com/kunpengcompute/Spark-ml-algo-lib/releases/download/v3.0.0-spark3.3.1/Spark-ml-algo-lib-Spark3.3.1.patch
    patch -p1 < Spark-ml-algo-lib-Spark3.3.1.patch
    

    完整的机器学习算法加速库适配代码Spark-ml-algo-lib的目录与仓库代码一致。

搜索结果
找到“0”个结果

当前产品无相关内容

未找到相关内容,请尝试其他搜索词