Node2Vec
run API
- API
def run( edgeList: RDD[(Long, Long, Double)], userParams: Params ): RDD[(Long, Vector)]
- 功能描述
- API描述
- 包名:package org.apache.spark.graphx.lib
- 类名:Node2Vec
- 方法名:run
- 输入:edgeList: RDD[(Long, Long, Double)],为图的边列表信息(权值大于0)
- 参数详情:
参数名称
参数含义
取值类型
edgeList
从文件读入的图边列表信息(权值大于0)
RDD[(Long, Long, Double)]
userParams
封装的Params类,用于初始化算法运行所需要的参数
Params(directed = true, weighted = false, input = null,
p = 1.0, q = 1.0, walkLength = 80, numWalks = 10, iter = 10, dim = 128, window = 10)
- 是否有向directed: Boolean,默认值true
- 是否有权weighted: Boolean,默认值false
- 回退概率p: Double,默认值1.0
- 前进概率q: Double,默认值1.0
- 路径长度walkLength: Int,默认值80
- 每个结点的采样路径数numWalks: Int,默认值10
- 迭代轮数iter: Int,默认值10
- 向量维度dim: Int,默认值128
- 滑动窗口大小window: Int,默认值20
- 输出:RDD[(Long, Vector)],为图中的结点编号,与每个结点embedding值。
- 使用样例
val edgeRDD = sc.makeRDD(Seq((1L, 2L, 1.0), (1L, 3L, 1.0), (1L, 4L, 1.0), (2L, 3L, 1.0), (3L, 4L, 1.0), (3L, 5L, 1.0), (4L, 5L, 1.0), (4L, 6L, 1.0), (5L, 7L, 1.0), (5L, 2L, 1.0), (5L, 6L, 1.0), (6L, 7L, 1.0))) val res = Node2Vec.run(edgeRDD, Params(directed = true, weighted = false, p = 1.0, q = 1.0, walkLength = 80, numWalks = 10, iter = 10, dim = 2, window = 10)) res.collect().foreach(println)
- 样例结果:
(7, [-1.1329193115234375, 0.8701171875]) (4, [-0.2488250732421875, 0.801239013671875]) (1, [-0.094482421875, 0.871551513671875]) (2, [-0.0323333740234375, 0.97137451171875]) (3, [-0.0724334716796875, 0.8855438232421875]) (5, [0.1421051025390625, 1.071990966796875]) (6, [0.7393035888671875, 1.4867706298828125]) 生成结果说明: 生成结果为RDD[(Node, Vector)],且每次程序运行后生成的Vector值不同 例如输入由编号为1到7的七个结点构成的边RDD,生成embedding的维度dim=2,因此生成的结果由结点编号与二维的embedding向量构成
父主题: 算法API