向量装配VectorAssembler:对每一行,将多个列的元素组成一个向量
笛卡尔积Interaction:这个也不知道怎么翻译好,先对集合做笛卡尔积,然后对每个元组结果做累乘,得到一个元素为向量的列
from pyspark.ml.feature import Interaction, VectorAssembler
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.appName("InteractionExample")\
.getOrCreate()
df = spark.createDataFrame(
[(1, 1, 2, 3, 8, 4, 5),
(2, 4, 3, 8, 7, 9, 8),
(3, 6, 1, 9, 2, 3, 6),
(4, 10, 8, 6, 9, 4, 5),
(5, 9, 2, 7, 10, 7, 3),
(6, 1, 1, 4, 2, 8, 4)],
["id1", "id2", "id3", "id4", "id5", "id6", "id7"])
assembler1 = VectorAssembler(inputCols=["id2", "id3", "id4"], outputCol="vec1")
assembled1 = assembler1.transform(df)# 将["id2", "id3", "id4"]装配为一个元素为向量的列
assembler2 = VectorAssembler(inputCols=["id5", "id6", "id7"], outputCol="vec2")
assembled2 = assembler2.transform(assembled1).select("id1", "vec1", "vec2")
# 对["id1", "vec1", "vec2"]求笛卡尔积后,每个元组内的元素累乘,得到一个元素为向量的列
interaction = Interaction(inputCols=["id1", "vec1", "vec2"], outputCol="interactedCol")
interacted = interaction.transform(assembled2)
interacted.show(truncate=False)
网友评论