美文网首页
pyspark向量装配与笛卡尔积

pyspark向量装配与笛卡尔积

作者: 米斯特芳 | 来源:发表于2021-08-21 15:13 被阅读0次

    向量装配VectorAssembler:对每一行,将多个列的元素组成一个向量
    笛卡尔积Interaction:这个也不知道怎么翻译好,先对集合做笛卡尔积,然后对每个元组结果做累乘,得到一个元素为向量的列

    from pyspark.ml.feature import Interaction, VectorAssembler
    from pyspark.sql import SparkSession
    
    spark = SparkSession\
        .builder\
        .appName("InteractionExample")\
        .getOrCreate()
    
    df = spark.createDataFrame(
        [(1, 1, 2, 3, 8, 4, 5),
         (2, 4, 3, 8, 7, 9, 8),
         (3, 6, 1, 9, 2, 3, 6),
         (4, 10, 8, 6, 9, 4, 5),
         (5, 9, 2, 7, 10, 7, 3),
         (6, 1, 1, 4, 2, 8, 4)],
        ["id1", "id2", "id3", "id4", "id5", "id6", "id7"])
    
    assembler1 = VectorAssembler(inputCols=["id2", "id3", "id4"], outputCol="vec1")
    assembled1 = assembler1.transform(df)# 将["id2", "id3", "id4"]装配为一个元素为向量的列
    assembler2 = VectorAssembler(inputCols=["id5", "id6", "id7"], outputCol="vec2")
    assembled2 = assembler2.transform(assembled1).select("id1", "vec1", "vec2")
    # 对["id1", "vec1", "vec2"]求笛卡尔积后,每个元组内的元素累乘,得到一个元素为向量的列
    interaction = Interaction(inputCols=["id1", "vec1", "vec2"], outputCol="interactedCol")
    interacted = interaction.transform(assembled2)
    interacted.show(truncate=False)
    

    相关文章

      网友评论

          本文标题:pyspark向量装配与笛卡尔积

          本文链接:https://www.haomeiwen.com/subject/ytadbltx.html