美文网首页
pyspark 添加UUID

pyspark 添加UUID

作者: FireJohnny | 来源:发表于2023-12-19 10:09 被阅读0次
    import pyspark.sql.functions as f
    from pyspark.sql.types import StringType
    
    # method 1 use udf 
    uuid_udf = f.udf(lambda : str(uuid.uuid4().hex), StringType())
    df_with_uuid = df.withColumn('uuid', uuid_udf())
    
    # method 2 use lit 
    df_with_uuid = df.withColumn('uuid', f.lit(uuid.uuid4().hex))
    

    code来源:https://elegantdata.blogspot.com/2021/03/add-uuid-column-to-spark-dataframe.html?lr=1

    方法简述

    上述两种添加uuid的方法第一种正确:
    result method1:

    Name Age City uuid
    John 25 New York 8a8d84e99b6f49aea...
    Emma 28 London dff0676453494d7cb...
    Mike 30 Paris db93842d82e34a11a...
    John 27 London cd3e3cac967a471a8...

    result method2:

    Name Age City uuid2
    John 25 New York 98426e22f58442f59...
    Emma 28 London 98426e22f58442f59...
    Mike 30 Paris 98426e22f58442f59...
    John 27 London 98426e22f58442f59...

    相关文章

      网友评论

          本文标题:pyspark 添加UUID

          本文链接:https://www.haomeiwen.com/subject/trszgdtx.html