import pyspark.sql.functions as f
from pyspark.sql.types import StringType
# method 1 use udf
uuid_udf = f.udf(lambda : str(uuid.uuid4().hex), StringType())
df_with_uuid = df.withColumn('uuid', uuid_udf())
# method 2 use lit
df_with_uuid = df.withColumn('uuid', f.lit(uuid.uuid4().hex))
code来源:https://elegantdata.blogspot.com/2021/03/add-uuid-column-to-spark-dataframe.html?lr=1
方法简述
上述两种添加uuid的方法第一种正确:
result method1:
Name | Age | City | uuid |
---|---|---|---|
John | 25 | New York | 8a8d84e99b6f49aea... |
Emma | 28 | London | dff0676453494d7cb... |
Mike | 30 | Paris | db93842d82e34a11a... |
John | 27 | London | cd3e3cac967a471a8... |
result method2:
Name | Age | City | uuid2 |
---|---|---|---|
John | 25 | New York | 98426e22f58442f59... |
Emma | 28 | London | 98426e22f58442f59... |
Mike | 30 | Paris | 98426e22f58442f59... |
John | 27 | London | 98426e22f58442f59... |
网友评论