Spark DataFrame：提取某列并修改/ Column更

Spark DataFrame：提取某列并修改/ Column更

作者: 雨笋情缘 | 来源:发表于2018-12-20 20:32 被阅读41次

Spark DataFrame：提取某列并修改/ Column更
Spark DataFrame按某列降序排序
Pandas数据结构之DataFrame常见操作
Spark SQL：function.array的数据类型问题记
Spark SQL一列拆分多列
Pandas使用技巧
Spark、BulkLoad Hbase、单列、多列
Spark DataFrame 实战
pandas 给dataframe 赋值操作1
Python-dataframe操作

1.concat(exprs: Column*): Column

function note： Concatenates multiple input columns together into a single column. The function works with strings, binary and compatible array columns.

我的问题： dateframe中的某列数据"XX_BM", 例如：值为 0008151223000316, 现在我想把Column("XX_BM")中的所有值变为：例如：0008151223000316sfjd。

0008151223000316 + sfjd

解决方案： in Scala

var tmp = dfval.col("XX_BM")

var result = concat(tmp,lit("sfjd"))

dfval = dfval.withColumn("XX_BM", result)

2.regexp_replace(e: Column, pattern: String, replacement: String): Column

function note: Replace all substrings of the specified string value that match regexp with rep.

我的问题：I got some dataframe with 170 columns. In one column I have a "name" string and this string sometimes can have a special symbols like "'" that are not appropriate, when I am writing them to Postgres. Can I make something like that:【问题来自】

Df[$'name']=Df[$'name'].map(x => x.replaceAll("'","")) ?

但是：I don't want to parse full DataFrame,because it's very huge.Help me please

解决方案：You can't mutate DataFrames, you can only transform them into new DataFrames with updated values. In this case - you can use the regex_replace function to perform the mapping on name column:

import org.apache.spark.sql.functions._

val updatedDf = Df.withColumn("name", regexp_replace(col("name"), ",", ""))

3.regexp_replace(e: Column, pattern: Column, replacement: Column): Column

function note : Replace all substrings of the specified string value that match regexp with rep

详细function 参考：org.apache.spark.sql.functions

相关文章

Spark DataFrame：提取某列并修改/ Column更
1.concat(exprs:Column*):Column function note： Concatenate...
Spark DataFrame按某列降序排序
我的原创地址：https://dongkelun.com/2018/07/04/sparkDfSortDesc/ ...
Pandas数据结构之DataFrame常见操作
提取、添加、删除列 DataFrame 就像带索引的 Series 字典，提取、设置、删除列的操作与字典类似：删...
Spark SQL：function.array的数据类型问题记
摘要：Spark SQL 问题复现需要对Spark SQL的DataFrame的一列做groupBy聚合其他所有...
Spark SQL一列拆分多列
将DataFrame中的一列拆分为多列，示例如下： import sparkObject.spark.implic...
Pandas使用技巧
添加列并逐行设置值从DataFrame获取特定列数据完全构建新的DataFrame，并添加数据导出csv时，...
Spark、BulkLoad Hbase、单列、多列
背景之前的博客：Spark：DataFrame写HFile （Hbase）一个列族、一个列扩展一个列族、多个列 ...
Spark DataFrame 实战
# Spark DataFrame Spark DataFrame Case Class ```java case...
pandas 给dataframe 赋值操作1
pandas 给dataframe 赋值操作1 给dataframe 加1列如下给df增加score列并使用单...
Python-dataframe操作
某列单元格元素的替换字典变DataFrame 字典变dataframe 2 任取两列变字典变回字典变矩阵

网友评论

The Sca...

本文标题：Spark DataFrame：提取某列并修改/ Column更

本文链接：https://www.haomeiwen.com/subject/tkpukqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

栏目导航

热点阅读

The Sca...

关于我们|服务条款|联系我们|Spark DataFrame：提取某列并修改/ Column更|投稿指南|网站地图|RSS订阅|排版工具|手机版

提供经典美文摘抄,优美散文欣赏,现代诗歌精选,短篇小说,心情随笔,表白情书范文,故事会在线阅读欣赏

Copyright © 2014-2023 Haomeiwen.com All Rights Reserved. 好美文阅读网版权所有

备案信息：桂公网安备 45052102000051号 · 桂ICP备13007215号-3

本站所收录作品、热点评论等信息部分来源互联网，目的只是为了系统归纳学习和传递资讯

所有作品版权归原创作者所有，与本站立场无关，如不慎侵犯了你的权益，请联系我们告知，我们将做删除处理！