Spark hello word（加载本地文件和加载hdfs文件

作者: 牛马风情 | 来源:发表于2017-04-12 23:41 被阅读0次

Spark hello word（加载本地文件和加载hdfs文件
Hive常用函数
导入数据到hive表中的6种方式
Swift-UIWebView&UIWKWebView，加载本地
WKWebview加载本地文件
android ------ 遍历获取Office格式（Word
spark sql 和 hive 关于 parquet sche
iOS word文件转pdf文件
Electron 与远端页面交互使用本地资源
UIWebView加载本地HTML5文件

本节内容主要完成：
使用sparkshell记载本地文件和hdfs文件

spark处理的文件可能存在于本地文件系统中，也可能存在分布式文件系统中

本地文件加载

创建一个测试文件

[root@sandbox home]# cd /home/guest/

// 在guest 目录下创建一个文件夹
[root@sandbox guest]# mkdir erhuan
// 在 新建的文件夹中创建一个测试文件
[root@sandbox guest]# cd erhuan/
[root@sandbox erhuan]# vi hellospark

启动sparkshell

[root@sandbox erhuan]# spark-shell
Spark assembly has been built with Hive, including Datanucleus jars on classpath
17/04/12 14:45:41 INFO SecurityManager: Changing view acls to: root
17/04/12 14:45:41 INFO SecurityManager: Changing modify acls to: root
17/04/12 14:45:41 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
17/04/12 14:45:41 INFO HttpServer: Starting HTTP Server
17/04/12 14:45:41 INFO Utils: Successfully started service 'HTTP class server' on port 47623.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.2.1
      /_/

// 省略一堆输出

加载本地文件

Spark context available as sc.
//使用sc.textFile()方法记载文件
scala>  val textFile = sc.textFile("file:///home/guest/erhuan/hellospark")

// 省略一堆输出
textFile: org.apache.spark.rdd.RDD[String] = file:///home/guest/erhuan/hellospark MappedRDD[1] at textFile at <console>:12
//执行一次action操作
scala> textFile.first()
// 省略一堆输出
17/04/12 14:53:27 WARN DomainSocketFactory: The short-circuit local reads feature cannot be 
17/04/12 14:53:27 INFO DAGScheduler: Job 0 finished: first at <console>:15, took 0.306226 s
res0: String = this is a hello word txt
// spark 会记录之前所有的动作但是并不会进行操作，执行action动作后才会启动之前的操作

将结果保存到本地
scala> textFile.saveAsTextFile("file:///home/guest/erhuan/wordres")
17/04/12 14:59:31 INFO DefaultExecutionContext: Starting job: saveAsTextFile at <console>:15
17/04/12 14:59:31 INFO DAGScheduler: Got job 6 (saveAsTextFile at <console>:15) with 2 output partitions (allowLocal=false)
// 省略一堆输出

退出spark-shell，查看"/home/guest/erhuan/hellospark"文件夹下面内容

//退出spark-shell
scala> exit
[root@sandbox erhuan]# cd wordres/
[root@sandbox wordres]# ll
total 4
-rw-r--r-- 1 root root 25 2017-04-12 14:59 part-00000
-rw-r--r-- 1 root root  0 2017-04-12 14:59 part-00001
-rw-r--r-- 1 root root  0 2017-04-12 14:59 _SUCCESS
[root@sandbox wordres]# more part-00000
this is a hello word txt
// 完成spark 对本地文件的加载和写入

加载hdfs文件

//首先向文件拷贝到hdfs上，避免权限问题将 先将文件拷贝到tmp目录下
[root@sandbox tmp]# mv /home/guest/erhuan/hellospark /tmp
[hdfs@sandbox tmp]$ hadoop fs -mkdir -p /user/erhuan
[hdfs@sandbox tmp]$ hadoop fs -put /tmp/hellospark /user/erhuan
//创建一个文件夹然后将本地文件推到hdfs上
[root@sandbox erhuan]# spark-shell
Spark assembly has been built with Hive, including Datanucleus jars on classpath
17/04/12 14:45:41 INFO SecurityManager: Changing view acls to: root
17/04/12 14:45:41 INFO SecurityManager: Changing modify acls to: root
17/04/12 14:45:41 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
17/04/12 14:45:41 INFO HttpServer: Starting HTTP Server
17/04/12 14:45:41 INFO Utils: Successfully started service 'HTTP class server' on port 47623.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.2.1
      /_/

// 省略一堆输出

加载HDFS文件

Spark context available as sc.
//使用sc.textFile()方法记载文件
scala> val textFile = sc.textFile("/user/erhuan/hellospark")
17/04/12 15:33:29 INFO MemoryStore: ensureFreeSpace(277063) called with curMem=684755, // //省略一堆输出

// 执行一次action 查看是否执行成功
scala> textFile.first()
// 省略一堆输出
17/04/12 15:33:32 INFO DAGScheduler: Job 0 finished: first at <console>:15, took 0.543566 s
res3: String = this is a hello word txt

//写入回来

scala> textFile.saveAsTextFile("/user/erhuan/res")
17/04/12 15:36:34 INFO DefaultExecutionContext: Starting job: saveAsTextFile at <console>:15
17/04/12 15:36:34 INFO DAGScheduler: Got job 1 (saveAsTextFile at <console>:15) with 2 output partitions (allowLocal=false)
// 省略一堆输出

//退出spark-shell
//查看结果
[hdfs@sandbox tmp]$ hadoop fs -ls /user/erhuan/res
Found 3 items
-rw-r--r--   1 hdfs hdfs          0 2017-04-12 15:36 /user/erhuan/res/_SUCCESS
-rw-r--r--   1 hdfs hdfs         25 2017-04-12 15:36 /user/erhuan/res/part-00000
-rw-r--r--   1 hdfs hdfs          0 2017-04-12 15:36 /user/erhuan/res/part-00001
[hdfs@sandbox tmp]$ hadoop fs -cat /user/erhuan/res/part-00000
this is a hello word txt

网友评论

本文标题：Spark hello word（加载本地文件和加载hdfs文件

本文链接：https://www.haomeiwen.com/subject/wgjvattx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Spark hello word（加载本地文件和加载hdfs文件

本地文件加载

创建一个测试文件

启动sparkshell

加载本地文件

退出spark-shell，查看"/home/guest/erhuan/hellospark"文件夹下面内容

加载hdfs文件

加载HDFS文件

相关文章

Spark hello word（加载本地文件和加载hdfs文件

Hive常用函数

导入数据到hive表中的6种方式

Swift-UIWebView&UIWKWebView，加载本地

WKWebview加载本地文件

android ------ 遍历获取Office格式（Word

spark sql 和 hive 关于 parquet sche

iOS word文件转pdf文件

Electron 与远端页面交互使用本地资源

UIWebView加载本地HTML5文件

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读