美文网首页
1.spark读取数据与scala编程

1.spark读取数据与scala编程

作者: 一杭oneline | 来源:发表于2020-04-04 20:11 被阅读0次

以下是idea中用到的maven仓库
版本说明:
spark 2.3.1
scala 2.11
hadoop 3.1.1

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.attest.bigdata</groupId>
    <artifactId>spark-200329</artifactId>
    <version>1.0</version>

    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>2.3.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka_2.11</artifactId>
            <version>1.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
            <version>2.3.1</version>
        </dependency>
    </dependencies>
    <build>
        <finalName>WordCount</finalName>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>
package com.sparktest.bigdata.spark
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext, rdd}
object WordCount {
  def main(args: Array[String]): Unit = {
    //设定部署环境
    //app id
    val  config : SparkConf = new SparkConf().setMaster("local").setAppName("WordCount")

    val sc = new SparkContext(config)

    //读取文件

    val lines : RDD [String] = sc.textFile("hdfs://192.168.56.101:9000/stream")
    val words : RDD[String] = lines.flatMap(_.split(" "))
    val wordToOne : RDD[(String,Int)] = words.map((_,1))
    val wordToSum : RDD[(String,Int)] = wordToOne.reduceByKey(_+_)
    val result: Array[(String, Int)] = wordToSum.collect()
    //println(result)
    result.foreach(println)
    //sc.textFile("input").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect

  }

}

相关文章

网友评论

      本文标题:1.spark读取数据与scala编程

      本文链接:https://www.haomeiwen.com/subject/zsrsphtx.html