美文网首页
1.spark读取数据与scala编程

1.spark读取数据与scala编程

作者: 一杭oneline | 来源:发表于2020-04-04 20:11 被阅读0次

    以下是idea中用到的maven仓库
    版本说明:
    spark 2.3.1
    scala 2.11
    hadoop 3.1.1

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>com.attest.bigdata</groupId>
        <artifactId>spark-200329</artifactId>
        <version>1.0</version>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.11</artifactId>
                <version>2.3.1</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-sql_2.11</artifactId>
                <version>2.3.1</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-streaming_2.11</artifactId>
                <version>2.3.1</version>
            </dependency>
            <dependency>
                <groupId>org.apache.kafka</groupId>
                <artifactId>kafka_2.11</artifactId>
                <version>1.0.0</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
                <version>2.3.1</version>
            </dependency>
        </dependencies>
        <build>
            <finalName>WordCount</finalName>
            <plugins>
                <plugin>
                    <groupId>net.alchim31.maven</groupId>
                    <artifactId>scala-maven-plugin</artifactId>
                    <version>3.2.2</version>
                    <executions>
                        <execution>
                            <goals>
                                <goal>compile</goal>
                                <goal>testCompile</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    </project>
    
    package com.sparktest.bigdata.spark
    import org.apache.spark.rdd.RDD
    import org.apache.spark.{SparkConf, SparkContext, rdd}
    object WordCount {
      def main(args: Array[String]): Unit = {
        //设定部署环境
        //app id
        val  config : SparkConf = new SparkConf().setMaster("local").setAppName("WordCount")
    
        val sc = new SparkContext(config)
    
        //读取文件
    
        val lines : RDD [String] = sc.textFile("hdfs://192.168.56.101:9000/stream")
        val words : RDD[String] = lines.flatMap(_.split(" "))
        val wordToOne : RDD[(String,Int)] = words.map((_,1))
        val wordToSum : RDD[(String,Int)] = wordToOne.reduceByKey(_+_)
        val result: Array[(String, Int)] = wordToSum.collect()
        //println(result)
        result.foreach(println)
        //sc.textFile("input").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect
    
      }
    
    }
    
    

    相关文章

      网友评论

          本文标题:1.spark读取数据与scala编程

          本文链接:https://www.haomeiwen.com/subject/zsrsphtx.html