wordcount

作者: 一只特立独行的猪1991 | 来源:发表于2020-05-10 15:33 被阅读0次
    1. 安装pyspark

      • 通过拷贝pyspark包安装
      • 源目录:D:\software\spark-2.2.0-bin-hadoop2.6\python\pyspark
      • 目标目录:D:\software\Anaconda3\Lib\site-packages
    2. 安装py4j

    3. 新建wordcount

      # coding:utf8
      from __future__ import print_function
      
      from operator import add
      from pyspark.sql import SparkSession
      
      if __name__ == "__main__":
          spark = SparkSession \
              .builder \
              .appName('PythonWordCount') \
              .getOrCreate()
      
          path = "D:\\workspace\\IdeaProjects\\learning\\src\\main\\resources\\word_count.txt"
          lines = spark.read.text(path).rdd.map(lambda r: r[0])
          counts = lines.flatMap(lambda x: x.split(' ')) \
              .map(lambda x: (x, 1)) \
              .reduceByKey(add)
          output = counts.collect()
          for (word, count) in output:
              print("%s: %i" % (word, count))
      
          spark.stop()
      
    4. 运行验证

    相关文章

      网友评论

          本文标题:wordcount

          本文链接:https://www.haomeiwen.com/subject/qgyjnhtx.html