美文网首页
Spark 2.3.1测试笔记二:SortExec性能测试1

Spark 2.3.1测试笔记二:SortExec性能测试1

作者: Kent_Yao | 来源:发表于2018-06-14 19:03 被阅读127次

    前言

    本例基于1 Spark 2.3.0测试笔记一:Shuffle到胃疼 2 Spark 2.3.0测试笔记二:还能不能玩了? 3 Spark 2.3.1测试笔记一:问题依旧在? 的猜测 2.3.1 SortExec物理算子相对于2.1.2可能存在性能regression 进行benchmark测试。

    Test Code

    class SortExecBenchmark extends BenchmarkBase {
    
      test("sort with one") {
        val N = 2 << 23
        runBenchmark("sort with one", N) {
          val df = sparkSession.range(N).selectExpr(s"-id * 2 as k1").sort("k1")
          assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortExec]).nonEmpty)
          df.count()
        }
      }
    
      test("sort with two") {
        val N = 2 << 23
        runBenchmark("sort with two", N) {
          val df = sparkSession.range(N)
            .selectExpr(s"-id * 2 as k1", "-id % 10000 as k2")
            .sort("k2", "k1")
          assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortExec]).nonEmpty)
          df.count()
        }
      }
    
      test("sort with three") {
        val N = 2 << 23
        runBenchmark("sort with three", N) {
          val df = sparkSession.range(N)
            .selectExpr(s"-id * 2 as k1", " -id % 100000 as k2", "-id % 10000 as k3")
            .sort("k3", "k2", "k1")
          assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortExec]).nonEmpty)
          df.count()
        }
      }
    
      test("merge join reversed") {
        val N = 2 << 21
        runBenchmark("merge join at the worst", N) {
          val df1 = sparkSession.range(N).selectExpr(s"-id * 2 as k1")
          val df2 = sparkSession.range(N).selectExpr(s"-id * 3 as k2")
          val df = df1.join(df2, col("k1") === col("k2"))
          assert(df.queryExecution.sparkPlan.find(_.isInstanceOf[SortMergeJoinExec]).isDefined)
          df.count()
        }
      }
    
      test("merge join with duplicates reversed") {
        val N = 2 << 21
        runBenchmark("sort merge join", N) {
          val df1 = sparkSession.range(N)
            .selectExpr(s"-(id * 15485863) % ${N*10} as k1")
          val df2 = sparkSession.range(N)
            .selectExpr(s"-(id * 15485867) % ${N*10} as k2")
          df1.join(df2, col("k1") === col("k2")).count()
        }
      }
    
      override def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = {
        val benchmark = new Benchmark(name, cardinality)
    
        benchmark.addCase(s"$name wholestage off", numIters = 2) { iter =>
          sparkSession.conf.set("spark.sql.codegen.wholeStage", value = false)
          f
        }
    
        benchmark.addCase(s"$name wholestage on", numIters = 3) { iter =>
          sparkSession.conf.set("spark.sql.codegen.wholeStage", value = true)
          f
        }
    
        benchmark.run()
      }
    }
    

    2.1.2 Benchmark records

    [info] SortExecBenchmark:
    Running benchmark: sort with one
      Running case: sort with one wholestage off
      Stopped after 2 iterations, 14683 ms
      Running case: sort with one wholestage on
      Stopped after 3 iterations, 18842 ms
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
    Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
    
    sort with one:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    sort with one wholestage off                  6538 / 7342          2.6         389.7       1.0X
    sort with one wholestage on                   6175 / 6281          2.7         368.1       1.1X
    
    [info] - sort with one (54 seconds, 387 milliseconds)
    Running benchmark: sort with two
      Running case: sort with two wholestage off
      Stopped after 2 iterations, 18571 ms
      Running case: sort with two wholestage on
      Stopped after 3 iterations, 26397 ms
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
    Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
    
    sort with two:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    sort with two wholestage off                  9196 / 9286          1.8         548.1       1.0X
    sort with two wholestage on                   8139 / 8799          2.1         485.1       1.1X
    
    [info] - sort with two (1 minute, 4 seconds)
    Running benchmark: sort with three
      Running case: sort with three wholestage off
      Stopped after 2 iterations, 28709 ms
      Running case: sort with three wholestage on
      Stopped after 3 iterations, 40878 ms
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
    Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
    
    sort with three:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    sort with three wholestage off              14038 / 14355          1.2         836.7       1.0X
    sort with three wholestage on               13018 / 13626          1.3         775.9       1.1X
    
    [info] - sort with three (1 minute, 37 seconds)
    Running benchmark: merge join at the worst
      Running case: merge join at the worst wholestage off
      Stopped after 2 iterations, 7851 ms
      Running case: merge join at the worst wholestage on
      Stopped after 3 iterations, 11256 ms
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
    Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
    
    merge join at the worst:                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    merge join at the worst wholestage off        3870 / 3926          1.1         922.6       1.0X
    merge join at the worst wholestage on         3698 / 3752          1.1         881.7       1.0X
    
    [info] - merge join reverted (27 seconds, 471 milliseconds)
    Running benchmark: sort merge join
      Running case: sort merge join wholestage off
      Stopped after 2 iterations, 9358 ms
      Running case: sort merge join wholestage on
      Stopped after 3 iterations, 13661 ms
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
    Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
    
    sort merge join:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    sort merge join wholestage off                4617 / 4679          0.9        1100.7       1.0X
    sort merge join wholestage on                 4306 / 4554          1.0        1026.7       1.1X
    
    [info] - merge join with duplicates reverted (32 seconds, 826 milliseconds)
    

    2.3.1 Benchmark records

    [info] SortExecBenchmark:
    Running benchmark: sort with one
      Running case: sort with one wholestage off
      Stopped after 2 iterations, 14670 ms
      Running case: sort with one wholestage on
      Stopped after 3 iterations, 18269 ms
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
    Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
    
    sort with one:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    sort with one wholestage off                  6936 / 7335          2.4         413.4       1.0X
    sort with one wholestage on                   6040 / 6090          2.8         360.0       1.1X
    
    [info] - sort with one (54 seconds, 443 milliseconds)
    Running benchmark: sort with two
      Running case: sort with two wholestage off
      Stopped after 2 iterations, 18748 ms
      Running case: sort with two wholestage on
      Stopped after 3 iterations, 25809 ms
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
    Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
    
    sort with two:                           Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    sort with two wholestage off                  9195 / 9374          1.8         548.0       1.0X
    sort with two wholestage on                   8459 / 8603          2.0         504.2       1.1X
    
    [info] - sort with two (1 minute, 4 seconds)
    Running benchmark: sort with three
      Running case: sort with three wholestage off
      Stopped after 2 iterations, 28472 ms
      Running case: sort with three wholestage on
      Stopped after 3 iterations, 40225 ms
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
    Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
    
    sort with three:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    sort with three wholestage off              13708 / 14236          1.2         817.1       1.0X
    sort with three wholestage on               13291 / 13408          1.3         792.2       1.0X
    
    [info] - sort with three (1 minute, 36 seconds)
    Running benchmark: merge join at the worst
      Running case: merge join at the worst wholestage off
      Stopped after 2 iterations, 7856 ms
      Running case: merge join at the worst wholestage on
      Stopped after 3 iterations, 10573 ms
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
    Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
    
    merge join at the worst:                 Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    merge join at the worst wholestage off        3810 / 3928          1.1         908.4       1.0X
    merge join at the worst wholestage on         3487 / 3525          1.2         831.4       1.1X
    
    [info] - merge join reverted (26 seconds, 664 milliseconds)
    Running benchmark: sort merge join
      Running case: sort merge join wholestage off
      Stopped after 2 iterations, 9118 ms
      Running case: sort merge join wholestage on
      Stopped after 3 iterations, 13825 ms
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_65-b17 on Mac OS X 10.13.4
    Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz
    
    sort merge join:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    sort merge join wholestage off                4450 / 4559          0.9        1061.0       1.0X
    sort merge join wholestage on                 4395 / 4608          1.0        1047.9       1.0X
    

    2.1.2 vs 2.3.1

    version case Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
    2.1.2 sort with one wholestage on 6175 / 6281 2.7 368.1 1.1X
    2.3.1 sort with one wholestage on 6040 / 6090 2.8 360.0 1.1X
    2.1.2 sort with two wholestage on 8139 / 8799 2.1 ** 485.1 ** 1.1X
    2.3.1 sort with two wholestage on 8459 / 8603 2.0 504.2 1.1X
    2.1.2 sort with three wholestage on 13018 / 13626 1.3 ** 775.9 ** 1.1X
    2.3.1 sort with three wholestage on 13291 / 13408 1.3 792.2 1.0X

    声明

    1. Benchmark有一定的波动性,也可能因计算机性能得到不同的结果
    2. 上面的数据,取第三次test的结果,第一次由于sbt编译会占用内存,所以执行killall java杀死所有java进程,进而第二次执行“跑热”JVM,最后记录第三次结果
    3. case有点简单,两者的差异不是特别明显,或许是对于spark那种类似alpha sort的排序方式对primitive类型影响不大
    4. 在全int场景下,2.1.2相比2.3.1略有优势,但微乎及微

    结论

    1. 尚不能做任何结论,需下一步丰富下用例继续测试复现

    相关文章

      网友评论

          本文标题:Spark 2.3.1测试笔记二:SortExec性能测试1

          本文链接:https://www.haomeiwen.com/subject/sctueftx.html