美文网首页
spark rpc config

spark rpc config

作者: clive0x | 来源:发表于2019-06-08 16:37 被阅读0次

TransportConf.java

spark.rpc.io.mode                     => nio/epoll 默认为nio

prefer allocating off-heap byte buffers within Netty

spark.rpc.io.preferDirectBufs     => True

spark.rpc.io.connectionTimeout =>spark.network.timeout(120s)

spark.rpc.io.backLog                 => 默认-1

#Number of concurrent connections between two nodes for fetching data

spark.rpc.io.numConnectionsPerPeer => 1

Number of threads used in the server thread pool. Default to 0, which is 2x#cores

spark.rpc.io.serverThreads                 => 0

Number of threads used in the client thread pool. Default to 0, which is 2x#cores. */

spark.rpc.io.clientThreads                   => 0

/**

  * Receive buffer size (SO_RCVBUF).

  * Note: the optimal size for receive buffer and send buffer should be

  *  latency * network_bandwidth.

  * Assuming latency = 1ms, network_bandwidth = 10Gbps

  *  buffer size should be ~ 1.25MB

  */

spark.rpc.io.receiveBuffer                   => -1

spark.rpc.io.sendBuffer                       => -1

spark.rpc.sasl.timeout                         => 30S(单位为ms)

spark.rpc.io.maxRetries                       => 3

spark.rpc.io.retryWait                           => 5s (单位为ms)

* Whether to initialize FileDescriptor lazily or not. If true, file descriptors are

  * created only when data is going to be transferred. This can reduce the number of open files.

spark.rpc.io.lazyFD                               =>True

* Whether to track Netty memory detailed metrics. If true, the detailed metrics of Netty

  * PoolByteBufAllocator will be gotten, otherwise only general memory usage will be tracked.

spark.rpc.io.enableVerboseMetrics       =>False

spark.rpc.io.mode最好改成epoll,默认用EpollEventLoopGroup而非NioEventLoopGroup,几个优势:1.前者用edge-triggered,后者用level trigger.简单来讲,LT是epoll的默认操作模式,当epoll_wait函数检测到有事件发生并将通知应用程序,而应用程序不一定必须立即进行处理,这样epoll_wait函数再次检测到此事件的时候还会通知应用程序,直到事件被处理。

而ET模式,只要epoll_wait函数检测到事件发生,通知应用程序立即进行处理,后续的epoll_wait函数将不再检测此事件。因此ET模式在很大程度上降低了同一个事件被epoll触发的次数,因此效率比LT模式高。

If you are running on linux you can use EpollEventLoopGroup and so get better performance, less GC and have more advanced features that are only available on linux.

没做过metrics,对比参考:https://juejin.im/entry/5a8ed33b6fb9a0634c26801c

注:spark.rpc.io.*都用于配置netty,原以为配置OS sysctl相关参数。netty默认几个配置如下:

return new PooledByteBufAllocator(

      allowDirectBufs && PlatformDependent.directBufferPreferred(),

      Math.min(PooledByteBufAllocator.defaultNumHeapArena(), numCores),--nHeapArena,System Property: io.netty.allocator.numHeapArenas默认2*ncores

      Math.min(PooledByteBufAllocator.defaultNumDirectArena(), allowDirectBufs ? numCores : 0), System Property: io.netty.allocator.numDirectArenas - default 2 * cores

      PooledByteBufAllocator.defaultPageSize(),

System Property: io.netty.allocator.pageSize - default 8192

      PooledByteBufAllocator.defaultMaxOrder(),System Property: io.netty.allocator.maxOrder - default 11

      allowCache ? PooledByteBufAllocator.defaultTinyCacheSize() : 0,

      allowCache ? PooledByteBufAllocator.defaultSmallCacheSize() : 0,

      allowCache ? PooledByteBufAllocator.defaultNormalCacheSize() : 0,

      allowCache ? PooledByteBufAllocator.defaultUseCacheForAllThreads() : false

    );

  }

allowDirectBufs为Spark配置:spark.rpc.io.preferDirectBufs默认为true

PlatformDependent.directBufferPreferred()为netty配置io.netty.noPreferDirect=false,默认(no + false双否否定)使用direect buffer

相关文章

网友评论

      本文标题:spark rpc config

      本文链接:https://www.haomeiwen.com/subject/orbixctx.html