spark rpc config

作者: clive0x | 来源:发表于2019-06-08 16:37 被阅读0次

spark rpc config
Spark RPC通信
Spark RPC 框架源码分析（三）Spark 心跳机制分析
Spark源码分析十-Spark底层通信组件RPC
Spark RPC 框架源码分析（二）运行时序
Spark 的“血液” --Spark RPC（一）简述
Spark 2.0 RPC通信层设计原理分析
Spark RPC 到底是个什么鬼？
Spark RPC 之 Master
Spark RPC

TransportConf.java

spark.rpc.io.mode => nio/epoll 默认为nio

prefer allocating off-heap byte buffers within Netty

spark.rpc.io.preferDirectBufs => True

spark.rpc.io.connectionTimeout =>spark.network.timeout(120s)

spark.rpc.io.backLog => 默认-1

#Number of concurrent connections between two nodes for fetching data

spark.rpc.io.numConnectionsPerPeer => 1

Number of threads used in the server thread pool. Default to 0, which is 2x#cores

spark.rpc.io.serverThreads => 0

Number of threads used in the client thread pool. Default to 0, which is 2x#cores. */

spark.rpc.io.clientThreads => 0

/**

* Receive buffer size (SO_RCVBUF).

* Note: the optimal size for receive buffer and send buffer should be

* latency * network_bandwidth.

* Assuming latency = 1ms, network_bandwidth = 10Gbps

* buffer size should be ~ 1.25MB

*/

spark.rpc.io.receiveBuffer => -1

spark.rpc.io.sendBuffer => -1

spark.rpc.sasl.timeout => 30S(单位为ms)

spark.rpc.io.maxRetries => 3

spark.rpc.io.retryWait => 5s (单位为ms)

* Whether to initialize FileDescriptor lazily or not. If true, file descriptors are

* created only when data is going to be transferred. This can reduce the number of open files.

spark.rpc.io.lazyFD =>True

* Whether to track Netty memory detailed metrics. If true, the detailed metrics of Netty

* PoolByteBufAllocator will be gotten, otherwise only general memory usage will be tracked.

spark.rpc.io.enableVerboseMetrics =>False

spark.rpc.io.mode最好改成epoll，默认用EpollEventLoopGroup而非NioEventLoopGroup，几个优势：１.前者用edge-triggered，后者用level trigger.简单来讲，LT是epoll的默认操作模式，当epoll_wait函数检测到有事件发生并将通知应用程序，而应用程序不一定必须立即进行处理，这样epoll_wait函数再次检测到此事件的时候还会通知应用程序，直到事件被处理。

而ET模式，只要epoll_wait函数检测到事件发生，通知应用程序立即进行处理，后续的epoll_wait函数将不再检测此事件。因此ET模式在很大程度上降低了同一个事件被epoll触发的次数，因此效率比LT模式高。

If you are running on linux you can use EpollEventLoopGroup and so get better performance, less GC and have more advanced features that are only available on linux.

没做过metrics，对比参考：https://juejin.im/entry/5a8ed33b6fb9a0634c26801c

注：spark.rpc.io.*都用于配置netty，原以为配置OS sysctl相关参数。netty默认几个配置如下：

return new PooledByteBufAllocator(

allowDirectBufs && PlatformDependent.directBufferPreferred(),

Math.min(PooledByteBufAllocator.defaultNumHeapArena(), numCores),--nHeapArena，System Property: io.netty.allocator.numHeapArenas默认2*ncores

Math.min(PooledByteBufAllocator.defaultNumDirectArena(), allowDirectBufs ? numCores : 0), System Property: io.netty.allocator.numDirectArenas - default 2 * cores

PooledByteBufAllocator.defaultPageSize(),

System Property: io.netty.allocator.pageSize - default 8192

PooledByteBufAllocator.defaultMaxOrder(),System Property: io.netty.allocator.maxOrder - default 11

allowCache ? PooledByteBufAllocator.defaultTinyCacheSize() : 0,

allowCache ? PooledByteBufAllocator.defaultSmallCacheSize() : 0,

allowCache ? PooledByteBufAllocator.defaultNormalCacheSize() : 0,

allowCache ? PooledByteBufAllocator.defaultUseCacheForAllThreads() : false

);

}

allowDirectBufs为Spark配置：spark.rpc.io.preferDirectBufs默认为true

PlatformDependent.directBufferPreferred()为netty配置io.netty.noPreferDirect＝false，默认(no + false双否否定)使用direect buffer

网友评论

本文标题：spark rpc config

本文链接：https://www.haomeiwen.com/subject/orbixctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

spark rpc config

相关文章

spark rpc config

Spark RPC通信

Spark RPC 框架源码分析（三）Spark 心跳机制分析

Spark源码分析十-Spark底层通信组件RPC

Spark RPC 框架源码分析（二）运行时序

Spark 的“血液” --Spark RPC（一）简述

Spark 2.0 RPC通信层设计原理分析

Spark RPC 到底是个什么鬼？

Spark RPC 之 Master

Spark RPC

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读