美文网首页
Kafka源码分析-序列4 -Producer -network

Kafka源码分析-序列4 -Producer -network

作者: 丸_子 | 来源:发表于2016-10-04 12:14 被阅读299次

    在上一篇我们分析了Java NIO的原理和使用方式,本篇将进一步分析Kafka client是如何基于NIO构建自己的network层。

    network层的分层架构

    下图展示了从最上层的KafkaProducer到最底层的Java NIO的构建层次关系:
    图中淡紫色的方框表示接口或者抽象类,白色方框是具体实现。

    整个架构图也体现了“面向接口编程”的思想:最底层Java NIO往上层全部以接口形式暴露,上面的3层,也都定义了相应的接口,逐层往上暴露。

    接口的实例化(包括KafkaClient, Selectable, ChannelBuilder),也都在最外层的容器类KafkaProducer的构造函数中完成,KafkaProducer也就充当了一个“工厂”的角色,装配所有这些底层组件。


    1.png

    network层组件与NIO组件的映射关系

    从上图也可以看出:

    KakfaChannel基本是对SocketChannel的封装,只是这个中间多个一个间接层:TransportLayer,为了封装普通和加密的Channel;
    Send/NetworkReceive是对ByteBuffer的封装,表示一次请求的数据包;
    Kafka的Selector封装了NIO的Selector,内含一个NIO Selector对象。

    Kafka Selector实现思路

    1.从上图可以看出, Selector内部包含一个Map, 也就是它维护了所有连接的连接池。这些KafkaChannel都由ChannelBuilder接口创建。

    private final Map<String, KafkaChannel> channels;
    

    2.所有的io操作:connect, read, write其实都是在poll这1个函数里面完成的。具体什么意思呢?

    NetworkClient的send()函数,调用了selector.send(Send send), 但这个时候数据并没有真的发送出去,只是暂存在了selector内部相对应的channel里面。下面看代码:

    //Selector  
        public void send(Send send) {  
            KafkaChannel channel = channelOrFail(send.destination());  //找到数据包相对应的connection  
            try {  
                channel.setSend(send);  //暂存在这个connection(channel)里面  
            } catch (CancelledKeyException e) {  
                this.failedSends.add(send.destination());  
                close(channel);  
            }  
        }  
      
    //KafkaChannel  
        public void setSend(Send send) {  
            if (this.send != null)  //关键点:当前的没有发出去之前,不能暂存下1个!!!关于这个,后面还要详细分析  
                throw new IllegalStateException("Attempt to begin a send operation with prior send operation still in progress.");  
            this.send = send;   //暂存这个数据包  
            this.transportLayer.addInterestOps(SelectionKey.OP_WRITE);  
        }  
      
    public class KafkaChannel {  
        private final String id;  
        private final TransportLayer transportLayer;  
        private final Authenticator authenticator;  
        private final int maxReceiveSize;  
        private NetworkReceive receive;  
        private Send send;   //关键点:1个channel一次只能存放1个数据包,在当前的send数据包没有完整发出去之前,不能存放下一个  
        ...  
    }
    

    暂存在channel中之后,poll函数进行处理,我们抽象出一个输入-输出模型如下:
    输入:暂存的send数据包
    输出:完成的sends, 完成的receive(针对上1次的send), 建立的连接, 断掉的连接。


    2.png
    @Override  
    public void poll(long timeout) throws IOException {  
        if (timeout < 0)  
            throw new IllegalArgumentException("timeout should be >= 0");  
        clear();  //关键点:每次poll之前,会清空“输出”  
        if (hasStagedReceives())  
            timeout = 0;  
        /* check ready keys */  
        long startSelect = time.nanoseconds();  
        int readyKeys = select(timeout);  
        long endSelect = time.nanoseconds();  
        currentTimeNanos = endSelect;  
        this.sensors.selectTime.record(endSelect - startSelect, time.milliseconds());  
      
        if (readyKeys > 0) {  
            Set<SelectionKey> keys = this.nioSelector.selectedKeys();  
            Iterator<SelectionKey> iter = keys.iterator();  
            while (iter.hasNext()) {  
                SelectionKey key = iter.next();  
                iter.remove();  
                KafkaChannel channel = channel(key);  
      
                // register all per-connection metrics at once  
                sensors.maybeRegisterConnectionMetrics(channel.id());  
                lruConnections.put(channel.id(), currentTimeNanos);  
      
                try {  
                    /* complete any connections that have finished their handshake */  
                    if (key.isConnectable()) {  
                        channel.finishConnect();    //把建立的连接,加入输出结果集合  
                        this.connected.add(channel.id());  
                        this.sensors.connectionCreated.record();  
                    }  
      
                    ...  
      
                    if (channel.ready() && key.isReadable() && !hasStagedReceive(channel)) {  
                        NetworkReceive networkReceive;  
                        while ((networkReceive = channel.read()) != null)  
                            addToStagedReceives(channel, networkReceive);  
                    }  
      
                    if (channel.ready() && key.isWritable()) {  
                        Send send = channel.write();  
                        if (send != null) {  
                            this.completedSends.add(send);  //把完成的发送,加入输出结果集合  
                            this.sensors.recordBytesSent(channel.id(), send.size());  
                        }  
                    }  
      
                    if (!key.isValid()) {  
                        close(channel);  
                        this.disconnected.add(channel.id());  //把断掉的连接,加入输出结果集合  
                    }  
                } catch (Exception e) {  
                    String desc = channel.socketDescription();  
                    if (e instanceof IOException)  
                        log.debug("Connection with {} disconnected", desc, e);  
                    else  
                        log.warn("Unexpected error from {}; closing connection", desc, e);  
                    close(channel);  
                    this.disconnected.add(channel.id()); //把断掉的连接,加入输出结果集合  
                }  
            }  
        }  
      
        addToCompletedReceives(); //把完成的接收,加入输出结果集合  
      
        long endIo = time.nanoseconds();  
        this.sensors.ioTime.record(endIo - endSelect, time.milliseconds());  
        maybeCloseOldestConnection();  
    }
    

    核心原理之1 – 消息的分包

    在上面的代码中,为什么会有addToStagedReceives? 什么叫做staged receives呢? 这叫要从数据的分包说起:

    在NetworkClient中,往下传的是一个完整的ClientRequest,进到Selector,暂存到channel中的,也是一个完整的Send对象(1个数据包)。但这个Send对象,交由底层的channel.write(Bytebuffer b)的时候,并不一定一次可以完全发送,可能要调用多次write,才能把一个Send对象完全发出去。这是因为write是非阻塞的,不是等到完全发出去,才会返回。所以才有上面的代码:

    if (channel.ready() && key.isWritable()) {  
        Send send = channel.write(); //send不为空,表示完全发送出去,返回发出去的这个Send对象。如果没完全发出去,返回null  
        if (send != null) {    
            this.completedSends.add(send);  
            this.sensors.recordBytesSent(channel.id(), send.size());  
        }  
    }
    

    同样,在接收的时候,channel.read(Bytebuffer b),一个response也可能要read多次,才能完全接收。所以就有了上面的while循环代码:

    if (channel.ready() && key.isReadable() && !hasStagedReceive(channel)) {  
        NetworkReceive networkReceive;  
        while ((networkReceive = channel.read()) != null)  //循环接收,直到1个response完全接收到,才会从while循环退出  
            addToStagedReceives(channel, networkReceive);  
    }
    

    核心原理之2 – 消息的分界

    从上面知道,底层数据的通信,是在每一个channel上面,2个源源不断的byte流,一个send流,一个receive流。
    send的时候,还好说,发送之前知道一个完整的消息的大小;
    那接收的时候,我怎么知道一个msg response什么时候结束,然后开始接收下一个response呢?

    这就需要一个小技巧:在所有request,response头部,首先是一个定长的,4字节的头,receive的时候,至少调用2次read,先读取这4个字节,获取整个response的长度,接下来再读取消息体。

    public class NetworkReceive implements Receive {  
        private final String source;  
        private final ByteBuffer size;  //头部4字节的buffer  
        private final int maxSize;  
        private ByteBuffer buffer;  //后面整个消息response的buffer  
      
        public NetworkReceive(String source) {  
            this.source = source;  
            this.size = ByteBuffer.allocate(4);   //先分配4字节的头部  
            this.buffer = null;  
            this.maxSize = UNLIMITED;  
       }  
    }
    

    核心原理之3 - 消息时序保证

    在InFlightRequests中,存放了所有发出去,但是response还没有回来的request。request发出去的时候,入对;response回来,就把相对应的request出对。

    final class InFlightRequests {  
      
        private final int maxInFlightRequestsPerConnection;  
        private final Map<String, Deque<ClientRequest>> requests = new HashMap<String, Deque<ClientRequest>>();  
    }
    

    这个有个关键点:我们注意到request与response的配对,在这里是用队列表达的,而不是Map。用队列的入队,出队,完成2者的匹配。要实现这个,服务器就必须要保证消息的时序:即在一个socket上面,假如发出去的reqeust是0, 1, 2,那返回的response的顺序也必须是0, 1, 2。

    但是服务器是1 + N + M模型,所有的请求进入一个requestQueue,然后是多线程并行处理的。那它如何保证消息的时序呢?

    答案是mute/unmute机制:每当一个channel上面接收到一个request,这个channel就会被mute,然后等response返回之后,才会再unmute。这样就保证了同1个连接上面,同时只会有1个请求被处理。

    下面是服务端的代码:

    selector.completedReceives.asScala.foreach { receive =>  
          try {  
            val channel = selector.channel(receive.source)  
            val session = RequestChannel.Session(new KafkaPrincipal(KafkaPrincipal.USER_TYPE, channel.principal.getName),  
              channel.socketAddress)  
            val req = RequestChannel.Request(processor = id, connectionId = receive.source, session = session, buffer = receive.payload, startTimeMs = time.milliseconds, securityProtocol = protocol)  
            requestChannel.sendRequest(req)  
          } catch {  
            case e @ (_: InvalidRequestException | _: SchemaException) =>  
              // note that even though we got an exception, we can assume that receive.source is valid. Issues with constructing a valid receive object were handled earlier  
              error("Closing socket for " + receive.source + " because of error", e)  
              close(selector, receive.source)  
          }  
          selector.mute(receive.source)    //收到请求,把这个请求对应的channel, mute  
        }  
      
        selector.completedSends.asScala.foreach { send =>  
          val resp = inflightResponses.remove(send.destination).getOrElse {  
            throw new IllegalStateException(s"Send for ${send.destination} completed, but not in `inflightResponses`")  
          }  
          resp.request.updateRequestMetrics()  
          selector.unmute(send.destination)  //发送response之后,把这个responese对应的channel, unmute  
        }
    

    NetworkClient实现思路

    上面已经讲到:
    (1)Selector维护了所有连接的连接池,所有连接上,消息的发送、接收都是通过poll函数进行的
    (2)一个channel一次只能暂存1个Send对象。

    但如果这个Send对象,一次poll之后,没有完全发送出去怎么办呢?看上层NetworkClient怎么处理的:

    关键的client.ready函数

    先从Sender的run()函数看起:

    public void run(long now) {  
        Cluster cluster = metadata.fetch();  
        // get the list of partitions with data ready to send  
        RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);  
      
        if (result.unknownLeadersExist)  
            this.metadata.requestUpdate();  
      
        // remove any nodes we aren't ready to send to  
        Iterator<Node> iter = result.readyNodes.iterator();  
        long notReadyTimeout = Long.MAX_VALUE;  
        while (iter.hasNext()) {  
            Node node = iter.next();  
            if (!this.client.ready(node, now)) {   //关键函数!!!  
                iter.remove();  
                notReadyTimeout = Math.min(notReadyTimeout, this.client.connectionDelay(node, now));  
            }  
        }  
      
        // create produce requests  
        Map<Integer, List<RecordBatch>> batches = this.accumulator.drain(cluster,  
                                                                         result.readyNodes,  
                                                                         this.maxRequestSize,  
                                                                         now);  
      
        List<RecordBatch> expiredBatches = this.accumulator.abortExpiredBatches(this.requestTimeout, cluster, now);  
        // update sensors  
        for (RecordBatch expiredBatch : expiredBatches)  
            this.sensors.recordErrors(expiredBatch.topicPartition.topic(), expiredBatch.recordCount);  
      
        sensors.updateProduceRequestMetrics(batches);  
        List<ClientRequest> requests = createProduceRequests(batches, now);  
      
        long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);  
        if (result.readyNodes.size() > 0) {  
            log.trace("Nodes with data ready to send: {}", result.readyNodes);  
            log.trace("Created {} produce requests: {}", requests.size(), requests);  
            pollTimeout = 0;  
        }  
      
        for (ClientRequest request : requests)  //每个request分属于不同的Node  
            client.send(request, now);   //client的send就是直接调用了selector.send,消息暂存在channel里面,没有发送  
      
        this.client.poll(pollTimeout, now); //调用selector.poll,处理连接、发送、接收  
    }
    

    在上面的代码中,有一个关键函数:client.ready(Node n, ..), 这个函数内部会判断这个node有没有ready,如果没有ready,就会从readNodes里面移除,接下来就不会往这个Node发送消息。

    那什么叫ready呢? 我们看一下代码:

    public boolean ready(Node node, long now) {  
        if (isReady(node, now))  
            return true;  
      
        if (connectionStates.canConnect(node.idString(), now))  
            initiateConnect(node, now);  
        return false;  
    }  
      
    public boolean isReady(Node node, long now) {  
        return !metadataUpdater.isUpdateDue(now) && canSendRequest(node.idString());  
    }  
      
    private boolean canSendRequest(String node) {  
        return connectionStates.isConnected(node) && selector.isChannelReady(node) && inFlightRequests.canSendMore(node);  
    }  
      
    public boolean canSendMore(String node) {  
        Deque<ClientRequest> queue = requests.get(node);  
        return queue == null || queue.isEmpty() ||  
               (queue.peekFirst().request().completed() && queue.size() < this.maxInFlightRequestsPerConnection);  
    }  
      
    public boolean completed() {  
        return remaining <= 0 && !pending;  
    }
    

    上面的代码封了好几层,但总结下来,一个Node ready,可以向其发送请求,需要符合以下几个条件:

    1. metadata正常,不需要update: !metadataUpdater.isUpdateDue(now)
    2. 连接正常 connectionStates.isConnected(node)
    3. channel是ready状态:这个对于PlaintextChannel, 一直返回true
    4. 当前该channel中,没有in flight request,所有请求都处理完了
    5. 当前该channel中,队列尾部的request已经完全发送出去, request.completed(),并且inflight request数目,没有超过设定的最大值
      缺省为5,即允许在“天上飞”的request最多有5个,所谓在“天上飞”,就是发出去了,response还没有回来)

    而上面的第5个条件,正是解决了上面的问题:一个channel里面的Send对象要是只发送了部分,下1次就不会处于ready状态了。

    client.poll函数

    下面看一下client.poll,是如何封装selector.poll的:

        public List<ClientResponse> poll(long timeout, long now) {  
            long metadataTimeout = metadataUpdater.maybeUpdate(now);  
            try {  
                this.selector.poll(Utils.min(timeout, metadataTimeout, requestTimeoutMs));  
            } catch (IOException e) {  
                log.error("Unexpected error during I/O", e);  
            }  
      
            //上面说到,selector.poll函数,会把处理结果,放到一堆的状态变量里面(输出结果集),现在就是处理这堆输出结果的时候了。  
      
            long updatedNow = this.time.milliseconds();  
            List<ClientResponse> responses = new ArrayList<>();  
            handleCompletedSends(responses, updatedNow);  
            handleCompletedReceives(responses, updatedNow);  
            handleDisconnections(responses, updatedNow);  
            handleConnections();  
            handleTimedOutRequests(responses, updatedNow);  
      
            // invoke callbacks  
            for (ClientResponse response : responses) {  
                if (response.request().hasCallback()) {  
                    try {  
                        response.request().callback().onComplete(response);  
                    } catch (Exception e) {  
                        log.error("Uncaught error in request completion:", e);  
                    }  
                }  
            }  
      
            return responses;  
       }  
      
    //Selector中的那堆状态变量,在每次poll之前,被clear情况掉,每次poll之后,填充。  
    //然后在client.poll里面,这堆输出结果被处理  
    public class Selector implements Selectable {  
        。。。  
        private final List<Send> completedSends;  
        private final List<NetworkReceive> completedReceives;  
        private final Map<KafkaChannel, Deque<NetworkReceive>> stagedReceives;  
        private final List<String> disconnected;  
        private final List<String> connected;  
    。。。  
    }
    

    连接检测 & 自动重连机制

    在所有tcp长链接的编程中,都有一个基本问题要解决:如何判断1个连接是否断开?客户端需要维护所有连接的状态(connecting, connected, disconnected),然后根据连接状态做不同逻辑。

    但在NIO中,并没有一个函数,可以直接告诉你一个连接是否断开了;在NetworkClient里面,也并没有开一个线程,不断发送心跳消息,来检测连接。那它是如何处理的呢?

    检测连接断开的手段

    在networkClient的实现中,用了3种手段,来判断一个连接是否断开:

    手段1:所有的IO函数,connect, finishConnect, read, write都会抛IOException,因此任何时候,调用这些函数的时候,只要抛异常,就认为连接已经断开。

    手段2:selectionKey.isValid()

    手段3:inflightRequests,所有发出去的request,都设置有一个response返回的时间。在这个时间内,response没有回来,就认为连接断了。

    前2种手段,都集中在Select.poll函数里面:

    public void poll(long timeout) throws IOException {  
        if (timeout < 0)  
            throw new IllegalArgumentException("timeout should be >= 0");  
        clear();  
        if (hasStagedReceives())  
            timeout = 0;  
        /* check ready keys */  
        long startSelect = time.nanoseconds();  
        int readyKeys = select(timeout);  
        long endSelect = time.nanoseconds();  
        currentTimeNanos = endSelect;  
        this.sensors.selectTime.record(endSelect - startSelect, time.milliseconds());  
      
        if (readyKeys > 0) {  
            Set<SelectionKey> keys = this.nioSelector.selectedKeys();  
            Iterator<SelectionKey> iter = keys.iterator();  
            while (iter.hasNext()) {  
                SelectionKey key = iter.next();  
                iter.remove();  
                KafkaChannel channel = channel(key);  
      
                // register all per-connection metrics at once  
                sensors.maybeRegisterConnectionMetrics(channel.id());  
                lruConnections.put(channel.id(), currentTimeNanos);  
      
                try {  
                    /* complete any connections that have finished their handshake */  
                    if (key.isConnectable()) {  
                        channel.finishConnect();  
                        this.connected.add(channel.id());  
                        this.sensors.connectionCreated.record();  
                    }  
      
                    /* if channel is not ready finish prepare */  
                    if (channel.isConnected() && !channel.ready())  
                        channel.prepare();  
      
                    /* if channel is ready read from any connections that have readable data */  
                    if (channel.ready() && key.isReadable() && !hasStagedReceive(channel)) {  
                        NetworkReceive networkReceive;  
                        while ((networkReceive = channel.read()) != null)  
                            addToStagedReceives(channel, networkReceive);  
                    }  
      
                    /* if channel is ready write to any sockets that have space in their buffer and for which we have data */  
                    if (channel.ready() && key.isWritable()) {  
                        Send send = channel.write();  
                        if (send != null) {  
                            this.completedSends.add(send);  
                            this.sensors.recordBytesSent(channel.id(), send.size());  
                        }  
                    }  
      
                    if (!key.isValid()) {   //手段2  
                        close(channel);  
                        this.disconnected.add(channel.id());  
                    }  
                } catch (Exception e) {  //手段1:任何一个io函数,只要抛错,就认为连接断了  
                    String desc = channel.socketDescription();  
                    if (e instanceof IOException)  
                        log.debug("Connection with {} disconnected", desc, e);  
                    else  
                        log.warn("Unexpected error from {}; closing connection", desc, e);  
                    close(channel);  
                    this.disconnected.add(channel.id());  
                }  
            }  
        }  
      
        addToCompletedReceives();  
      
        long endIo = time.nanoseconds();  
        this.sensors.ioTime.record(endIo - endSelect, time.milliseconds());  
        maybeCloseOldestConnection();  
    }
    

    第3种手段,在NetworkClient里面:

    public List<ClientResponse> poll(long timeout, long now) {  
        long metadataTimeout = metadataUpdater.maybeUpdate(now);  
        try {  
            this.selector.poll(Utils.min(timeout, metadataTimeout, requestTimeoutMs));  
        } catch (IOException e) {  
            log.error("Unexpected error during I/O", e);  
        }  
      
        long updatedNow = this.time.milliseconds();  
        List<ClientResponse> responses = new ArrayList<>();  
        handleCompletedSends(responses, updatedNow);  
        handleCompletedReceives(responses, updatedNow);  
        handleDisconnections(responses, updatedNow);  
        handleConnections();  
        handleTimedOutRequests(responses, updatedNow); //手段3:处理所有TimeOutRequests  
      
        for (ClientResponse response : responses) {  
            if (response.request().hasCallback()) {  
                try {  
                    response.request().callback().onComplete(response);  
                } catch (Exception e) {  
                    log.error("Uncaught error in request completion:", e);  
                }  
            }  
        }  
      
        return responses;  
    }  
      
    private void processDisconnection(List<ClientResponse> responses, String nodeId, long now) {  
        connectionStates.disconnected(nodeId, now);  
        for (ClientRequest request : this.inFlightRequests.clearAll(nodeId)) {  
            log.trace("Cancelled request {} due to node {} being disconnected", request, nodeId);  
            if (!metadataUpdater.maybeHandleDisconnection(request)) //把MetaDataRequest排除在外,其它所有请求,只要超时,就认为连接断开  
                responses.add(new ClientResponse(request, now, true, null));  
        }  
    }
    

    除了上述的2个地方,还要一个地方,就是初始化的时候

    private void initiateConnect(Node node, long now) {  
        String nodeConnectionId = node.idString();  
        try {  
            log.debug("Initiating connection to node {} at {}:{}.", node.id(), node.host(), node.port());  
            this.connectionStates.connecting(nodeConnectionId, now);  
            selector.connect(nodeConnectionId,  
                             new InetSocketAddress(node.host(), node.port()),  
                             this.socketSendBuffer,  
                             this.socketReceiveBuffer);  
        } catch (IOException e) { //检测到连接断开  
            connectionStates.disconnected(nodeConnectionId, now);  
            metadataUpdater.requestUpdate();  
            log.debug("Error connecting to node {} at {}:{}:", node.id(), node.host(), node.port(), e);  
        }  
    }
    

    检测时机

    从上面代码我们可以看出,连接的检测时机,有2个:
    一个是初始建立连接的时候,一个就是每次poll循环,每poll一次,就收集到一个断开的连接集合。

    下面分别是Selector和NetworkClient中,关于连接状态的数据结构:

    //Selector中的连接状态  
    public class Selector implements Selectable {  
        private final List<String> disconnected;  
        private final List<String> connected;  
        ..  
    }
    
    //NetworkClient中的连接状态维护  
    public class NetworkClient implements KafkaClient {  
        private final ClusterConnectionStates connectionStates;  
        ...  
    }  
      
    final class ClusterConnectionStates {  
        private final long reconnectBackoffMs; //重连的时间间隔  
        private final Map<String, NodeConnectionState> nodeState;  
    }  
      
        private static class NodeConnectionState {  
            ConnectionState state;  
            long lastConnectAttemptMs;  //上1次发起重连的时间  
            ...  
        }  
      
    public enum ConnectionState {  
        DISCONNECTED, CONNECTING, CONNECTED  
    }
    

    总结:

    1. Selector中的连接状态,在每次poll之前,会调用clear清空;在poll之后,收集。
    2. Selector中的连接状态,会传给上层NetworkClient,用于它更新自己的连接状态
    3. 出了来自Selctor,NetworkClient自己内部的inflightRequests,也就是上面的手段3, 也用于检测连接状态。

    通过上面的机制,就保证了NetworkClient可以实时准确维护所有connection的状态。

    自动重连 - ready函数

    状态知道了,那剩下的就是自动重连了。这个发生在更上层的Send的run函数里面:

    //Sender  
        public void run(long now) {  
            Cluster cluster = metadata.fetch();  
            RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);  
      
            if (result.unknownLeadersExist)  
                this.metadata.requestUpdate();  
      
            Iterator<Node> iter = result.readyNodes.iterator();  
            long notReadyTimeout = Long.MAX_VALUE;  
            while (iter.hasNext()) {  
                Node node = iter.next();  
                if (!this.client.ready(node, now)) {  //关键的ready函数  
                    iter.remove();  
                    notReadyTimeout = Math.min(notReadyTimeout, this.client.connectionDelay(node, now));  
                }  
            }  
      
        public boolean ready(Node node, long now) {  
            if (isReady(node, now))  
                return true;  
      
            if (connectionStates.canConnect(node.idString(), now))  
                initiateConnect(node, now);   //发起重连  
      
            return false;  
        }  
      
        public boolean canConnect(String id, long now) {  
            NodeConnectionState state = nodeState.get(id);  
            if (state == null)  
                return true;  
            else  
                return state.state == ConnectionState.DISCONNECTED && now - state.lastConnectAttemptMs >= this.reconnectBackoffMs;  
        }
    

    从上面函数可以看出,每次Send发数据之前,会先调用client.ready(node)判断该node的连接是否可用。

    在ready内部,如果连接不是connected状态,会再判断是否可以发起自动重连,检测条件有2个:

    条件1: 它不能是connecting状态,必须是disconnected
    条件2: 重连不能太频繁。当前时间距离上1次重连时间,要有一定的间隔。如果broker挂了,你太频繁的重连也不起作用。

    这里有个关键点:因为都是非阻塞调用,本次虽然检测到连接断了,但只是发起连接,不会等到连接建立好了,再执行下面的代码。
    会在poll之后,判断连接是否建立;在下1次或者下几次poll之前,可能连接才会建立好,ready才会返回true.

    欢迎加入QQ群:104286694

    相关文章

      网友评论

          本文标题:Kafka源码分析-序列4 -Producer -network

          本文链接:https://www.haomeiwen.com/subject/xgnjyttx.html