美文网首页
Python检查zookeeper节点状态

Python检查zookeeper节点状态

作者: 平凡的运维之路 | 来源:发表于2023-12-19 19:29 被阅读0次

    Python检查zookeeper节点信息状态

    需求

    • 目前项目核心模块使用zookeeper做分布式锁,但是zookeeper的节点信息状态没有监控,所以需要监控zookeeper的节点信息状态
    • 监控zookeeper的节点信息状态,如果节点信息状态异常不存在,则输出错误关键字或者报警等

    实现

    • 利用python的zookeeper模块,获取zookeeper的节点信息状态
    • 利用python的配合错误关键字,将zookeeper的节点信息异常状态推送网管告警

    缺陷

    • 使用DataWatch只监控节点信息状态,如果该节点信息不存在,这时其他节点cti1信息变更后,会通知到其他不存在节点cti5,如下信息
    #修改节点信息时
    [zk: localhost:2182(CONNECTED) 28] set /ctimanager/bj-cti1  {"agent":2,"test":1,"a1":2ok23121111111}
    2023-12-20 19:26:35,654 140538316216128 check_zookeeper_node.py:73 INFO get config file  zookeeper node list: ['/ctimanager/bj-cti1', '/ctimanager/bj-cti2', '/ctimanager/bj-cti3', '/ctimanager/bj-cti4', '/ctimanager/bj-cti5']
    2023-12-20 19:26:35,654 140538316216128 check_zookeeper_node.py:76 INFO zookeeper login host: 127.0.0.1:2182
    2023-12-20 19:26:35,660 140538316216128 check_zookeeper_node.py:81 INFO use user login auth conn zookeeper 
    2023-12-20 19:26:35,661 140538316216128 check_zookeeper_node.py:83 INFO zookeeper user info: digest admin:123456
    2023-12-20 19:26:35,662 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti1 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2}' 状态正常
    2023-12-20 19:26:35,662 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti1 正常
    2023-12-20 19:26:35,662 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti2 的数据获取信息: b'{"agent":1}' 状态正常
    2023-12-20 19:26:35,663 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti2 正常
    2023-12-20 19:26:35,664 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti3 不存在
    2023-12-20 19:26:35,665 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti4 不存在
    2023-12-20 19:26:35,666 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti5 不存在
    2023-12-20 19:26:40,915 140538087216896 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti5 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2222}' 状态正常
    2023-12-20 19:26:40,915 140538087216896 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti5 正常
    2023-12-20 19:26:45,674 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti1 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2222}' 状态正常
    2023-12-20 19:26:45,675 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti1 正常
    2023-12-20 19:26:45,675 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti2 的数据获取信息: b'{"agent":1}' 状态正常
    2023-12-20 19:26:45,675 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti2 正常
    2023-12-20 19:26:45,676 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti3 不存在
    2023-12-20 19:26:45,677 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti4 不存在
    

    实现代码

    • 获取zookeeper的节点信息状态
    #!/usr/bin/python3 
    # -*- coding:utf-8 -*-
    ###############################################################
    #针对平台监控注册zookeeper节点模块宕机后,节点丢失异常时,输出告警##
    ############2023年12月20日16点02分##############################
    ###############################################################
    
    from kazoo.client import KazooClient
    from kazoo.exceptions import KazooException
    from logging.handlers import RotatingFileHandler
    import time,configparser,os,logging
    
    def watch_nodes():
        for node_path in node_paths:
            try:
                @zk.DataWatch(node_path)
                def on_data_change(data, stat):
                    if data is None:
                        logger.error(f"从zookeeper获取节点 {node_path} 不存在")
                    else:
                        logger.debug(f"从zookeeper节点 {node_path} 的数据获取信息: {data} 状态正常")
                        logger.info(f"从zookeeper获取节点 {node_path} 正常")
            except KazooException as e:
                logger.error(f"从zookeeper监控节点 {node_path} 时发生异常:{e}")
    
    def logger_func(LogLevel):
      # 创建日志记录器
      logger = logging.getLogger("my_logger")
      if "DEBUG" == LogLevel:
        logger.setLevel(logging.DEBUG)
      if "INFO" == LogLevel:
        logger.setLevel(logging.INFO)
      if "WARNING" == LogLevel:
        logger.setLevel(logging.WARNING)
      if "ERROR" == LogLevel:
        logger.setLevel(logging.ERROR)
      # 创建RotatingFileHandler对象
      handler = RotatingFileHandler(log_file, maxBytes=max_log_size, backupCount=backup_count)
    
      # 定义日志格式
      formatter = logging.Formatter("%(asctime)s %(thread)d %(filename)s:%(lineno)d %(levelname)s %(message)s")
      handler.setFormatter(formatter)
    
      # 将处理程序添加到日志记录器
      logger.addHandler(handler)
      return logger
    
    if __name__ == "__main__":
        for dirpath in os.popen("pwd"):
            dirpath = dirpath.strip('\n')
        cfgpath = os.path.join(dirpath, "cfg/config.ini")
        conf = configparser.ConfigParser()
            # conf = ConfigParser.ConfigParser()
        print("config file ---> ",cfgpath)
        # conf.read(cfgpath, encoding='UTF-8')
        conf.read(cfgpath)
        GetLogDir = conf.get("Base","LogDir")
        LogLevel = conf.get("Base","LogLevel")
        CheckintervalDate = conf.get("Base","CheckintervalDate")
        max_log_size = int(conf.get("Base","max_log_size"))
        backup_count = int(conf.get("Base","backup_count"))
        zk_hosts = conf.get("zookeeper","zookeeper_host")
        is_auth = conf.getboolean("zookeeper","is_auth")
        username = conf.get("zookeeper","zookeeper_user")
        password = conf.get("zookeeper","zookeeper_passwd")
        zookeeper_node = conf.get("zookeeper","zookeeper_node")
    
        log_file =  "./log/watch_zoo.log"
        logger = logger_func(LogLevel)
    
        # 要监控的节点路径列表
        node_paths = zookeeper_node.split(",")
        logger.info("get config file  zookeeper node list: " + str(node_paths))
    
        # 连接Zookeeper
        logger.info("zookeeper login host: " + zk_hosts) 
        zk = KazooClient(hosts=zk_hosts)
        try:
            zk.start()
            if True == is_auth:
                logger.info("use user login auth conn zookeeper ")
                zk.add_auth(scheme='digest',credential=username + ":" + password)
                logger.info("zookeeper user info: digest " + str(username) + ":" + password )
            else:
                logger.info("not use user auth login zookeeper")
    
            while True:
                watch_nodes()
                time.sleep(int(CheckintervalDate))
    
        except KazooException as e:
            if "Connection refused" in str(e):
                logger.error("conn zookeeper error: " + str(e) + " exit!!!")
                os.exit(110)
            else:
                logger.error("conn other zookeeper error: " + str(e) + " exit!!!" )
                os.exit(110)
        finally:
            zk.stop()
    
    • 配置文件说明
    [Base]
    #执行运行间隔休眠时间单位是s
    CheckintervalDate=10
    #设置日志级别: DEBUG、INFO、WARNING、ERROR
    LogLevel=INFO
    #日志目录
    LogDir=./log
    #日志文件大小100MB
    max_log_size = 1104857600
    #日志文件最大备份次数
    backup_count = 10
    
    [zookeeper]
    #zookeeper 登录host ip
    zookeeper_host = 127.0.0.1:2182
    #认证用户
    zookeeper_user = admin
    #认证密码
    zookeeper_passwd = 123456
    #是否启用zookeeper密码链接
    is_auth = True
    #配置对应节点信息,最好对应上,cti上报多个cti,这里就配置多少个节点信息
    zookeeper_node = /ctimanager/bj-cti1,/ctimanager/bj-cti2,/ctimanager/bj-cti3,/ctimanager/bj-cti4,/ctimanager/bj-cti5
    

    脚本运行输出

    • 脚本运行
    [devops@my-dev watch_zookeeper]$ ./check_zookeeper_node.py 
    config file --->  /home/devops/Python/ABC/watch_zookeeper/cfg/config.ini
    2023-12-20 19:20:18,946 139836511770432 check_zookeeper_node.py:73 INFO get config file  zookeeper node list: ['/ctimanager/bj-cti1', '/ctimanager/bj-cti2', '/ctimanager/bj-cti3', '/ctimanager/bj-cti4', '/ctimanager/bj-cti5']
    2023-12-20 19:26:35,654 140538316216128 check_zookeeper_node.py:73 INFO get config file  zookeeper node list: ['/ctimanager/bj-cti1', '/ctimanager/bj-cti2', '/ctimanager/bj-cti3', '/ctimanager/bj-cti4', '/ctimanager/bj-cti5']
    2023-12-20 19:26:35,654 140538316216128 check_zookeeper_node.py:76 INFO zookeeper login host: 127.0.0.1:2182
    2023-12-20 19:26:35,660 140538316216128 check_zookeeper_node.py:81 INFO use user login auth conn zookeeper 
    2023-12-20 19:26:35,661 140538316216128 check_zookeeper_node.py:83 INFO zookeeper user info: digest admin:123456
    2023-12-20 19:26:35,662 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti1 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2}' 状态正常
    2023-12-20 19:26:35,662 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti1 正常
    2023-12-20 19:26:35,662 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti2 的数据获取信息: b'{"agent":1}' 状态正常
    2023-12-20 19:26:35,663 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti2 正常
    2023-12-20 19:26:35,664 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti3 不存在
    2023-12-20 19:26:35,665 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti4 不存在
    2023-12-20 19:26:35,666 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti5 不存在
    2023-12-20 19:26:40,915 140538087216896 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti5 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2222}' 状态正常
    2023-12-20 19:26:40,915 140538087216896 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti5 正常
    2023-12-20 19:26:45,674 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti1 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2222}' 状态正常
    2023-12-20 19:26:45,675 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti1 正常
    2023-12-20 19:26:45,675 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti2 的数据获取信息: b'{"agent":1}' 状态正常
    2023-12-20 19:26:45,675 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti2 正常
    2023-12-20 19:26:45,676 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti3 不存在
    2023-12-20 19:26:45,677 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti4 不存在
    2023-12-20 19:26:45,678 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti5 不存在
    2023-12-20 19:26:55,685 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti1 的数据获取信息: b'{"agent":2,"test":1,"a1":2ok2222}' 状态正常
    2023-12-20 19:26:55,685 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti1 正常
    2023-12-20 19:26:55,686 140538316216128 check_zookeeper_node.py:21 DEBUG 从zookeeper节点 /ctimanager/bj-cti2 的数据获取信息: b'{"agent":1}' 状态正常
    2023-12-20 19:26:55,686 140538316216128 check_zookeeper_node.py:22 INFO 从zookeeper获取节点 /ctimanager/bj-cti2 正常
    2023-12-20 19:26:55,687 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti3 不存在
    2023-12-20 19:26:55,688 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti4 不存在
    2023-12-20 19:26:55,689 140538316216128 check_zookeeper_node.py:19 ERROR 从zookeeper获取节点 /ctimanager/bj-cti5 不存在
    

    相关文章

      网友评论

          本文标题:Python检查zookeeper节点状态

          本文链接:https://www.haomeiwen.com/subject/kdzzgdtx.html