美文网首页程序员
zabbix + python 监控redis集群

zabbix + python 监控redis集群

作者: OrangeLoveMilan | 来源:发表于2017-12-21 21:28 被阅读276次

    目录

    • 监控需求
    • 监控脚本
    • 监控界面
    • 总结

    监控需求

    公司部署了套redis集群,三主三从,虽然是高可用,但是还是要做好监控,即时发现问题。网上虽然有zabbix的redis监控模板,监控项不够简练,于是自己用python写了个。

    监控脚本

    pip install rediscluster

    #!/usr/local/python/shims/python
    from rediscluster import StrictRedisCluster
    
    
    import redis
    import sys
    import mylog
    redis_nodes =  [{'host':'ip','port':port},
                    {'host':'ip','port':port},
                    {'host':'ip','port':port},
                    {'host':'ip','port':port},
                    {'host':'ip','port':port},
                    {'host':'ip','port':port},
                       ]
    
    ###
    进入redis集群模式,如果异常,记录到日志中,并终止脚本
    ###
    
    try:
        redisconn = StrictRedisCluster(startup_nodes=redis_nodes,password='pwd')
        
    except Exception,e:
        mylog.logging.error('%s' %e )
        sys.exit(0)
    
    ###
    定义参数
    ###
    
    
    data = {}
    NodeData = {}
    hit = 0
    misshit = 0
    hitrate = 0.00
    
    ###
    定义函数,抓取监控项
    ###
    
    
    def ClusterState(item):
        cluster_state = redisconn.execute_command('cluster','info')
        cluster_state = cluster_state.split('\r\n')
        try:
            for i in cluster_state:
                data[i.split(':')[0]] = i.split(':')[1]
        except:
            pass
        if  item == 'clusterstatus':
            state = data['cluster_state']
            if state == 'ok' :
                item = 1
            else:
                item = 0
            return item
    
        elif item == 'clusterslotsfail':
            item = data['cluster_slots_fail']
            return item  
    
        elif item == 'clusterknownnodes':
            item = data['cluster_known_nodes']
            return item
        else:
            return 9999
    def NodeInfoServer(item):
        node_info = redisconn.info('Server')
        NodeData = node_info['ip:port']
        if item == 'uptime_in_days':
            item = NodeData['uptime_in_days']
            return item
        else:
            return 9999   
        
    def NodeInfoClients(item):
        node_info = redisconn.info('Clients')
        NodeData = node_info['ip:port']
        if item == 'connected_clients':
            item = NodeData['connected_clients']
            return item
        else:
            return 9999
    
    def NodeInfoMemory(item):
        node_info = redisconn.info('Memory')
        NodeData = node_info['ip:port']
        if item == 'used_memory_human':
            item = NodeData['used_memory_human']
            return item
        elif item == 'total_system_memory_human':
            item = NodeData['total_system_memory_human']
            return item
        else:
            return 9999
    
    def NodeInfoPersistence(item):
        node_info = redisconn.info('Persistence')
        NodeData = node_info['ip:port']
        
        if item == 'rdb_last_bgsave_status':
            item = NodeData['rdb_last_bgsave_status']    
            if item == 'ok' :
                item = 1
            else:
                item = 0
            return item
        else:
            return 9999
    
    def NodeInfoStats(item):
        node_info = redisconn.info('Stats')
        NodeData = node_info['ip:port']
        if item == 'instantaneous_ops_per_sec':
            item = NodeData['instantaneous_ops_per_sec']
            return item
        elif item == 'instantaneous_input_kbps':
            item = NodeData['instantaneous_input_kbps']
            return item
        elif item == 'instantaneous_output_kbps':
            item = NodeData['instantaneous_output_kbps']
            return item
        elif item == 'hit':
            hit =  NodeData['keyspace_hits'] 
            misshit = NodeData['keyspace_misses']
            hitrate = round((float(hit) / float(hit + misshit)) ,3)
            item = hitrate
            return item        
        else:
            return 9999
    
    ###
    脚本传参,zabbix获取监控项
    ###
    
    if sys.argv[1] == 'status':
         print ClusterState('clusterstatus')
    elif sys.argv[1] == 'slotsfail': 
         print ClusterState('clusterslotsfail')
    elif sys.argv[1] == 'nodes':
         print ClusterState('clusterknownnodes')
    elif sys.argv[1] == 'day':
         print NodeInfoServer('uptime_in_days')
    elif sys.argv[1] == 'clients':
         print NodeInfoClients('connected_clients')
    elif sys.argv[1] == 'usememory':
         print NodeInfoMemory('used_memory_human')
    elif sys.argv[1] == 'sysmemory':
         print NodeInfoMemory('total_system_memory_human')
    elif sys.argv[1] == 'rdb':
         print NodeInfoPersistence('rdb_last_bgsave_status')
    elif sys.argv[1] == 'ops':
         print NodeInfoStats('instantaneous_ops_per_sec')
    elif sys.argv[1] == 'input_kbps':
         print NodeInfoStats('instantaneous_input_kbps')
    elif sys.argv[1] == 'output_kbps':
         print NodeInfoStats('instantaneous_output_kbps')
    elif sys.argv[1] == 'hit':
         print NodeInfoStats('hit')
    

    监控界面

    1513862328(1).jpg

    监控的参数有input_kps、output_kps、ops、redis运行天数、占用系统内存、失败的槽数、客户端数、持久化状态、缓存命中率、集群状态等等,结合生产情况设置阈值告警

    总结

    由于个人对redis理解有限,暂时只监控了这些参数。python写的也不够简洁,很多需要优化的地方。先暂时实现了功能,有空再优化

    相关文章

      网友评论

        本文标题:zabbix + python 监控redis集群

        本文链接:https://www.haomeiwen.com/subject/oblzwxtx.html