美文网首页监控首页投稿(暂停使用,暂停投稿)
基于Zabbix IPMI监控服务器硬件状况

基于Zabbix IPMI监控服务器硬件状况

作者: ygqygq2 | 来源:发表于2016-11-18 11:44 被阅读1873次

    公司有多个分部,且机房没有专业值班,机房等级不够。在这种情况下,又想实时监控机房环境,于是使用IPMI方式来达到目的。由于之前已经部署了Zabbix监控系统,本次将结合Zabbix自带的IPMI,完成服务器温度及风扇转速等的监控。

    1.环境说明

    被监控端服务器型号:Dell PowerEdge R510
    规划分配的IPMI地址: 10.103.1.100

    2.Zabbix监控平台说明

    Zabbix版本: 3.2.1,在安装时,未使用--with-openipmi
    Zabbix网络接口可以连通10.103.1.100

    3.前置学习

    维基百科IPMI: http://zh.wikipedia.org/wiki/IPMI
    IBM DeveloperWorks -- 使用ipmitool实现Linux系统下对服务器的ipmi管理: http://www.ibm.com/developerworks/cn/linux/l-ipmi/
    Dell -- Managing Dell PowerEdge Servers Using IPMItool:http://www.dell.com/downloads/global/power/ps4q04-20040204-Murphy.pdf
    Zabbix IPMI checks:https://www.zabbix.com/documentation/2.0/manual/config/items/itemtypes/ipmi
    使用IPMITOOL实现终端重定向(课外读物):http://docs.linuxtone.org/ebooks/Dell/ipmitool.pdf

    4.配置IPMI

    4.1.配置IPMI地址

    可以参考前置推荐中的《Managing Dell PowerEdge Servers Using IPMItool》在服务器启动时进行IPMI地址的配置,并开启IPMI Over LAN。
    也可以使用Dell的iDRAC开启IPMI功能,具体可以查看文章最后的参考资料。


    QQ截图20161116003506.png

    4.2.获取传感器信息

    登录Zabbix服务器,通过ipmitool远程访问Dell服务器传感器信息

    # ipmitool -I lan -H 10.103.1.100 -U root -P calvin -L user sensor list
    # ipmitool -I lan -H 10.103.1.100 -U root -P calvin -L user sensor get "FAN MOD 1B RPM"
    
    Paste_Image.png Paste_Image.png

    4.3.安装IPMItool软件包

    # yum -y install OpenIPMI OpenIPMI-devel ipmitool freeipmi
    

    4.4.配置Zabbix

    注:为了支持IPMI,需要在zabbix server/proxy安装时增加--with-openipmi参数

    服务器端配置zabbix IPMI pollers
    zabbix_server.conf/zabbix_proxy.conf

    # sed -i '/# StartIPMIPollers=0/aStartIPMIPollers=5' zabbix_server.conf
    # service zabbix-server restart
    

    4.5.导入监控模板

    下面提供DELL的2个型号的IPMI模板:
    template-ipmi-dell-poweredge-r510
    template-ipmi-dell-poweredge-2950
    添加监控主机,关联上本模板,并在IPMI页面,设置Authentication algorithmDefault, Privilege levelUser, Usernamesensor, Passwordsensor_pass,保存即可。
    使用此种方法获取数据的结果就是效率很差,基本没什么数据。

    5.使用Zabbix External checks自定义IPMI

    本来是选择nagios的IPMI插件:check_ipmi_sensor,文件是:check_ipmi_sensor_v3-v3.9.tar.gz
    具体使用方法详见:http://www.thomas-krenn.com/en/wiki/IPMI_Sensor_Monitoring_Plugin

    5.1.安装perl-IPC-Run模块

    yum -y install perl-IPC-Run perl-Getopt-Long
    

    5.2.使用check_ipmi_sensor查看效果

    但是发现报错,显示格式又不友好。

    # ./check_ipmi_sensor -f ipmi.cfg -H 10.103.1.100 -vvv
    ------------- debug output for sel (-vvv is set): ------------
      /usr/sbin/ipmi-sel was executed with the following parameters:
        /usr/sbin/ipmi-sel -h 10.103.1.100 --config-file ipmi.cfg --driver-type=LAN_2_0 --output-event-state --interpret-oem-data --entity-sensor-names
      output of FreeIPMI:
    ID  | Date        | Time     | Name                                        | Type                     | State    | Event
    1   | Apr-08-2011 | 06:42:13 | System Board SEL                            | Event Logging Disabled   | Nominal  | Log Area Reset/Cleared
    2   | Jan-01-1970 | 08:00:31 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
    3   | Jan-01-1970 | 08:00:36 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
    4   | Aug-15-2011 | 23:09:53 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    5   | Aug-16-2011 | 11:38:25 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    6   | Aug-16-2011 | 11:38:25 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    7   | Aug-16-2011 | 11:38:55 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    8   | Jun-10-2012 | 22:41:13 | System Board Ambient Temp                   | Temperature              | Warning  | Upper Non-critical - going high ; Sensor Reading = 45.00 C ; Threshold = 45.00 C
    9   | Jun-11-2012 | 02:53:53 | System Board Ambient Temp                   | Temperature              | Nominal  | Upper Non-critical - going high ; Sensor Reading = 43.00 C ; Threshold = 45.00 C
    10  | Nov-05-2012 | 21:56:42 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    11  | Nov-14-2012 | 21:53:58 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    12  | Nov-14-2012 | 21:53:58 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    13  | Nov-14-2012 | 21:54:19 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    14  | Nov-15-2012 | 16:12:03 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    15  | Nov-17-2012 | 17:14:34 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    16  | Nov-17-2012 | 17:14:34 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    17  | Nov-17-2012 | 17:15:40 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    18  | Nov-19-2012 | 20:47:57 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    19  | Nov-19-2012 | 20:50:04 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    20  | Jan-01-1970 | 08:00:33 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
    21  | Jan-01-1970 | 08:00:38 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
    22  | Jun-27-2014 | 17:27:38 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    23  | Jun-27-2014 | 17:27:53 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    24  | Jan-01-1970 | 08:00:31 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
    25  | Jan-01-1970 | 08:00:36 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
    26  | Oct-31-2016 | 05:48:35 | System Board Ambient Temp                   | Temperature              | Warning  | Lower Non-critical - going low ; Sensor Reading = 8.00 C ; Threshold = 8.00 C
    27  | Oct-31-2016 | 09:00:38 | System Board Ambient Temp                   | Temperature              | Nominal  | Lower Non-critical - going low ; Sensor Reading = 10.00 C ; Threshold = 8.00 C
    ------------- debug output for sensors (-vvv is set): ------------
      script was executed with the following parameters:
        ./check_ipmi_sensor -f ipmi.cfg -H 10.103.1.100 -vvv
      check_ipmi_sensor version:
        3.9
      FreeIPMI version:
        ipmi-sensors - 1.2.9
      FreeIPMI was executed with the following parameters:
        /usr/sbin/ipmi-sensors -h 10.103.1.100 --config-file ipmi.cfg --quiet-cache --sdr-cache-recreate --interpret-oem-data --output-sensor-state --ignore-not-available-sensors --driver-type=LAN_2_0 --output-sensor-thresholds
      FreeIPMI return code: 0
      output of FreeIPMI:
    Record ID | Sensor Name | Sensor Group | Monitoring Status | Sensor Units | Sensor Reading
    5 | Ambient Temp | Temperature | Nominal | C | 28.000000
    7 | CMOS Battery | Battery | Nominal | N/A | 'OK'
    8 | VCORE PG | Voltage | Nominal | N/A | 'State Deasserted'
    9 | VCORE PG | Voltage | Nominal | N/A | 'State Deasserted'
    10 | 0.75 VTT PG | Voltage | Nominal | N/A | 'State Deasserted'
    11 | 0.75 VTT PG | Voltage | Nominal | N/A | 'State Deasserted'
    12 | CPU VTT PG | Voltage | Nominal | N/A | 'State Deasserted'
    13 | 1.5V PG | Voltage | Nominal | N/A | 'State Deasserted'
    14 | 1.8V PG | Voltage | Nominal | N/A | 'State Deasserted'
    15 | 5V PG | Voltage | Nominal | N/A | 'State Deasserted'
    16 | MEM CPU2 FAIL | Voltage | Nominal | N/A | 'State Deasserted'
    17 | 5V Riser PG | Voltage | Nominal | N/A | 'State Deasserted'
    18 | MEM CPU1 FAIL | Voltage | Nominal | N/A | 'State Deasserted'
    19 | VTT CPU2 FAIL | Voltage | Nominal | N/A | 'State Deasserted'
    20 | VTT CPU1 FAIL | Voltage | Nominal | N/A | 'State Deasserted'
    21 | 0.9V PG | Voltage | Nominal | N/A | 'State Deasserted'
    22 | CPU2 1.8 PLL PG | Voltage | Nominal | N/A | 'State Deasserted'
    23 | CPU1 1.8 PLL PG | Voltage | Nominal | N/A | 'State Deasserted'
    24 | 1.1 FAIL | Voltage | Nominal | N/A | 'State Deasserted'
    25 | 1.0 LOM FAIL | Voltage | Nominal | N/A | 'State Deasserted'
    26 | 1.0 AUX FAIL | Voltage | Nominal | N/A | 'State Deasserted'
    27 | Heatsink Pres | Entity Presence | Nominal | N/A | 'Entity Present'
    28 | iDRAC6 Ent Pres | Entity Presence | Critical | N/A | 'Entity Absent'
    29 | USB Cable Pres | Entity Presence | Nominal | N/A | 'Entity Present'
    31 | Riser Presence | Entity Presence | Nominal | N/A | 'Entity Present'
    32 | FAN MOD 1A RPM | Fan | Nominal | RPM | 3480.000000
    34 | FAN MOD 2A RPM | Fan | Nominal | RPM | 3480.000000
    36 | FAN MOD 3A RPM | Fan | Nominal | RPM | 3480.000000
    39 | FAN MOD 4A RPM | Fan | Nominal | RPM | 3480.000000
    40 | Presence | Entity Presence | Nominal | N/A | 'Entity Present'
    41 | Presence | Entity Presence | Nominal | N/A | 'Entity Present'
    42 | Presence | Entity Presence | Nominal | N/A | 'Entity Present'
    43 | Presence | Entity Presence | Nominal | N/A | 'Entity Present'
    44 | Presence  | Entity Presence | Nominal | N/A | 'Entity Present'
    45 | Status | Processor | Nominal | N/A | 'Processor Presence detected'
    46 | Status | Processor | Nominal | N/A | 'Processor Presence detected'
    47 | Status | Power Supply | Nominal | N/A | 'Presence detected'
    48 | Current | Current | Nominal | A | 0.400000
    49 | Current | Current | Nominal | A | 0.400000
    50 | Voltage | Voltage | Nominal | V | 218.000000
    51 | Voltage | Voltage | Nominal | V | 218.000000
    52 | Status | Power Supply | Nominal | N/A | 'Presence detected'
    53 | Status | Cable/Interconnect | Nominal | N/A | 'Cable/Interconnect is connected'
    54 | OS Watchdog | Watchdog 2 | Nominal | N/A | 'OK'
    56 | Intrusion | Physical Security | Nominal | N/A | 'OK'
    57 | PS Redundancy | Power Supply | Nominal | N/A | 'Fully Redundant'
    58 | Fan Redundancy | Fan | Nominal | N/A | 'Fully Redundant'
    60 | System Level | Current | Nominal | W | 168.000000
    61 | Power Optimized | OEM Reserved | Nominal | N/A | 'Good'
    62 | Drive | Drive Slot | Nominal | N/A | 'Drive Presence'
    65 | Cable SAS A | Cable/Interconnect | Nominal | N/A | 'Cable/Interconnect is connected'
    66 | Cable SAS B | Cable/Interconnect | Nominal | N/A | 'Cable/Interconnect is connected'
    67 | DKM Status | OEM Reserved | N/A | N/A | 'OEM Event = 0000h'
    119 | FAN MOD 5A RPM | Fan | Nominal | RPM | 3480.000000
    
    --------------------- end of debug output ---------------------
    IPMI Status: Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 737.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 738.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 749.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in concatenation (.) or string at ./check_ipmi_sensor line 750.
    Use of uninitialized value in string ne at ./check_ipmi_sensor line 759.
    Critical [iDRAC6 Ent Pres = Critical ('Entity Absent'), System Board Intrusion = Critical (Physical Security), System Board Intrusion = Critical (Physical Security), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), System Board Ambient Temp = Warning (Temperature), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), Disk Drive Bay 1 Drive 2 = Critical (Drive Slot), System Board Intrusion = Critical (Physical Security), System Board Intrusion = Critical (Physical Security), System Board Intrusion = Critical (Physical Security), System Board Intrusion = Critical (Physical Security), System Board Ambient Temp = Warning (Temperature)] | 'Ambient Temp'=28.000000;:;: 'FAN MOD 1A RPM'=3480.000000;:;: 'FAN MOD 2A RPM'=3480.000000;:;: 'FAN MOD 3A RPM'=3480.000000;:;: 'FAN MOD 4A RPM'=3480.000000;:;: 'Current'=0.400000;:;: 'Current'=0.400000;:;: 'Voltage'=218.000000;:;: 'Voltage'=218.000000;:;: 'System Level'=168.000000;:;: 'FAN MOD 5A RPM'=3480.000000;:;:
    Ambient Temp = 28.000000 (Status: Nominal)
    CMOS Battery = 'OK' (Status: Nominal)
    VCORE PG = 'State Deasserted' (Status: Nominal)
    VCORE PG = 'State Deasserted' (Status: Nominal)
    0.75 VTT PG = 'State Deasserted' (Status: Nominal)
    0.75 VTT PG = 'State Deasserted' (Status: Nominal)
    CPU VTT PG = 'State Deasserted' (Status: Nominal)
    1.5V PG = 'State Deasserted' (Status: Nominal)
    1.8V PG = 'State Deasserted' (Status: Nominal)
    5V PG = 'State Deasserted' (Status: Nominal)
    MEM CPU2 FAIL = 'State Deasserted' (Status: Nominal)
    5V Riser PG = 'State Deasserted' (Status: Nominal)
    MEM CPU1 FAIL = 'State Deasserted' (Status: Nominal)
    VTT CPU2 FAIL = 'State Deasserted' (Status: Nominal)
    VTT CPU1 FAIL = 'State Deasserted' (Status: Nominal)
    0.9V PG = 'State Deasserted' (Status: Nominal)
    CPU2 1.8 PLL PG = 'State Deasserted' (Status: Nominal)
    CPU1 1.8 PLL PG = 'State Deasserted' (Status: Nominal)
    1.1 FAIL = 'State Deasserted' (Status: Nominal)
    1.0 LOM FAIL = 'State Deasserted' (Status: Nominal)
    1.0 AUX FAIL = 'State Deasserted' (Status: Nominal)
    Heatsink Pres = 'Entity Present' (Status: Nominal)
    iDRAC6 Ent Pres = 'Entity Absent' (Status: Critical)
    USB Cable Pres = 'Entity Present' (Status: Nominal)
    Riser Presence = 'Entity Present' (Status: Nominal)
    FAN MOD 1A RPM = 3480.000000 (Status: Nominal)
    FAN MOD 2A RPM = 3480.000000 (Status: Nominal)
    FAN MOD 3A RPM = 3480.000000 (Status: Nominal)
    FAN MOD 4A RPM = 3480.000000 (Status: Nominal)
    Presence = 'Entity Present' (Status: Nominal)
    Presence = 'Entity Present' (Status: Nominal)
    Presence = 'Entity Present' (Status: Nominal)
    Presence = 'Entity Present' (Status: Nominal)
    Presence = 'Entity Present' (Status: Nominal)
    Status = 'Processor Presence detected' (Status: Nominal)
    Status = 'Processor Presence detected' (Status: Nominal)
    Status = 'Presence detected' (Status: Nominal)
    Current = 0.400000 (Status: Nominal)
    Current = 0.400000 (Status: Nominal)
    Voltage = 218.000000 (Status: Nominal)
    Voltage = 218.000000 (Status: Nominal)
    Status = 'Presence detected' (Status: Nominal)
    Status = 'Cable/Interconnect is connected' (Status: Nominal)
    OS Watchdog = 'OK' (Status: Nominal)
    Intrusion = 'OK' (Status: Nominal)
    PS Redundancy = 'Fully Redundant' (Status: Nominal)
    Fan Redundancy = 'Fully Redundant' (Status: Nominal)
    System Level = 168.000000 (Status: Nominal)
    Power Optimized = 'Good' (Status: Nominal)
    Drive = 'Drive Presence' (Status: Nominal)
    Cable SAS A = 'Cable/Interconnect is connected' (Status: Nominal)
    Cable SAS B = 'Cable/Interconnect is connected' (Status: Nominal)
    FAN MOD 5A RPM = 3480.000000 (Status: Nominal)
    

    不过根据它的提示(其实插件也是调用如下命令),可以使用

    /usr/sbin/ipmi-sel -h 10.103.1.100 --config-file ipmi.cfg --driver-type=LAN_2_0 --output-event-state --interpret-oem-data --entity-sensor-names
    

    执行结果是:

    # /usr/sbin/ipmi-sel -h 10.103.1.100 --config-file ipmi.cfg --driver-type=LAN_2_0 --output-event-state --interpret-oem-data --entity-sensor-names
    ID  | Date        | Time     | Name                                        | Type                     | State    | Event
    1   | Apr-08-2011 | 06:42:13 | System Board SEL                            | Event Logging Disabled   | Nominal  | Log Area Reset/Cleared
    2   | Jan-01-1970 | 08:00:31 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
    3   | Jan-01-1970 | 08:00:36 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
    4   | Aug-15-2011 | 23:09:53 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    5   | Aug-16-2011 | 11:38:25 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    6   | Aug-16-2011 | 11:38:25 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    7   | Aug-16-2011 | 11:38:55 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    8   | Jun-10-2012 | 22:41:13 | System Board Ambient Temp                   | Temperature              | Warning  | Upper Non-critical - going high ; Sensor Reading = 45.00 C ; Threshold = 45.00 C
    9   | Jun-11-2012 | 02:53:53 | System Board Ambient Temp                   | Temperature              | Nominal  | Upper Non-critical - going high ; Sensor Reading = 43.00 C ; Threshold = 45.00 C
    10  | Nov-05-2012 | 21:56:42 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    11  | Nov-14-2012 | 21:53:58 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    12  | Nov-14-2012 | 21:53:58 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    13  | Nov-14-2012 | 21:54:19 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    14  | Nov-15-2012 | 16:12:03 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    15  | Nov-17-2012 | 17:14:34 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    16  | Nov-17-2012 | 17:14:34 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Critical | Drive Fault ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    17  | Nov-17-2012 | 17:15:40 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    18  | Nov-19-2012 | 20:47:57 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    19  | Nov-19-2012 | 20:50:04 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    20  | Jan-01-1970 | 08:00:33 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
    21  | Jan-01-1970 | 08:00:38 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
    22  | Jun-27-2014 | 17:27:38 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    23  | Jun-27-2014 | 17:27:53 | Disk Drive Bay 1 Drive 2                    | Drive Slot               | Nominal  | Drive Presence ; OEM Event Data2 code = 01h ; OEM Event Data3 code = 02h
    24  | Jan-01-1970 | 08:00:31 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
    25  | Jan-01-1970 | 08:00:36 | System Board Intrusion                      | Physical Security        | Critical | General Chassis Intrusion ; Intrusion while system Off
    26  | Oct-31-2016 | 05:48:35 | System Board Ambient Temp                   | Temperature              | Warning  | Lower Non-critical - going low ; Sensor Reading = 8.00 C ; Threshold = 8.00 C
    27  | Oct-31-2016 | 09:00:38 | System Board Ambient Temp                   | Temperature              | Nominal  | Lower Non-critical - going low ; Sensor Reading = 10.00 C ; Threshold = 8.00 C
    

    5.3编写Zabbix外部检查(External checks)脚本

    [root@HN-zabbix-proxy13 externalscripts]# pwd
    /usr/local/zabbix/share/zabbix/externalscripts
    [root@HN-zabbix-proxy13 externalscripts]# cat check_ipmi 
    

    下面是脚本内容

    #!/bin/bash
    #用于检测ipmi相关信息
    #Create on 2016-011-18
    #@author: Chinge_Yang
    
    args="$*"
    echo $(date +%F-%T) $args >> /tmp/check_ipmi.debug
    
    check_ipmi_dir=/usr/local/zabbix/shell/check_ipmi_sensor
    check_ipmi_bin=$check_ipmi_dir/check_ipmi_sensor
    
    ipmi_sensors=/usr/sbin/ipmi-sensors
    ipmi_cfg=$check_ipmi_dir/ipmi.cfg
    
    #$check_ipmi_bin -f $ipmi_cfg -v $args
    #${ipmi_sel} $args --config-file $ipmi_cfg --driver-type=LAN_2_0 --output-event-state --interpret-oem-data --entity-sensor-names 
    options="--quiet-cache --sdr-cache-recreate --interpret-oem-data --output-sensor-state --ignore-not-available-sensors --driver-type=LAN_2_0 --output-sensor-thresholds"
    
    function usage(){
        echo "Usage: `basename $0` options (-h HOST|-n NAME)"
    }
    
    function check(){
        $ipmi_sensors -h $host --config-file $ipmi_cfg $options|grep "$name"|awk -F"| " '{print $NF}'
    }
    
    if [ $# -lt 4 ]  
    then
        usage
        exit 55     
    fi  
    
    # 用法: scriptname -options
    # 注意: 必须使用破折号 (-) 
    # 参数后接冒号,表示必须接值
    while getopts ":h:n:" Option;do
      case $Option in
        h)
        host=$OPTARG
        ;;
        n)
        name=$OPTARG
        ;;
        *)
        usage
        ;;   # 默认情况的处理
      esac
    done
    
    shift $(($OPTIND - 1))
    #  (译者注: shift命令是可以带参数的, 参数就是移动的个数)
    #  将参数指针减1, 这样它将指向下一个参数.
    #  $1 现在引用的是命令行上的第一个非选项参数,
    #+ 如果有一个这样的参数存在的话.
    
    check
    
    exit 0
    

    添加执行权限

    chmod a+x check_ipmi
    

    5.4新建自定义模板

    这里就不详细介绍内容了,其实就是改改上文中的模板而来,一张图看完内容:


    Paste_Image.png

    给2张图看看效果:

    Paste_Image.png Paste_Image.png
    好吧,最后发现,就算是自定义脚本,仍然是获取数据艰难,脚本执行ipmi的命令都timeout。。。。
    参考资料:
    http://pengyao.org/zabbix-monitor-ipmi-1.html
    http://zh.community.dell.com/techcenter/w/techcenter_wiki/189.idrac-7
    http://www.weibo.com/p/1001603921723593500304
    http://www.thomas-krenn.com/en/wiki/IPMI_Sensor_Monitoring_Plugin

    相关文章

      网友评论

        本文标题:基于Zabbix IPMI监控服务器硬件状况

        本文链接:https://www.haomeiwen.com/subject/ttyhpttx.html