1.应用场景描述

在目前公司的业务中,有部分ESB架构用ZooKeeper作为协同服务的场景,做好ZooKeeper的监控很重要。

2.ZooKeeper监控要点

系统监控

内存使用量    ZooKeeper应当完全运行在内存中,不能使用到SWAP。Java Heap大小不能超过可用内存。
Swap使用量    使用Swap会降低ZooKeeper的性能,设置vm.swappiness = 0
网络带宽占用   如果发现ZooKeeper性能降低关注下网络带宽占用情况和丢包情况,通常情况下ZooKeeper是20%写入80%读入
磁盘使用量    ZooKeeper数据目录使用情况需要注意
磁盘I/O      ZooKeeper的磁盘写入是异步的,所以不会存在很大的I/O请求,如果ZooKeeper和其他I/O密集型服务公用应该关注下磁盘I/O情况
ZooKeeper监控
zk_avg/min/max_latency    响应一个客户端请求的时间,建议这个时间大于10个Tick就报警
zk_outstanding_requests        排队请求的数量,当ZooKeeper超过了它的处理能力时,这个值会增大,建议设置报警阀值为10
zk_packets_received      接收到客户端请求的包数量
zk_packets_sent        发送给客户单的包数量,主要是响应和通知
zk_max_file_descriptor_count   最大允许打开的文件数,由ulimit控制
zk_open_file_descriptor_count    打开文件数量,当这个值大于允许值得85%时报警
Mode                运行的角色,如果没有加入集群就是standalone,加入集群式follower或者leader
zk_followers          leader角色才会有这个输出,集合中follower的个数。正常的值应该是集合成员的数量减1
zk_pending_syncs       leader角色才会有这个输出,pending syncs的数量
zk_znode_count         znodes的数量
zk_watch_count         watches的数量
Java Heap Size         ZooKeeper Java进程的

3.在各节点包括leader和follower上配置

zook集群的配置请参考:

http://blog.csdn.net/reblue520/article/details/52279486

 

监控原理描述

安装依赖包
yum install -y nc
yum install -y zabbix-sender
echo ruok|nc 127.0.0.1 2181
imok
echo mntr|nc 127.0.0.1 2181
zk_version3.4.6-1569965, built on 02/20/2014 09:09 GMT
zk_avg_latency0
zk_max_latency6
zk_min_latency0
zk_packets_received93114
zk_packets_sent93113
zk_num_alive_connections4
zk_outstanding_requests0
zk_server_stateleader
zk_znode_count29
zk_watch_count0
zk_ephemerals_count14
zk_approximate_data_size1087
zk_open_file_descriptor_count39
zk_max_file_descriptor_count1000000
zk_followers4
zk_synced_followers4
zk_pending_syncs0
echo srvr|nc 127.0.0.1 2181
Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
Latency min/avg/max: 0/0/6
Received: 93121
Sent: 93120
Connections: 4
Outstanding: 0
Zxid: 0x900000020
Mode: leader
Node count: 29

4.编写Zabbix监控ZooKeeper的脚本和配置文件

要让Zabbix收集到这些监控数据,有两种方法一种是每个监控项目通过zabbix agent单独获取,主动监控和被动监控都可以。还有一种方法就是将这些监控数据一次性使用zabbix_sender全部发送给zabbix。这里我们选择第二种方式。那么采用zabbix_sender一次性发送全部监控数据的脚本就不能像通过zabbix agent这样逐个获取监控项目来编写脚本。

首先想办法将监控项目汇集成一个字典,然后遍历这个字典,将字典中的key:value对通过zabbix_sender的-k和-o参数指定发送出去
echo mntr|nc 127.0.0.1 2181
这条命令可以使用Python的subprocess模块调用,也可以使用socket模块去访问2181端口然后发送命令获取数据,获取到mntr执行的数据后还需要将其转化成为字典数据
即需要将这种样式的数据
zk_version3.4.6-1569965, built on 02/20/2014 09:09 GMT
zk_avg_latency0
zk_max_latency0
zk_min_latency0
zk_packets_received5
zk_packets_sent4
zk_num_alive_connections1
zk_outstanding_requests0
zk_server_statestandalone
zk_znode_count4
zk_watch_count0
zk_ephemerals_count0
zk_approximate_data_size27
zk_open_file_descriptor_count23
zk_max_file_descriptor_count4096
详细代码如下:
vim /usr/local/zabbix-agent/scripts/check_zookeeper.py

#!/usr/bin/python """ Check Zookeeper Cluster zookeeper version should be newer than 3.4.x # echo mntr|nc 127.0.0.1 2181zk_version  3.4.6-1569965, built on 02/20/2014 09:09 GMTzk_avg_latency  0zk_max_latency  4zk_min_latency  0zk_packets_received 84467zk_packets_sent 84466zk_num_alive_connections    3zk_outstanding_requests 0zk_server_state followerzk_znode_count  17159zk_watch_count  2zk_ephemerals_count 1zk_approximate_data_size    6666471zk_open_file_descriptor_count   29zk_max_file_descriptor_count    102400 # echo ruok|nc 127.0.0.1 2181imok """import sysimport socketimport reimport subprocessfrom StringIO import StringIOimport os  zabbix_sender = '/usr/bin/zabbix_sender'zabbix_conf = '/etc/zabbix/zabbix_agentd.conf'send_to_zabbix = 1  ############# get zookeeper server statusclass ZooKeeperServer(object):     def __init__(self, host='localhost', port='2181', timeout=1):        self._address = (host, int(port))        self._timeout = timeout        self._result  = {}     def _create_socket(self):        return socket.socket()      def _send_cmd(self, cmd):        """ Send a 4letter word command to the server """        s = self._create_socket()        s.settimeout(self._timeout)         s.connect(self._address)        s.send(cmd)         data = s.recv(2048)        s.close()         return data     def get_stats(self):        """ Get ZooKeeper server stats as a map """        data_mntr = self._send_cmd('mntr')        data_ruok = self._send_cmd('ruok')        if data_mntr:            result_mntr = self._parse(data_mntr)        if data_ruok:            result_ruok = self._parse_ruok(data_ruok)         self._result = dict(result_mntr.items() + result_ruok.items())                 if not self._result.has_key('zk_followers') and not self._result.has_key('zk_synced_followers') and not self._result.has_key('zk_pending_syncs'):            ##### the tree metrics only exposed on leader role zookeeper server, we just set the followers' to 0           leader_only = {'zk_followers':0,'zk_synced_followers':0,'zk_pending_syncs':0}               self._result = dict(result_mntr.items() + result_ruok.items() + leader_only.items() )         return self._result         def _parse(self, data):        """ Parse the output from the 'mntr' 4letter word command """        h = StringIO(data)                 result = {}        for line in h.readlines():            try:                key, value = self._parse_line(line)                result[key] = value            except ValueError:                pass # ignore broken lines         return result     def _parse_ruok(self, data):        """ Parse the output from the 'ruok' 4letter word command """                h = StringIO(data)                result = {}                ruok = h.readline()        if ruok:           result['zk_server_ruok'] = ruok          return result        def _parse_line(self, line):        try:            key, value = map(str.strip, line.split('\t'))        except ValueError:            raise ValueError('Found invalid line: %s' % line)         if not key:            raise ValueError('The key is mandatory and should not be empty')         try:            value = int(value)        except (TypeError, ValueError):            pass         return key, value       def get_pid(self):#  ps -ef|grep java|grep zookeeper|awk '{print $2}'         pidarg = '''ps -ef|grep java|grep zookeeper|grep -v grep|awk '{print $2}' '''          pidout = subprocess.Popen(pidarg,shell=True,stdout=subprocess.PIPE)         pid = pidout.stdout.readline().strip('\n')         return pid      def send_to_zabbix(self, metric):         key = "zookeeper.status[" +  metric + "]"          if send_to_zabbix > 0:             #print key + ":" + str(self._result[metric])             try:                 subprocess.call([zabbix_sender, "-c", zabbix_conf, "-k", key, "-o", str(self._result[metric]) ], stdout=FNULL, stderr=FNULL, shell=False)             except OSError, detail:                print "Something went wrong while exectuting zabbix_sender : ", detail         else:                print "Simulation: the following command would be execucted :\n", zabbix_sender, "-c", zabbix_conf, "-k", key, "-o", self._result[metric], "\n"    def usage():        """Display program usage"""         print "\nUsage : ", sys.argv[0], " alive|all"        print "Modes : \n\talive : Return pid of running zookeeper\n\tall : Send zookeeper stats as well"        sys.exit(1)   accepted_modes = ['alive', 'all'] if len(sys.argv) == 2 and sys.argv[1] in accepted_modes:        mode = sys.argv[1]else:        usage()    zk = ZooKeeperServer()#  print zk.get_stats()pid = zk.get_pid() if pid != "" and  mode == 'all':   zk.get_stats()   # print zk._result   FNULL = open(os.devnull, 'w')   for key in zk._result:       zk.send_to_zabbix(key)   FNULL.close()   print pid elif pid != "" and mode == "alive":    print pidelse:    print 0

增加脚本可执行权限
chmod +x  /usr/local/zabbix-agent/scripts/check_zookeeper.py

zabbix配置文件

vim /etc/zabbix/zabbix_agentd.d/check_zookeeper.conf

UserParameter=zookeeper.status[*],/usr/bin/python /usr/local/zabbix-agent/scripts/check_zookeeper.py $1
重新启动zabbix-agent服务
service zabbix-agent restart

4.制作Zabbix监控ZooKeeper的模板并设置报警阀值

zookeeper.xml

    
2.0
    
2016-02-27T15:15:09Z
    
        
            
Templates
        
    
    
        
    
    
        
            
{Template ZooKeeper:zookeeper.status[zk_outstanding_requests].last()}>10
            
big outstanding requests number
            
            
0
            
4
            
            
0
            
        
        
            
{Template ZooKeeper:zookeeper.status[zk_pending_syncs].last()}>10
            
big pending syncs
            
            
0
            
4
            
            
0
            
        
        
            
{Template ZooKeeper:zookeeper.status[zk_avg_latency].last()}>10
            
large average latency
            
            
0
            
4
            
            
0
            
        
        
            
{Template ZooKeeper:zookeeper.status[zk_open_file_descriptor_count].last()} > {Template ZooKeeper:zookeeper.status[zk_max_file_descriptor_count].last()}*0.85
            
large file descriptor used
            
            
0
            
4
            
            
0
            
        
        
            
{Template ZooKeeper:zookeeper.status[zk_server_ruok].str(imok)}<>1
            
zookeeper is abnormal
            
            
0
            
4
            
            
0
            
        
        
            
{Template ZooKeeper:zookeeper.status[all].last()}=0
            
zookeeper is not running
            
            
0
            
4
            
            
0
            
        
        
            
{Template ZooKeeper:zookeeper.status[zk_server_state].abschange()}>0
            
zookeeper state role has been changed
            
            
0
            
2
            
            
0
            
        
    
    
        
            
ZooKeeper Alive Connections
            
900
            
200
            
0.0000
            
100.0000
            
1
            
1
            
1
            
1
            
0
            
0.0000
            
0.0000
            
0
            
0
            
0
            
0
            
                
                    
0
                    
0
                    
00DDDD
                    
0
                    
2
                    
0
                    
                        
Template ZooKeeper
                        
zookeeper.status[zk_num_alive_connections]
                    
                
            
        
        
            
ZooKeeper Data Size
            
900
            
200
            
0.0000
            
100.0000
            
1
            
1
            
1
            
1
            
0
            
0.0000
            
0.0000
            
0
            
0
            
0
            
0
            
                
                    
0
                    
0
                    
00C800
                    
0
                    
2
                    
0
                    
                        
Template ZooKeeper
                        
zookeeper.status[zk_approximate_data_size]
                    
                
            
        
        
            
ZooKeeper Latency
            
900
            
200
            
0.0000
            
100.0000
            
1
            
1
            
0
            
1
            
0
            
0.0000
            
0.0000
            
0
            
0
            
0
            
0
            
                
                    
0
                    
2
                    
00C800
                    
0
                    
2
                    
0
                    
                        
Template ZooKeeper
                        
zookeeper.status[zk_avg_latency]
                    
                
                
                    
1
                    
2
                    
C80000
                    
0
                    
2
                    
0
                    
                        
Template ZooKeeper
                        
zookeeper.status[zk_min_latency]
                    
                
                
                    
2
                    
2
                    
0000C8
                    
0
                    
2
                    
0
                    
                        
Template ZooKeeper
                        
zookeeper.status[zk_max_latency]
                    
                
            
        
        
            
ZooKeeper Packages Received/Sent
            
900
            
200
            
0.0000
            
100.0000
            
1
            
1
            
1
            
1
            
0
            
0.0000
            
0.0000
            
0
            
0
            
0
            
0
            
                
                    
1
                    
0
                    
00C800
                    
0
                    
2
                    
0
                    
                        
Template ZooKeeper
                        
zookeeper.status[zk_packets_received]
                    
                
                
                    
0
                    
0
                    
FF3333
                    
0
                    
2
                    
0
                    
                        
Template ZooKeeper
                        
zookeeper.status[zk_packets_sent]
                    
                
            
        
        
            
ZooKeeper Watches Count
            
900
            
200
            
0.0000
            
100.0000
            
1
            
1
            
1
            
1
            
0
            
0.0000
            
0.0000
            
0
            
0
            
0
            
0
            
                
                    
0
                    
0
                    
660066
                    
0
                    
2
                    
0
                    
                        
Template ZooKeeper
                        
zookeeper.status[zk_watch_count]
                    
                
            
        
        
            
ZooKeeper Znodes Count
            
900
            
200
            
0.0000
            
100.0000
            
1
            
1
            
0
            
1
            
0
            
0.0000
            
0.0000
            
0
            
0
            
0
            
0
            
                
                    
0
                    
1
                    
FFCCFF
                    
0
                    
2
                    
0
                    
                        
Template ZooKeeper
                        
zookeeper.status[zk_znode_count]
                    
                
            
        
    

 

问题:

部署上去后发现不出图,查看zoo.cfg文件,发现服务器是通过内外dns解析域名访问的,于是将/usr/local/zabbix-agent/scripts/check_zookeeper.py中监听的localhost改为对应的域名

并且将zabbix-server端监听方式改为dns,重启客户端后,终于出图了

 

转载至:http://www.linuxidc.com/Linux/2016-11/137638p10.htm