MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebook公司)开发,
是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性。目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。官方介绍:https://code.google.com/p/mysql-master-ha/下图展示了如何通过MHA Manager管理多组主从复制。可以将MHA工作原理总结为如下本次环境规划如下 (centos6.7)
1、配置三台服务器ssh互信ssh-keygen -t rsa 一路回车即可Generating public/private rsa key pair.Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa.Your public key has been saved in /root/.ssh/id_rsa.pub.The key fingerprint is:c7:2e:ca:e2:c2:3b:30:63:97:b4:62:81:dd:27:e3:f9 root@centos02The key's randomart p_w_picpath is:+--[ RSA 2048]----+| || ||.. . ||....+ . . || o.o= S o ||++ +o o ||o=o . . . || + ..E. . || .=..o |+-----------------+[root@ansible mysql]#ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.16.80.117[root@ansible mysql]#ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.16.80.128[root@ansible mysql]# ssh-copy-id -i /root/.ssh/id_rsa.pub 172.16.80.127The authenticity of host '172.16.80.127 (172.16.80.127)' can't be established.RSA key fingerprint is 05:89:5e:3d:2a:c1:ae:90:27:d9:a5:48:4a:ab:b9:79.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added '172.16.80.127' (RSA) to the list of known hosts.root@172.16.80.127's password: Now try logging into the machine, with "ssh '172.16.80.127'", and check in: .ssh/authorized_keysto make sure we haven't added extra keys that you weren't expecting.测试[root@ansible mysql]# ssh 172.16.80.117 ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:0C:29:45:FE:30 inet addr:172.16.80.117 Bcast:172.16.80.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe45:fe30/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1220176 errors:0 dropped:0 overruns:0 frame:0 TX packets:980887 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1198343068 (1.1 GiB) TX bytes:1318688106 (1.2 GiB)[root@ansible mysql]# ssh 172.16.80.127 ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:0C:29:FF:58:D9 inet addr:172.16.80.127 Bcast:172.16.80.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:feff:58d9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:162129 errors:0 dropped:0 overruns:0 frame:0 TX packets:27546 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:225287420 (214.8 MiB) TX bytes:1921228 (1.8 MiB) 2、三节点配置epel的yum源,安装相关依赖包rpm -Uvh rpm --import/etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6 yum -y install perl-DBD-MySQL ncftp三台mysql服务器的配置文件master 172.16.80.117server-id = 1read-only=1log-bin=mysql-binrelay-log = mysql-relay-binreplicate-wild-ignore-table=mysql.%replicate-wild-ignore-table=test.%replicate-wild-ignore-table=information_schema.%salve 172.16.80.127server-id = 2read-only=1log-bin=mysql-binrelay-log = mysql-relay-binreplicate-wild-ignore-table=mysql.%replicate-wild-ignore-table=test.%replicate-wild-ignore-table=information_schema.%salve-manager 172.16.80.128server-id = 3read-only=1log-bin=mysql-binrelay-log = mysql-relay-binreplicate-wild-ignore-table=mysql.%replicate-wild-ignore-table=test.%replicate-wild-ignore-table=information_schema.%在3个mysql节点做授权配置mysql> grant replication slave on *.* to 'martin'@'172.16.80.%' identified by '123456';Query OK, 0 rows affected (0.05 sec)mysql> grant all on *.* to 'root'@'172.16.80.%' identified by '123456';Query OK, 0 rows affected (0.00 sec)
查看主节点上的日志状态mysql> show master status;mysql> show master status;+------------------+----------+--------------+------------------+| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |+------------------+----------+--------------+------------------+| mysql-bin.000001 | 107 | | |+------------------+----------+--------------+------------------+1 row in set (0.01 sec)
3、在两个从节点上面执行如下操作change master to \master_host='172.16.80.117',\master_user='martin',\master_password='123456',\master_log_file='mysql-bin.000001',\master_log_pos=107;mysql> start slave;Query OK, 0 rows affected (0.06 sec)mysql> show slave status \G; 可以看到主从同步状态正常
4、安装MHA软件MHA提供了源码和rpm包两种安装方式,如果是rpm包安装,方式如下:1)在三个节点依次安装MHA的node[root@ansible tools]# rpm -ivh mha4mysql-node-0.56-0.el6.noarch.rpm Preparing...########################################### [100%] 1:mha4mysql-node ########################################### [100%]2)最后在Slave/MHA Manager节点安装mha4mysql-manage:yum install perl-Parallel-ForkManager perl-Time-HiRes \ perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch \ perl-Parallel-ForkManagerperl-Config-IniFilesperl-Time-HiRes[root@ansible tools]# rpm -ivh mha4mysql-manager-0.56-0.el6.noarch.rpmPreparing...########################################### [100%] 1:mha4mysql-manager ########################################### [100%][root@ansible tools]# mkdir -p /etc/mha/scripts
MHA 配置文件如下[root@ansible etc]# cat masterha_default.cnf [server default]user=rootpassword=123456ssh_user=rootrepl_user=martinrepl_password=123456ping_interval=1secondary_check_script = masterha_secondary_check -s 172.16.80.117 -s 172.16.80.127 --user=repl_user --master_host=centos02 --master_ip=172.16.80.117 --master_port=3306master_ip_failover_script="/etc/mha/scripts/master_ip_failover"report_script="/etc/mha/scripts/send_report"[root@ansible mha]# cat app1.cnf [server default]manager_log=/var/log/mha/app1/manager.logmanager_workdir=/var/log/mha/app1[server1]candidate_master=1hostname=172.16.80.117master_binlog_dir="/application/mysql/data"[server2]candidate_master=1hostname=172.16.80.127master_binlog_dir="/application/mysql/data"check_repl_delay=0[server3]hostname=172.16.80.128master_binlog_dir="/application/mysql/data"no_master=1
1、通过masterha_check_ssh验证ssh信任登录是否成功,[root@ansible scripts]# masterha_check_ssh --conf=/etc/mha/app1.cnf Thu Aug 11 19:29:03 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..Thu Aug 11 19:29:03 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf..Thu Aug 11 19:29:03 2016 - [info] Reading server configuration from /etc/mha/app1.cnf..Thu Aug 11 19:29:03 2016 - [info] Starting SSH connection tests..Thu Aug 11 19:29:04 2016 - [debug] Thu Aug 11 19:29:03 2016 - [debug] Connecting via SSH from root@172.16.80.117(172.16.80.117:22) to root@172.16.80.127(172.16.80.127:22)..Thu Aug 11 19:29:03 2016 - [debug] ok.Thu Aug 11 19:29:03 2016 - [debug] Connecting via SSH from root@172.16.80.117(172.16.80.117:22) to root@172.16.80.128(172.16.80.128:22)..Thu Aug 11 19:29:04 2016 - [debug] ok.Thu Aug 11 19:29:04 2016 - [debug] Thu Aug 11 19:29:03 2016 - [debug] Connecting via SSH from root@172.16.80.127(172.16.80.127:22) to root@172.16.80.117(172.16.80.117:22)..Thu Aug 11 19:29:04 2016 - [debug] ok.Thu Aug 11 19:29:04 2016 - [debug] Connecting via SSH from root@172.16.80.127(172.16.80.127:22) to root@172.16.80.128(172.16.80.128:22)..Thu Aug 11 19:29:04 2016 - [debug] ok.Thu Aug 11 19:29:04 2016 - [debug] Thu Aug 11 19:29:04 2016 - [debug] Connecting via SSH from root@172.16.80.128(172.16.80.128:22) to root@172.16.80.117(172.16.80.117:22)..Thu Aug 11 19:29:04 2016 - [debug] ok.Thu Aug 11 19:29:04 2016 - [debug] Connecting via SSH from root@172.16.80.128(172.16.80.128:22) to root@172.16.80.127(172.16.80.127:22)..Thu Aug 11 19:29:04 2016 - [debug] ok.Thu Aug 11 19:29:04 2016 - [info] All SSH connection tests passed successfully.2、masterha_check_repl验证mysql复制是否成功masterha_check_repl --conf=/etc/mha/app1.cnf[root@ansible scripts]# masterha_check_repl --conf=/etc/mha/app1.cnfThu Aug 11 19:31:53 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..Thu Aug 11 19:31:53 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf..Thu Aug 11 19:31:53 2016 - [info] Reading server configuration from /etc/mha/app1.cnf..Thu Aug 11 19:31:53 2016 - [info] MHA::MasterMonitor version 0.56.Thu Aug 11 19:31:54 2016 - [info] GTID failover mode = 0Thu Aug 11 19:31:54 2016 - [info] Dead Servers:Thu Aug 11 19:31:54 2016 - [info] Alive Servers:Thu Aug 11 19:31:54 2016 - [info] 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:31:54 2016 - [info] 172.16.80.127(172.16.80.127:3306)Thu Aug 11 19:31:54 2016 - [info] 172.16.80.128(172.16.80.128:3306)Thu Aug 11 19:31:54 2016 - [info] Alive Slaves:Thu Aug 11 19:31:54 2016 - [info] 172.16.80.127(172.16.80.127:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledThu Aug 11 19:31:54 2016 - [info] Replicating from 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:31:54 2016 - [info] Primary candidate for the new Master (candidate_master is set)Thu Aug 11 19:31:54 2016 - [info] 172.16.80.128(172.16.80.128:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledThu Aug 11 19:31:54 2016 - [info] Replicating from 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:31:54 2016 - [info] Not candidate for the new Master (no_master is set)Thu Aug 11 19:31:54 2016 - [info] Current Alive Master: 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:31:54 2016 - [info] Checking slave configurations..Thu Aug 11 19:31:54 2016 - [warning] relay_log_purge=0 is not set on slave 172.16.80.127(172.16.80.127:3306).Thu Aug 11 19:31:54 2016 - [warning] relay_log_purge=0 is not set on slave 172.16.80.128(172.16.80.128:3306).Thu Aug 11 19:31:54 2016 - [info] Checking replication filtering settings..Thu Aug 11 19:31:54 2016 - [info] binlog_do_db= , binlog_ignore_db= Thu Aug 11 19:31:54 2016 - [info] Replication filtering check ok.Thu Aug 11 19:31:54 2016 - [info] GTID (with auto-pos) is not supportedThu Aug 11 19:31:54 2016 - [info] Starting SSH connection tests..Thu Aug 11 19:31:56 2016 - [info] All SSH connection tests passed successfully.Thu Aug 11 19:31:56 2016 - [info] Checking MHA Node version..Thu Aug 11 19:31:56 2016 - [info] Version check ok.Thu Aug 11 19:31:56 2016 - [info] Checking SSH publickey authentication settings on the current master..Thu Aug 11 19:31:57 2016 - [info] HealthCheck: SSH to 172.16.80.117 is reachable.Thu Aug 11 19:31:57 2016 - [info] Master MHA Node version is 0.56.Thu Aug 11 19:31:57 2016 - [info] Checking recovery script configurations on 172.16.80.117(172.16.80.117:3306)..Thu Aug 11 19:31:57 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/application/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000001 Thu Aug 11 19:31:57 2016 - [info] Connecting to root@172.16.80.117(172.16.80.117:22).. Creating /var/tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /application/mysql/data, up to mysql-bin.000001Thu Aug 11 19:31:57 2016 - [info] Binlog setting check done.Thu Aug 11 19:31:57 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..Thu Aug 11 19:31:57 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=172.16.80.127 --slave_ip=172.16.80.127 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.49-log --manager_version=0.56 --relay_log_info=/application/mysql/data/relay-log.info --relay_dir=/application/mysql/data/ --slave_pass=xxxThu Aug 11 19:31:57 2016 - [info] Connecting to root@172.16.80.127(172.16.80.127:22).. Checking slave recovery environment settings.. Opening /application/mysql/data/relay-log.info ... ok. Relay log found at /application/mysql/data, up to mysql-relay-bin.000002 Temporary relay log file is /application/mysql/data/mysql-relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done.Thu Aug 11 19:31:58 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=172.16.80.128 --slave_ip=172.16.80.128 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.49-log --manager_version=0.56 --relay_log_info=/application/mysql/data/relay-log.info --relay_dir=/application/mysql/data/ --slave_pass=xxxThu Aug 11 19:31:58 2016 - [info] Connecting to root@172.16.80.128(172.16.80.128:22).. Checking slave recovery environment settings.. Opening /application/mysql/data/relay-log.info ... ok. Relay log found at /application/mysql/data, up to mysql-relay-bin.000002 Temporary relay log file is /application/mysql/data/mysql-relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done.Thu Aug 11 19:31:58 2016 - [info] Slaves settings check done.Thu Aug 11 19:31:58 2016 - [info] 172.16.80.117(172.16.80.117:3306) (current master) +--172.16.80.127(172.16.80.127:3306) +--172.16.80.128(172.16.80.128:3306)Thu Aug 11 19:31:58 2016 - [info] Checking replication health on 172.16.80.127..Thu Aug 11 19:31:58 2016 - [info] ok.Thu Aug 11 19:31:58 2016 - [info] Checking replication health on 172.16.80.128..Thu Aug 11 19:31:58 2016 - [info] ok.Thu Aug 11 19:31:58 2016 - [info] Checking master_ip_failover_script status:Thu Aug 11 19:31:58 2016 - [info] /etc/mha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.80.117 --orig_master_ip=172.16.80.117 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.16.80.200/24===Checking the Status of the script.. OK Thu Aug 11 19:31:58 2016 - [info] OK.Thu Aug 11 19:31:58 2016 - [warning] shutdown_script is not defined.Thu Aug 11 19:31:58 2016 - [info] Got exit code 0 (Not master dead).MySQL Replication Health is OK.准备failover脚本用于vip切换[root@ansible ~]# cat /etc/mha/scripts/master_ip_failover#!/usr/bin/env perluse strict;use warnings FATAL => 'all';use Getopt::Long;my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port);my $vip = '172.16.80.200/24';my $key = '1';my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port,);exit &main();sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; exit 0; } else { &usage(); exit 1; }}sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;}sub stop_vip() { return 0 unless ($ssh_user); `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;}sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";}启动MHA先执行如下命令:/sbin/ifconfig eth0:1 172.16.80.200(只需第一次添加)将vip绑定到目前的master上。然后通过masterha_manager启动MHA监控:[root@ansible scripts]# mkdir /var/log//masterha/app1 -p[root@ansible scripts]# touch /var/log/masterha/app1/manager.log[root@ansible scripts]# nohup masterha_manager --conf=/etc/mha/app1.cnf \--remove_dead_master_conf --ignore_last_failover< /dev/null > \ /var/log/masterha/app1/manager.log 2>&1 &然后通过masterha_check_status查看MHA状态[root@ansible scripts]# masterha_check_status --conf=/etc/mha/app1.cnf app1 (pid:58184) is running(0:PING_OK), master:172.16.80.117模拟主库 172.16.80.117 数据库挂掉[root@centos02 .ssh]# /etc/init.d/mysqld stopShutting down MySQL................ [ OK ]
看看failover过程中的日志记录情况Checking the Status of the script.. OK Thu Aug 11 19:40:45 2016 - [info] OK.Thu Aug 11 19:40:45 2016 - [warning] shutdown_script is not defined.Thu Aug 11 19:40:45 2016 - [info] Set master ping interval 1 seconds.Thu Aug 11 19:40:45 2016 - [info] Set secondary check script: masterha_secondary_check -s 172.16.80.117 -s 172.16.80.127 --user=repl_user --master_host=centos02 --master_ip=172.16.80.117 --master_port=3306Thu Aug 11 19:40:45 2016 - [info] Starting ping health check on 172.16.80.117(172.16.80.117:3306)..Thu Aug 11 19:40:45 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..Thu Aug 11 19:42:36 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)Thu Aug 11 19:42:36 2016 - [info] Executing secondary network check script: masterha_secondary_check -s 172.16.80.117 -s 172.16.80.127 --user=repl_user --master_host=centos02 --master_ip=172.16.80.117 --master_port=3306 --user=root --master_host=172.16.80.117 --master_ip=172.16.80.117 --master_port=3306 --master_user=root --master_password=123456 --ping_type=SELECTThu Aug 11 19:42:36 2016 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/application/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysql-binThu Aug 11 19:42:37 2016 - [info] HealthCheck: SSH to 172.16.80.117 is reachable.Monitoring server 172.16.80.117 is reachable, Master is not reachable from 172.16.80.117. OK.Thu Aug 11 19:42:37 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)Thu Aug 11 19:42:37 2016 - [warning] Connection failed 2 time(s)..Monitoring server 172.16.80.127 is reachable, Master is not reachable from 172.16.80.127. OK.Thu Aug 11 19:42:38 2016 - [info] Master is not reachable from all other monitoring servers. Failover should start.Thu Aug 11 19:42:38 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)Thu Aug 11 19:42:38 2016 - [warning] Connection failed 3 time(s)..Thu Aug 11 19:42:39 2016 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)Thu Aug 11 19:42:39 2016 - [warning] Connection failed 4 time(s)..Thu Aug 11 19:42:39 2016 - [warning] Master is not reachable from health checker!Thu Aug 11 19:42:39 2016 - [warning] Master 172.16.80.117(172.16.80.117:3306) is not reachable!Thu Aug 11 19:42:39 2016 - [warning] SSH is reachable.Thu Aug 11 19:42:39 2016 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and trying to connect to all servers to check server status..Thu Aug 11 19:42:39 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..Thu Aug 11 19:42:39 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf..Thu Aug 11 19:42:39 2016 - [info] Reading server configuration from /etc/mha/app1.cnf..Thu Aug 11 19:42:39 2016 - [info] GTID failover mode = 0Thu Aug 11 19:42:39 2016 - [info] Dead Servers:Thu Aug 11 19:42:39 2016 - [info] 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:42:39 2016 - [info] Alive Servers:Thu Aug 11 19:42:39 2016 - [info] 172.16.80.127(172.16.80.127:3306)Thu Aug 11 19:42:39 2016 - [info] 172.16.80.128(172.16.80.128:3306)Thu Aug 11 19:42:39 2016 - [info] Alive Slaves:Thu Aug 11 19:42:39 2016 - [info] 172.16.80.127(172.16.80.127:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledThu Aug 11 19:42:39 2016 - [info] Replicating from 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:42:39 2016 - [info] Primary candidate for the new Master (candidate_master is set)Thu Aug 11 19:42:39 2016 - [info] 172.16.80.128(172.16.80.128:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledThu Aug 11 19:42:39 2016 - [info] Replicating from 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:42:39 2016 - [info] Not candidate for the new Master (no_master is set)Thu Aug 11 19:42:39 2016 - [info] Checking slave configurations..Thu Aug 11 19:42:39 2016 - [warning] relay_log_purge=0 is not set on slave 172.16.80.127(172.16.80.127:3306).Thu Aug 11 19:42:39 2016 - [warning] relay_log_purge=0 is not set on slave 172.16.80.128(172.16.80.128:3306).Thu Aug 11 19:42:39 2016 - [info] Checking replication filtering settings..Thu Aug 11 19:42:39 2016 - [info] Replication filtering check ok.Thu Aug 11 19:42:39 2016 - [info] Master is down!Thu Aug 11 19:42:39 2016 - [info] Terminating monitoring script.Thu Aug 11 19:42:39 2016 - [info] Got exit code 20 (Master dead).Thu Aug 11 19:42:39 2016 - [info] MHA::MasterFailover version 0.56.Thu Aug 11 19:42:39 2016 - [info] Starting master failover.Thu Aug 11 19:42:39 2016 - [info] Thu Aug 11 19:42:39 2016 - [info] * Phase 1: Configuration Check Phase..Thu Aug 11 19:42:39 2016 - [info] Thu Aug 11 19:42:40 2016 - [info] GTID failover mode = 0Thu Aug 11 19:42:40 2016 - [info] Dead Servers:Thu Aug 11 19:42:40 2016 - [info] 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:42:40 2016 - [info] Checking master reachability via MySQL(double check)...Thu Aug 11 19:42:40 2016 - [info] ok.Thu Aug 11 19:42:40 2016 - [info] Alive Servers:Thu Aug 11 19:42:40 2016 - [info] 172.16.80.127(172.16.80.127:3306)Thu Aug 11 19:42:40 2016 - [info] 172.16.80.128(172.16.80.128:3306)Thu Aug 11 19:42:40 2016 - [info] Alive Slaves:Thu Aug 11 19:42:40 2016 - [info] 172.16.80.127(172.16.80.127:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledThu Aug 11 19:42:40 2016 - [info] Replicating from 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:42:40 2016 - [info] Primary candidate for the new Master (candidate_master is set)Thu Aug 11 19:42:40 2016 - [info] 172.16.80.128(172.16.80.128:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledThu Aug 11 19:42:40 2016 - [info] Replicating from 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:42:40 2016 - [info] Not candidate for the new Master (no_master is set)Thu Aug 11 19:42:40 2016 - [info] Starting Non-GTID based failover.Thu Aug 11 19:42:40 2016 - [info] Thu Aug 11 19:42:40 2016 - [info] ** Phase 1: Configuration Check Phase completed.Thu Aug 11 19:42:40 2016 - [info] Thu Aug 11 19:42:40 2016 - [info] * Phase 2: Dead Master Shutdown Phase..Thu Aug 11 19:42:40 2016 - [info] Thu Aug 11 19:42:40 2016 - [info] Forcing shutdown so that applications never connect to the current master..Thu Aug 11 19:42:40 2016 - [info] Executing master IP deactivation script:Thu Aug 11 19:42:40 2016 - [info] /etc/mha/scripts/master_ip_failover --orig_master_host=172.16.80.117 --orig_master_ip=172.16.80.117 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.16.80.200/24===Disabling the VIP on old master: 172.16.80.117 Thu Aug 11 19:42:40 2016 - [info] done.Thu Aug 11 19:42:40 2016 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.Thu Aug 11 19:42:40 2016 - [info] * Phase 2: Dead Master Shutdown Phase completed.Thu Aug 11 19:42:40 2016 - [info] Thu Aug 11 19:42:40 2016 - [info] * Phase 3: Master Recovery Phase..Thu Aug 11 19:42:40 2016 - [info] Thu Aug 11 19:42:40 2016 - [info] * Phase 3.1: Getting Latest Slaves Phase..Thu Aug 11 19:42:40 2016 - [info] Thu Aug 11 19:42:40 2016 - [info] The latest binary log file/position on all slaves is mysql-bin.000001:107Thu Aug 11 19:42:40 2016 - [info] Latest slaves (Slaves that received relay log files to the latest):Thu Aug 11 19:42:40 2016 - [info] 172.16.80.127(172.16.80.127:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledThu Aug 11 19:42:40 2016 - [info] Replicating from 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:42:40 2016 - [info] Primary candidate for the new Master (candidate_master is set)Thu Aug 11 19:42:40 2016 - [info] 172.16.80.128(172.16.80.128:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledThu Aug 11 19:42:40 2016 - [info] Replicating from 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:42:40 2016 - [info] Not candidate for the new Master (no_master is set)Thu Aug 11 19:42:40 2016 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:107Thu Aug 11 19:42:40 2016 - [info] Oldest slaves:Thu Aug 11 19:42:40 2016 - [info] 172.16.80.127(172.16.80.127:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledThu Aug 11 19:42:40 2016 - [info] Replicating from 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:42:40 2016 - [info] Primary candidate for the new Master (candidate_master is set)Thu Aug 11 19:42:40 2016 - [info] 172.16.80.128(172.16.80.128:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledThu Aug 11 19:42:40 2016 - [info] Replicating from 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:42:40 2016 - [info] Not candidate for the new Master (no_master is set)Thu Aug 11 19:42:40 2016 - [info] Thu Aug 11 19:42:40 2016 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..Thu Aug 11 19:42:40 2016 - [info] Thu Aug 11 19:42:41 2016 - [info] Fetching dead master's binary logs..Thu Aug 11 19:42:41 2016 - [info] Executing command on the dead master 172.16.80.117(172.16.80.117:3306): save_binary_logs --command=save --start_file=mysql-bin.000001 --start_pos=107 --binlog_dir=/application/mysql/data --output_file=/var/tmp/saved_master_binlog_from_172.16.80.117_3306_20160811194239.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 Creating /var/tmp if not exists.. ok. Concat binary/relay logs from mysql-bin.000001 pos 107 to mysql-bin.000001 EOF into /var/tmp/saved_master_binlog_from_172.16.80.117_3306_20160811194239.binlog .. Dumping binlog format description event, from position 0 to 107.. ok. Dumping effective binlog data from /application/mysql/data/mysql-bin.000001 position 107 to tail(126).. ok. Concat succeeded.Thu Aug 11 19:42:42 2016 - [info] scp from root@172.16.80.117:/var/tmp/saved_master_binlog_from_172.16.80.117_3306_20160811194239.binlog to local:/var/log/mha/app1/saved_master_binlog_from_172.16.80.117_3306_20160811194239.binlog succeeded.Thu Aug 11 19:42:43 2016 - [info] HealthCheck: SSH to 172.16.80.127 is reachable.Thu Aug 11 19:42:43 2016 - [info] HealthCheck: SSH to 172.16.80.128 is reachable.Thu Aug 11 19:42:43 2016 - [info] Thu Aug 11 19:42:43 2016 - [info] * Phase 3.3: Determining New Master Phase..Thu Aug 11 19:42:43 2016 - [info] Thu Aug 11 19:42:43 2016 - [info] Finding the latest slave that has all relay logs for recovering other slaves..Thu Aug 11 19:42:43 2016 - [info] All slaves received relay logs to the same position. No need to resync each other.Thu Aug 11 19:42:43 2016 - [info] Searching new master from slaves..Thu Aug 11 19:42:43 2016 - [info] Candidate masters from the configuration file:Thu Aug 11 19:42:43 2016 - [info] 172.16.80.127(172.16.80.127:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledThu Aug 11 19:42:43 2016 - [info] Replicating from 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:42:43 2016 - [info] Primary candidate for the new Master (candidate_master is set)Thu Aug 11 19:42:43 2016 - [info] Non-candidate masters:Thu Aug 11 19:42:43 2016 - [info] 172.16.80.128(172.16.80.128:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledThu Aug 11 19:42:43 2016 - [info] Replicating from 172.16.80.117(172.16.80.117:3306)Thu Aug 11 19:42:43 2016 - [info] Not candidate for the new Master (no_master is set)Thu Aug 11 19:42:43 2016 - [info] Searching from candidate_master slaves which have received the latest relay log events..Thu Aug 11 19:42:43 2016 - [info] New master is 172.16.80.127(172.16.80.127:3306)Thu Aug 11 19:42:43 2016 - [info] Starting master failover..Thu Aug 11 19:42:43 2016 - [info] From:172.16.80.117(172.16.80.117:3306) (current master) +--172.16.80.127(172.16.80.127:3306) +--172.16.80.128(172.16.80.128:3306)To:172.16.80.127(172.16.80.127:3306) (new master) +--172.16.80.128(172.16.80.128:3306)Thu Aug 11 19:42:43 2016 - [info] Thu Aug 11 19:42:43 2016 - [info] * Phase 3.3: New Master Diff Log Generation Phase..Thu Aug 11 19:42:43 2016 - [info] Thu Aug 11 19:42:43 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.Thu Aug 11 19:42:43 2016 - [info] Sending binlog..Thu Aug 11 19:42:44 2016 - [info] scp from local:/var/log/mha/app1/saved_master_binlog_from_172.16.80.117_3306_20160811194239.binlog to root@172.16.80.127:/var/tmp/saved_master_binlog_from_172.16.80.117_3306_20160811194239.binlog succeeded.Thu Aug 11 19:42:44 2016 - [info] Thu Aug 11 19:42:44 2016 - [info] * Phase 3.4: Master Log Apply Phase..Thu Aug 11 19:42:44 2016 - [info] Thu Aug 11 19:42:44 2016 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.Thu Aug 11 19:42:44 2016 - [info] Starting recovery on 172.16.80.127(172.16.80.127:3306)..Thu Aug 11 19:42:44 2016 - [info] Generating diffs succeeded.Thu Aug 11 19:42:44 2016 - [info] Waiting until all relay logs are applied.Thu Aug 11 19:42:44 2016 - [info] done.Thu Aug 11 19:42:44 2016 - [info] Getting slave status..Thu Aug 11 19:42:44 2016 - [info] This slave(172.16.80.127)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:107). No need to recover from Exec_Master_Log_Pos.Thu Aug 11 19:42:44 2016 - [info] Connecting to the target slave host 172.16.80.127, running recover script..Thu Aug 11 19:42:44 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=172.16.80.127 --slave_ip=172.16.80.127 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_172.16.80.117_3306_20160811194239.binlog --workdir=/var/tmp --target_version=5.5.49-log --timestamp=20160811194239 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxxThu Aug 11 19:42:46 2016 - [info] Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_172.16.80.117_3306_20160811194239.binlog on 172.16.80.127:3306. This may take long time...Applying log files succeeded.Thu Aug 11 19:42:46 2016 - [info] All relay logs were successfully applied.Thu Aug 11 19:42:46 2016 - [info] Getting new master's binlog name and position..Thu Aug 11 19:42:46 2016 - [info] mysql-bin.000001:245Thu Aug 11 19:42:46 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.80.127', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=245, MASTER_USER='martin', MASTER_PASSWORD='xxx';Thu Aug 11 19:42:46 2016 - [info] Executing master IP activate script:Thu Aug 11 19:42:46 2016 - [info] /etc/mha/scripts/master_ip_failover --command=start --ssh_user=root --orig_master_host=172.16.80.117 --orig_master_ip=172.16.80.117 --orig_master_port=3306 --new_master_host=172.16.80.127 --new_master_ip=172.16.80.127 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' Unknown option: new_master_userUnknown option: new_master_passwordIN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.16.80.200/24===Enabling the VIP - 172.16.80.200/24 on the new master - 172.16.80.127 Thu Aug 11 19:42:47 2016 - [info] OK.Thu Aug 11 19:42:47 2016 - [info] Setting read_only=0 on 172.16.80.127(172.16.80.127:3306)..Thu Aug 11 19:42:47 2016 - [info] ok.Thu Aug 11 19:42:47 2016 - [info] ** Finished master recovery successfully.Thu Aug 11 19:42:47 2016 - [info] * Phase 3: Master Recovery Phase completed.Thu Aug 11 19:42:47 2016 - [info] Thu Aug 11 19:42:47 2016 - [info] * Phase 4: Slaves Recovery Phase..Thu Aug 11 19:42:47 2016 - [info] Thu Aug 11 19:42:47 2016 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..Thu Aug 11 19:42:47 2016 - [info] Thu Aug 11 19:42:47 2016 - [info] -- Slave diff file generation on host 172.16.80.128(172.16.80.128:3306) started, pid: 58855. Check tmp log /var/log/mha/app1/172.16.80.128_3306_20160811194239.log if it takes time..Thu Aug 11 19:42:47 2016 - [info] Thu Aug 11 19:42:47 2016 - [info] Log messages from 172.16.80.128 ...Thu Aug 11 19:42:47 2016 - [info] Thu Aug 11 19:42:47 2016 - [info] This server has all relay logs. No need to generate diff files from the latest slave.Thu Aug 11 19:42:47 2016 - [info] End of log messages from 172.16.80.128.Thu Aug 11 19:42:47 2016 - [info] -- 172.16.80.128(172.16.80.128:3306) has the latest relay log events.Thu Aug 11 19:42:47 2016 - [info] Generating relay diff files from the latest slave succeeded.Thu Aug 11 19:42:47 2016 - [info] Thu Aug 11 19:42:47 2016 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..Thu Aug 11 19:42:47 2016 - [info] Thu Aug 11 19:42:47 2016 - [info] -- Slave recovery on host 172.16.80.128(172.16.80.128:3306) started, pid: 58857. Check tmp log /var/log/mha/app1/172.16.80.128_3306_20160811194239.log if it takes time..Thu Aug 11 19:42:48 2016 - [info] Thu Aug 11 19:42:48 2016 - [info] Log messages from 172.16.80.128 ...Thu Aug 11 19:42:48 2016 - [info] Thu Aug 11 19:42:47 2016 - [info] Sending binlog..Thu Aug 11 19:42:47 2016 - [info] scp from local:/var/log/mha/app1/saved_master_binlog_from_172.16.80.117_3306_20160811194239.binlog to root@172.16.80.128:/var/tmp/saved_master_binlog_from_172.16.80.117_3306_20160811194239.binlog succeeded.Thu Aug 11 19:42:47 2016 - [info] Starting recovery on 172.16.80.128(172.16.80.128:3306)..Thu Aug 11 19:42:47 2016 - [info] Generating diffs succeeded.Thu Aug 11 19:42:47 2016 - [info] Waiting until all relay logs are applied.Thu Aug 11 19:42:47 2016 - [info] done.Thu Aug 11 19:42:47 2016 - [info] Getting slave status..Thu Aug 11 19:42:47 2016 - [info] This slave(172.16.80.128)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:107). No need to recover from Exec_Master_Log_Pos.Thu Aug 11 19:42:47 2016 - [info] Connecting to the target slave host 172.16.80.128, running recover script..Thu Aug 11 19:42:47 2016 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='root' --slave_host=172.16.80.128 --slave_ip=172.16.80.128 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_172.16.80.117_3306_20160811194239.binlog --workdir=/var/tmp --target_version=5.5.49-log --timestamp=20160811194239 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxxThu Aug 11 19:42:48 2016 - [info] Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_172.16.80.117_3306_20160811194239.binlog on 172.16.80.128:3306. This may take long time...Applying log files succeeded.Thu Aug 11 19:42:48 2016 - [info] All relay logs were successfully applied.Thu Aug 11 19:42:48 2016 - [info] Resetting slave 172.16.80.128(172.16.80.128:3306) and starting replication from the new master 172.16.80.127(172.16.80.127:3306)..Thu Aug 11 19:42:48 2016 - [info] Executed CHANGE MASTER.Thu Aug 11 19:42:48 2016 - [info] Slave started.Thu Aug 11 19:42:48 2016 - [info] End of log messages from 172.16.80.128.Thu Aug 11 19:42:48 2016 - [info] -- Slave recovery on host 172.16.80.128(172.16.80.128:3306) succeeded.Thu Aug 11 19:42:48 2016 - [info] All new slave servers recovered successfully.Thu Aug 11 19:42:48 2016 - [info] Thu Aug 11 19:42:48 2016 - [info] * Phase 5: New master cleanup phase..Thu Aug 11 19:42:48 2016 - [info] Thu Aug 11 19:42:48 2016 - [info] Resetting slave info on the new master..Thu Aug 11 19:42:49 2016 - [info] 172.16.80.127: Resetting slave info succeeded.Thu Aug 11 19:42:49 2016 - [info] Master failover to 172.16.80.127(172.16.80.127:3306) completed successfully.Thu Aug 11 19:42:49 2016 - [info] ----- Failover Report -----app1: MySQL Master failover 172.16.80.117(172.16.80.117:3306) to 172.16.80.127(172.16.80.127:3306) succeededMaster 172.16.80.117(172.16.80.117:3306) is down!Check MHA Manager logs at ansible:/var/log/mha/app1/manager.log for details.Started automated(non-interactive) failover.Invalidated master IP address on 172.16.80.117(172.16.80.117:3306)The latest slave 172.16.80.127(172.16.80.127:3306) has all relay logs for recovery.Selected 172.16.80.127(172.16.80.127:3306) as a new master.172.16.80.127(172.16.80.127:3306): OK: Applying all logs succeeded.172.16.80.127(172.16.80.127:3306): OK: Activated master IP address.172.16.80.128(172.16.80.128:3306): This host has the latest relay log events.Generating relay diff files from the latest slave succeeded.172.16.80.128(172.16.80.128:3306): OK: Applying all logs succeeded. Slave started, replicating from 172.16.80.127(172.16.80.127:3306)172.16.80.127(172.16.80.127:3306): Resetting slave info succeeded.Master failover to 172.16.80.127(172.16.80.127:3306) completed successfully.
可以看到这个从库自动连接到了新的主库 172.16.80.127上面
切换完成后,关注如下变化:1、 vip自动从原来的master切换到新的master,同时,manager节点的监控进程自动退出。2、 在日志目录(/var/log/masterha/app1)产生一个app1.failover.complete文件3、 /etc/mha/app1.cnf配置文件中原来老的master配置被删除。
再截图之前ssh及mysql主从检查的过程
修复老的主master 172.16.80.117[root@centos02 .ssh]# /etc/init.d/mysqld startStarting MySQL................ [ OK ]此时在管理节点 172.16.80.128上检查同步情况
[root@ansible ~]# masterha_check_repl --conf=/etc/mha/app1.cnfFri Aug 12 14:03:15 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..Fri Aug 12 14:03:15 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf..Fri Aug 12 14:03:15 2016 - [info] Reading server configuration from /etc/mha/app1.cnf..Fri Aug 12 14:03:15 2016 - [info] MHA::MasterMonitor version 0.56.Fri Aug 12 14:03:19 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln653] There are 2 non-slave servers! MHA manages at most one non-slave server. Check configurations.Fri Aug 12 14:03:19 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 326Fri Aug 12 14:03:19 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.Fri Aug 12 14:03:19 2016 - [info] Got exit code 1 (Not master dead).MySQL Replication Health is NOT OK!在老的master执行如下命令:mysql>reset slave然后查看目前新的master状态mysql> show master status;+------------------+----------+--------------+------------------+| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |+------------------+----------+--------------+------------------+| mysql-bin.000001 | 245 | | |+------------------+----------+--------------+------------------+1 row in set (0.00 sec),找到binlog日志信息和pos id,然后在老master上执行如下命令:mysql> reset slave;Query OK, 0 rows affected (0.03 sec)mysql> change master to \ -> master_host='172.16.80.127',\ -> master_user='martin',\ -> master_password='123456',\ -> master_log_file='mysql-bin.000001',\ -> master_log_pos=245;Query OK, 0 rows affected (0.09 sec)mysql> start slave;Query OK, 0 rows affected (0.00 sec)
在老的主节点上面mysql> grant all on *.* to root@'centos02' identified by 123456;管理节点启动manage进程
[root@ansible ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf--ignore_last_failover< /dev/null > /var/log/masterha/app1/manager.log 2>&1 &[root@ansible ~]# masterha_check_status --conf=/etc/mha/app1.cnf app1 (pid:62074) is running(0:PING_OK), master:172.16.80.127[root@ansible ~]# masterha_check_repl --conf=/etc/mha/app1.cnfFri Aug 12 14:22:29 2016 - [info] Reading default configuration from /etc/masterha_default.cnf..Fri Aug 12 14:22:29 2016 - [info] Reading application default configuration from /etc/mha/app1.cnf..Fri Aug 12 14:22:29 2016 - [info] Reading server configuration from /etc/mha/app1.cnf..Fri Aug 12 14:22:29 2016 - [info] MHA::MasterMonitor version 0.56.Fri Aug 12 14:22:30 2016 - [info] GTID failover mode = 0Fri Aug 12 14:22:30 2016 - [info] Dead Servers:Fri Aug 12 14:22:30 2016 - [info] Alive Servers:Fri Aug 12 14:22:30 2016 - [info] 172.16.80.117(172.16.80.117:3306)Fri Aug 12 14:22:30 2016 - [info] 172.16.80.127(172.16.80.127:3306)Fri Aug 12 14:22:30 2016 - [info] 172.16.80.128(172.16.80.128:3306)Fri Aug 12 14:22:30 2016 - [info] Alive Slaves:Fri Aug 12 14:22:30 2016 - [info] 172.16.80.117(172.16.80.117:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledFri Aug 12 14:22:30 2016 - [info] Replicating from 172.16.80.127(172.16.80.127:3306)Fri Aug 12 14:22:30 2016 - [info] Primary candidate for the new Master (candidate_master is set)Fri Aug 12 14:22:30 2016 - [info] 172.16.80.128(172.16.80.128:3306) Version=5.5.49-log (oldest major version between slaves) log-bin:enabledFri Aug 12 14:22:30 2016 - [info] Replicating from 172.16.80.127(172.16.80.127:3306)Fri Aug 12 14:22:30 2016 - [info] Not candidate for the new Master (no_master is set)Fri Aug 12 14:22:30 2016 - [info] Current Alive Master: 172.16.80.127(172.16.80.127:3306)Fri Aug 12 14:22:30 2016 - [info] Checking slave configurations..Fri Aug 12 14:22:30 2016 - [warning] relay_log_purge=0 is not set on slave 172.16.80.117(172.16.80.117:3306).Fri Aug 12 14:22:30 2016 - [warning] relay_log_purge=0 is not set on slave 172.16.80.128(172.16.80.128:3306).Fri Aug 12 14:22:30 2016 - [info] Checking replication filtering settings..Fri Aug 12 14:22:30 2016 - [info] binlog_do_db= , binlog_ignore_db= Fri Aug 12 14:22:30 2016 - [info] Replication filtering check ok.Fri Aug 12 14:22:30 2016 - [info] GTID (with auto-pos) is not supportedFri Aug 12 14:22:30 2016 - [info] Starting SSH connection tests..Fri Aug 12 14:22:36 2016 - [info] All SSH connection tests passed successfully.Fri Aug 12 14:22:36 2016 - [info] Checking MHA Node version..Fri Aug 12 14:22:37 2016 - [info] Version check ok.Fri Aug 12 14:22:37 2016 - [info] Checking SSH publickey authentication settings on the current master..Fri Aug 12 14:22:37 2016 - [info] HealthCheck: SSH to 172.16.80.127 is reachable.Fri Aug 12 14:22:38 2016 - [info] Master MHA Node version is 0.56.Fri Aug 12 14:22:38 2016 - [info] Checking recovery script configurations on 172.16.80.127(172.16.80.127:3306)..Fri Aug 12 14:22:38 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/application/mysql/data --output_file=/var/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000001 Fri Aug 12 14:22:38 2016 - [info] Connecting to root@172.16.80.127(172.16.80.127:22).. Creating /var/tmp if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /application/mysql/data, up to mysql-bin.000001Fri Aug 12 14:22:38 2016 - [info] Binlog setting check done.Fri Aug 12 14:22:38 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..Fri Aug 12 14:22:38 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=172.16.80.117 --slave_ip=172.16.80.117 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.49-log --manager_version=0.56 --relay_log_info=/application/mysql/data/relay-log.info --relay_dir=/application/mysql/data/ --slave_pass=xxxFri Aug 12 14:22:38 2016 - [info] Connecting to root@172.16.80.117(172.16.80.117:22).. Checking slave recovery environment settings.. Opening /application/mysql/data/relay-log.info ... ok. Relay log found at /application/mysql/data, up to mysql-relay-bin.000002 Temporary relay log file is /application/mysql/data/mysql-relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done.Fri Aug 12 14:22:39 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=172.16.80.128 --slave_ip=172.16.80.128 --slave_port=3306 --workdir=/var/tmp --target_version=5.5.49-log --manager_version=0.56 --relay_log_info=/application/mysql/data/relay-log.info --relay_dir=/application/mysql/data/ --slave_pass=xxxFri Aug 12 14:22:39 2016 - [info] Connecting to root@172.16.80.128(172.16.80.128:22).. Checking slave recovery environment settings.. Opening /application/mysql/data/relay-log.info ... ok. Relay log found at /application/mysql/data, up to mysql-relay-bin.000002 Temporary relay log file is /application/mysql/data/mysql-relay-bin.000002 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done.Fri Aug 12 14:22:39 2016 - [info] Slaves settings check done.Fri Aug 12 14:22:39 2016 - [info] 172.16.80.127(172.16.80.127:3306) (current master) +--172.16.80.117(172.16.80.117:3306) +--172.16.80.128(172.16.80.128:3306)Fri Aug 12 14:22:39 2016 - [info] Checking replication health on 172.16.80.117..Fri Aug 12 14:22:39 2016 - [info] ok.Fri Aug 12 14:22:39 2016 - [info] Checking replication health on 172.16.80.128..Fri Aug 12 14:22:39 2016 - [info] ok.Fri Aug 12 14:22:39 2016 - [info] Checking master_ip_failover_script status:Fri Aug 12 14:22:39 2016 - [info] /etc/mha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.80.127 --orig_master_ip=172.16.80.127 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 172.16.80.200/24===Checking the Status of the script.. OK Fri Aug 12 14:22:40 2016 - [info] OK.Fri Aug 12 14:22:40 2016 - [warning] shutdown_script is not defined.Fri Aug 12 14:22:40 2016 - [info] Got exit code 0 (Not master dead).MySQL Replication Health is OK.