[Pacemaker] MySQL, Percona replication manager - split brain
Andrew
nitr0 at seti.kr.ua
Sun Oct 26 19:18:51 UTC 2014
26.10.2014 17:44, Andrei Borzenkov пишет:
> В Sun, 26 Oct 2014 10:51:13 +0200
> Andrew <nitr0 at seti.kr.ua> пишет:
>
>> 26.10.2014 08:32, Andrei Borzenkov пишет:
>>> В Sat, 25 Oct 2014 23:34:54 +0300
>>> Andrew <nitr0 at seti.kr.ua> пишет:
>>>
>>>> 25.10.2014 22:34, Digimer пишет:
>>>>> On 25/10/14 03:32 PM, Andrew wrote:
>>>>>> Hi all.
>>>>>>
>>>>>> I use Percona as RA on cluster (nothing mission-critical, currently -
>>>>>> just zabbix data); today after restarting MySQL resource (crm resource
>>>>>> restart p_mysql) I've got a split brain state - MySQL for some reason
>>>>>> started first at ex-slave node, ex-master starts later (possibly I've
>>>>>> set too small timeout to shutdown - only 120s, but I'm not sure).
>>>>>>
> Your logs do not show resource restart - they show pacemaker restart on
> node2.
Yes, you're right. This was a pacemaker restart.
>
>>>>>> After restart resource on both nodes it seems like mysql replication was
>>>>>> ok - but then after ~50min it fails in split brain again for unknown
>>>>>> reason (no resource restart was noticed).
>>>>>>
>>>>>> In 'show replication status' there is an error in table caused by unique
>>>>>> index dup.
>>>>>>
>>>>>> So I have a questions:
>>>>>> 1) Which thing causes split brain, and how to avoid it in future?
>>>>> Cause:
>>>>>
>>>>> Logs?
>>>> ct 25 13:54:13 node2 crmd[29248]: notice: do_state_transition: State
>>>> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
>>>> cause=C_FSA_INTERNAL origin=abort_transition_graph ]
>>>> Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_config: On loss
>>>> of CCM Quorum: Ignore
>>>> Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operation
>>>> monitor found resource p_pgsql:0 active in master mode on node1.cluster
>>>> Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operation
>>>> monitor found resource p_mysql:1 active in master mode on node2.cluster
>>> That seems too late. The real cause is that resource was reported as
>>> being in master state on both nodes and this happened earlier.
>> This is a different resources (pgsql and mysql)/
>>
>>>>> Prevent:
>>>>>
>>>>> Fencing (aka stonith). This is why fencing is required.
>>>> No node failure. Just daemon was restarted.
>>>>
>>> "Split brain" == loss of communication. It does not matter whether
>>> communication was lost because node failed or because daemon was not
>>> running. There is no way for surviving node to know, *why*
>>> communication was lost.
>>>
>> So how stonith will help in this case? Daemon will be restarted after
>> it's death if it occures during restart, and stonith will see alive
>> daemon...
>>
>> So what is the easiest split-brain solution? Just to stop daemons, and
>> copy all mysql data from good node to bad one?
> There is no split-brain visible in your log. Pacemaker on node2 was
> restarted, cleanly as far as I can tell, and reintegrated back in
> cluster. May be node1 "lost" node2, but that needs logs from node1.
Here is log from other node:
Oct 25 13:54:13 node1 pacemakerd[21773]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:13 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (56)
Oct 25 13:54:13 node1 attrd[26079]: notice: attrd_perform_update: Sent
update 6993: master-p_mysql=56
Oct 25 13:54:16 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (53)
Oct 25 13:54:16 node1 attrd[26079]: notice: attrd_perform_update: Sent
update 6995: master-p_mysql=53
Oct 25 13:54:16 node1 pacemakerd[22035]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:18 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (60)
Oct 25 13:54:18 node1 attrd[26079]: notice: attrd_perform_update: Sent
update 6997: master-p_mysql=60
Oct 25 13:54:18 node1 pacemakerd[22335]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:19 node1 pacemakerd[22476]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:19 node1 mysql(p_mysql)[22446]: INFO: Ignoring post-demote
notification execpt for my own demotion.
Oct 25 13:54:19 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2423, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:19 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2425, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node1 pacemakerd[22597]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:20 node1 mysql(p_mysql)[22540]: INFO: Ignoring post-demote
notification execpt for my own demotion.
Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2433, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node1 IPaddr(ClusterIP)[22538]: INFO: Adding inet
address 192.168.253.254/24 with broadcast address 192.168.253.255 to
device br0
Oct 25 13:54:20 node1 IPaddr2(pgsql_reader_vip)[22539]: INFO: Adding
inet address 192.168.253.31/24 with broadcast address 192.168.253.255 to
device br0
Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2435, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node1 IPaddr(ClusterIP)[22538]: INFO: Bringing device br0 up
Oct 25 13:54:20 node1 IPaddr2(pgsql_reader_vip)[22539]: INFO: Bringing
device br0 up
Oct 25 13:54:20 node1 IPaddr(ClusterIP)[22538]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.253.254 br0 192.168.253.254
auto not_used not_used
Oct 25 13:54:20 node1 IPaddr2(pgsql_reader_vip)[22539]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.253.31 br0 192.168.253.31 auto
not_used not_used
Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM
operation ClusterIP_start_0 (call=2429, rc=0, cib-update=653,
confirmed=true) ok
Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_start_0 (call=2431, rc=0, cib-update=654,
confirmed=true) ok
Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM
operation ClusterIP_monitor_2000 (call=2441, rc=0, cib-update=655,
confirmed=false) ok
Oct 25 13:54:20 node1 crmd[26081]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_monitor_10000 (call=2443, rc=0,
cib-update=656, confirmed=false) ok
Oct 25 13:54:20 node1 mysql(p_mysql)[22808]: INFO: post-demote
notification for node2.cluster.
Oct 25 13:54:20 node1 mysql(p_mysql)[22808]: INFO: MySQL slave has
finished reading master binary log
Oct 25 13:54:20 node1 mysql(p_mysql)[22808]: INFO: MySQL slave has
finished processing relay log
Oct 25 13:54:20 node1 pacemakerd[22948]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:20 node1 pgsql(p_pgsql)[22937]: INFO: Changing
p_pgsql-data-status on node2.cluster : STREAMING|SYNC->DISCONNECT.
Oct 25 13:54:21 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2445, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:21 node1 pgsql(p_pgsql)[22937]: INFO: Setup node2.cluster
into async mode.
Oct 25 13:54:21 node1 pgsql(p_pgsql)[22937]: INFO: server signaled
Oct 25 13:54:21 node1 pgsql(p_pgsql)[22937]: INFO: Reload configuration
file.
Oct 25 13:54:21 node1 pacemakerd[23094]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:21 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2450, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:21 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2453, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:22 node1 ntpd[1526]: Listen normally on 102 br0
192.168.253.254 UDP 123
Oct 25 13:54:22 node1 ntpd[1526]: Listen normally on 103 br0
192.168.253.31 UDP 123
Oct 25 13:54:22 node1 ntpd[1526]: peers refreshed
Oct 25 13:54:23 node1 pacemakerd[23336]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:25 node1 pacemakerd[23575]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:27 node1 pacemakerd[23851]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:29 node1 pacemakerd[24006]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:32 node1 pacemakerd[24205]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:34 node1 pacemakerd[24541]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:36 node1 pacemakerd[24705]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:38 node1 pacemakerd[24912]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:41 node1 pacemakerd[25150]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:43 node1 pacemakerd[25440]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:45 node1 pacemakerd[25641]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:47 node1 pacemakerd[25863]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:50 node1 pacemakerd[26016]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:52 node1 pacemakerd[26230]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:54 node1 pacemakerd[26563]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:56 node1 pacemakerd[26780]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:58 node1 pacemakerd[27008]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:01 node1 pacemakerd[27208]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:03 node1 pacemakerd[27540]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:05 node1 pacemakerd[27696]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:07 node1 pacemakerd[27958]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:10 node1 pacemakerd[28111]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:12 node1 pacemakerd[28310]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:14 node1 pacemakerd[28641]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:17 node1 pacemakerd[28812]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:19 node1 pacemakerd[29019]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:21 node1 pacemakerd[29219]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:23 node1 pacemakerd[29551]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:25 node1 pacemakerd[29706]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:28 node1 pacemakerd[29978]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:30 node1 pacemakerd[30177]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:32 node1 pacemakerd[30420]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:34 node1 pacemakerd[30708]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:37 node1 pacemakerd[30871]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2456, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2459, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:38 node1 mysql(p_mysql)[31063]: WARNING: Attempted to unset
the replication master on an instance that is not configured as a
replication slave
Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (1060)
Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_perform_update: Sent
update 7016: master-p_mysql=1060
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_promote_0 (call=2462, rc=0, cib-update=657,
confirmed=true) ok
Oct 25 13:55:38 node1 mysql(p_mysql)[31139]: INFO: This will be the new
master, ignoring post-promote notification.
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2465, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:38 node1 lrmd[26078]: notice: operation_finished:
p_mysql_monitor_5000:31173:stderr [ resource ms_MySQL is NOT running ]
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_monitor_5000 (call=2468, rc=8, cib-update=658,
confirmed=false) master
Oct 25 13:55:38 node1 IPaddr2(mysql_writer_vip)[31260]: INFO: Adding
inet address 192.168.253.64/24 with broadcast address 192.168.253.255 to
device br0
Oct 25 13:55:38 node1 IPaddr2(mysql_writer_vip)[31260]: INFO: Bringing
device br0 up
Oct 25 13:55:38 node1 IPaddr2(mysql_writer_vip)[31260]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.253.64 br0 192.168.253.64 auto
not_used not_used
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation mysql_writer_vip_start_0 (call=2470, rc=0, cib-update=659,
confirmed=true) ok
Oct 25 13:55:38 node1 crmd[26081]: notice: process_lrm_event: LRM
operation mysql_writer_vip_monitor_10000 (call=2474, rc=0,
cib-update=660, confirmed=false) ok
Oct 25 13:55:38 node1 crmd[26081]: notice: peer_update_callback: Our
peer on the DC is dead
Oct 25 13:55:38 node1 crmd[26081]: notice: do_state_transition: State
transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION
cause=C_CRMD_STATUS_CALLBACK origin=peer_update_callback ]
Oct 25 13:55:38 node1 crmd[26081]: notice: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: Diff: --- 0.322.7
Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: Diff: +++ 0.323.1
b52f4b73ea1a2704d36b65af7364a3f8
Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: -- <nvpair
value="1.1.8-7.el6-394e906" id="cib-bootstrap-options-dc-version"/>
Oct 25 13:55:38 node1 cib[26076]: notice: cib:diff: ++ <nvpair
id="cib-bootstrap-options-dc-version" name="dc-version"
value="1.1.10-14.el6_5.3-368c726"/>
Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Oct 25 13:55:38 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (1060)
Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_pgsql (1000)
Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-master-baseline
(000000000700FB08)
Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: readable (1)
Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (PRI)
Oct 25 13:55:39 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Oct 25 13:55:39 node1 pacemakerd[31397]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:40 node1 ntpd[1526]: Listen normally on 104 br0
192.168.253.64 UDP 123
Oct 25 13:55:40 node1 ntpd[1526]: peers refreshed
Oct 25 13:55:41 node1 pacemakerd[31594]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (1060)
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_pgsql (1000)
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-master-baseline
(000000000700FB08)
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: readable (1)
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (PRI)
Oct 25 13:55:42 node1 attrd[26079]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Oct 25 13:55:43 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:43 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start
p_mysql:1#011(node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start
p_pgsql:1#011(node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start
p_nginx:1#011(node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start
p_perl-fpm:1#011(node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Move
p_web_ip#011(Started node1.cluster -> node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Move
p_radius_ip#011(Started node1.cluster -> node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: LogActions: Start
p_radiusd:1#011(node2.cluster)
Oct 25 13:55:43 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-547.bz2
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 17: monitor ClusterIP_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 18: monitor mysql_reader_vip_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 19: monitor mysql_writer_vip_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 20: monitor pgsql_reader_vip_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 21: monitor pgsql_writer_vip_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 22: monitor p_mysql:1_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 23: monitor p_pgsql:1_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 24: monitor p_nginx:1_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 25: monitor p_perl-fpm:1_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 26: monitor p_web_ip_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 27: monitor p_radius_ip_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 28: monitor p_radiusd:1_monitor_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 140: notify p_mysql_pre_notify_start_0 on node1.cluster (local)
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 151: notify p_pgsql_pre_notify_start_0 on node1.cluster (local)
Oct 25 13:55:43 node1 pacemakerd[31858]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2477, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2479, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 16: probe_complete probe_complete on node2.cluster - no waiting
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 44: start p_mysql:1_start_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 74: start p_pgsql:1_start_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 108: start p_nginx:1_start_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 121: stop p_web_ip_stop_0 on node1.cluster (local)
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 124: stop p_radius_ip_stop_0 on node1.cluster (local)
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 129: start p_radiusd:1_start_0 on node2.cluster
Oct 25 13:55:43 node1 pacemakerd[32056]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node1 IPaddr2(p_radius_ip)[32032]: INFO: IP status = ok,
IP_CIP=
Oct 25 13:55:43 node1 IPaddr2(p_web_ip)[32030]: INFO: IP status = ok,
IP_CIP=
Oct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_radius_ip_stop_0 (call=2487, rc=0, cib-update=699,
confirmed=true) ok
Oct 25 13:55:43 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_web_ip_stop_0 (call=2484, rc=0, cib-update=700,
confirmed=true) ok
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 125: start p_radius_ip_start_0 on node2.cluster
Oct 25 13:55:43 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 122: start p_web_ip_start_0 on node2.cluster
Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 110: monitor p_nginx:1_monitor_30000 on node2.cluster
Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 109: monitor p_nginx:1_monitor_10000 on node2.cluster
Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 111: start p_perl-fpm:1_start_0 on node2.cluster
Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 126: monitor p_radius_ip_monitor_10000 on node2.cluster
Oct 25 13:55:44 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 123: monitor p_web_ip_monitor_10000 on node2.cluster
Oct 25 13:55:44 node1 lrmd[26078]: notice: operation_finished:
p_mysql_monitor_5000:31992:stderr [ resource ms_MySQL is NOT running ]
Oct 25 13:55:45 node1 ntpd[1526]: Deleting interface #100 lo,
10.255.0.32#123, interface stats: received=0, sent=0, dropped=0,
active_time=44998 secs
Oct 25 13:55:45 node1 ntpd[1526]: Deleting interface #18 lo,
10.255.0.33#123, interface stats: received=0, sent=0, dropped=0,
active_time=47428 secs
Oct 25 13:55:45 node1 ntpd[1526]: peers refreshed
Oct 25 13:55:45 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 152: notify p_pgsql_post_notify_start_0 on node1.cluster (local)
Oct 25 13:55:45 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 153: notify p_pgsql:1_post_notify_start_0 on node2.cluster
Oct 25 13:55:45 node1 pacemakerd[32302]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Changing
p_pgsql-data-status on node2.cluster : DISCONNECT->STREAMING|ASYNC.
Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: Diff: --- 0.323.42
Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: Diff: +++ 0.324.1
5fa5332e25fae6e209da715c2ba76e73
Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: -- <nvpair
value="DISCONNECT" id="nodes-node2.cluster-p_pgsql-data-status"/>
Oct 25 13:55:45 node1 cib[26076]: notice: cib:diff: ++ <nvpair
id="nodes-node2.cluster-p_pgsql-data-status" name="p_pgsql-data-status"
value="STREAMING|ASYNC"/>
Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Setup node2.cluster
into sync mode.
Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: server signaled
Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Reload configuration
file.
Oct 25 13:55:45 node1 pgsql(p_pgsql)[32291]: INFO: Changing
p_pgsql-status on node2.cluster : HS:alone->HS:async.
Oct 25 13:55:45 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2493, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:46 node1 pacemakerd[32448]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:46 node1 pgsql(p_pgsql)[32437]: INFO: Changing
p_pgsql-data-status on node2.cluster : STREAMING|ASYNC->STREAMING|SYNC.
Oct 25 13:55:46 node1 cib[26076]: notice: cib:diff: Diff: --- 0.324.2
Oct 25 13:55:46 node1 cib[26076]: notice: cib:diff: Diff: +++ 0.325.1
78a796d8af0b652875a2fff886b4b5bd
Oct 25 13:55:46 node1 cib[26076]: notice: cib:diff: -- <nvpair
value="STREAMING|ASYNC" id="nodes-node2.cluster-p_pgsql-data-status"/>
Oct 25 13:55:46 node1 cib[26076]: notice: cib:diff: ++ <nvpair
id="nodes-node2.cluster-p_pgsql-data-status" name="p_pgsql-data-status"
value="STREAMING|SYNC"/>
Oct 25 13:55:46 node1 pgsql(p_pgsql)[32437]: INFO: Changing p_pgsql
master score on node2.cluster : -INFINITY->100.
Oct 25 13:55:46 node1 pgsql(p_pgsql)[32437]: INFO: Changing
p_pgsql-status on node2.cluster : HS:async->HS:sync.
Oct 25 13:55:48 node1 pacemakerd[32701]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:49 node1 lrmd[26078]: notice: operation_finished:
p_mysql_monitor_5000:444:stderr [ resource ms_MySQL is NOT running ]
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 141: notify p_mysql_post_notify_start_0 on node1.cluster (local)
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 142: notify p_mysql:1_post_notify_start_0 on node2.cluster
Oct 25 13:55:49 node1 crmd[26081]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2496, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:55:49 node1 crmd[26081]: notice: run_graph: Transition 0
(Complete=50, Pending=0, Fired=0, Skipped=5, Incomplete=1,
Source=/var/lib/pacemaker/pengine/pe-input-547.bz2): Stopped
Oct 25 13:55:49 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:49 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:49 node1 pengine[26080]: notice: LogActions: Move
mysql_reader_vip#011(Started node1.cluster -> node2.cluster)
Oct 25 13:55:49 node1 pengine[26080]: notice: LogActions: Move
pgsql_reader_vip#011(Started node1.cluster -> node2.cluster)
Oct 25 13:55:49 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-548.bz2
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 22: stop mysql_reader_vip_stop_0 on node1.cluster (local)
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 27: stop pgsql_reader_vip_stop_0 on node1.cluster (local)
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 38: monitor p_mysql_monitor_2000 on node2.cluster
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 69: monitor p_pgsql_monitor_7000 on node2.cluster
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 106: monitor p_perl-fpm_monitor_10000 on node2.cluster
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 123: monitor p_radiusd_monitor_10000 on node2.cluster
Oct 25 13:55:49 node1 IPaddr2(mysql_reader_vip)[512]: INFO: IP status =
ok, IP_CIP=
Oct 25 13:55:49 node1 IPaddr2(pgsql_reader_vip)[513]: INFO: IP status =
ok, IP_CIP=
Oct 25 13:55:49 node1 crmd[26081]: notice: process_lrm_event: LRM
operation mysql_reader_vip_stop_0 (call=2500, rc=0, cib-update=702,
confirmed=true) ok
Oct 25 13:55:49 node1 crmd[26081]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_stop_0 (call=2503, rc=0, cib-update=703,
confirmed=true) ok
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 23: start mysql_reader_vip_start_0 on node2.cluster
Oct 25 13:55:49 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 28: start pgsql_reader_vip_start_0 on node2.cluster
Oct 25 13:55:50 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 24: monitor mysql_reader_vip_monitor_10000 on node2.cluster
Oct 25 13:55:50 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 29: monitor pgsql_reader_vip_monitor_10000 on node2.cluster
Oct 25 13:55:50 node1 crmd[26081]: notice: run_graph: Transition 1
(Complete=11, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-548.bz2): Complete
Oct 25 13:55:50 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:50 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:50 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-549.bz2
Oct 25 13:55:50 node1 crmd[26081]: notice: run_graph: Transition 2
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-549.bz2): Complete
Oct 25 13:55:50 node1 crmd[26081]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:55:50 node1 pacemakerd[631]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:51 node1 ntpd[1526]: Deleting interface #103 br0,
192.168.253.31#123, interface stats: received=0, sent=0, dropped=0,
active_time=89 secs
Oct 25 13:55:51 node1 ntpd[1526]: Deleting interface #101 br0,
192.168.253.63#123, interface stats: received=0, sent=0, dropped=0,
active_time=42685 secs
Oct 25 13:55:51 node1 ntpd[1526]: peers refreshed
Oct 25 13:55:52 node1 crmd[26081]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:55:52 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:52 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:52 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-550.bz2
Oct 25 13:55:52 node1 crmd[26081]: notice: run_graph: Transition 3
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-550.bz2): Complete
Oct 25 13:55:52 node1 crmd[26081]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:55:53 node1 pacemakerd[846]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:54 node1 crmd[26081]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:55:54 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:54 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:54 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-551.bz2
Oct 25 13:55:54 node1 crmd[26081]: notice: run_graph: Transition 4
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-551.bz2): Complete
Oct 25 13:55:54 node1 crmd[26081]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:55:55 node1 pacemakerd[1123]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:56 node1 crmd[26081]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:55:56 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:56 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:56 node1 pengine[26080]: notice: LogActions: Move
mysql_reader_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:55:56 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 5: /var/lib/pacemaker/pengine/pe-input-552.bz2
Oct 25 13:55:56 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 26: stop mysql_reader_vip_stop_0 on node2.cluster
Oct 25 13:55:56 node1 crmd[26081]: notice: run_graph: Transition 5
(Complete=1, Pending=0, Fired=0, Skipped=3, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-552.bz2): Stopped
Oct 25 13:55:56 node1 pengine[26080]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:56 node1 pengine[26080]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:56 node1 pengine[26080]: notice: LogActions: Start
mysql_reader_vip#011(node1.cluster)
Oct 25 13:55:56 node1 pengine[26080]: notice: process_pe_message:
Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-553.bz2
Oct 25 13:55:56 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 25: start mysql_reader_vip_start_0 on node1.cluster (local)
Oct 25 13:55:57 node1 IPaddr2(mysql_reader_vip)[1299]: INFO: Adding inet
address 192.168.253.63/24 with broadcast address 192.168.253.255 to
device br0
Oct 25 13:55:57 node1 IPaddr2(mysql_reader_vip)[1299]: INFO: Bringing
device br0 up
Oct 25 13:55:57 node1 IPaddr2(mysql_reader_vip)[1299]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.253.63 br0 192.168.253.63 auto
not_used not_used
Oct 25 13:55:57 node1 crmd[26081]: notice: process_lrm_event: LRM
operation mysql_reader_vip_start_0 (call=2509, rc=0, cib-update=709,
confirmed=true) ok
Oct 25 13:55:57 node1 crmd[26081]: notice: te_rsc_command: Initiating
action 26: monitor mysql_reader_vip_monitor_10000 on node1.cluster (local)
Oct 25 13:55:57 node1 crmd[26081]: notice: process_lrm_event: LRM
operation mysql_reader_vip_monitor_10000 (call=2512, rc=0,
cib-update=710, confirmed=false) ok
Oct 25 13:55:57 node1 crmd[26081]: notice: run_graph: Transition 6
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-553.bz2): Complete
Oct 25 13:55:57 node1 crmd[26081]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:55:57 node1 pacemakerd[1528]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:58 node1 ntpd[1526]: Listen normally on 105 br0
192.168.253.63 UDP 123
Oct 25 13:55:58 node1 ntpd[1526]: peers refreshed
> You probably misuse "split brain" in this case. Split-brain means -
> nodes lost communication with each other, so each node is unaware of
> in which state resources on other node are. Here "nodes" means
> corosync/pacemaker. Not individual resources.
I mean mysql database split brain - databases on nodes are out of sync.
More information about the Pacemaker
mailing list