[Pacemaker] MySQL, Percona replication manager - split brain
Andrew
nitr0 at seti.kr.ua
Sat Oct 25 20:34:54 UTC 2014
25.10.2014 22:34, Digimer пишет:
> On 25/10/14 03:32 PM, Andrew wrote:
>> Hi all.
>>
>> I use Percona as RA on cluster (nothing mission-critical, currently -
>> just zabbix data); today after restarting MySQL resource (crm resource
>> restart p_mysql) I've got a split brain state - MySQL for some reason
>> started first at ex-slave node, ex-master starts later (possibly I've
>> set too small timeout to shutdown - only 120s, but I'm not sure).
>>
>> After restart resource on both nodes it seems like mysql replication was
>> ok - but then after ~50min it fails in split brain again for unknown
>> reason (no resource restart was noticed).
>>
>> In 'show replication status' there is an error in table caused by unique
>> index dup.
>>
>> So I have a questions:
>> 1) Which thing causes split brain, and how to avoid it in future?
>
> Cause:
>
> Logs?
ct 25 13:54:13 node2 crmd[29248]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:13 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:13 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3459: /var/lib/pacemaker/pengine/pe-input-254.bz2
Oct 25 13:54:13 node2 crmd[29248]: notice: run_graph: Transition 3459
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-254.bz2): Complete
Oct 25 13:54:13 node2 crmd[29248]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:54:14 node2 pacemakerd[11626]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:16 node2 crmd[29248]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:54:16 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:16 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:16 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:16 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3460: /var/lib/pacemaker/pengine/pe-input-255.bz2
Oct 25 13:54:16 node2 crmd[29248]: notice: run_graph: Transition 3460
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-255.bz2): Complete
Oct 25 13:54:16 node2 crmd[29248]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:54:18 node2 crmd[29248]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Oct 25 13:54:18 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:18 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:18 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:18 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3461: /var/lib/pacemaker/pengine/pe-input-256.bz2
Oct 25 13:54:18 node2 crmd[29248]: notice: run_graph: Transition 3461
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-256.bz2): Complete
Oct 25 13:54:18 node2 crmd[29248]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Oct 25 13:54:19 node2 pacemaker: Signaling Pacemaker Cluster Manager to
terminate
Oct 25 13:54:19 node2 pacemakerd[29237]: notice: pcmk_shutdown_worker:
Shuting down Pacemaker
Oct 25 13:54:19 node2 pacemakerd[29237]: notice: stop_child: Stopping
crmd: Sent -15 to process 29248
Oct 25 13:54:19 node2 crmd[29248]: notice: crm_shutdown: Requesting
shutdown, upper limit is 1200000ms
Oct 25 13:54:19 node2 crmd[29248]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN cause=C_SHUTDOWN
origin=crm_shutdown ]
Oct 25 13:54:19 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: shutdown (1414234459)
Oct 25 13:54:19 node2 pacemaker: Waiting for cluster services to unload
Oct 25 13:54:19 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7041: shutdown=1414234459
Oct 25 13:54:19 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:19 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:19 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:19 node2 pengine[29247]: notice: stage6: Scheduling Node
node2.cluster for shutdown
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Move
ClusterIP#011(Started node2.cluster -> node1.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Move
mysql_writer_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Move
pgsql_reader_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Promote
p_mysql:0#011(Slave -> Master node1.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Demote
p_mysql:1#011(Master -> Stopped node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stop
p_pgsql:1#011(node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stop
p_nginx:1#011(node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stop
p_perl-fpm:1#011(node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: LogActions: Stop
p_radiusd:1#011(node2.cluster)
Oct 25 13:54:19 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3462: /var/lib/pacemaker/pengine/pe-input-257.bz2
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 24: stop ClusterIP_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 32: stop pgsql_reader_vip_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 2: cancel p_mysql_cancel_2000 on node1.cluster
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 143: notify p_mysql_pre_notify_demote_0 on node1.cluster
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 145: notify p_mysql_pre_notify_demote_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 149: notify p_pgsql_pre_notify_stop_0 on node1.cluster
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 151: notify p_pgsql_pre_notify_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 105: stop p_perl-fpm_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 120: stop p_radiusd_stop_0 on node2.cluster (local)
Oct 25 13:54:19 node2 lrmd[29245]: notice: operation_finished:
p_perl-fpm_stop_0:12107:stderr [ grep: /var/run/perl-fpm/perl-fpm.pid:
No such file or directory ]
Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_perl-fpm_stop_0 (call=2537, rc=0, cib-update=4282,
confirmed=true) ok
Oct 25 13:54:19 node2 pacemakerd[12173]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:19 node2 mysql(p_mysql)[12101]: INFO: post-demote
notification for node2.cluster
Oct 25 13:54:19 node2 IPaddr2(pgsql_reader_vip)[12100]: INFO: IP status
= ok, IP_CIP=
Oct 25 13:54:19 node2 IPaddr(ClusterIP)[12099]: INFO: IP status = ok,
IP_CIP=
Oct 25 13:54:19 node2 avahi-daemon[6932]: Withdrawing address record for
192.168.253.31 on br0.
Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2534, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:19 node2 avahi-daemon[6932]: Withdrawing address record for
192.168.253.254 on br0.
Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_stop_0 (call=2528, rc=0, cib-update=4283,
confirmed=true) ok
Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM
operation ClusterIP_stop_0 (call=2525, rc=0, cib-update=4284,
confirmed=true) ok
Oct 25 13:54:19 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2532, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_radiusd_stop_0 (call=2540, rc=0, cib-update=4285,
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]: notice: run_graph: Transition 3462
(Complete=15, Pending=0, Fired=0, Skipped=29, Incomplete=18,
Source=/var/lib/pacemaker/pengine/pe-input-257.bz2): Stopped
Oct 25 13:54:20 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:20 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:20 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:20 node2 pengine[29247]: notice: stage6: Scheduling Node
node2.cluster for shutdown
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Start
ClusterIP#011(node1.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Move
mysql_writer_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Start
pgsql_reader_vip#011(node1.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Promote
p_mysql:0#011(Slave -> Master node1.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Demote
p_mysql:1#011(Master -> Stopped node2.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Stop
p_pgsql:1#011(node2.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: LogActions: Stop
p_nginx:1#011(node2.cluster)
Oct 25 13:54:20 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3463: /var/lib/pacemaker/pengine/pe-input-258.bz2
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 19: start ClusterIP_start_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 26: start pgsql_reader_vip_start_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 134: notify p_mysql_pre_notify_demote_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 136: notify p_mysql_pre_notify_demote_0 on node2.cluster (local)
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 140: notify p_pgsql_pre_notify_stop_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 142: notify p_pgsql_pre_notify_stop_0 on node2.cluster (local)
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 97: stop p_nginx_stop_0 on node2.cluster (local)
Oct 25 13:54:20 node2 pacemakerd[12402]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:20 node2 mysql(p_mysql)[12371]: INFO: post-demote
notification for node2.cluster
Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=2552, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 64: stop p_pgsql_stop_0 on node2.cluster (local)
Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2550, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 34: demote p_mysql_demote_0 on node2.cluster (local)
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 20: monitor ClusterIP_monitor_2000 on node1.cluster
Oct 25 13:54:20 node2 pacemakerd[12542]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 27: monitor pgsql_reader_vip_monitor_10000 on node1.cluster
Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (1)
Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7043: master-p_mysql=1
Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_mysql_demote_0 (call=2567, rc=0, cib-update=4287,
confirmed=true) ok
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 135: notify p_mysql_post_notify_demote_0 on node1.cluster
Oct 25 13:54:20 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 137: notify p_mysql_post_notify_demote_0 on node2.cluster (local)
Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_pgsql (-INFINITY)
Oct 25 13:54:20 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7045: master-p_pgsql=-INFINITY
Oct 25 13:54:20 node2 mysql(p_mysql)[12605]: INFO: Ignoring post-demote
notification for my own demotion.
Oct 25 13:54:20 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2571, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:20 node2 ntpd[7193]: Deleting interface #110 br0:0,
192.168.253.254#123, interface stats: received=0, sent=0, dropped=0,
active_time=47343 secs
Oct 25 13:54:20 node2 ntpd[7193]: Deleting interface #172 br0,
192.168.253.31#123, interface stats: received=0, sent=0, dropped=0,
active_time=45246 secs
Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: Diff: --- 0.320.2419
Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: Diff: +++ 0.321.1
17dd8c556d2c592e066c52d67b1a5b87
Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: -- <nvpair
value="STREAMING|SYNC" id="nodes-node2.cluster-p_pgsql-data-status"/>
Oct 25 13:54:20 node2 cib[29243]: notice: cib:diff: ++ <nvpair
id="nodes-node2.cluster-p_pgsql-data-status" name="p_pgsql-data-status"
value="DISCONNECT"/>
Oct 25 13:54:21 node2 nginx(p_nginx)[12373]: INFO: Killing nginx PID 29997
Oct 25 13:54:21 node2 nginx(p_nginx)[12373]: INFO: nginx stopped.
Oct 25 13:54:21 node2 lrmd[29245]: notice: operation_finished:
p_nginx_stop_0:12373:stderr [ /usr/lib/ocf/resource.d/heartbeat/nginx:
line 467: kill: (29997) - No such process ]
Oct 25 13:54:21 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_nginx_stop_0 (call=2556, rc=0, cib-update=4288,
confirmed=true) ok
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: waiting for server to
shut down.... done server stopped
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: PostgreSQL is down
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: Changing
p_pgsql-status on node2.cluster : HS:sync->STOP.
Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (STOP)
Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7047: p_pgsql-status=STOP
Oct 25 13:54:21 node2 pgsql(p_pgsql)[12524]: INFO: Set all nodes into
async mode.
Oct 25 13:54:21 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_pgsql_stop_0 (call=2562, rc=0, cib-update=4289,
confirmed=true) ok
Oct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 141: notify p_pgsql_post_notify_stop_0 on node1.cluster
Oct 25 13:54:21 node2 crmd[29248]: notice: run_graph: Transition 3463
(Complete=28, Pending=0, Fired=0, Skipped=18, Incomplete=9,
Source=/var/lib/pacemaker/pengine/pe-input-258.bz2): Stopped
Oct 25 13:54:21 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:54:21 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:54:21 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:54:21 node2 pengine[29247]: notice: stage6: Scheduling Node
node2.cluster for shutdown
Oct 25 13:54:21 node2 pengine[29247]: notice: LogActions: Stop
mysql_writer_vip#011(Started node1.cluster)
Oct 25 13:54:21 node2 pengine[29247]: notice: LogActions: Promote
p_mysql:0#011(Slave -> Master node1.cluster)
Oct 25 13:54:21 node2 pengine[29247]: notice: LogActions: Stop
p_mysql:1#011(node2.cluster)
Oct 25 13:54:21 node2 pengine[29247]: crit: stage8: Cannot shut down
node 'node2.cluster' because of mysql_writer_vip: blocked
Oct 25 13:54:21 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3464: /var/lib/pacemaker/pengine/pe-input-259.bz2
Oct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 125: notify p_mysql_pre_notify_stop_0 on node1.cluster
Oct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 127: notify p_mysql_pre_notify_stop_0 on node2.cluster (local)
Oct 25 13:54:21 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=2576, rc=0, cib-update=0,
confirmed=true) ok
Oct 25 13:54:21 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 32: stop p_mysql_stop_0 on node2.cluster (local)
Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (<null>)
Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_perform_update: Sent
delete 7049: node=node2.cluster, attr=master-p_mysql, id=<n/a>,
set=(null), section=status
Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: readable (0)
Oct 25 13:54:21 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7052: readable=0
Oct 25 13:54:25 node2 nginx(p_nginx)[12778]: INFO: nginx not running
Oct 25 13:54:25 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_nginx_monitor_10000 (call=2442, rc=7, cib-update=4291,
confirmed=false) not running
Oct 25 13:54:25 node2 crmd[29248]: warning: update_failcount: Updating
failcount for p_nginx on node2.cluster after failed monitor: rc=7
(update=value++, time=1414234465)
Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-p_nginx (1)
Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7055: fail-count-p_nginx=1
Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-p_nginx (1414234465)
Oct 25 13:54:25 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7057: last-failure-p_nginx=1414234465
Oct 25 13:54:26 node2 nginx(p_nginx)[12844]: INFO: nginx not running
Oct 25 13:54:26 node2 nginx(p_nginx)[12845]: INFO: nginx not running
Oct 25 13:54:26 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_nginx_monitor_10000 (call=333, rc=7, cib-update=4292,
confirmed=false) not running
Oct 25 13:54:26 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_nginx_monitor_10000 (call=2469, rc=7, cib-update=4293,
confirmed=false) not running
Oct 25 13:54:26 node2 crmd[29248]: warning: update_failcount: Updating
failcount for p_nginx on node2.cluster after failed monitor: rc=7
(update=value++, time=1414234466)
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-p_nginx (2)
Oct 25 13:54:26 node2 crmd[29248]: warning: update_failcount: Updating
failcount for p_nginx on node2.cluster after failed monitor: rc=7
(update=value++, time=1414234466)
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7059: fail-count-p_nginx=2
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-p_nginx (1414234466)
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7061: last-failure-p_nginx=1414234466
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-p_nginx (3)
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7063: fail-count-p_nginx=3
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-p_nginx (1414234466)
Oct 25 13:54:26 node2 attrd[29246]: notice: attrd_perform_update: Sent
update 7065: last-failure-p_nginx=1414234466
Oct 25 13:54:35 node2 nginx(p_nginx)[13037]: INFO: nginx not running
Oct 25 13:54:36 node2 nginx(p_nginx)[13099]: INFO: nginx not running
Oct 25 13:54:36 node2 nginx(p_nginx)[13098]: INFO: nginx not running
Oct 25 13:54:45 node2 nginx(p_nginx)[13291]: INFO: nginx not running
Oct 25 13:54:47 node2 nginx(p_nginx)[13353]: INFO: nginx not running
Oct 25 13:54:47 node2 nginx(p_nginx)[13352]: INFO: nginx not running
Oct 25 13:54:55 node2 nginx(p_nginx)[13549]: INFO: nginx not running
Oct 25 13:54:57 node2 nginx(p_nginx)[13628]: INFO: nginx not running
Oct 25 13:54:57 node2 nginx(p_nginx)[13627]: INFO: nginx not running
Oct 25 13:55:05 node2 nginx(p_nginx)[13870]: INFO: nginx not running
Oct 25 13:55:07 node2 nginx(p_nginx)[13931]: INFO: nginx not running
Oct 25 13:55:07 node2 nginx(p_nginx)[13932]: INFO: nginx not running
Oct 25 13:55:15 node2 nginx(p_nginx)[14151]: INFO: nginx not running
Oct 25 13:55:17 node2 nginx(p_nginx)[14214]: INFO: nginx not running
Oct 25 13:55:17 node2 nginx(p_nginx)[14213]: INFO: nginx not running
Oct 25 13:55:25 node2 nginx(p_nginx)[14405]: INFO: nginx not running
Oct 25 13:55:27 node2 nginx(p_nginx)[14469]: INFO: nginx not running
Oct 25 13:55:27 node2 nginx(p_nginx)[14468]: INFO: nginx not running
Oct 25 13:55:35 node2 nginx(p_nginx)[14659]: INFO: nginx not running
Oct 25 13:55:37 node2 nginx(p_nginx)[14723]: INFO: nginx not running
Oct 25 13:55:37 node2 nginx(p_nginx)[14724]: INFO: nginx not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL is not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL is not running
Oct 25 13:55:38 node2 mysql(p_mysql)[12733]: INFO: MySQL stopped
Oct 25 13:55:38 node2 crmd[29248]: notice: process_lrm_event: LRM
operation p_mysql_stop_0 (call=2579, rc=0, cib-update=4294,
confirmed=true) ok
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 126: notify p_mysql_post_notify_stop_0 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]: notice: run_graph: Transition 3464
(Complete=10, Pending=0, Fired=0, Skipped=6, Incomplete=4,
Source=/var/lib/pacemaker/pengine/pe-input-259.bz2): Stopped
Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:55:38 node2 pengine[29247]: warning: unpack_rsc_op:
Processing failed op monitor for p_nginx:1 on node2.cluster: not running (7)
Oct 25 13:55:38 node2 pengine[29247]: notice: stage6: Scheduling Node
node2.cluster for shutdown
Oct 25 13:55:38 node2 pengine[29247]: notice: LogActions: Stop
mysql_writer_vip#011(Started node1.cluster)
Oct 25 13:55:38 node2 pengine[29247]: notice: LogActions: Promote
p_mysql:0#011(Slave -> Master node1.cluster)
Oct 25 13:55:38 node2 pengine[29247]: crit: stage8: Cannot shut down
node 'node2.cluster' because of mysql_writer_vip: blocked
Oct 25 13:55:38 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3465: /var/lib/pacemaker/pengine/pe-input-260.bz2
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 126: notify p_mysql_pre_notify_promote_0 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 30: promote p_mysql_promote_0 on node1.cluster
Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: Diff: --- 0.321.15
Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: Diff: +++ 0.322.1
92b30c7922072633f9dbc336e4f1a066
Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: -- <nvpair
value="192.168.253.4|mysqld-bin.000409|106"
id="mysql_replication-p_mysql_REPL_INFO"/>
Oct 25 13:55:38 node2 cib[29243]: notice: cib:diff: ++ <nvpair
id="mysql_replication-p_mysql_REPL_INFO" name="p_mysql_REPL_INFO"
value="192.168.253.5|mysqld-bin.000471|106"/>
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 127: notify p_mysql_post_notify_promote_0 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]: notice: run_graph: Transition 3465
(Complete=9, Pending=0, Fired=0, Skipped=1, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-260.bz2): Stopped
Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_config: On loss
of CCM Quorum: Ignore
Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_pgsql:0 active in master mode on node1.cluster
Oct 25 13:55:38 node2 pengine[29247]: notice: unpack_rsc_op: Operation
monitor found resource p_mysql:1 active in master mode on node2.cluster
Oct 25 13:55:38 node2 pengine[29247]: warning: unpack_rsc_op:
Processing failed op monitor for p_nginx:1 on node2.cluster: not running (7)
Oct 25 13:55:38 node2 pengine[29247]: notice: stage6: Scheduling Node
node2.cluster for shutdown
Oct 25 13:55:38 node2 pengine[29247]: notice: LogActions: Move
mysql_writer_vip#011(Started node2.cluster -> node1.cluster)
Oct 25 13:55:38 node2 pengine[29247]: notice: process_pe_message:
Calculated Transition 3466: /var/lib/pacemaker/pengine/pe-input-261.bz2
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 21: stop mysql_writer_vip_stop_0 on node2.cluster (local)
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 32: monitor p_mysql_monitor_5000 on node1.cluster
Oct 25 13:55:38 node2 IPaddr2(mysql_writer_vip)[14851]: INFO: IP status
= ok, IP_CIP=
Oct 25 13:55:38 node2 avahi-daemon[6932]: Withdrawing address record for
192.168.253.64 on br0.
Oct 25 13:55:38 node2 crmd[29248]: notice: process_lrm_event: LRM
operation mysql_writer_vip_stop_0 (call=2586, rc=0, cib-update=4300,
confirmed=true) ok
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 22: start mysql_writer_vip_start_0 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]: notice: te_rsc_command: Initiating
action 23: monitor mysql_writer_vip_monitor_10000 on node1.cluster
Oct 25 13:55:38 node2 crmd[29248]: notice: run_graph: Transition 3466
(Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-261.bz2): Complete
Oct 25 13:55:38 node2 crmd[29248]: notice: lrm_state_verify_stopped:
Stopped 1 recurring operations at shutdown... waiting (0 ops remaining)
Oct 25 13:55:38 node2 lrmd[29245]: warning: qb_ipcs_event_sendv:
new_event_notification (29245-29248-6): Bad file descriptor (9)
Oct 25 13:55:38 node2 lrmd[29245]: warning: send_client_notify:
Notification of client crmd/d068e1e6-7c6e-48b0-bb86-49d35fe11673 failed
Oct 25 13:55:38 node2 crmd[29248]: notice: do_lrm_control:
Disconnected from the LRM
Oct 25 13:55:38 node2 crmd[29248]: notice: terminate_cs_connection:
Disconnecting from Corosync
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit:
Client crmd (conn=0x1afff00, async-conn=0x1afff00) left
Oct 25 13:55:38 node2 cib[29243]: warning: qb_ipcs_event_sendv:
new_event_notification (29243-29248-13): Broken pipe (32)
Oct 25 13:55:38 node2 cib[29243]: warning: do_local_notify: A-Sync
reply to crmd failed: No message of desired type
Oct 25 13:55:38 node2 pacemakerd[29237]: warning: qb_ipcs_event_sendv:
new_event_notification (29237-29248-14): Broken pipe (32)
Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping
pengine: Sent -15 to process 29247
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.crmd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping
attrd: Sent -15 to process 29246
Oct 25 13:55:38 node2 attrd[29246]: notice: main: Exiting...
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit:
Client attrd (conn=0x1af3480, async-conn=0x1af3480) left
Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping
lrmd: Sent -15 to process 29245
Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping
stonith-ng: Sent -15 to process 29244
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit:
Client stonith-ng (conn=0x1af7800, async-conn=0x1af7800) left
Oct 25 13:55:38 node2 pacemakerd[29237]: notice: stop_child: Stopping
cib: Sent -15 to process 29243
Oct 25 13:55:38 node2 cib[29243]: notice: terminate_cs_connection:
Disconnecting from Corosync
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] info: pcmk_ipc_exit:
Client cib (conn=0x1afbb80, async-conn=0x1afbb80) left
Oct 25 13:55:38 node2 pacemakerd[29237]: notice: pcmk_shutdown_worker:
Shutdown complete
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.cib failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.cib failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.cib failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:38 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.cib failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.cib failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.cib failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.attrd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.attrd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.attrd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.attrd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.attrd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.attrd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:39 node2 pacemaker: Starting Pacemaker Cluster Manager
Oct 25 13:55:39 node2 pacemakerd[14916]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 pacemakerd[14916]: notice: main: Starting
Pacemaker 1.1.10-14.el6_5.3 (Build: 368c726): generated-manpages
agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc
nagios corosync-plugin cman
Oct 25 13:55:39 node2 pacemakerd[14916]: notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.stonith-ng failed: ipc
delivery failed (rc=-2)
Oct 25 13:55:39 node2 cib[14922]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 attrd[14925]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 stonith-ng[14923]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 cib[14922]: notice: main: Using legacy config
location: /var/lib/heartbeat/crm
Oct 25 13:55:39 node2 attrd[14925]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Oct 25 13:55:39 node2 pengine[14926]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 stonith-ng[14923]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Oct 25 13:55:39 node2 lrmd[14924]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 crmd[14927]: notice: crm_add_logfile: Additional
logging available in /var/log/cluster/corosync.log
Oct 25 13:55:39 node2 crmd[14927]: notice: main: CRM Git Version: 368c726
Oct 25 13:55:39 node2 attrd[14925]: notice: get_node_name: Defaulting
to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:
Recorded connection 0x1af31c0 for attrd/0
Oct 25 13:55:39 node2 stonith-ng[14923]: notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:
Recorded connection 0x1af7540 for stonith-ng/0
Oct 25 13:55:39 node2 attrd[14925]: notice: get_node_name: Defaulting
to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 cib[14922]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Oct 25 13:55:39 node2 stonith-ng[14923]: notice: get_node_name:
Defaulting to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 attrd[14925]: notice: main: Starting mainloop...
Oct 25 13:55:39 node2 cib[14922]: notice: get_node_name: Defaulting to
uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:
Recorded connection 0x1afb8c0 for cib/0
Oct 25 13:55:39 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:
Sending membership update 1608 to cib
Oct 25 13:55:39 node2 cib[14922]: notice: get_node_name: Defaulting to
uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:39 node2 cib[14922]: notice: plugin_handle_membership:
Membership 1608: quorum acquired
Oct 25 13:55:39 node2 cib[14922]: notice: crm_update_peer_state:
plugin_handle_membership: Node node2.cluster[83732672] - state is now
member (was (null))
Oct 25 13:55:39 node2 cib[14922]: notice: crm_update_peer_state:
plugin_handle_membership: Node node1.cluster[100509888] - state is now
member (was (null))
Oct 25 13:55:40 node2 corosync[29219]: [pcmk ] WARN:
route_ais_message: Sending message to local.crmd failed: ipc delivery
failed (rc=-2)
Oct 25 13:55:40 node2 cib[14922]: notice: crm_ipc_prepare: Message
exceeds the configured ipc limit (51200 bytes), consider configuring
PCMK_ipc_buffer to 161796 or higher to avoid compression overheads
Oct 25 13:55:40 node2 crmd[14927]: notice: crm_cluster_connect:
Connecting to cluster infrastructure: classic openais (with plugin)
Oct 25 13:55:40 node2 crmd[14927]: notice: get_node_name: Defaulting
to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:40 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:
Recorded connection 0x1affc40 for crmd/0
Oct 25 13:55:40 node2 corosync[29219]: [pcmk ] info: pcmk_ipc:
Sending membership update 1608 to crmd
Oct 25 13:55:40 node2 stonith-ng[14923]: notice: setup_cib: Watching
for stonith topology changes
Oct 25 13:55:40 node2 crmd[14927]: notice: get_node_name: Defaulting
to uname -n for the local classic openais (with plugin) node name
Oct 25 13:55:40 node2 stonith-ng[14923]: notice: unpack_config: On
loss of CCM Quorum: Ignore
Oct 25 13:55:40 node2 crmd[14927]: notice: plugin_handle_membership:
Membership 1608: quorum acquired
Oct 25 13:55:40 node2 crmd[14927]: notice: crm_update_peer_state:
plugin_handle_membership: Node node2.cluster[83732672] - state is now
member (was (null))
Oct 25 13:55:40 node2 crmd[14927]: notice: crm_update_peer_state:
plugin_handle_membership: Node node1.cluster[100509888] - state is now
member (was (null))
Oct 25 13:55:40 node2 crmd[14927]: notice: do_started: The local CRM
is operational
Oct 25 13:55:40 node2 crmd[14927]: notice: do_state_transition: State
transition S_STARTING -> S_PENDING [ input=I_PENDING
cause=C_FSA_INTERNAL origin=do_started ]
Oct 25 13:55:40 node2 ntpd[7193]: Deleting interface #76 br0,
192.168.253.64#123, interface stats: received=0, sent=0, dropped=0,
active_time=3555898 secs
Oct 25 13:55:42 node2 crmd[14927]: notice: do_state_transition: State
transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE
origin=do_cl_join_finalize_respond ]
Oct 25 13:55:42 node2 attrd[14925]: notice: attrd_local_callback:
Sending full refresh (origin=crmd)
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_perl-fpm_monitor_0 (call=41, rc=7, cib-update=8,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_radiusd_monitor_0 (call=54, rc=7, cib-update=9,
confirmed=true) not running
Oct 25 13:55:43 node2 pacemakerd[15105]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node2 mysql(p_mysql)[14949]: INFO: MySQL is not running
Oct 25 13:55:43 node2 crmd[14927]: error: generic_get_metadata:
Failed to retrieve meta-data for ocf:percona:mysql
Oct 25 13:55:43 node2 crmd[14927]: warning: get_rsc_metadata: No
metadata found for mysql::ocf:percona: Input/output error (-5)
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_mysql_monitor_0 (call=26, rc=7, cib-update=10,
confirmed=true) not running
Oct 25 13:55:43 node2 pgsql(p_pgsql)[14953]: INFO: Don't check
/var/lib/pgsql/9.1/data/ during probe
Oct 25 13:55:43 node2 pgsql(p_pgsql)[14953]: INFO: PostgreSQL is down
Oct 25 13:55:43 node2 nginx(p_nginx)[14956]: INFO: nginx not running
Oct 25 13:55:43 node2 pacemakerd[15368]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_pgsql_monitor_0 (call=31, rc=7, cib-update=11,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_nginx_monitor_0 (call=36, rc=7, cib-update=12,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_web_ip_monitor_0 (call=45, rc=7, cib-update=13,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation pgsql_writer_vip_monitor_0 (call=21, rc=7, cib-update=14,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation ClusterIP_monitor_0 (call=5, rc=7, cib-update=15,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_radius_ip_monitor_0 (call=49, rc=7, cib-update=16,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation mysql_writer_vip_monitor_0 (call=13, rc=7, cib-update=17,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_monitor_0 (call=17, rc=7, cib-update=18,
confirmed=true) not running
Oct 25 13:55:43 node2 crmd[14927]: notice: process_lrm_event: LRM
operation mysql_reader_vip_monitor_0 (call=9, rc=7, cib-update=19,
confirmed=true) not running
Oct 25 13:55:43 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: probe_complete (true)
Oct 25 13:55:43 node2 pacemakerd[15481]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:43 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: readable (0)
Oct 25 13:55:43 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (0)
Oct 25 13:55:43 node2 mysql(p_mysql)[15438]: INFO: MySQL is not running
Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: nginx: the
configuration file /etc/nginx/nginx.conf syntax is ok nginx:
configuration file /etc/nginx/nginx.conf test is successful
Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: Starting
/usr/sbin/nginx - nginx version: nginx/1.0.15
Oct 25 13:55:43 node2 nginx(p_nginx)[15440]: INFO: /usr/sbin/nginx build
configuration: configure arguments: --prefix=/usr/share/nginx
--sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf
--error-log-path=/var/log/nginx/error.log
--http-log-path=/var/log/nginx/access.log
--http-client-body-temp-path=/var/lib/nginx/tmp/client_body
--http-proxy-temp-path=/var/lib/nginx/tmp/proxy
--http-fastcgi-temp-path=/var/lib/nginx/tmp/fastcgi
--http-uwsgi-temp-path=/var/lib/nginx/tmp/uwsgi
--http-scgi-temp-path=/var/lib/nginx/tmp/scgi
--pid-path=/var/run/nginx.pid --lock-path=/var/lock/subsys/nginx
--user=nginx --group=nginx --with-file-aio --with-ipv6
--with-http_ssl_module --with-http_realip_module
--with-http_addition_module --with-http_xslt_module
--with-http_image_filter_module --with-http_geoip_module
--with-http_sub_module --with-http_dav_module --with-http_flv_module
--with-http_mp4_module --with-http_gzip_static_module
--with-http_random_index_module --with-http_secure_link_module
--with-http_degradation_module --with-http_stub_status_module
--with-http_perl_module --with-mail --with-mail_ssl_module
--with-cc-opt='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic'
--with-ld-opt=-Wl,-E
Oct 25 13:55:43 node2 pgsql(p_pgsql)[15439]: INFO: Changing
p_pgsql-status on node2.cluster : ->STOP.
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_nginx_start_0 (call=72, rc=0, cib-update=20, confirmed=true) ok
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (STOP)
Oct 25 13:55:44 node2 IPaddr2(p_radius_ip)[15582]: INFO: Adding inet
address 10.255.0.33/32 to device lo
Oct 25 13:55:44 node2 IPaddr2(p_web_ip)[15589]: INFO: Adding inet
address 10.255.0.32/32 to device lo
Oct 25 13:55:44 node2 IPaddr2(p_radius_ip)[15582]: INFO: Bringing device
lo up
Oct 25 13:55:44 node2 IPaddr2(p_web_ip)[15589]: INFO: Bringing device lo up
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_radius_ip_start_0 (call=76, rc=0, cib-update=21,
confirmed=true) ok
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_web_ip_start_0 (call=78, rc=0, cib-update=22, confirmed=true) ok
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_pgsql (-INFINITY)
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_nginx_monitor_30000 (call=81, rc=0, cib-update=23,
confirmed=false) ok
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: Set all nodes into
async mode.
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_radius_ip_monitor_10000 (call=89, rc=0, cib-update=24,
confirmed=false) ok
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_web_ip_monitor_10000 (call=91, rc=0, cib-update=25,
confirmed=false) ok
Oct 25 13:55:44 node2 mysql(p_mysql)[15438]: INFO: MySQL is not running
Oct 25 13:55:44 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_nginx_monitor_10000 (call=83, rc=0, cib-update=26,
confirmed=false) ok
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: server starting
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: INFO: PostgreSQL start
command sent.
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: WARNING: Can't get
PostgreSQL recovery status. rc=2
Oct 25 13:55:44 node2 pgsql(p_pgsql)[15439]: WARNING: Connection error
(connection to the server went bad and the session was not interactive)
occurred while executing the psql command.
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 4: master-p_mysql=0
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 7: master-p_pgsql=-INFINITY
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 11: readable=0
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 17: p_pgsql-status=STOP
Oct 25 13:55:44 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 20: probe_complete=true
Oct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_perl-fpm_start_0 (call=85, rc=0, cib-update=27,
confirmed=true) ok
Oct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_radiusd_start_0 (call=74, rc=0, cib-update=28,
confirmed=true) ok
Oct 25 13:55:45 node2 pgsql(p_pgsql)[15439]: INFO: PostgreSQL is started.
Oct 25 13:55:45 node2 pgsql(p_pgsql)[15439]: INFO: Changing
p_pgsql-status on node2.cluster : STOP->HS:alone.
Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (HS:alone)
Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 22: p_pgsql-status=HS:alone
Oct 25 13:55:45 node2 lrmd[14924]: notice: operation_finished:
p_pgsql_start_0:15439:stderr [ psql: could not connect to server: No
such file or directory ]
Oct 25 13:55:45 node2 lrmd[14924]: notice: operation_finished:
p_pgsql_start_0:15439:stderr [ #011Is the server running locally and
accepting ]
Oct 25 13:55:45 node2 lrmd[14924]: notice: operation_finished:
p_pgsql_start_0:15439:stderr [ #011connections on Unix domain socket
"/tmp/.s.PGSQL.5432"? ]
Oct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_pgsql_start_0 (call=70, rc=0, cib-update=29, confirmed=true) ok
Oct 25 13:55:45 node2 pacemakerd[16158]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:45 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_pgsql_notify_0 (call=100, rc=0, cib-update=0, confirmed=true) ok
Oct 25 13:55:45 node2 ntpd[7193]: Listening on interface #174 lo,
10.255.0.33#123 Enabled
Oct 25 13:55:45 node2 ntpd[7193]: Listening on interface #175 lo,
10.255.0.32#123 Enabled
Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_cs_dispatch: Update
relayed from node1.cluster
Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (HS:async)
Oct 25 13:55:45 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 24: p_pgsql-status=HS:async
Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_cs_dispatch: Update
relayed from node1.cluster
Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_pgsql (100)
Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 26: master-p_pgsql=100
Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_cs_dispatch: Update
relayed from node1.cluster
Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: p_pgsql-status (HS:sync)
Oct 25 13:55:46 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 28: p_pgsql-status=HS:sync
Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: INFO: Changing MySQL
configuration to replicate from node1.cluster.
Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: ERROR: check_slave invoked
on an instance that is not a replication slave.
Oct 25 13:55:48 node2 mysql(p_mysql)[15438]: INFO: Restored master pos
for 192.168.253.5 : mysqld-bin.000471:106
Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (1)
Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 30: master-p_mysql=1
Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (60)
Oct 25 13:55:48 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 32: master-p_mysql=60
Oct 25 13:55:49 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: readable (1)
Oct 25 13:55:49 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 34: readable=1
Oct 25 13:55:49 node2 mysql(p_mysql)[15438]: INFO: MySQL started
Oct 25 13:55:49 node2 lrmd[14924]: notice: operation_finished:
p_mysql_start_0:15438:stderr [ Error performing operation: No such
device or address ]
Oct 25 13:55:49 node2 lrmd[14924]: notice: operation_finished:
p_mysql_start_0:15438:stderr [ resource ms_MySQL is NOT running ]
Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_mysql_start_0 (call=68, rc=0, cib-update=30, confirmed=true) ok
Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_mysql_notify_0 (call=104, rc=0, cib-update=0, confirmed=true) ok
Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_perl-fpm_monitor_10000 (call=111, rc=0, cib-update=31,
confirmed=false) ok
Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_radiusd_monitor_10000 (call=113, rc=0, cib-update=32,
confirmed=false) ok
Oct 25 13:55:49 node2 pacemakerd[16380]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:49 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_pgsql_monitor_7000 (call=109, rc=0, cib-update=33,
confirmed=false) ok
Oct 25 13:55:49 node2 IPaddr2(mysql_reader_vip)[16445]: INFO: Adding
inet address 192.168.253.63/24 with broadcast address 192.168.253.255 to
device br0
Oct 25 13:55:49 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO: Adding
inet address 192.168.253.31/24 with broadcast address 192.168.253.255 to
device br0
Oct 25 13:55:49 node2 avahi-daemon[6932]: Registering new address record
for 192.168.253.63 on br0.IPv4.
Oct 25 13:55:49 node2 avahi-daemon[6932]: Registering new address record
for 192.168.253.31 on br0.IPv4.
Oct 25 13:55:50 node2 IPaddr2(mysql_reader_vip)[16445]: INFO: Bringing
device br0 up
Oct 25 13:55:50 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO: Bringing
device br0 up
Oct 25 13:55:50 node2 IPaddr2(mysql_reader_vip)[16445]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.253.63 br0 192.168.253.63 auto
not_used not_used
Oct 25 13:55:50 node2 IPaddr2(pgsql_reader_vip)[16449]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/var/run/resource-agents/send_arp-192.168.253.31 br0 192.168.253.31 auto
not_used not_used
Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM
operation mysql_reader_vip_start_0 (call=117, rc=0, cib-update=34,
confirmed=true) ok
Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_start_0 (call=119, rc=0, cib-update=35,
confirmed=true) ok
Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM
operation mysql_reader_vip_monitor_10000 (call=124, rc=0, cib-update=36,
confirmed=false) ok
Oct 25 13:55:50 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (52)
Oct 25 13:55:50 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 36: master-p_mysql=52
Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM
operation pgsql_reader_vip_monitor_10000 (call=126, rc=0, cib-update=37,
confirmed=false) ok
Oct 25 13:55:50 node2 crmd[14927]: notice: process_lrm_event: LRM
operation p_mysql_monitor_2000 (call=107, rc=0, cib-update=38,
confirmed=false) ok
Oct 25 13:55:51 node2 ntpd[7193]: Listening on interface #176 br0,
192.168.253.63#123 Enabled
Oct 25 13:55:51 node2 ntpd[7193]: Listening on interface #177 br0,
192.168.253.31#123 Enabled
Oct 25 13:55:52 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (50)
Oct 25 13:55:52 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 38: master-p_mysql=50
Oct 25 13:55:54 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (51)
Oct 25 13:55:54 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 40: master-p_mysql=51
Oct 25 13:55:56 node2 mysql(p_mysql)[17074]: ERROR: MySQL instance
configured for replication, but replication has failed.
Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: readable (0)
Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 42: readable=0
Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_trigger_update:
Sending flush op to all hosts for: master-p_mysql (0)
Oct 25 13:55:56 node2 attrd[14925]: notice: attrd_perform_update: Sent
update 44: master-p_mysql=0
Oct 25 13:55:56 node2 IPaddr2(mysql_reader_vip)[17141]: INFO: IP status
= ok, IP_CIP=
Oct 25 13:55:56 node2 avahi-daemon[6932]: Withdrawing address record for
192.168.253.63 on br0.
Oct 25 13:55:56 node2 crmd[14927]: notice: process_lrm_event: LRM
operation mysql_reader_vip_stop_0 (call=132, rc=0, cib-update=39,
confirmed=true) ok
Oct 25 13:55:57 node2 pacemakerd[17206]: notice: crm_add_logfile:
Additional logging available in /var/log/cluster/corosync.log
Oct 25 13:55:58 node2 ntpd[7193]: Deleting interface #176 br0,
192.168.253.63#123, interface stats: received=0, sent=0, dropped=0,
active_time=7 secs
Oct 25 13:55:58 node2 mysql(p_mysql)[17267]: ERROR: MySQL instance
configured for replication, but replication has failed.
>
> Prevent:
>
> Fencing (aka stonith). This is why fencing is required.
No node failure. Just daemon was restarted.
>
>> 2) How to resolve split brain state? Is it enough just to wait for
>> failure, then - restart mysql by hand and clean row with dup index in
>> slave db, and then run resource again? Or there is some automation for
>> such cases?
>
> How are you sharing data? Can you give us a better understanding of
> your setup?
>
Semi-synchronous MySQL replication, if you mean sharing DB log between
nodes.
More information about the Pacemaker
mailing list