[Pacemaker] Resource ordering/colocating question (DRBD + LVM + FS)
Heikki Manninen
hma at iki.fi
Fri Sep 6 11:58:53 UTC 2013
On 6.9.2013, at 10.24, Heikki Manninen <heikki.manninen at gmail.com> wrote:
>> 2) Enable logging and look out which node is the DC.
>> There in the logs you find many many informations showing
>> what is going on. Hint: Open a terminal session with an
>> opened tail -f logfile. Watch it while inserting commands.
>> You'll get used to it.
>
> Seems that node #2 was the DC (also visible in the pcs status output). I have looked at the logs all the time, just not yet too familiar with the contents of pacemaker logging. Here's the thing that keeps repeating everytime those LVM and FS resources stay in stopped state:
>
> Sep 3 20:01:23 pgdbsrv02 pengine[1667]: notice: LogActions: Start LVM_vgdata01#011(pgdbsrv01.cl1.local - blocked)
> Sep 3 20:01:23 pgdbsrv02 pengine[1667]: notice: LogActions: Start FS_data01#011(pgdbsrv01.cl1.local - blocked)
> Sep 3 20:01:23 pgdbsrv02 pengine[1667]: notice: LogActions: Start LVM_vgdata02#011(pgdbsrv01.cl1.local - blocked)
> Sep 3 20:01:23 pgdbsrv02 pengine[1667]: notice: LogActions: Start FS_data02#011(pgdbsrv01.cl1.local - blocked)
>
> So what does blocked mean here? Is it that the node #1 in this case is in need of fencing/stonithing and thus being blocked or something else (I have a backgroud in the RHCS/HACMP/LifeKeeper etc. world). No quorum policy is set to ignore.
Some logs from the situation. Here's what I've done before these logs (the problem remains):
- STONITH configured and enabled (though the Fusion python agent is not working, but pacemaker trying to stonith would appear in logs anyway right?)
- LVM resources changed to "exclusive=false" to remove LVM tagging function (just to minimize moving part although the tagging seemed to work properly)
- cluster property migration-threshold=1 set (was rsc defaults) as per the earlier e-mail about error in the quick start documentation
Everything running on node #1, issue standby node #1:
Sep 4 00:39:27 pgdbsrv02 crmd[1858]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Sep 4 00:39:27 pgdbsrv02 cib[1853]: notice: cib:diff: Diff: --- 0.174.67
Sep 4 00:39:27 pgdbsrv02 cib[1853]: notice: cib:diff: Diff: +++ 0.175.1 14210c59954a4f036d168d60b7ed7baf
Sep 4 00:39:27 pgdbsrv02 cib[1853]: notice: cib:diff: -- <cib admin_epoch="0" epoch="174" num_updates="67" />
Sep 4 00:39:27 pgdbsrv02 cib[1853]: notice: cib:diff: ++ <nvpair id="nodes-pgdbsrv01.cl1.local-standby" name="standby" value="on" />
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Demote DRBD_data01:0#011(Master -> Stopped pgdbsrv01.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Promote DRBD_data01:1#011(Slave -> Master pgdbsrv02.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Demote DRBD_data02:0#011(Master -> Stopped pgdbsrv01.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Promote DRBD_data02:1#011(Slave -> Master pgdbsrv02.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Stop LVM_vgdata01#011(Started pgdbsrv02.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Stop FS_data01#011(Started pgdbsrv02.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Stop LVM_vgdata02#011(Started pgdbsrv02.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Stop FS_data02#011(Started pgdbsrv02.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-188.bz2
Sep 4 00:39:27 pgdbsrv02 crmd[1858]: notice: run_graph: Transition 3 (Complete=8, Pending=0, Fired=0, Skipped=37, Incomplete=26, Source=/var/lib/pacemaker/pengine/pe-input-188.bz2): Stopped
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Demote DRBD_data01:0#011(Master -> Stopped pgdbsrv01.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Promote DRBD_data01:1#011(Slave -> Master pgdbsrv02.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Demote DRBD_data02:0#011(Master -> Stopped pgdbsrv01.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Promote DRBD_data02:1#011(Slave -> Master pgdbsrv02.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Stop LVM_vgdata01#011(Started pgdbsrv02.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Start FS_data01#011(pgdbsrv02.cl1.local - blocked)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Stop LVM_vgdata02#011(Started pgdbsrv02.cl1.local)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: LogActions: Start FS_data02#011(pgdbsrv02.cl1.local - blocked)
Sep 4 00:39:27 pgdbsrv02 crmd[1858]: warning: destroy_action: Cancelling timer for action 6 (src=97)
Sep 4 00:39:27 pgdbsrv02 crmd[1858]: warning: destroy_action: Cancelling timer for action 5 (src=98)
Sep 4 00:39:27 pgdbsrv02 pengine[1857]: notice: process_pe_message: Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-189.bz2
Sep 4 00:39:28 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data02_notify_0 (call=81, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:39:28 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data01_notify_0 (call=79, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:39:28 pgdbsrv02 kernel: block drbd1: peer( Primary -> Secondary )
Sep 4 00:39:28 pgdbsrv02 kernel: block drbd2: peer( Primary -> Secondary )
Sep 4 00:39:28 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data01_notify_0 (call=85, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:39:28 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data02_notify_0 (call=87, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:39:28 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data01_notify_0 (call=91, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:39:28 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data02_notify_0 (call=93, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data01: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data01: asender terminated
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data01: Terminating drbd_a_data01
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data01: Connection closed
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data01: conn( TearDown -> Unconnected )
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data01: receiver terminated
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data01: Restarting receiver thread
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data01: receiver (re)started
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data01: conn( Unconnected -> WFConnection )
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data02: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown )
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data02: asender terminated
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data02: Terminating drbd_a_data02
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data02: Connection closed
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data02: conn( TearDown -> Unconnected )
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data02: receiver terminated
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data02: Restarting receiver thread
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data02: receiver (re)started
Sep 4 00:39:28 pgdbsrv02 kernel: d-con data02: conn( Unconnected -> WFConnection )
Sep 4 00:39:28 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data01_notify_0 (call=97, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:39:28 pgdbsrv02 attrd[1856]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-DRBD_data01 (1000)
Sep 4 00:39:28 pgdbsrv02 attrd[1856]: notice: attrd_perform_update: Sent update 28: master-DRBD_data01=1000
Sep 4 00:39:28 pgdbsrv02 attrd[1856]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-DRBD_data02 (1000)
Sep 4 00:39:28 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data02_notify_0 (call=99, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:39:28 pgdbsrv02 crmd[1858]: notice: run_graph: Transition 4 (Complete=48, Pending=0, Fired=0, Skipped=11, Incomplete=8, Source=/var/lib/pacemaker/pengine/pe-input-189.bz2): Stopped
Sep 4 00:39:28 pgdbsrv02 pengine[1857]: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep 4 00:39:28 pgdbsrv02 pengine[1857]: notice: LogActions: Promote DRBD_data01:0#011(Slave -> Master pgdbsrv02.cl1.local)
Sep 4 00:39:28 pgdbsrv02 attrd[1856]: notice: attrd_perform_update: Sent update 30: master-DRBD_data02=1000
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: LogActions: Promote DRBD_data02:0#011(Slave -> Master pgdbsrv02.cl1.local)
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: LogActions: Start LVM_vgdata01#011(pgdbsrv02.cl1.local)
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: LogActions: Start FS_data01#011(pgdbsrv02.cl1.local)
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: LogActions: Start LVM_vgdata02#011(pgdbsrv02.cl1.local)
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: LogActions: Start FS_data02#011(pgdbsrv02.cl1.local)
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: process_pe_message: Calculated Transition 5: /var/lib/pacemaker/pengine/pe-input-190.bz2
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: LogActions: Promote DRBD_data01:0#011(Slave -> Master pgdbsrv02.cl1.local)
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: LogActions: Promote DRBD_data02:0#011(Slave -> Master pgdbsrv02.cl1.local)
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: LogActions: Start LVM_vgdata01#011(pgdbsrv02.cl1.local)
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: LogActions: Start FS_data01#011(pgdbsrv02.cl1.local)
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: LogActions: Start LVM_vgdata02#011(pgdbsrv02.cl1.local)
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: LogActions: Start FS_data02#011(pgdbsrv02.cl1.local)
Sep 4 00:39:29 pgdbsrv02 pengine[1857]: notice: process_pe_message: Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-191.bz2
Sep 4 00:39:29 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data02_notify_0 (call=105, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:39:29 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data01_notify_0 (call=103, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:39:29 pgdbsrv02 kernel: block drbd2: role( Secondary -> Primary )
Sep 4 00:39:29 pgdbsrv02 kernel: block drbd2: new current UUID 8C003D59CE4067EB:21AC79F0DD0AE428:C6D75DB020A4DE3C:C6D65DB020A4DE3C
Sep 4 00:39:29 pgdbsrv02 kernel: block drbd1: role( Secondary -> Primary )
Sep 4 00:39:29 pgdbsrv02 kernel: block drbd1: new current UUID 9E9B1EAB443C3EEF:A94A7B84E93B1CA8:F0225BC8D96E7F1A:F0215BC8D96E7F1A
Sep 4 00:39:29 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data01_promote_0 (call=111, rc=0, cib-update=47, confirmed=true) ok
Sep 4 00:39:29 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data02_promote_0 (call=108, rc=0, cib-update=48, confirmed=true) ok
Sep 4 00:39:29 pgdbsrv02 attrd[1856]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-DRBD_data01 (10000)
Sep 4 00:39:29 pgdbsrv02 attrd[1856]: notice: attrd_perform_update: Sent update 32: master-DRBD_data01=10000
Sep 4 00:39:29 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data01_notify_0 (call=117, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:39:29 pgdbsrv02 attrd[1856]: notice: attrd_trigger_update: Sending flush op to all hosts for: master-DRBD_data02 (10000)
Sep 4 00:39:29 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data02_notify_0 (call=121, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:39:29 pgdbsrv02 attrd[1856]: notice: attrd_perform_update: Sent update 34: master-DRBD_data02=10000
Sep 4 00:39:29 pgdbsrv02 lrmd[1855]: notice: operation_finished: LVM_vgdata01_start_0:3224 [ 2013/09/04_00:39:29 INFO: Activating volume group vgdata01 ]
Sep 4 00:39:29 pgdbsrv02 lrmd[1855]: notice: operation_finished: LVM_vgdata01_start_0:3224 [ 2013/09/04_00:39:29 INFO: Reading all physical volumes. This may take a while... Found volume group "vgdata02" using metadata type lvm2 Found volume group "vg_pgdbsrv02" using metadata type lvm2 Found volume group "vgdata01" using metadata type lvm2 ]
Sep 4 00:39:29 pgdbsrv02 lrmd[1855]: notice: operation_finished: LVM_vgdata01_start_0:3224 [ 2013/09/04_00:39:29 INFO: 1 logical volume(s) in volume group "vgdata01" now active ]
Sep 4 00:39:29 pgdbsrv02 lrmd[1855]: notice: operation_finished: LVM_vgdata02_start_0:3230 [ 2013/09/04_00:39:29 INFO: Activating volume group vgdata02 ]
Sep 4 00:39:29 pgdbsrv02 lrmd[1855]: notice: operation_finished: LVM_vgdata02_start_0:3230 [ 2013/09/04_00:39:29 INFO: Reading all physical volumes. This may take a while... Found volume group "vgdata02" using metadata type lvm2 Found volume group "vg_pgdbsrv02" using metadata type lvm2 Found volume group "vgdata01" using metadata type lvm2 ]
Sep 4 00:39:29 pgdbsrv02 lrmd[1855]: notice: operation_finished: LVM_vgdata02_start_0:3230 [ 2013/09/04_00:39:29 INFO: 1 logical volume(s) in volume group "vgdata02" now active ]
Sep 4 00:39:30 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation LVM_vgdata01_start_0 (call=115, rc=0, cib-update=49, confirmed=true) ok
Sep 4 00:39:30 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation LVM_vgdata02_start_0 (call=119, rc=0, cib-update=50, confirmed=true) ok
Sep 4 00:39:30 pgdbsrv02 crmd[1858]: notice: run_graph: Transition 6 (Complete=22, Pending=0, Fired=0, Skipped=8, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-191.bz2): Stopped
Sep 4 00:39:30 pgdbsrv02 pengine[1857]: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep 4 00:39:30 pgdbsrv02 pengine[1857]: notice: LogActions: Start FS_data01#011(pgdbsrv02.cl1.local)
Sep 4 00:39:30 pgdbsrv02 pengine[1857]: notice: LogActions: Start FS_data02#011(pgdbsrv02.cl1.local)
Sep 4 00:39:30 pgdbsrv02 pengine[1857]: notice: process_pe_message: Calculated Transition 7: /var/lib/pacemaker/pengine/pe-input-192.bz2
Sep 4 00:39:30 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation LVM_vgdata01_monitor_30000 (call=127, rc=0, cib-update=52, confirmed=false) ok
Sep 4 00:39:30 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation LVM_vgdata02_monitor_30000 (call=131, rc=0, cib-update=53, confirmed=false) ok
Sep 4 00:39:30 pgdbsrv02 kernel: EXT4-fs (dm-3): warning: maximal mount count reached, running e2fsck is recommended
Sep 4 00:39:30 pgdbsrv02 lrmd[1855]: notice: operation_finished: FS_data02_start_0:3393 [ 2013/09/04_00:39:30 INFO: Running start for /dev/mapper/vgdata02-lvdata02 on /data02 ]
Sep 4 00:39:30 pgdbsrv02 kernel: EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts:
Sep 4 00:39:30 pgdbsrv02 kernel: EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts:
Sep 4 00:39:30 pgdbsrv02 lrmd[1855]: notice: operation_finished: FS_data01_start_0:3391 [ 2013/09/04_00:39:30 INFO: Running start for /dev/mapper/vgdata01-lvdata01 on /data01 ]
Sep 4 00:39:31 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation FS_data02_start_0 (call=133, rc=0, cib-update=54, confirmed=true) ok
Sep 4 00:39:31 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation FS_data01_start_0 (call=129, rc=0, cib-update=55, confirmed=true) ok
Sep 4 00:39:32 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation FS_data01_monitor_30000 (call=141, rc=0, cib-update=56, confirmed=false) ok
Sep 4 00:39:32 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation FS_data02_monitor_30000 (call=139, rc=0, cib-update=57, confirmed=false) ok
Sep 4 00:39:32 pgdbsrv02 crmd[1858]: notice: run_graph: Transition 7 (Complete=10, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-192.bz2): Complete
Sep 4 00:39:32 pgdbsrv02 crmd[1858]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
-> Everything running ok on node #2, issue unstandby node #1:
Sep 4 00:40:19 pgdbsrv02 crmd[1858]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Sep 4 00:40:19 pgdbsrv02 cib[1853]: notice: cib:diff: Diff: --- 0.175.31
Sep 4 00:40:19 pgdbsrv02 cib[1853]: notice: cib:diff: Diff: +++ 0.176.1 89700ebf57ae9b557e7336b873c95c08
Sep 4 00:40:19 pgdbsrv02 cib[1853]: notice: cib:diff: -- <nvpair id="nodes-pgdbsrv01.cl1.local-standby" name="standby" value="on" />
Sep 4 00:40:19 pgdbsrv02 cib[1853]: notice: cib:diff: ++ <cib epoch="176" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" update-origin="pgdbsrv01.cl1.local" update-client="crm_attribute" cib-last-written="Wed Sep 4 00:39:27 2013" crm_feature_set="3.0.7" have-quorum="1" dc-uuid="pgdbsrv02.cl1.local" />
Sep 4 00:40:19 pgdbsrv02 pengine[1857]: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep 4 00:40:19 pgdbsrv02 pengine[1857]: notice: LogActions: Start DRBD_data01:1#011(pgdbsrv01.cl1.local)
Sep 4 00:40:19 pgdbsrv02 pengine[1857]: notice: LogActions: Start DRBD_data02:1#011(pgdbsrv01.cl1.local)
Sep 4 00:40:19 pgdbsrv02 pengine[1857]: notice: LogActions: Move fusion-fencing#011(Started pgdbsrv02.cl1.local -> pgdbsrv01.cl1.local)
Sep 4 00:40:19 pgdbsrv02 pengine[1857]: notice: process_pe_message: Calculated Transition 8: /var/lib/pacemaker/pengine/pe-input-193.bz2
Sep 4 00:40:19 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation fusion-fencing_stop_0 (call=145, rc=0, cib-update=59, confirmed=true) ok
Sep 4 00:40:19 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data01_notify_0 (call=148, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:40:19 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data02_notify_0 (call=150, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:40:20 pgdbsrv02 kernel: d-con data01: Handshake successful: Agreed network protocol version 101
Sep 4 00:40:20 pgdbsrv02 kernel: d-con data01: conn( WFConnection -> WFReportParams )
Sep 4 00:40:20 pgdbsrv02 kernel: d-con data01: Starting asender thread (from drbd_r_data01 [2205])
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: drbd_sync_handshake:
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: self 9E9B1EAB443C3EEF:A94A7B84E93B1CA8:F0225BC8D96E7F1A:F0215BC8D96E7F1A bits:1 flags:0
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: peer A94A7B84E93B1CA8:0000000000000000:F0225BC8D96E7F1A:F0215BC8D96E7F1A bits:0 flags:0
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: uuid_compare()=1 by rule 70
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0%
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0%
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: helper command: /sbin/drbdadm before-resync-source minor-1
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: helper command: /sbin/drbdadm before-resync-source minor-1 exit code 0 (0x0)
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: Began resync as SyncSource (will sync 4 KB [1 bits set]).
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: updated sync UUID 9E9B1EAB443C3EEF:A94B7B84E93B1CA8:A94A7B84E93B1CA8:F0225BC8D96E7F1A
Sep 4 00:40:20 pgdbsrv02 kernel: d-con data02: Handshake successful: Agreed network protocol version 101
Sep 4 00:40:20 pgdbsrv02 kernel: d-con data02: conn( WFConnection -> WFReportParams )
Sep 4 00:40:20 pgdbsrv02 kernel: d-con data02: Starting asender thread (from drbd_r_data02 [2209])
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: drbd_sync_handshake:
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: self 8C003D59CE4067EB:21AC79F0DD0AE428:C6D75DB020A4DE3C:C6D65DB020A4DE3C bits:1 flags:0
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: peer 21AC79F0DD0AE428:0000000000000000:C6D75DB020A4DE3C:C6D65DB020A4DE3C bits:0 flags:0
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: uuid_compare()=1 by rule 70
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0%
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0%
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: helper command: /sbin/drbdadm before-resync-source minor-2
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: Resync done (total 1 sec; paused 0 sec; 4 K/sec)
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: updated UUIDs 9E9B1EAB443C3EEF:0000000000000000:A94B7B84E93B1CA8:A94A7B84E93B1CA8
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd1: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: helper command: /sbin/drbdadm before-resync-source minor-2 exit code 0 (0x0)
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: Began resync as SyncSource (will sync 4 KB [1 bits set]).
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: updated sync UUID 8C003D59CE4067EB:21AD79F0DD0AE428:21AC79F0DD0AE428:C6D75DB020A4DE3C
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: Resync done (total 1 sec; paused 0 sec; 4 K/sec)
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: updated UUIDs 8C003D59CE4067EB:0000000000000000:21AD79F0DD0AE428:21AC79F0DD0AE428
Sep 4 00:40:20 pgdbsrv02 kernel: block drbd2: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
Sep 4 00:40:21 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data01_notify_0 (call=154, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:40:21 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation DRBD_data02_notify_0 (call=156, rc=0, cib-update=0, confirmed=true) ok
Sep 4 00:40:22 pgdbsrv02 crmd[1858]: notice: run_graph: Transition 8 (Complete=23, Pending=0, Fired=0, Skipped=2, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-193.bz2): Stopped
Sep 4 00:40:22 pgdbsrv02 pengine[1857]: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep 4 00:40:22 pgdbsrv02 pengine[1857]: notice: LogActions: Stop LVM_vgdata01#011(Started pgdbsrv02.cl1.local)
Sep 4 00:40:22 pgdbsrv02 pengine[1857]: notice: LogActions: Stop FS_data01#011(Started pgdbsrv02.cl1.local)
Sep 4 00:40:22 pgdbsrv02 pengine[1857]: notice: LogActions: Stop LVM_vgdata02#011(Started pgdbsrv02.cl1.local)
Sep 4 00:40:22 pgdbsrv02 pengine[1857]: notice: LogActions: Stop FS_data02#011(Started pgdbsrv02.cl1.local)
Sep 4 00:40:22 pgdbsrv02 pengine[1857]: notice: process_pe_message: Calculated Transition 9: /var/lib/pacemaker/pengine/pe-input-194.bz2
Sep 4 00:40:22 pgdbsrv02 lrmd[1855]: notice: operation_finished: FS_data02_stop_0:3826 [ 2013/09/04_00:40:22 INFO: Running stop for /dev/mapper/vgdata02-lvdata02 on /data02 ]
Sep 4 00:40:22 pgdbsrv02 lrmd[1855]: notice: operation_finished: FS_data02_stop_0:3826 [ 2013/09/04_00:40:22 INFO: Trying to unmount /data02 ]
Sep 4 00:40:22 pgdbsrv02 lrmd[1855]: notice: operation_finished: FS_data02_stop_0:3826 [ 2013/09/04_00:40:22 INFO: unmounted /data02 successfully ]
Sep 4 00:40:22 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation FS_data02_stop_0 (call=164, rc=0, cib-update=61, confirmed=true) ok
Sep 4 00:40:22 pgdbsrv02 lrmd[1855]: notice: operation_finished: FS_data01_stop_0:3825 [ 2013/09/04_00:40:22 INFO: Running stop for /dev/mapper/vgdata01-lvdata01 on /data01 ]
Sep 4 00:40:22 pgdbsrv02 lrmd[1855]: notice: operation_finished: FS_data01_stop_0:3825 [ 2013/09/04_00:40:22 INFO: Trying to unmount /data01 ]
Sep 4 00:40:22 pgdbsrv02 lrmd[1855]: notice: operation_finished: FS_data01_stop_0:3825 [ 2013/09/04_00:40:22 INFO: unmounted /data01 successfully ]
Sep 4 00:40:22 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation FS_data01_stop_0 (call=161, rc=0, cib-update=62, confirmed=true) ok
Sep 4 00:40:22 pgdbsrv02 crmd[1858]: notice: run_graph: Transition 9 (Complete=6, Pending=0, Fired=0, Skipped=5, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-194.bz2): Stopped
Sep 4 00:40:22 pgdbsrv02 pengine[1857]: notice: unpack_config: On loss of CCM Quorum: Ignore
Sep 4 00:40:22 pgdbsrv02 pengine[1857]: notice: LogActions: Stop LVM_vgdata01#011(Started pgdbsrv02.cl1.local)
Sep 4 00:40:22 pgdbsrv02 pengine[1857]: notice: LogActions: Start FS_data01#011(pgdbsrv02.cl1.local - blocked)
Sep 4 00:40:22 pgdbsrv02 pengine[1857]: notice: LogActions: Stop LVM_vgdata02#011(Started pgdbsrv02.cl1.local)
Sep 4 00:40:22 pgdbsrv02 pengine[1857]: notice: LogActions: Start FS_data02#011(pgdbsrv02.cl1.local - blocked)
Sep 4 00:40:22 pgdbsrv02 pengine[1857]: notice: process_pe_message: Calculated Transition 10: /var/lib/pacemaker/pengine/pe-input-195.bz2
Sep 4 00:40:22 pgdbsrv02 lrmd[1855]: notice: operation_finished: LVM_vgdata01_stop_0:3971 [ 2013/09/04_00:40:22 INFO: Deactivating volume group vgdata01 ]
Sep 4 00:40:22 pgdbsrv02 lrmd[1855]: notice: operation_finished: LVM_vgdata01_stop_0:3971 [ 2013/09/04_00:40:22 INFO: 0 logical volume(s) in volume group "vgdata01" now active ]
Sep 4 00:40:22 pgdbsrv02 lrmd[1855]: notice: operation_finished: LVM_vgdata01_stop_0:3971 [ 2013/09/04_00:40:22 INFO: LVM Volume vgdata01 is not available (stopped) ]
Sep 4 00:40:22 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation LVM_vgdata01_stop_0 (call=171, rc=0, cib-update=64, confirmed=true) ok
Sep 4 00:40:22 pgdbsrv02 lrmd[1855]: notice: operation_finished: LVM_vgdata02_stop_0:3972 [ 2013/09/04_00:40:22 INFO: Deactivating volume group vgdata02 ]
Sep 4 00:40:22 pgdbsrv02 lrmd[1855]: notice: operation_finished: LVM_vgdata02_stop_0:3972 [ 2013/09/04_00:40:22 INFO: 0 logical volume(s) in volume group "vgdata02" now active ]
Sep 4 00:40:22 pgdbsrv02 lrmd[1855]: notice: operation_finished: LVM_vgdata02_stop_0:3972 [ 2013/09/04_00:40:22 INFO: LVM Volume vgdata02 is not available (stopped) ]
Sep 4 00:40:22 pgdbsrv02 crmd[1858]: notice: process_lrm_event: LRM operation LVM_vgdata02_stop_0 (call=174, rc=0, cib-update=65, confirmed=true) ok
Sep 4 00:40:22 pgdbsrv02 crmd[1858]: notice: run_graph: Transition 10 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-195.bz2): Complete
Sep 4 00:40:22 pgdbsrv02 crmd[1858]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
So it looks like it stops LVM and FS resources running on node #2 (why?) and then tries to start FS resources but never starts LVM resources.
I'm thinking that this could be just some simple error in the ordering/colocating config by me. Here's the config again:
# crm configure show
node pgdbsrv01.cl1.local
node pgdbsrv02.cl1.local
primitive DRBD_data01 ocf:linbit:drbd \
params drbd_resource="data01" \
op monitor interval="30s"
primitive DRBD_data02 ocf:linbit:drbd \
params drbd_resource="data02" \
op monitor interval="30s"
primitive FS_data01 ocf:heartbeat:Filesystem \
params device="/dev/mapper/vgdata01-lvdata01" directory="/data01" fstype="ext4" \
op monitor interval="30s"
primitive FS_data02 ocf:heartbeat:Filesystem \
params device="/dev/mapper/vgdata02-lvdata02" directory="/data02" fstype="ext4" \
op monitor interval="30s"
primitive LVM_vgdata01 ocf:heartbeat:LVM \
params volgrpname="vgdata01" exclusive="false" \
op monitor interval="30s"
primitive LVM_vgdata02 ocf:heartbeat:LVM \
params volgrpname="vgdata02" exclusive="false" \
op monitor interval="30s"
primitive fusion-fencing stonith:fence_fusion \
params action="reboot" ipaddr="xxxxxxxxxxx" login="root" passwd="xxxxxxxx" pcmk_host_list="pgdbsrv01.cl1.local pgdbsrv02.cl1.local" pcmk_host_map="pgdbsrv01.cl1.local:00_0C_29_AB_30_BD;pgdbsrv02.cl1.local:00_0C_29_97_43_A2"
group GRP_data01 LVM_vgdata01 FS_data01
group GRP_data02 LVM_vgdata02 FS_data02
ms DRBD_ms_data01 DRBD_data01 \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
ms DRBD_ms_data02 DRBD_data02 \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation colocation-GRP_data01-DRBD_ms_data01-INFINITY inf: GRP_data01 DRBD_ms_data01:Master
colocation colocation-GRP_data02-DRBD_ms_data02-INFINITY inf: GRP_data02 DRBD_ms_data02:Master
order order-DRBD_data01-GRP_data01-mandatory : DRBD_data01:promote GRP_data01:start
order order-DRBD_data02-GRP_data02-mandatory : DRBD_data02:promote GRP_data02:start
property $id="cib-bootstrap-options" \
dc-version="1.1.8-7.el6-394e906" \
cluster-infrastructure="cman" \
no-quorum-policy="ignore" \
migration-threshold="1" \
stonith-enabled="true"
rsc_defaults $id="rsc_defaults-options" \
resource-stickiness="0" \
migration-threshold="1"
--
Heikki M
More information about the Pacemaker
mailing list